Page MenuHome GnuPG

libgcrypt tests/basic.c and tests/keygen.c occasionally fail with "error generating RSA key: Number is not prime"
Closed, ResolvedPublic

Description

In the CI runs of the libgcrypt mirror on GitLab (https://gitlab.com/redhat-crypto/libgcrypt/libgcrypt-mirror/-/pipelines?page=1&scope=all&ref=master), tests/basic.c and tests/keygen.c have both failed once in forced FIPS mode with the error message

error generating RSA key: Number is not prime

The specific runs where this happened are:

It would be great if we could improve the stability of the tests so that this does not happen.

Event Timeline

I don't know the exact procedure by FIPS, but just setting the least significant bit in the generation (after _gcry_mpi_randomize) can reduce the probability by half.

I think that it is OK to loop forever until we find a prime.

diff --git a/cipher/rsa.c b/cipher/rsa.c
index 3f1cd722..3ff7fec4 100644
--- a/cipher/rsa.c
+++ b/cipher/rsa.c
@@ -386,7 +386,6 @@ generate_fips (RSA_secret_key *sk, unsigned int nbits, unsigned long use_e,
   gcry_mpi_t diff, mindiff;
   gcry_random_level_t random_level;
   unsigned int pbits = nbits/2;
-  unsigned int i;
   int pqswitch;
   gpg_err_code_t ec;
 
@@ -476,18 +475,18 @@ generate_fips (RSA_secret_key *sk, unsigned int nbits, unsigned long use_e,
 
  retry:
   /* generate p and q */
-  for (i = 0; i < 5 * pbits; i++)
+  for (;;)
     {
-    ploop:
       if (!testparms)
         {
           _gcry_mpi_randomize (p, pbits, random_level);
+          mpi_set_bit (p, 0);
         }
       if (mpi_cmp (p, minp) < 0)
         {
           if (testparms)
             goto err;
-          goto ploop;
+          continue;
         }
 
       mpi_sub_ui (p1, p, 1);
@@ -505,21 +504,19 @@ generate_fips (RSA_secret_key *sk, unsigned int nbits, unsigned long use_e,
       else if (testparms)
         goto err;
     }
-  if (i >= 5 * pbits)
-    goto err;
 
-  for (i = 0; i < 5 * pbits; i++)
+  for (;;)
     {
-    qloop:
       if (!testparms)
         {
           _gcry_mpi_randomize (q, pbits, random_level);
+          mpi_set_bit (q, 0);
         }
       if (mpi_cmp (q, minp) < 0)
         {
           if (testparms)
             goto err;
-          goto qloop;
+          continue;
         }
       if (mpi_cmp (p, q) > 0)
         {
@@ -535,7 +532,7 @@ generate_fips (RSA_secret_key *sk, unsigned int nbits, unsigned long use_e,
         {
           if (testparms)
             goto err;
-          goto qloop;
+          continue;
         }
 
       mpi_sub_ui (q1, q, 1);
@@ -553,8 +550,6 @@ generate_fips (RSA_secret_key *sk, unsigned int nbits, unsigned long use_e,
       else if (testparms)
         goto err;
     }
-  if (i >= 5 * pbits)
-    goto err;
 
   if (testparms)
     {
werner triaged this task as Normal priority.Apr 7 2022, 10:04 AM
werner added a subscriber: werner.

The set_bit is obvious but we should cross check with the specs. In the non-fips mode we also try w/o a limit.

I checked FIPS 186-4 (and FIPS 186-5-draft). It is Appendix A 1.3.

In the generation, we should call mpi_set_bit.

The constants for loops are different:
(1) In FIPS 186-4, it's 5 for p and it's also 5 for q.
(2) In FIPS 186-5-draft, it's 10 for p and it's 20 for q.

So, probably, this kind of failure is known and addressed in FIPS 186-5-draft.

That sounds reasonable. The FIPS 186-5 draft (https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-5-draft.pdf) covers this in section A.1.3, although I'm not quite sure why a lower bound for p was chosen compared to q. The comment that seems to have triggered this change is published on page 68 of https://csrc.nist.gov/CSRC/media/Publications/fips/186/4/final/documents/comments-received-fips186-4-december-2015.pdf by Allen Roginsky. It only contains a suggestion of 20, presumably for both numbers.

For future reference, the same algorithm in FIPS 186-4 (https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf) can be found in section B.3.3.

Seems like following the 186-5 draft would be a reasonable course of action. Comments on 186-5 (see comment no. 38 on page 80 https://csrc.nist.gov/CSRC/media/Publications/fips/186/5/draft/documents/fips-186-5-draft-comments-received.pdf) suggest this has originally been copied from ANSI X9.80-2020, so it's not out of the question that this will change again in the final version of 186-5.

Feedback from the lab is that they'd recommend returning a specific error code that indicates that the prime search failed and then relying on the caller to decide whether to loop or bubble up the error. I'm not sure who we would consider to be the "caller" of the relevant generation function in this case, though.

I'd say let's consider this solved by switching to 10/20. According to the comments I linked above, the probability for generation of a p or q to fail for a specific k = nlen/2 is (1 - (2 / k * ln 2))^(a * k) with a = {5, 10, 20}.

For generation of an RSA key to fail, either the generation of p or the generation of q must fail, hence we know that the probability that the generation of a key fails is p(failure) = 1 - p(success) = 1 - (p(p success) * p(q success)). If I run the numbers, key size does not contribute a lot, and the 5/5 approach gives us a probability of failure between 1.0417e-6 and 1.0796e-6, i.e. just around 1.08 ppm. The 10/20 approach on the other hand is mostly dominated by the success probability of the generation of p and comes in between 2.7131e-13 and 2.9140e-13, i.e., around 0.0000003 ppm.

In other words, with the 5/5 approach, on average every 500,000th key generation fails, while the 10/20 approach fails on average every 1.7e12 keys generated. If we assume generating one key takes a second, that's ~54000 years until a failure.

werner changed the task status from Open to Testing.Sep 22 2022, 11:02 AM
werner removed a project: Restricted Project.