Page MenuHome GnuPG

New API for modern password hash function
Closed, ResolvedPublic

Description

We have gcry_kdf_derive now. For modern password hashing, it's better to have new API. To avoid dependency to thread support, it's good to decouple thread things and kdf computation.

  • gcry_kdf_open
  • gcry_kdf_ctl
  • gcry_kdf_compute
  • gcry_kdf_final
  • gcry_kdf_close

Possibly, user code can be something like following.

Argon2 and Balloon will be implemented. Possibly, scrypt too.

Event Timeline

gniibe triaged this task as Normal priority.EditedJan 24 2022, 10:52 AM
gniibe created this task.
gniibe updated the task description. (Show Details)
struct thread_creation {
  void (*compute) (void *arg);
  void *arg;
  void (*create_done) (gcry_kdf_handle_t *hd, void *tid);
};

struct thread_termination {
  void *thread_id;
};

static void *
start_thread (void *a)
{
  struct thread_creation *p = a;
  p->compute (p->arg);
  pthread_exit (NULL);
}


#define MAX_THREAD 8

gcry_error_t
kdf_derive (int parallel,
            algo, subalgo, params, paramslen,
            pass, passlen, salt, saltlen,
            outlen, out)
{
  gcry_error_t err;
  gcry_kdf_handle_t *hd;
  pthread_attr_t attr;
  pthread_t thr[MAX_THREAD];
  int i;

  err = gcry_kdf_open (&hd, algo, subalgo, params, paramslen,
                       pass, passlen, salt, saltlen);
  if (err)
    return err;

  if (parallel)
    {
      int max_thread = MAX_THREAD;

      memset (thr, 0, sizeof (thr));

      if (pthread_attr_init (&attr))
	{
	  gcry_kdf_close (hd);
	  return gpg_error_from_syserror ();
	}

      if (pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_JOINABLE))
	{
	  pthread_attr_destroy (&attr);
	  gcry_kdf_close (hd);
	  return gpg_error_from_syserror ();
	}

      err = gcry_kdf_ctl (hd, GCRYCTL_SET_MAX_THREAD,
			  &max_thread, sizeof (max_thread));
      if (err)
	{
	  pthread_attr_destroy (&attr);
	  gcry_kdf_close (hd);
	  return err;
	}
    }

  i = 0;
  while (1)
    {
      int dispatch;
      void *arg;

      dispatch = gcry_kdf_compute (hd, &arg);
      if (dispatch < 0)
        {
          /* ERROR. */
          err = (gcry_error_t)arg;
          break;
        }
      else if (dispatch == 0)
        /* DONE */
        break;
      else if (dispatch == 1)
        {                       /* request to ask creating a thread */
	  struct thread_creation *p = arg;

          if (parallel)
	    {
	      pthread_create (&thr[i], &attr, start_thread, p);
	      p->create_done (hd, &thr[i++]);
	    }
          else
	    {
	      p->compute (p->arg);
	      p->create_done (hd, NULL);
	    }
        }
      else if (dispatch == 2)
        {                       /* request to ask joining a thread */
          if (parallel)
	    {
	      struct thread_termination *p = arg;
	      pthread_t *thr_p = p->thread_id;

	      pthread_join (*thr_p, NULL);
	      memset (thr_p, 0, sizeof (pthread_t));
	      --i;
	    }
        }
    }

  if (parallel)
    pthread_attr_destroy (&attr);

  gcry_kdf_final (hd, outlen, out);
  gcry_kdf_close (hd);
}

I planned to reply to your email on mailing-list, but I just have too little time.

I thought about how to handle parallel jobs from kdf/libgcrypt and how to make interface simple as possible. New KDF algos with parallel processing basically have following construction:

  • 1. initialization
  • 2.1. for (X = 0; X < work_count; X++) launch parallel work X
  • 2.2. synchronization point, wait all parallel work to complete
  • 2.3. if more work, goto 2.1.
  • 3. finalization

For 2.1., libgcrypt needs way to dispatch parallel work items to caller's threads. For 2.2. libgcrypt needs way to wait for all dispatched work items to complete. My idea was to provide function pointers to KDF algorithm:

typedef struct gcry_kdf_thread_ops
{
  /* Context pointer passed to job functions calls. */
  void *jobs_context;
  /* Launch new JOB for parallel processing.  JOB function is executed in
   * separate thread or place in to worker thread pool for execution.
   * WORK_PRIV parameter is passed to JOB function as is.  On success
   * identifier for launched job is output to JOB_ID.  Returns '0' on
   * success, '-1' on error.  If error is returned, LAUNCH_JOB has performed
   * clean-up and no jobs are left pending.  */
  int (*launch_job)(unsigned int *job_id, void *jobs_context,
		    void (*job)(void *work_priv), void *work_priv);
  /* Wait pending jobs to complete.  Returns '0' on success, '-1' on error.  */
  int (*wait_all_jobs_completion)(void *jobs_context);
} gcry_kdf_thread_ops_t;

With this approach, gcry_kdf_compute calling look like:

struct user_defined_threads_ctx threads_ctx = { ... };
gcry_kdf_thread_ops_t ops = 
{
  .jobs_context = &threads,
  .launch_job = pthread_jobs_launch_job,
  .wait_all_jobs_completion =  pthread_jobs_wait_all
}

err = gcry_kdf_open (&hd, algo, subalgo, params, paramslen,
                       pass, passlen, salt, saltlen);
if (err)
  return err;
err = gcry_kdf_compute (hd, &ops); /* if ops == NULL, internally uses 'single_thread_ops' instead. */
gcry_kdf_final (hd, outlen, out);
gcry_kdf_close (hd);

With simple launch_job and wait_all_jobs_completion, libgcrypt does not need to track each thread and handle joining. Instead libgcrypt would just dispatch all work and then wait all jobs to complete. This would also avoid need for libgcrypt to see/handle thread identifiers. Handling maximum number of threads would be up to user (number of running threads could be handled in launch_job function).

Example implementation of thread operations with pthread could look something like this:

#define MAX_THREADS 8

struct user_defined_threads_ctx 
{
  int oldest_thread_idx;
  int next_thread_idx;
  int num_threads_running;
  pthread_t thread[MAX_THREADS];
  struct job_thread_param {
    void (*job)(void *work_priv);
    void *priv;
  } work[MAX_THREADS];
};

void *job_thread(void *p)
{
  struct job_thread_param *param = p;
  param->job (param->priv);
  pthread_exit (NULL);
}

int pthread_jobs_launch_job(unsigned int *job_id, void *jobs_context, void (*job)(void *work_priv), void *work_priv)
{
  struct user_defined_threads_ctx *ctx = jobs_context;
  
  if (ctx->next_thread_idx == ctx->oldest_thread_idx)
  {
    assert(ctx->num_threads_running == MAX_THREADS);
    // thread limit reached, join thread
    pthread_join(ctx->thread[ctx->oldest_thread_idx], NULL);
    ctx->oldest_thread_idx = (ctx->oldest_thread_idx + 1) % MAX_THREADS;
    ctx->num_threads_running--;
  }

  ctx->work[ctx->next_thread_idx].job = job;
  ctx->work[ctx->next_thread_idx].priv = work_priv;
  pthread_create (&ctx->thread[ctx->next_thread_idx], NULL, job_thread, &ctx->work[ctx->next_thread_idx]);
  ctx->next_thread_idx = (ctx->next_thread_idx + 1) % MAX_THREADS;
  ctx->num_threads_running++;
  return 0;
}

int wait_all_jobs_completion(void *jobs_context)
{
  struct user_defined_threads_ctx *ctx = jobs_context;

  int i, idx;
  for (i = 0; i < ctx->num_threads_running; i++) 
  {
    idx = (ctx->oldest_thread_idx + i) % MAX_THREADS;
    pthread_join(ctx->thread[idx], NULL);
  }

  // reset context for next round of parallel work
  ctx->num_threads_running = 0;
  ctx->oldest_thread_idx = -1;
  ctx->next_thread_idx = 0;

  return 0;
}

Another implementation strategy could be to prepare thread-pool before calling gcry_kdf_compute, dispatch jobs to thread-pool and at synchronization point wait thread-pool to become idle.

Maybe some of these ideas could be used in the new KDF API. Dispatching work by returning from gcry_kdf_compute looks better than giving &ops pointer as would avoid blocking main thread. Moving some of the complexity on threading details outside libgcrypt might be something to consider.

@jukivili , thank you for your comment.

Yes, currently, my major point is decoupling thread support from libgcrypt implementation.

Your idea looks better (than current one (iterator and dispatch of thread functions by caller) in 1.10-beta):

  • limiting number of threads is controlled by caller
    • thus, no worries in KDF computation in libgcrypt implementation
  • API is simpler (less exposure of symbols)

So, I'm going to adopt it for 1.10.0. Firstly, only an API, then implementation.

gniibe removed a project: Restricted Project.