In libgcrypt 1.7.8, for modular exponentiation, it simply uses _gcry_mpih_divrem.
We can use Barret modular reduction here.
We have _gcry_mpi_mod_barrett in mpi/mpi-mod.c, but it requires optimization.
- We can only do partial multiplication for q2 = q1 * y, because lower bits will be removed,
and ignoring part of computation can have an error at most 1.
- We can only do partial multiplication for q3 * m, because higher bits will be removed.