mpi: optimize mpi_rshift and mpi_lshift to avoid extra MPI copying
* mpi/mpi-bit.c (_gcry_mpi_rshift): Refactor so that _gcry_mpih_rshift is used to do the copying along with shifting when copying is needed and refactor so that same code-path is used for both in-place and copying operation. (_gcry_mpi_lshift): Refactor so that _gcry_mpih_lshift is used to do the copying along with shifting when copying is needed and refactor so that same code-path is used for both in-place and copying operation.
Benchmark on AMD Ryzen 9 7900X:
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
rshift3            |     0.039 ns/B     24662 MiB/s     0.182 c/B      4700
 lshift3            |     0.108 ns/B      8832 MiB/s     0.508 c/B      4700
 rshift65           |     0.137 ns/B      6968 MiB/s     0.643 c/B      4700
 lshift65           |     0.109 ns/B      8776 MiB/s     0.511 c/B      4700
After:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
rshift3            |     0.038 ns/B     25049 MiB/s     0.179 c/B      4700
 lshift3            |     0.039 ns/B     24709 MiB/s     0.181 c/B      4700
 rshift65           |     0.038 ns/B     24942 MiB/s     0.180 c/B      4700
 lshift65           |     0.040 ns/B     23671 MiB/s     0.189 c/B      4700
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>