mpi: optimize mpi_rshift and mpi_lshift to avoid extra MPI copying
* mpi/mpi-bit.c (_gcry_mpi_rshift): Refactor so that _gcry_mpih_rshift is used to do the copying along with shifting when copying is needed and refactor so that same code-path is used for both in-place and copying operation. (_gcry_mpi_lshift): Refactor so that _gcry_mpih_lshift is used to do the copying along with shifting when copying is needed and refactor so that same code-path is used for both in-place and copying operation.
Benchmark on AMD Ryzen 9 7900X:
Before:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
rshift3 | 0.039 ns/B 24662 MiB/s 0.182 c/B 4700
lshift3 | 0.108 ns/B 8832 MiB/s 0.508 c/B 4700
rshift65 | 0.137 ns/B 6968 MiB/s 0.643 c/B 4700
lshift65 | 0.109 ns/B 8776 MiB/s 0.511 c/B 4700
After:
| nanosecs/byte mebibytes/sec cycles/byte auto Mhz
rshift3 | 0.038 ns/B 25049 MiB/s 0.179 c/B 4700
lshift3 | 0.039 ns/B 24709 MiB/s 0.181 c/B 4700
rshift65 | 0.038 ns/B 24942 MiB/s 0.180 c/B 4700
lshift65 | 0.040 ns/B 23671 MiB/s 0.189 c/B 4700
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>