Improved ripemd160 performance
* cipher/rmd160.c (transform): Interleave the left and right lane rounds to introduce more instruction level parallelism.
The benchmarks on different systems:
Intel(R) Atom(TM) CPU N570 @ 1.66GHz
before:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 13.07 ns/B 72.97 MiB/s - c/B
after:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 11.37 ns/B 83.84 MiB/s - c/B
Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz
before:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 3.31 ns/B 288.0 MiB/s - c/B
after:
Hash:
| nanosecs/byte mebibytes/sec cycles/byte
RIPEMD160 | 2.08 ns/B 458.5 MiB/s - c/B
- Signed-off-by: Andrei Scherer <andsch@inbox.com>