Tweak ARM inline assembly for mpi
mpi/longlong.h [arm]: Enable inline assembly if thumb2 is
defined.
[arm]: Use __ARCH_ARM when defined.
[arm] [__ARM_ARCH >= 5] (count_leading_zeros): New.
Current ARM Linux distributions use EABI that enables thumb2, and therefore
inline assembly is disable (because !defined(thumb) selector). However
thumb2 allows the use of assembly instructions that longlong.h contains for
ARM. So this patch enables inline assembly for ARM when thumb2 is defined
in addition to thumb.
Patch also adds optimization for count_leading_zeros() macro for ARM.
Results on Cortex-A8, 1Ghz:
Before:
Algorithm generate 100*sign 100*verify
RSA 1024 bit 750ms 2780ms 110ms
RSA 2048 bit 14280ms 17250ms 300ms
RSA 3072 bit 38630ms 51300ms 650ms
RSA 4096 bit 60940ms 111430ms 1000ms
jussi@cubie:~/libgcrypt$ tests/benchmark dsa
Algorithm generate 100*sign 100*verify
DSA 1024/160 - 1410ms 1680ms
DSA 2048/224 - 6100ms 7390ms
DSA 3072/256 - 14350ms 17120ms
jussi@cubie:~/libgcrypt$ tests/benchmark ecc
Algorithm generate 100*sign 100*verify
ECDSA 192 bit 90ms 2160ms 3940ms
ECDSA 224 bit 110ms 2810ms 5400ms
ECDSA 256 bit 150ms 3570ms 6970ms
ECDSA 384 bit 340ms 8320ms 16420ms
ECDSA 521 bit 850ms 19760ms 38480ms
After:
jussi@cubie:~/libgcrypt$ tests/benchmark rsa
Algorithm generate 100*sign 100*verify
RSA 1024 bit 590ms 2230ms 80ms
RSA 2048 bit 2320ms 13090ms 240ms
RSA 3072 bit 60580ms 38420ms 460ms
RSA 4096 bit 115130ms 82250ms 750ms
jussi@cubie:~/libgcrypt$ tests/benchmark dsa
Algorithm generate 100*sign 100*verify
DSA 1024/160 - 1070ms 1290ms
DSA 2048/224 - 4500ms 5550ms
DSA 3072/256 - 10280ms 12200ms
jussi@cubie:~/libgcrypt$ tests/benchmark ecc
Algorithm generate 100*sign 100*verify
ECDSA 192 bit 70ms 1900ms 3560ms
ECDSA 224 bit 100ms 2490ms 4750ms
ECDSA 256 bit 120ms 3140ms 5920ms
ECDSA 384 bit 270ms 6990ms 13790ms
ECDSA 521 bit 680ms 17080ms 33490ms
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>