cast5: add ARMv6 assembly implementation
* cipher/Makefile.am: Add 'cast5-armv6.S'. * cipher/cast5-armv6.S: New file. * cipher/cast5.c (USE_ARMV6_ASM): New macro. (CAST5_context) [USE_ARMV6_ASM]: New members 'Kr_arm_enc' and 'Kr_arm_dec'. [USE_ARMV6_ASM] (_gcry_cast5_armv6_encrypt_block) (_gcry_cast5_armv6_decrypt_block, _gcry_cast5_armv6_ctr_enc) (_gcry_cast5_armv6_cbc_dec, _gcry_cast5_armv6_cfb_dec): New prototypes. [USE_ARMV6_ASM] (do_encrypt_block, do_decrypt_block, encrypt_block) (decrypt_block): New functions. (_gcry_cast5_ctr_enc) [USE_ARMV6_ASM]: Use ARMv6 assembly function. (_gcry_cast5_cbc_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function. (_gcry_cast5_cfb_dec) [USE_ARMV6_ASM]: Use ARMv6 assembly function. (do_cast_setkey) [USE_ARMV6_ASM]: Initialize 'Kr_arm_enc' and 'Kr_arm_dec'. * configure.ac (cast5) [arm]: Add 'cast5-armv6.lo'.
Provides non-parallel implementations for small speed-up and 2-way parallel
implementations that gets accelerated on multi-issue CPUs (hand-tuned for
in-order dual-issue Cortex-A8). Unaligned access handling is done in assembly.
For now, only enable this on little-endian systems as big-endian correctness
have not been tested yet.
Old vs new (Cortex-A8, Debian Wheezy/armhf):
ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- ---------------
CAST5 1.15x 1.12x 1.12x 2.07x 1.14x 1.60x 1.12x 1.13x 1.62x 1.63x
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>