blowfish: add amd64 assembly implementation
* cipher/Makefile.am: Add 'blowfish-amd64.S'. * cipher/blowfish-amd64.S: New file. * cipher/blowfish.c (USE_AMD64_ASM): New macro. [USE_AMD64_ASM] (_gcry_blowfish_amd64_do_encrypt) (_gcry_blowfish_amd64_encrypt_block) (_gcry_blowfish_amd64_decrypt_block, _gcry_blowfish_amd64_ctr_enc) (_gcry_blowfish_amd64_cbc_dec, _gcry_blowfish_amd64_cfb_dec): New prototypes. [USE_AMD64_ASM] (do_encrypt, do_encrypt_block, do_decrypt_block) (encrypt_block, decrypt_block): New functions. (_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec) (_gcry_blowfish_cfb_dec, selftest_ctr, selftest_cbc, selftest_cfb): New functions. (selftest): Call new bulk selftests. * cipher/cipher.c (gcry_cipher_open) [USE_BLOWFISH]: Register Blowfish bulk functions for ctr-enc, cbc-dec and cfb-dec. * configure.ac (blowfish) [x86_64]: Add 'blowfish-amd64.lo'. * src/cipher.h (_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec) (gcry_blowfish_cfb_dec): New prototypes.
Add non-parallel functions for small speed-up and 4-way parallel functions for
modes of operation that support parallel processing.
Speed old vs. new on AMD Phenom II X6 1055T:
ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- ---------------
BLOWFISH 1.21x 1.12x 1.17x 3.52x 1.18x 3.34x 1.16x 1.15x 3.38x 3.47x
Speed old vs. new on Intel Core i5-2450M (Sandy-Bridge):
ECB/Stream CBC CFB OFB CTR --------------- --------------- --------------- --------------- ---------------
BLOWFISH 1.16x 1.10x 1.17x 2.98x 1.18x 2.88x 1.16x 1.15x 3.00x 3.02x
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>