Home GnuPG

blowfish: add three rounds parallel handling to generic C implementation

Description

blowfish: add three rounds parallel handling to generic C implementation

* cipher/blowfish.c (BLOWFISH_ROUNDS): Remove.
[BLOWFISH_ROUNDS != 16] (function_F): Remove.
(F): Replace big-endian and little-endian version with single
endian-neutral version.
(R3, do_encrypt_3, do_decrypt_3): New.
(_gcry_blowfish_ctr_enc, _gcry_blowfish_cbc_dec)
(_gcry_blowfish_cfb_dec): Use new three block functions.

Benchmark on aarch64 (cortex-a53, 816 Mhz):

Before:
BLOWFISH | nanosecs/byte mebibytes/sec cycles/byte

CBC dec |     29.58 ns/B     32.24 MiB/s     24.13 c/B
CFB dec |     33.38 ns/B     28.57 MiB/s     27.24 c/B
CTR enc |     34.18 ns/B     27.90 MiB/s     27.89 c/B

After (~60%-70% faster):
BLOWFISH | nanosecs/byte mebibytes/sec cycles/byte

CBC dec |     18.18 ns/B     52.45 MiB/s     14.84 c/B
CFB dec |     19.67 ns/B     48.50 MiB/s     16.05 c/B
CTR enc |     19.77 ns/B     48.25 MiB/s     16.13 c/B

Benchmark on i386 (haswell, 4000 Mhz):

Before:
BLOWFISH | nanosecs/byte mebibytes/sec cycles/byte

CBC dec |      6.10 ns/B     156.4 MiB/s     24.39 c/B
CFB dec |      6.39 ns/B     149.2 MiB/s     25.56 c/B
CTR enc |      6.73 ns/B     141.6 MiB/s     26.93 c/B

After (~80% faster):
BLOWFISH | nanosecs/byte mebibytes/sec cycles/byte

CBC dec |      3.46 ns/B     275.5 MiB/s     13.85 c/B
CFB dec |      3.53 ns/B     270.4 MiB/s     14.11 c/B
CTR enc |      3.56 ns/B     268.0 MiB/s     14.23 c/B
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Mar 31 2019, 5:30 PM
Parents
rC4ec566b3689e: cast5: add three rounds parallel handling to generic C implementation
Branches
Unknown
Tags
Unknown