Home GnuPG

cast5: add three rounds parallel handling to generic C implementation

Description

cast5: add three rounds parallel handling to generic C implementation

* cipher/cast5.c (do_encrypt_block_3, do_decrypt_block_3): New.
(_gcry_cast5_ctr_enc, _gcry_cast5_cbc_dec, _gcry_cast5_cfb_dec): Use
new three block functions.

Benchmark on aarch64 (cortex-a53, 816 Mhz):

Before:
CAST5 | nanosecs/byte mebibytes/sec cycles/byte

CBC dec |     35.24 ns/B     27.07 MiB/s     28.75 c/B
CFB dec |     34.62 ns/B     27.54 MiB/s     28.25 c/B
CTR enc |     35.39 ns/B     26.95 MiB/s     28.88 c/B

After (~40%-50% faster):
CAST5 | nanosecs/byte mebibytes/sec cycles/byte

CBC dec |     23.05 ns/B     41.38 MiB/s     18.81 c/B
CFB dec |     24.49 ns/B     38.94 MiB/s     19.98 c/B
CTR dec |     24.57 ns/B     38.82 MiB/s     20.05 c/B

Benchmark on i386 (haswell, 4000 Mhz):

Before:
CAST5 | nanosecs/byte mebibytes/sec cycles/byte

CBC dec |      6.92 ns/B     137.7 MiB/s     27.69 c/B
CFB dec |      6.83 ns/B     139.7 MiB/s     27.32 c/B
CTR enc |      7.01 ns/B     136.1 MiB/s     28.03 c/B

After (~70% faster):
CAST5 | nanosecs/byte mebibytes/sec cycles/byte

CBC dec |      3.97 ns/B     240.1 MiB/s     15.89 c/B
CFB dec |      3.96 ns/B     241.0 MiB/s     15.83 c/B
CTR enc |      4.01 ns/B     237.8 MiB/s     16.04 c/B
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Mar 31 2019, 5:26 PM
Parents
rC8a0e68be1020: cast5: read Kr four blocks at time and shift for current round
Branches
Unknown
Tags
Unknown