Home GnuPG

Add SM4 ARMv8/AArch64/CE assembly implementation

Description

Add SM4 ARMv8/AArch64/CE assembly implementation

* cipher/Makefile.am: Add 'sm4-armv8-aarch64-ce.S'.
* cipher/sm4-armv8-aarch64-ce.S: New.
* cipher/sm4.c (USE_ARM_CE): New.
(SM4_context) [USE_ARM_CE]: Add 'use_arm_ce'.
[USE_ARM_CE] (_gcry_sm4_armv8_ce_expand_key)
(_gcry_sm4_armv8_ce_crypt, _gcry_sm4_armv8_ce_ctr_enc)
(_gcry_sm4_armv8_ce_cbc_dec, _gcry_sm4_armv8_ce_cfb_dec)
(_gcry_sm4_armv8_ce_crypt_blk1_8, sm4_armv8_ce_crypt_blk1_8): New.
(sm4_expand_key) [USE_ARM_CE]: Use ARMv8/AArch64/CE key setup.
(sm4_setkey): Enable ARMv8/AArch64/CE if supported by HW.
(sm4_encrypt) [USE_ARM_CE]: Use SM4 CE encryption.
(sm4_decrypt) [USE_ARM_CE]: Use SM4 CE decryption.
(_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec)
(_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_ARM_CE]: Add
ARMv8/AArch64/CE bulk functions.
* configure.ac: Add 'sm4-armv8-aarch64-ce.lo'.

This patch adds ARMv8/AArch64/CE bulk encryption/decryption. Bulk
functions process eight blocks in parallel.

Benchmark on T-Head Yitian-710 2.75 GHz:

Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 CBC enc |     12.10 ns/B     78.79 MiB/s     33.28 c/B      2750
 CBC dec |      4.63 ns/B     205.9 MiB/s     12.74 c/B      2749
 CFB enc |     12.14 ns/B     78.58 MiB/s     33.37 c/B      2750
 CFB dec |      4.64 ns/B     205.5 MiB/s     12.76 c/B      2750
 CTR enc |      4.69 ns/B     203.3 MiB/s     12.90 c/B      2750
 CTR dec |      4.69 ns/B     203.3 MiB/s     12.90 c/B      2750
 GCM enc |      4.88 ns/B     195.4 MiB/s     13.42 c/B      2750
 GCM dec |      4.88 ns/B     195.5 MiB/s     13.42 c/B      2750
GCM auth |     0.189 ns/B      5048 MiB/s     0.520 c/B      2750
 OCB enc |      4.86 ns/B     196.0 MiB/s     13.38 c/B      2750
 OCB dec |      4.90 ns/B     194.7 MiB/s     13.47 c/B      2750
OCB auth |      4.79 ns/B     199.0 MiB/s     13.18 c/B      2750

After (10x - 19x faster than ARMv8/AArch64 impl):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 CBC enc |      1.25 ns/B     762.7 MiB/s      3.44 c/B      2749
 CBC dec |     0.243 ns/B      3927 MiB/s     0.668 c/B      2750
 CFB enc |      1.25 ns/B     763.1 MiB/s      3.44 c/B      2750
 CFB dec |     0.245 ns/B      3899 MiB/s     0.673 c/B      2750
 CTR enc |     0.298 ns/B      3199 MiB/s     0.820 c/B      2750
 CTR dec |     0.298 ns/B      3198 MiB/s     0.820 c/B      2750
 GCM enc |     0.487 ns/B      1957 MiB/s      1.34 c/B      2749
 GCM dec |     0.487 ns/B      1959 MiB/s      1.34 c/B      2750
GCM auth |     0.189 ns/B      5048 MiB/s     0.519 c/B      2750
 OCB enc |     0.443 ns/B      2150 MiB/s      1.22 c/B      2749
 OCB dec |     0.486 ns/B      1964 MiB/s      1.34 c/B      2750
OCB auth |     0.369 ns/B      2585 MiB/s      1.01 c/B      2749
  • Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

Details

Provenance
Tianjia Zhang <tianjia.zhang@linux.alibaba.com>Authored on Mar 1 2022, 10:56 AM
jukiviliCommitted on Mar 2 2022, 7:45 PM
Parents
rC7d2983979866: hwf-arm: add ARMv8.2 optional crypto extension HW features
Branches
Unknown
Tags
Unknown

Event Timeline