Home GnuPG

Add SM4 ARMv8/AArch64 assembly implementation

Description

Add SM4 ARMv8/AArch64 assembly implementation

* cipher/Makefile.am: Add 'sm4-aarch64.S'.
* cipher/sm4-aarch64.S: New.
* cipher/sm4.c (USE_AARCH64_SIMD): New.
(SM4_context) [USE_AARCH64_SIMD]: Add 'use_aarch64_simd'.
[USE_AARCH64_SIMD] (_gcry_sm4_aarch64_crypt)
(_gcry_sm4_aarch64_ctr_enc, _gcry_sm4_aarch64_cbc_dec)
(_gcry_sm4_aarch64_cfb_dec, _gcry_sm4_aarch64_crypt_blk1_8)
(sm4_aarch64_crypt_blk1_8): New.
(sm4_setkey): Enable ARMv8/AArch64 if supported by HW.
(_gcry_sm4_ctr_enc, _gcry_sm4_cbc_dec, _gcry_sm4_cfb_dec)
(_gcry_sm4_ocb_crypt, _gcry_sm4_ocb_auth) [USE_AARCH64_SIMD]:
Add ARMv8/AArch64 bulk functions.
* configure.ac: Add 'sm4-aarch64.lo'.

This patch adds ARMv8/AArch64 bulk encryption/decryption. Bulk
functions process eight blocks in parallel.

Benchmark on T-Head Yitian-710 2.75 GHz:

Before:
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 CBC enc |     12.10 ns/B     78.81 MiB/s     33.28 c/B      2750
 CBC dec |      7.19 ns/B     132.6 MiB/s     19.77 c/B      2750
 CFB enc |     12.14 ns/B     78.58 MiB/s     33.37 c/B      2750
 CFB dec |      7.24 ns/B     131.8 MiB/s     19.90 c/B      2750
 CTR enc |      7.24 ns/B     131.7 MiB/s     19.90 c/B      2750
 CTR dec |      7.24 ns/B     131.7 MiB/s     19.91 c/B      2750
 GCM enc |      9.49 ns/B     100.4 MiB/s     26.11 c/B      2750
 GCM dec |      9.49 ns/B     100.5 MiB/s     26.10 c/B      2750
GCM auth |      2.25 ns/B     423.1 MiB/s      6.20 c/B      2750
 OCB enc |      7.35 ns/B     129.8 MiB/s     20.20 c/B      2750
 OCB dec |      7.36 ns/B     129.6 MiB/s     20.23 c/B      2750
OCB auth |      7.29 ns/B     130.8 MiB/s     20.04 c/B      2749

After (~55% faster):
SM4 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 CBC enc |     12.10 ns/B     78.79 MiB/s     33.28 c/B      2750
 CBC dec |      4.63 ns/B     205.9 MiB/s     12.74 c/B      2749
 CFB enc |     12.14 ns/B     78.58 MiB/s     33.37 c/B      2750
 CFB dec |      4.64 ns/B     205.5 MiB/s     12.76 c/B      2750
 CTR enc |      4.69 ns/B     203.3 MiB/s     12.90 c/B      2750
 CTR dec |      4.69 ns/B     203.3 MiB/s     12.90 c/B      2750
 GCM enc |      4.88 ns/B     195.4 MiB/s     13.42 c/B      2750
 GCM dec |      4.88 ns/B     195.5 MiB/s     13.42 c/B      2750
GCM auth |     0.189 ns/B      5048 MiB/s     0.520 c/B      2750
 OCB enc |      4.86 ns/B     196.0 MiB/s     13.38 c/B      2750
 OCB dec |      4.90 ns/B     194.7 MiB/s     13.47 c/B      2750
OCB auth |      4.79 ns/B     199.0 MiB/s     13.18 c/B      2750
  • Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>

Details

Provenance
Tianjia Zhang <tianjia.zhang@linux.alibaba.com>Authored on Feb 23 2022, 5:23 AM
jukiviliCommitted on Feb 23 2022, 5:50 PM
Parents
rC83e1649edd5e: Move VPUSH_API/VPOP_API macros to common header
Branches
Unknown
Tags
Unknown