Home GnuPG

Add ARM NEON assembly implementation of Salsa20
3ff9d2571c18Unpublished

Unpublished Commit ยท Learn More

Not On Permanent Ref: This commit is not an ancestor of any permanent ref.

Description

Add ARM NEON assembly implementation of Salsa20

* cipher/Makefile.am: Add 'salsa20-armv7-neon.S'.
* cipher/salsa20-armv7-neon.S: New.
* cipher/salsa20.c [USE_ARM_NEON_ASM]: New macro.
(struct SALSA20_context_s, salsa20_core_t, salsa20_keysetup_t)
(salsa20_ivsetup_t): New.
(SALSA20_context_t) [USE_ARM_NEON_ASM]: Add 'use_neon'.
(SALSA20_context_t): Add 'keysetup', 'ivsetup' and 'core'.
(salsa20_core): Change 'src' argument to 'ctx'.
[USE_ARM_NEON_ASM] (_gcry_arm_neon_salsa20_encrypt): New prototype.
[USE_ARM_NEON_ASM] (salsa20_core_neon, salsa20_keysetup_neon)
(salsa20_ivsetup_neon): New.
(salsa20_do_setkey): Setup keysetup, ivsetup and core with default
functions.
(salsa20_do_setkey) [USE_ARM_NEON_ASM]: When NEON support detect,
set keysetup, ivsetup and core with ARM NEON functions.
(salsa20_do_setkey): Call 'ctx->keysetup'.
(salsa20_setiv): Call 'ctx->ivsetup'.
(salsa20_do_encrypt_stream) [USE_ARM_NEON_ASM]: Process large buffers
in ARM NEON implementation.
(salsa20_do_encrypt_stream): Call 'ctx->core' instead of directly
calling 'salsa20_core'.
(selftest): Add test to check large buffer processing and block counter
updating.
* configure.ac [neonsupport]: 'Add salsa20-armv7-neon.lo'.

Patch adds fast ARM NEON assembly implementation for Salsa20. Implementation
gains extra speed by processing three blocks in parallel with help of ARM
NEON vector processing unit.

This implementation is based on public domain code by Peter Schwabe and D. J.
Bernstein and it is available in SUPERCOP benchmarking framework. For more
details on this work, check paper "NEON crypto" by Daniel J. Bernstein and
Peter Schwabe:

http://cryptojedi.org/papers/#neoncrypto

Benchmark results on Cortex-A8 (1008 Mhz):

Before:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte

STREAM enc |     18.88 ns/B     50.51 MiB/s     19.03 c/B
STREAM dec |     18.89 ns/B     50.49 MiB/s     19.04 c/B
           =

SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte

STREAM enc |     13.60 ns/B     70.14 MiB/s     13.71 c/B
STREAM dec |     13.60 ns/B     70.13 MiB/s     13.71 c/B

After:
SALSA20 | nanosecs/byte mebibytes/sec cycles/byte

STREAM enc |      5.48 ns/B     174.1 MiB/s      5.52 c/B
STREAM dec |      5.47 ns/B     174.2 MiB/s      5.52 c/B
           =

SALSA20R12 | nanosecs/byte mebibytes/sec cycles/byte

STREAM enc |      3.65 ns/B     260.9 MiB/s      3.68 c/B
STREAM dec |      3.65 ns/B     261.6 MiB/s      3.67 c/B
  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on Oct 26 2013, 2:00 PM
Parents
rC5a3d43485efd: Add AMD64 assembly implementation of Salsa20
Branches
Unknown
Tags
Unknown

Event Timeline

Jussi Kivilinna <jussi.kivilinna@iki.fi> committed rC3ff9d2571c18: Add ARM NEON assembly implementation of Salsa20 (authored by Jussi Kivilinna <jussi.kivilinna@iki.fi>).Oct 28 2013, 3:12 PM