Home GnuPG

poly1305: add AMD64/SSE2 optimized implementation
297532602ed2Unpublished

Unpublished Commit ยท Learn More

Not On Permanent Ref: This commit is not an ancestor of any permanent ref.

Description

poly1305: add AMD64/SSE2 optimized implementation

* cipher/Makefile.am: Add 'poly1305-sse2-amd64.S'.
* cipher/poly1305-internal.h (POLY1305_USE_SSE2)
(POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE)
(POLY1305_SSE2_ALIGNMENT): New.
(POLY1305_LARGEST_BLOCKSIZE, POLY1305_LARGEST_STATESIZE)
(POLY1305_STATE_ALIGNMENT): Use SSE2 versions when needed.
* cipher/poly1305-sse2-amd64.S: New.
* cipher/poly1305.c [POLY1305_USE_SSE2]
(_gcry_poly1305_amd64_sse2_init_ext)
(_gcry_poly1305_amd64_sse2_finish_ext)
(_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops): New.
(_gcry_polu1305_init) [POLY1305_USE_SSE2]: Use SSE2 version.
* configure.ac [host=x86_64]: Add 'poly1305-sse2-amd64.lo'.

Add Andrew Moon's public domain SSE2 implementation of Poly1305. Original
source is available at: https://github.com/floodyberry/poly1305-opt

Benchmarks on Intel i5-4570 (haswell):

Old:

|  nanosecs/byte   mebibytes/sec   cycles/byte

POLY1305 | 0.844 ns/B 1130.2 MiB/s 2.70 c/B

New:

|  nanosecs/byte   mebibytes/sec   cycles/byte

POLY1305 | 0.448 ns/B 2129.5 MiB/s 1.43 c/B

Benchmarks on Intel i5-2450M (sandy-bridge):

Old:

|  nanosecs/byte   mebibytes/sec   cycles/byte

POLY1305 | 1.25 ns/B 763.0 MiB/s 3.12 c/B

New:

|  nanosecs/byte   mebibytes/sec   cycles/byte

POLY1305 | 0.605 ns/B 1575.9 MiB/s 1.51 c/B

  • Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>

Details

Provenance
jukiviliAuthored on May 11 2014, 7:18 PM
Parents
rCe813958419b0: Add Poly1305 based cipher AEAD mode
Branches
Unknown
Tags
Unknown

Event Timeline

Jussi Kivilinna <jussi.kivilinna@iki.fi> committed rC297532602ed2: poly1305: add AMD64/SSE2 optimized implementation (authored by Jussi Kivilinna <jussi.kivilinna@iki.fi>).May 12 2014, 7:32 PM