New Poly1305 implementations
* cipher/Makefile.am: Include '../mpi' for 'longlong.h'; Remove 'poly1305-sse2-amd64.S', 'poly1305-avx2-amd64.S' and 'poly1305-armv7-neon.S'. * cipher/poly1305-armv7-neon.S: Remove. * cipher/poly1305-avx2-amd64.S: Remove. * cipher/poly1305-sse2-amd64.S: Remove. * cipher/poly1305-internal.h (POLY1305_BLOCKSIZE) (POLY1305_STATE): New. (POLY1305_SYSV_FUNC_ABI, POLY1305_REF_BLOCKSIZE) (POLY1305_REF_STATESIZE, POLY1305_REF_ALIGNMENT) (POLY1305_USE_SSE2, POLY1305_SSE2_BLOCKSIZE, POLY1305_SSE2_STATESIZE) (POLY1305_SSE2_ALIGNMENT, POLY1305_USE_AVX2, POLY1305_AVX2_BLOCKSIZE) (POLY1305_AVX2_STATESIZE, POLY1305_AVX2_ALIGNMENT) (POLY1305_USE_NEON, POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE) (POLY1305_NEON_ALIGNMENT, POLY1305_LARGEST_BLOCKSIZE) (POLY1305_LARGEST_STATESIZE, POLY1305_LARGEST_ALIGNMENT) (POLY1305_STATE_BLOCKSIZE, POLY1305_STATE_STATESIZE) (POLY1305_STATE_ALIGNMENT, OPS_FUNC_ABI, poly1305_key_s) (poly1305_ops_s): Remove. (poly1305_context_s): Rewrite. * cipher/poly1305.c (_gcry_poly1305_amd64_sse2_init_ext) (_gcry_poly1305_amd64_sse2_finish_ext) (_gcry_poly1305_amd64_sse2_blocks, poly1305_amd64_sse2_ops) (poly1305_init_ext_ref32, poly1305_blocks_ref32) (poly1305_finish_ext_ref32, poly1305_default_ops) (_gcry_poly1305_amd64_avx2_init_ext) (_gcry_poly1305_amd64_avx2_finish_ext) (_gcry_poly1305_amd64_avx2_blocks) (poly1305_amd64_avx2_ops, poly1305_get_state): Remove. (poly1305_init): Rewrite. (USE_MPI_64BIT, USE_MPI_32BIT): New. [USE_MPI_64BIT] (ADD_1305_64, MUL_MOD_1305_64, poly1305_blocks) (poly1305_final): New implementation using 64-bit limbs. [USE_MPI_32BIT] (UMUL_ADD_32, ADD_1305_32, MUL_MOD_1305_32) (poly1305_blocks): New implementation using 32-bit limbs. (_gcry_poly1305_update, _gcry_poly1305_finish) (_gcry_poly1305_init): Adapt to new implementation. * configure.ac: Remove 'poly1305-sse2-amd64.lo', 'poly1305-avx2-amd64.lo' and 'poly1305-armv7-neon.lo'.
Intel Core i7-4790K CPU @ 4.00GHz (x86_64):
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.284 ns/B 3358.6 MiB/s 1.14 c/B
Intel Core i7-4790K CPU @ 4.00GHz (i386):
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 0.888 ns/B 1073.9 MiB/s 3.55 c/B
Cortex-A53 @ 1152Mhz (armv7):
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 4.40 ns/B 216.7 MiB/s 5.07 c/B
Cortex-A53 @ 1152Mhz (aarch64):
| nanosecs/byte mebibytes/sec cycles/byte
POLY1305 | 2.60 ns/B 367.0 MiB/s 2.99 c/B
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>