AVX2 size optimization
4c0e244fc53e
Actions

Description

Camellia AES-NI/AVX/AVX2 size optimization

* cipher/camellia-aesni-avx-amd64.S: Use loop for handling repeating
'(enc|dec)_rounds16/fls16' portions of encryption/decryption.
* cipher/camellia-aesni-avx2-amd64.S: Use loop for handling repeating
'(enc|dec)_rounds32/fls32' portions of encryption/decryption.

Use round+fls loop to reduce binary size of Camellia AES-NI/AVX/AVX2
implementations. This also gives small performance boost on AMD Zen2.

Before:

 text    data     bss     dec     hex filename
63877       0       0   63877    f985 cipher/.libs/camellia-aesni-avx2-amd64.o
59623       0       0   59623    e8e7 cipher/.libs/camellia-aesni-avx-amd64.o

After:

 text    data     bss     dec     hex filename
22999       0       0   22999    59d7 cipher/.libs/camellia-aesni-avx2-amd64.o
25047       0       0   25047    61d7 cipher/.libs/camellia-aesni-avx-amd64.o

Benchmark on AMD Ryzen 7 3700X:

Before:
Cipher:
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 CBC dec |     0.670 ns/B      1424 MiB/s      2.88 c/B      4300
 CFB dec |     0.667 ns/B      1430 MiB/s      2.87 c/B      4300
 CTR enc |     0.677 ns/B      1410 MiB/s      2.91 c/B      4300
 CTR dec |     0.676 ns/B      1412 MiB/s      2.90 c/B      4300
 OCB enc |     0.696 ns/B      1370 MiB/s      2.98 c/B      4275
 OCB dec |     0.698 ns/B      1367 MiB/s      2.98 c/B      4275
OCB auth |     0.683 ns/B      1395 MiB/s      2.94 c/B      4300

After (~8% faster):
CAMELLIA128 | nanosecs/byte mebibytes/sec cycles/byte auto Mhz

 CBC dec |     0.611 ns/B      1561 MiB/s      2.64 c/B      4313
 CFB dec |     0.616 ns/B      1549 MiB/s      2.65 c/B      4312
 CTR enc |     0.625 ns/B      1525 MiB/s      2.69 c/B      4300
 CTR dec |     0.625 ns/B      1526 MiB/s      2.69 c/B      4299
 OCB enc |     0.639 ns/B      1493 MiB/s      2.75 c/B      4307
 OCB dec |     0.642 ns/B      1485 MiB/s      2.76 c/B      4301
OCB auth |     0.631 ns/B      1512 MiB/s      2.71 c/B      4300