Tested patch with small change so that HWF_PPC_ARCH_3_00 is used instead of HWF_PPC_ARCH_3_10. Building bench-slope with "-O3 -flto" makes bug in new implementation visible. Without new implementations bench-slope is ok (testing with QEMU):
$ tests/bench-slope --disable-hwf ppc-arch_3_00 cipher chacha20 Cipher: CHACHA20 | nanosecs/byte mebibytes/sec cycles/byte STREAM enc | 2.35 ns/B 405.0 MiB/s - c/B STREAM dec | 2.32 ns/B 410.7 MiB/s - c/B POLY1305 enc | 2.46 ns/B 388.0 MiB/s - c/B POLY1305 dec | 2.34 ns/B 408.1 MiB/s - c/B POLY1305 auth | 0.238 ns/B 4003 MiB/s - c/B