- g10/Makefile.am (tofu_source) [USE_TOFU]: Remove sqrtu32.h and sqrtu32.c.
- g10/sqrtu32.h, g10/sqrtu32.c: Removed files.
- g10/tofu.c: Compare squares instead of square roots. --
I am not sure you measured the performance of the original code,but I'd think that even using floats and real square root are faster than all thosebranch points.tofu: Compare squares instead of square roots.
I made a quick test: The original code is a factor 11.5 slower than using libm's sqrt(), which in turn is a factor 3.5 slower than using one multiplication on the other side of the comparison. So, all in all this patch microoptimized by a factor of 40, but more importantly, it simplifies the code by a lot.