I think I am doing to try to do this on top of the work of Szabolcs Nagy with the goal of making it portable, and also serving as a test cast to my carry-less multiplication intrinsic RFC. Hopefully I can also remove the manual register allocation that makes it still a derivitive work of Andy, however this algorithm takes advantage of the communicative properties of carry-less multiplication, which is mult(H) on page 5 of the gcm spec, this communicative property works differently than with addition and multiplication in a way I do not entirely understand.
OK, I upgraded the patch, including style adjustments to GNU style despite feeling that not having clang-format support for GNU style leads to it driving away contributors. It also credits Andy, and I have personally e-mailed Andy before.
I do want to do a wholy original implementation (and a portable one), but I am hoping this is sufficient (the performance certainly is) for now.
I do not think there is any point in contributing the original code when my code diviates from it quite a bit.
Can you please also update AUTHORS and LICENSE files? I would also ask you to add
* * SPDX-License-Identifier: (BSD-3-Clause OR GPL-2.0-only) */
As the last line of the header blurb. Here I assumed that the author meant GPL 2.0 only as used by Linux.