twofish-amd64: do not use xchg instruction
* cipher/twofish-amd64.S (g1g2_3): Swap ab and cd registers using 'movq' instructions instead of 'xchgq'.
Avoiding xchg instruction improves three block parallel performance
by ~3% on Intel Haswell.
- Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>