Commit Graph

8 Commits (b1050636a6148a937c61d19d100f4f808bde04aa)

Author SHA1 Message Date
Jeffrey Walton b1050636a6
Add ChaCha NEON implementation 2018-10-25 12:08:32 -04:00
Jeffrey Walton babdf8b38b
Add XOP aware CHAM and LEA 2018-10-24 17:12:03 -04:00
Jeffrey Walton ed4d57cecb
Add XOP aware ChaCha
ChaCha is about 50% faster using XOP for the rotates on AMD machines
2018-10-24 16:15:13 -04:00
Jeffrey Walton b4c4c5aa14
Add SSSE3 rotates when available
This change obtains the remaining 0.1 to 0.15 cpb. It should be engaged with -march=native
2018-10-24 15:34:54 -04:00
Jeffrey Walton 18dcbdf514
Move input xor to ChaCha_OperateKeystream_SSE2
This picks up about 0.2 cpb in ChaCha::OperateKeystream. It may not sound like much but it puts SSE2 intrinsics version on par with the ASM version of Salsa20. Salsa20 leads ChaCha by 0.1 to 0.15 cpb, which equates to about 50 MB/s.
2018-10-24 11:00:35 -04:00
Jeffrey Walton d230999b40
Fix ChaCha compile on ARM and MIPS 2018-10-24 01:11:45 -04:00
Jeffrey Walton 6a5d2ab03d
Remove unneeded params from ChaCha_OperateKeystream_SSE2 2018-10-23 08:52:29 -04:00
Jeffrey Walton 916c4484a2
Add ChaCha SSE2 implementation
Thanks to Jack Lloyd and Botan for allowing us to use the implementation.
The numbers for SSE2 are very good. When compared with Salsa20 ASM the results are:
  * Salsa20 2.55 cpb; ChaCha/20 2.90 cpb
  * Salsa20/12 1.61 cpb; ChaCha/12 1.90 cpb
  * Salsa20/8 1.34 cpb; ChaCha/8 1.5 cpb
2018-10-23 07:57:59 -04:00