Commit Graph

20 Commits (776a2195bd78c80130b1809b22a5e4d3aecb5b95)

Author SHA1 Message Date
Jeffrey Walton f839e5093c
Enable SSE2 intrinsics for SunCC 2018-11-09 20:35:27 -05:00
Jeffrey Walton a7615a8c7c
Add packed 32-bit Shuffle specializations for ChaCha on Power8 2018-10-28 00:48:18 -04:00
Jeffrey Walton 542140621a
Update comments 2018-10-27 14:01:25 -04:00
Jeffrey Walton d7d76fa5f7
Add ChaCha Power8 implementation 2018-10-27 08:40:07 -04:00
Jeffrey Walton ca97f6fafb
Add addition helper for Aarch32 and Aarch64
Update comments
2018-10-26 13:42:09 -04:00
Jeffrey Walton c0b273dac8
Remove xorInput parameter from ChaCha SIMD functions
We can use the input pointer directly after checking KeystreamOperation
2018-10-26 10:10:52 -04:00
Jeffrey Walton 61a696f710
Update comments 2018-10-26 04:26:18 -04:00
Jeffrey Walton 76ab8ffa4b
Update comments 2018-10-26 03:12:46 -04:00
Jeffrey Walton c992fe98a9
Fix failed compile on Ubuntu with -msse2
Also see https://github.com/noloader/cryptopp-cmake/issues/36
2018-10-26 02:43:35 -04:00
Jeffrey Walton 99c65bdb35
Rename ARM Shuffle() to Extract()
Extract() is the equivalent to SSE's _mm_shuffle_epi32(), but ARM naming calls it vector extract
2018-10-26 00:44:10 -04:00
Jeffrey Walton d3a3189ba3
Sync CRYPTOPP_ARM_ACLE_AVAILABLE with Autotools 2018-10-25 14:08:09 -04:00
Jeffrey Walton b4b3623938
Whitespace check-in 2018-10-25 12:15:33 -04:00
Jeffrey Walton b1050636a6
Add ChaCha NEON implementation 2018-10-25 12:08:32 -04:00
Jeffrey Walton babdf8b38b
Add XOP aware CHAM and LEA 2018-10-24 17:12:03 -04:00
Jeffrey Walton ed4d57cecb
Add XOP aware ChaCha
ChaCha is about 50% faster using XOP for the rotates on AMD machines
2018-10-24 16:15:13 -04:00
Jeffrey Walton b4c4c5aa14
Add SSSE3 rotates when available
This change obtains the remaining 0.1 to 0.15 cpb. It should be engaged with -march=native
2018-10-24 15:34:54 -04:00
Jeffrey Walton 18dcbdf514
Move input xor to ChaCha_OperateKeystream_SSE2
This picks up about 0.2 cpb in ChaCha::OperateKeystream. It may not sound like much but it puts SSE2 intrinsics version on par with the ASM version of Salsa20. Salsa20 leads ChaCha by 0.1 to 0.15 cpb, which equates to about 50 MB/s.
2018-10-24 11:00:35 -04:00
Jeffrey Walton d230999b40
Fix ChaCha compile on ARM and MIPS 2018-10-24 01:11:45 -04:00
Jeffrey Walton 6a5d2ab03d
Remove unneeded params from ChaCha_OperateKeystream_SSE2 2018-10-23 08:52:29 -04:00
Jeffrey Walton 916c4484a2
Add ChaCha SSE2 implementation
Thanks to Jack Lloyd and Botan for allowing us to use the implementation.
The numbers for SSE2 are very good. When compared with Salsa20 ASM the results are:
  * Salsa20 2.55 cpb; ChaCha/20 2.90 cpb
  * Salsa20/12 1.61 cpb; ChaCha/12 1.90 cpb
  * Salsa20/8 1.34 cpb; ChaCha/8 1.5 cpb
2018-10-23 07:57:59 -04:00