Jeffrey Walton
7bc621da62
Enable NEON/ASIMD for Simon and Speck on Aarch32/Aarch64 (GH #545 )
2017-12-05 14:02:48 -05:00
Jeffrey Walton
9b61d4143d
Add big- and little-endian rotates for Aarch32 and Aarch64
2017-12-05 12:32:26 -05:00
Jeffrey Walton
9faa504a24
Fix Aarch32 and Aarch64 rotates
2017-12-05 11:15:26 -05:00
Jeffrey Walton
c18793f862
Fix SIMON-64 missing transform
2017-12-05 09:14:58 -05:00
Jeffrey Walton
b208c8c1b4
Add 4 additional lanes to SPECK-64 for ARM
2017-12-05 07:16:34 -05:00
Jeffrey Walton
e09e6af1f8
Enable multi-block for SPECK-64 and SIMON-64
...
Also cleaned up SIMON-64 vector permute code. Thanks again to Peter Cordes
2017-12-05 04:19:44 -05:00
Jeffrey Walton
147ecba5df
Add temp working variable for SPECK64_AdvancedProcessBlocks_SSE41
...
Avoid potential undefined behavior by using aligned words
2017-12-04 14:52:36 -05:00
Jeffrey Walton
076937eb81
Update comments for vector permutes in SPECK-128
2017-12-04 12:31:32 -05:00
Jeffrey Walton
25709d2597
Fix SPECK64 vector permutes
...
Thanks to Peter Cordes for the suggestion on handling the case
2017-12-04 09:47:26 -05:00
Jeffrey Walton
46271660a1
Switch to uint64x2_t for SIMON-128
2017-12-04 05:47:34 -05:00
Jeffrey Walton
e9714b40d2
Switch to _mm_unpacklo_epi32 and _mm_unpackhi_epi32
...
The manual _mm_extract_epi32 and _mm_insert_epi32 are required during setup, be we can use SSE on teardown
2017-12-04 05:01:27 -05:00
Jeffrey Walton
cd31fa29dc
Switch to uint64x2_t for SPECK-128
2017-12-04 03:38:39 -05:00
Jeffrey Walton
1de143203e
Add SPECK-64 NEON intrinsics
2017-12-03 18:47:39 -05:00
Jeffrey Walton
f0e49785f6
Fix incorrect SPECK-128 decrypt when blocks >= 6
...
Add defines for CRYPTOPP_SPECK64_ADVANCED_PROCESS_BLOCKS and CRYPTOPP_SPECK128_ADVANCED_PROCESS_BLOCKS
2017-12-03 09:00:39 -05:00
Jeffrey Walton
6bb1f1d9c4
Add SPECK-64 SSE intrinsics
...
Performance went from about 11.9 cpb (C++) to about 4.5 cpb (SSE)
2017-12-03 02:28:40 -05:00
Jeffrey Walton
25493ded49
Add AVX512VL rotate support
2017-12-01 09:39:05 -05:00
Jeffrey Walton
a7fec9c0f6
Fix assert in Debug builds
...
This was copy/paste from the template function
2017-11-30 11:54:21 -05:00
Jeffrey Walton
22257c4b6e
Remove SunCC const cast workaround
...
This code does not suffer SunCC losing const-ness
2017-11-29 12:56:19 -05:00
Jeffrey Walton
39594a53b0
Add fast rotate-by-8 for Aarch32 and Aarch64
2017-11-29 12:33:34 -05:00
Jeffrey Walton
532f13fe53
Fix compile using SunCC 12.4
2017-11-29 12:10:19 -05:00
Jeffrey Walton
16ebfa72bf
Cleanup comments and whitespace
2017-11-29 10:15:41 -05:00
Jeffrey Walton
6e829cebee
Use EPI8 Shuffle rather than Shifts and Or for rotate when R=8
...
Louis Wingers and Bryan Weeks from the Simon and Speck team offered the suggestion. The change save 0.7 cpb for Speck, and 5 cpb for Simon on x86_64.
Speck is now running very close to the Team's time sor SSE4. Simon is still off, but we know the root cause. For Simon, the Team used a fast bit-sliced implementation
2017-11-29 08:53:48 -05:00
Jeffrey Walton
07c2047cec
Add simon-simd.cpp to file list and nmake file
2017-11-27 01:20:15 -05:00
Jeffrey Walton
4f2d6f713f
Switch to rotlConstant and rotrConstant
...
Update comments
2017-11-24 17:54:12 -05:00
Jeffrey Walton
2e63e46747
Fix Speck compile error with iOS Watch
2017-11-23 09:45:53 -05:00
Jeffrey Walton
304809a65d
Add NEON and ASIMD intrinsics for SPECK-128 (GH #538 )
...
Performance increased by about 115% on a 980 MHz BananaPi dev-board. Throughput went from about 46.2 cpb to about 21.5 cpb.
2017-11-23 02:47:44 -05:00
Jeffrey Walton
f5784c1634
Update comments
2017-11-22 17:35:59 -05:00
Jeffrey Walton
f2bc3cd0ca
Add speck-simd.cpp to project files (GH #538 , #539 )
...
Cleaned up whitespace
2017-11-22 08:45:38 -05:00
Jeffrey Walton
e7fee716d6
Add SSSE3 intrinsics for SPECK-128 (GH #538 )
...
Performance increased by about 100% on a 3.1 GHz Core i5 Skylake. Throughput went from about 7.3 cpb to about 3.5 cpb. Not bad for a software-based implementation of a block cipher
2017-11-22 08:01:41 -05:00