Jeffrey Walton
46271660a1
Switch to uint64x2_t for SIMON-128
2017-12-04 05:47:34 -05:00
Jeffrey Walton
f0e49785f6
Fix incorrect SPECK-128 decrypt when blocks >= 6
...
Add defines for CRYPTOPP_SPECK64_ADVANCED_PROCESS_BLOCKS and CRYPTOPP_SPECK128_ADVANCED_PROCESS_BLOCKS
2017-12-03 09:00:39 -05:00
Jeffrey Walton
081afde0fd
Add SIMON-64 SSE intrinsics
...
Performance went from about 29 cpb (C++) to about 11.1 cpb (SSE)
2017-12-03 04:10:55 -05:00
Jeffrey Walton
25493ded49
Add AVX512VL rotate support
2017-12-01 09:39:05 -05:00
Jeffrey Walton
4792578f09
Rearrange statements and avoid intermediates
...
The folding of statements helps GCC elimate some of the intermediate stores it was performing. The elimination saved about 1.0 cpb. SIMON-128 is now running around 10 cpb, but it is still off the Simon and Speck team's numbers of 3.5 cpb
2017-12-01 04:11:31 -05:00
Jeffrey Walton
b7ced67892
Update comments
2017-12-01 02:38:19 -05:00
Jeffrey Walton
a7fec9c0f6
Fix assert in Debug builds
...
This was copy/paste from the template function
2017-11-30 11:54:21 -05:00
Jeffrey Walton
22257c4b6e
Remove SunCC const cast workaround
...
This code does not suffer SunCC losing const-ness
2017-11-29 12:56:19 -05:00
Jeffrey Walton
39594a53b0
Add fast rotate-by-8 for Aarch32 and Aarch64
2017-11-29 12:33:34 -05:00
Jeffrey Walton
532f13fe53
Fix compile using SunCC 12.4
2017-11-29 12:10:19 -05:00
Jeffrey Walton
16ebfa72bf
Cleanup comments and whitespace
2017-11-29 10:15:41 -05:00
Jeffrey Walton
6e829cebee
Use EPI8 Shuffle rather than Shifts and Or for rotate when R=8
...
Louis Wingers and Bryan Weeks from the Simon and Speck team offered the suggestion. The change save 0.7 cpb for Speck, and 5 cpb for Simon on x86_64.
Speck is now running very close to the Team's time sor SSE4. Simon is still off, but we know the root cause. For Simon, the Team used a fast bit-sliced implementation
2017-11-29 08:53:48 -05:00
Jeffrey Walton
a29b36c197
Whitespace check-in
2017-11-27 01:51:27 -05:00
Jeffrey Walton
568e608ea6
Add NEON and ASIMD intrinsics for SPECK-128 (GH #539 )
...
Performance increased by about 200% on a 980 MHz BananaPi dev-board. Throughput went from about 176.6 cpb to about 60.3 cpb.
2017-11-27 00:36:45 -05:00