Jeffrey Walton
923cf95571
ByteReverseArray → ReverseByteArrayLE
2017-09-18 18:40:19 -04:00
Jeffrey Walton
2c18fe8af8
Refactor LoadT() and StoreT(). Add separate ReverseT() for little endian machines
...
The refactoring has no effect on little endian machines. However, on big endian GCC119 using GCC 7.1 the performance improved by 2.5x for ECB and CTR modes:
BEFORE:
<TR><TH>AES/CTR (128-bit key)<TD>2723<TD>1.4<TD>0.163<TD>670
<TR><TH>AES/CTR (192-bit key)<TD>2560<TD>1.5<TD>0.175<TD>719
<TR><TH>AES/CTR (256-bit key)<TD>2728<TD>1.4<TD>0.183<TD>749
<TR><TH>AES/CBC (128-bit key)<TD>1204<TD>3.2<TD>0.135<TD>554
<TR><TH>AES/CBC (192-bit key)<TD>1066<TD>3.7<TD>0.148<TD>605
<TR><TH>AES/CBC (256-bit key)<TD>948<TD>4.1<TD>0.155<TD>635
<TR><TH>AES/OFB (128-bit key)<TD>1019<TD>3.8<TD>0.158<TD>648
<TR><TH>AES/CFB (128-bit key)<TD>949<TD>4.1<TD>0.192<TD>787
<TR><TH>AES/ECB (128-bit key)<TD>3564<TD>1.1<TD>0.082<TD>337
AFTER:
<TR><TH>AES/CTR (128-bit key)<TD>6484<TD>0.6<TD>0.163<TD>677
<TR><TH>AES/CTR (192-bit key)<TD>5641<TD>0.7<TD>0.176<TD>728
<TR><TH>AES/CTR (256-bit key)<TD>5005<TD>0.8<TD>0.183<TD>761
<TR><TH>AES/CBC (128-bit key)<TD>1223<TD>3.2<TD>0.135<TD>559
<TR><TH>AES/CBC (192-bit key)<TD>1080<TD>3.7<TD>0.147<TD>611
<TR><TH>AES/CBC (256-bit key)<TD>966<TD>4.1<TD>0.155<TD>642
<TR><TH>AES/OFB (128-bit key)<TD>1057<TD>3.7<TD>0.158<TD>656
<TR><TH>AES/CFB (128-bit key)<TD>1217<TD>3.3<TD>0.186<TD>774
<TR><TH>AES/ECB (128-bit key)<TD>7289<TD>0.5<TD>0.082<TD>342
2017-09-18 18:15:25 -04:00
Jeffrey Walton
f0c2324f6b
Fix armeabi and armv7-a for Android (GH #509 )
2017-09-17 20:07:53 -04:00
Jeffrey Walton
adea69ab68
Avoid increment during stores of 6x blocks
...
This provides another 0.1 cpb with GCC
2017-09-14 21:06:44 -04:00
Jeffrey Walton
25efb7a140
Use 6x blocks for ARMv8 AES rather than 4x
...
We gain 0.1 to 0.3 cpb, depending on the mode
2017-09-14 20:32:06 -04:00
Jeffrey Walton
58890ff053
Use 6x blocks for Power8 AES rather than 4x
...
Perforamnce increased for all modes when performing 6x vs 4x. 8x and 12x performed worse.
Here are the numbers:
4x Blocks:
<TR><TH>AES/CTR (128-bit key)<TD>1563<TD>2.1<TD>0.409<TD>1392
<TR><TH>AES/CTR (192-bit key)<TD>1403<TD>2.3<TD>0.450<TD>1529
<TR><TH>AES/CTR (256-bit key)<TD>1280<TD>2.5<TD>0.482<TD>1639
<TR><TH>AES/CBC (128-bit key)<TD>582<TD>5.6<TD>0.359<TD>1222
<TR><TH>AES/CBC (192-bit key)<TD>517<TD>6.3<TD>0.394<TD>1339
<TR><TH>AES/CBC (256-bit key)<TD>474<TD>6.8<TD>0.432<TD>1469
<TR><TH>AES/OFB (128-bit key)<TD>533<TD>6.1<TD>0.402<TD>1368
<TR><TH>AES/CFB (128-bit key)<TD>563<TD>5.8<TD>0.461<TD>1568
<TR><TH>AES/ECB (128-bit key)<TD>1829<TD>1.8<TD>0.240<TD>817
6x Blocks:
<TR><TH>AES/CTR (128-bit key)<TD>1750<TD>1.7<TD>0.406<TD>1300
<TR><TH>AES/CTR (192-bit key)<TD>1638<TD>1.9<TD>0.447<TD>1432
<TR><TH>AES/CTR (256-bit key)<TD>1528<TD>2.0<TD>0.482<TD>1541
<TR><TH>AES/CBC (128-bit key)<TD>582<TD>5.2<TD>0.358<TD>1145
<TR><TH>AES/CBC (192-bit key)<TD>517<TD>5.9<TD>0.394<TD>1260
<TR><TH>AES/CBC (256-bit key)<TD>474<TD>6.4<TD>0.431<TD>1379
<TR><TH>AES/OFB (128-bit key)<TD>533<TD>5.7<TD>0.400<TD>1281
<TR><TH>AES/CFB (128-bit key)<TD>563<TD>5.4<TD>0.461<TD>1476
<TR><TH>AES/ECB (128-bit key)<TD>1950<TD>1.6<TD>0.238<TD>763
2017-09-14 16:07:21 -04:00
Jeffrey Walton
08e4ee422e
Avoid increment during stores of 4x blocks
...
This provides another 0.1 cpb with GCC
2017-09-14 15:12:07 -04:00
Jeffrey Walton
ddeae859d0
Use vec_xl_be and vec_xst_be for IBM XL C/C++ compiler
2017-09-14 13:27:49 -04:00
Jeffrey Walton
63a0af4efa
Fix endianess for s_one on ARM big-endian
2017-09-13 22:52:29 -04:00
Jeffrey Walton
8e52ce6dd2
Load correct value fo 1 under ARM big endian
2017-09-13 21:42:15 -04:00
Jeffrey Walton
c22507e38b
Clear unused variable warnings under Clang
2017-09-13 21:37:55 -04:00
Jeffrey Walton
8d98417306
Add Aarch64 specific defines to Android cross-compile
...
Move <arm_acle.h> logic into "sonfig.h". Detecting when we can/should include <arm_acle.h> is proving to be troublesome
2017-09-13 17:16:57 -04:00
Jeffrey Walton
397ccd7e49
remove commented code for Power8
2017-09-13 03:59:25 -04:00
Jeffrey Walton
2b24f5b9fe
VectorLoadAligned → VectorLoadKey
...
Add comments for the Load and Store functions
2017-09-12 20:38:58 -04:00
Jeffrey Walton
5659acb704
Cleanup vector casts
2017-09-12 19:44:34 -04:00
Jeffrey Walton
6899d3f8bb
Add AdvancedProcessBlocks for Power8
...
This increases performance to about 1.6 cpb. We are about 0.5 cpb behind Botan, and about 1.0 cpb behind OpenSSL. However, it beats the snot out of C/C++, which runs at 20 to 30 cpb
2017-09-12 18:15:55 -04:00
Jeffrey Walton
b090e5f69f
Add Power8 AES decryption
2017-09-12 05:53:17 -04:00
Jeffrey Walton
7fb34e9b08
Add Power8 AES encryption
...
This is the forward direction on encryption only. Crypto++ uses the "Equivalent Inverse Cipher" (FIPS-197, Section 5.3.5, p.23), and it is not compatible with IBM hardware. The library library will need to re-work the decryption key scheduling routines. (We may be able to work around it another way, but I have not investigated it).
2017-09-11 22:52:22 -04:00
Jeffrey Walton
37e02f9e0e
Revert AltiVec and Power8 commits
...
The strategy of "cleanup under-aligned buffers" is not scaling well. Corner cases are still turing up. The library has some corner-case breaks, like old 32-bit Intels. And it still has not solved the AltiVec and Power8 alignment problems.
For now we are backing out the changes and investigating other strategies
2017-09-05 16:28:00 -04:00
Jeffrey Walton
75aef9bded
Fixup under-aligned buffers when using AES on AltiVec and Power8
...
This commit supports the upcoming AltiVec and Power8 processor. This commit affects a number of classes due to the ubiquitous use of AES. The commit provides the data alignment requirements.
2017-09-04 11:21:47 -04:00
Jeffrey Walton
5c6a32ba0f
Support Base Implementation + SIMD implementation on Solaris (PR #461 )
2017-08-24 19:17:21 -04:00
Jeffrey Walton
a19f0c663b
Update asserts
...
Change 'rounds' to size_t in Rijndael_AdvancedProcessBlocks_ARMV8
2017-08-19 01:55:20 -04:00
Jeffrey Walton
a1b3102eab
Update comments
2017-08-19 01:35:36 -04:00
Jeffrey Walton
51fe8a7776
Guard use of SIGILL probes on Apple platforms
2017-08-17 18:06:57 -04:00
Jeffrey Walton
e2c377effd
Split source files to support Base Implementation + SIMD implementation (GH #461 )
...
Split source files to support Base Implementation + SIMD implementation
2017-08-17 12:33:43 -04:00