Commit Graph

45 Commits (1d0df34ae8304fa964cb7702e4f4476bbf6e9e7c)

Author SHA1 Message Date
Jeffrey Walton ea3c80c949
Move Rijndael_AdvancedProcessBlocks_ARMV8 into anonymous namespace 2017-09-23 05:28:59 -04:00
Jeffrey Walton 26597059d9
Move to anonymous namespaces in rijndael-simd.cpp 2017-09-23 02:13:16 -04:00
Jeffrey Walton 12953fd0e4
Add IncrementPointerAndStore
This speeds up XL C/C++ by 0.1 to 0.2 cpb
2017-09-22 20:35:18 -04:00
Jeffrey Walton 1057f89363
Move Power8 crypto functions into ppc-crypto.h 2017-09-22 05:23:29 -04:00
Jeffrey Walton 3e55817819
Add C++ templates for additional Vector ops
Removed lower-level C-like functions such as Store8x16 and Store64x2
2017-09-22 04:15:33 -04:00
Jeffrey Walton 441e944a66
Switch to vec_vsx_ld, remove unaligned loads
Partially unroll loop Rijndael_UncheckedSetKey_POWER8 loop. It saves about another 60 cycles
2017-09-22 02:53:08 -04:00
Jeffrey Walton d9592a303c
Updated comments 2017-09-21 21:45:23 -04:00
Jeffrey Walton dabad4b409
Cleanup asserts and casts 2017-09-21 20:55:35 -04:00
Jeffrey Walton 1edea5a80f
Vectorize tail of Rijndael_UncheckedSetKey_POWER8 2017-09-21 20:02:40 -04:00
Jeffrey Walton e43c0eee74
Fold ConditionalByteReverse for non-Power8 paths 2017-09-21 19:17:42 -04:00
Jeffrey Walton f763bf3da6
Updated comments 2017-09-21 12:08:54 -04:00
Jeffrey Walton e78464a1af
Enable little endian Rijndael_UncheckedSetKey_POWER8 using built-ins
The problem was vec_sld is endian sensitive. The built-in required more than us setting up arguments to ensure the vsx load resulted in a big endian value. Thanks to Paul R on Stack Overflow for sharing the information that IBM did not provide. Also see http://stackoverflow.com/q/46341923/608639
2017-09-21 09:56:37 -04:00
Jeffrey Walton c6b096ddd4
Move Rijndael_UncheckedSetKey_POWER8 prior to GetUserKey call
Arg... GetUserKey was performing a 32-bit word reverse. It was part of the problem on little endian machines
2017-09-21 01:08:44 -04:00
Jeffrey Walton 9fd5d023f9
Load r5 mask once for key expansion 2017-09-20 20:27:58 -04:00
Jeffrey Walton 35c0fa82fd
Use <time.h> for Borland/Embarcadero (GH #512) 2017-09-20 18:10:07 -04:00
Jeffrey Walton c5a427d690
Add PowerPC VectorLoadKeyUnaligned for AES-192
Make internal functions static. We get better optimizations depsice using unnamed namespaces
Add PowerPC uint32x4 functions for handling 32-bit rcon and mask
2017-09-20 08:57:53 -04:00
Jeffrey Walton c94d076aa1 Move r1 write to caller; remove from Rijndael_Subkey_POWER8
Signed-off-by: Jeffrey Walton <noloader@gmail.com>
2017-09-20 04:38:53 -04:00
Jeffrey Walton 5159d0803d
Add Power8 key expansion for big endian
This is AES-128 key expansion for big endian. Little endian has a bug in it so it can't be enabled at the moment. GDB is acting up on GCC112, so I've had trouble investigating it
2017-09-20 03:34:54 -04:00
Jeffrey Walton 6102333fc3
Add CRYPTOPP_NO_CPU_FEATURE_PROBES (GH #511)
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.
2017-09-19 21:08:37 -04:00
Jeffrey Walton 6440921723
Add Rijndael_UncheckedSetKey_POWER8
We are going to attempt to perform key setup using Power8 in-core vector instructions
2017-09-19 04:55:15 -04:00
Jeffrey Walton 923cf95571
ByteReverseArray → ReverseByteArrayLE 2017-09-18 18:40:19 -04:00
Jeffrey Walton 2c18fe8af8
Refactor LoadT() and StoreT(). Add separate ReverseT() for little endian machines
The refactoring has no effect on little endian machines. However, on big endian GCC119 using GCC 7.1 the performance improved by 2.5x for ECB and CTR modes:

BEFORE:

<TR><TH>AES/CTR (128-bit key)<TD>2723<TD>1.4<TD>0.163<TD>670
<TR><TH>AES/CTR (192-bit key)<TD>2560<TD>1.5<TD>0.175<TD>719
<TR><TH>AES/CTR (256-bit key)<TD>2728<TD>1.4<TD>0.183<TD>749
<TR><TH>AES/CBC (128-bit key)<TD>1204<TD>3.2<TD>0.135<TD>554
<TR><TH>AES/CBC (192-bit key)<TD>1066<TD>3.7<TD>0.148<TD>605
<TR><TH>AES/CBC (256-bit key)<TD>948<TD>4.1<TD>0.155<TD>635
<TR><TH>AES/OFB (128-bit key)<TD>1019<TD>3.8<TD>0.158<TD>648
<TR><TH>AES/CFB (128-bit key)<TD>949<TD>4.1<TD>0.192<TD>787
<TR><TH>AES/ECB (128-bit key)<TD>3564<TD>1.1<TD>0.082<TD>337

AFTER:

<TR><TH>AES/CTR (128-bit key)<TD>6484<TD>0.6<TD>0.163<TD>677
<TR><TH>AES/CTR (192-bit key)<TD>5641<TD>0.7<TD>0.176<TD>728
<TR><TH>AES/CTR (256-bit key)<TD>5005<TD>0.8<TD>0.183<TD>761
<TR><TH>AES/CBC (128-bit key)<TD>1223<TD>3.2<TD>0.135<TD>559
<TR><TH>AES/CBC (192-bit key)<TD>1080<TD>3.7<TD>0.147<TD>611
<TR><TH>AES/CBC (256-bit key)<TD>966<TD>4.1<TD>0.155<TD>642
<TR><TH>AES/OFB (128-bit key)<TD>1057<TD>3.7<TD>0.158<TD>656
<TR><TH>AES/CFB (128-bit key)<TD>1217<TD>3.3<TD>0.186<TD>774
<TR><TH>AES/ECB (128-bit key)<TD>7289<TD>0.5<TD>0.082<TD>342
2017-09-18 18:15:25 -04:00
Jeffrey Walton f0c2324f6b
Fix armeabi and armv7-a for Android (GH #509) 2017-09-17 20:07:53 -04:00
Jeffrey Walton adea69ab68
Avoid increment during stores of 6x blocks
This provides another 0.1 cpb with GCC
2017-09-14 21:06:44 -04:00
Jeffrey Walton 25efb7a140
Use 6x blocks for ARMv8 AES rather than 4x
We gain 0.1 to 0.3 cpb, depending on the mode
2017-09-14 20:32:06 -04:00
Jeffrey Walton 58890ff053
Use 6x blocks for Power8 AES rather than 4x
Perforamnce increased for all modes when performing 6x vs 4x. 8x and 12x performed worse.

Here are the numbers:
4x Blocks:

<TR><TH>AES/CTR (128-bit key)<TD>1563<TD>2.1<TD>0.409<TD>1392
<TR><TH>AES/CTR (192-bit key)<TD>1403<TD>2.3<TD>0.450<TD>1529
<TR><TH>AES/CTR (256-bit key)<TD>1280<TD>2.5<TD>0.482<TD>1639
<TR><TH>AES/CBC (128-bit key)<TD>582<TD>5.6<TD>0.359<TD>1222
<TR><TH>AES/CBC (192-bit key)<TD>517<TD>6.3<TD>0.394<TD>1339
<TR><TH>AES/CBC (256-bit key)<TD>474<TD>6.8<TD>0.432<TD>1469
<TR><TH>AES/OFB (128-bit key)<TD>533<TD>6.1<TD>0.402<TD>1368
<TR><TH>AES/CFB (128-bit key)<TD>563<TD>5.8<TD>0.461<TD>1568
<TR><TH>AES/ECB (128-bit key)<TD>1829<TD>1.8<TD>0.240<TD>817

6x Blocks:

<TR><TH>AES/CTR (128-bit key)<TD>1750<TD>1.7<TD>0.406<TD>1300
<TR><TH>AES/CTR (192-bit key)<TD>1638<TD>1.9<TD>0.447<TD>1432
<TR><TH>AES/CTR (256-bit key)<TD>1528<TD>2.0<TD>0.482<TD>1541
<TR><TH>AES/CBC (128-bit key)<TD>582<TD>5.2<TD>0.358<TD>1145
<TR><TH>AES/CBC (192-bit key)<TD>517<TD>5.9<TD>0.394<TD>1260
<TR><TH>AES/CBC (256-bit key)<TD>474<TD>6.4<TD>0.431<TD>1379
<TR><TH>AES/OFB (128-bit key)<TD>533<TD>5.7<TD>0.400<TD>1281
<TR><TH>AES/CFB (128-bit key)<TD>563<TD>5.4<TD>0.461<TD>1476
<TR><TH>AES/ECB (128-bit key)<TD>1950<TD>1.6<TD>0.238<TD>763
2017-09-14 16:07:21 -04:00
Jeffrey Walton 08e4ee422e
Avoid increment during stores of 4x blocks
This provides another 0.1 cpb with GCC
2017-09-14 15:12:07 -04:00
Jeffrey Walton ddeae859d0
Use vec_xl_be and vec_xst_be for IBM XL C/C++ compiler 2017-09-14 13:27:49 -04:00
Jeffrey Walton 63a0af4efa
Fix endianess for s_one on ARM big-endian 2017-09-13 22:52:29 -04:00
Jeffrey Walton 8e52ce6dd2
Load correct value fo 1 under ARM big endian 2017-09-13 21:42:15 -04:00
Jeffrey Walton c22507e38b
Clear unused variable warnings under Clang 2017-09-13 21:37:55 -04:00
Jeffrey Walton 8d98417306
Add Aarch64 specific defines to Android cross-compile
Move <arm_acle.h> logic into "sonfig.h". Detecting when we can/should include <arm_acle.h> is proving to be troublesome
2017-09-13 17:16:57 -04:00
Jeffrey Walton 397ccd7e49
remove commented code for Power8 2017-09-13 03:59:25 -04:00
Jeffrey Walton 2b24f5b9fe
VectorLoadAligned → VectorLoadKey
Add comments for the Load and Store functions
2017-09-12 20:38:58 -04:00
Jeffrey Walton 5659acb704
Cleanup vector casts 2017-09-12 19:44:34 -04:00
Jeffrey Walton 6899d3f8bb
Add AdvancedProcessBlocks for Power8
This increases performance to about 1.6 cpb. We are about 0.5 cpb behind Botan, and about 1.0 cpb behind OpenSSL. However, it beats the snot out of C/C++, which runs at 20 to 30 cpb
2017-09-12 18:15:55 -04:00
Jeffrey Walton b090e5f69f
Add Power8 AES decryption 2017-09-12 05:53:17 -04:00
Jeffrey Walton 7fb34e9b08
Add Power8 AES encryption
This is the forward direction on encryption only.  Crypto++ uses the "Equivalent Inverse Cipher" (FIPS-197, Section 5.3.5, p.23), and it is not compatible with IBM hardware. The library library will need to re-work the decryption key scheduling routines. (We may be able to work around it another way, but I have not investigated it).
2017-09-11 22:52:22 -04:00
Jeffrey Walton 37e02f9e0e
Revert AltiVec and Power8 commits
The strategy of "cleanup under-aligned buffers" is not scaling well. Corner cases are still turing up. The library has some corner-case breaks, like old 32-bit Intels. And it still has not solved the AltiVec and Power8 alignment problems.
For now we are backing out the changes and investigating other strategies
2017-09-05 16:28:00 -04:00
Jeffrey Walton 75aef9bded
Fixup under-aligned buffers when using AES on AltiVec and Power8
This commit supports the upcoming AltiVec and Power8 processor. This commit affects a number of classes due to the ubiquitous use of AES. The commit provides the data alignment requirements.
2017-09-04 11:21:47 -04:00
Jeffrey Walton 5c6a32ba0f
Support Base Implementation + SIMD implementation on Solaris (PR #461) 2017-08-24 19:17:21 -04:00
Jeffrey Walton a19f0c663b
Update asserts
Change 'rounds' to size_t in Rijndael_AdvancedProcessBlocks_ARMV8
2017-08-19 01:55:20 -04:00
Jeffrey Walton a1b3102eab
Update comments 2017-08-19 01:35:36 -04:00
Jeffrey Walton 51fe8a7776
Guard use of SIGILL probes on Apple platforms 2017-08-17 18:06:57 -04:00
Jeffrey Walton e2c377effd Split source files to support Base Implementation + SIMD implementation (GH #461)
Split source files to support Base Implementation + SIMD implementation
2017-08-17 12:33:43 -04:00