cryptopp

Commit Graph

Author	SHA1	Message	Date
Jeffrey Walton	04908cca48	Improve x86 and x64 ARIA performance The changes were meant to improve Windows, but GCC benefited more. Windows gained 0.3 cpb, while GCC gained 1.2 cpb	2017-04-13 06:52:56 -04:00
Jeffrey Walton	35f95fb739	Fix unaligned pointer crash on Win32 due to _mm_load_si128 The SSSE3 intrinsics were performing aligned loads using _mm_load_si128 using user supplied pointers. The pointers are only a byte pointer, so its alignment can drop to 1 or 2. Switching to _mm_loadu_si128 will sidestep potential problems. The crash surfaced under Win32 testing. Switch to memcpy's when performing bulk assignment x[0]=y[0] ... x[3]=y[3]. I believe Yun used the pattern to promote vectorization. Some compilers appear to be braindead and issue integer move's one word at a time. Non-braindead compiler will still take the optimization when advantageous, and slower compilers will benefit from the bulk move. We also cherry picked vectorization opportunities, like in ARIA_GSRK_NEON. Remove keyBits variable. We now use UncheckedSetKey's keylen throughout. Also fix a typo in CRYPTOPP_BOOL_SSSE3_INTRINSICS_AVAILABLE. __SSSE3__ was listed twice.	2017-04-13 04:28:02 -04:00
Jeffrey Walton	59767be52e	Add Intel and ARM intrinsics Win32 and Win64 benefited from the Intel intrinsics. A32 and Aarch64 benefited from the ARM intrinsics. The intrinsics shaved 150 to 350 cycles from key setup. The intrinsics slowed modern GCC down a small bit, and did not appear to affect old GCC. As such, Intel intrinsics were only enabled for Microsoft compilers. We were not able to improve encryption and decryption. In fact, some of the attempted macro conversions and intrinsics attempts slowed things down considerably. For example, GCC 5.4 on x86_64 went from 120 MB/s to about 70 MB/s when we tried to improve code around the Key XOR Layer (ARIA_KXL).	2017-04-12 23:28:41 -04:00
Jeffrey Walton	f44e705c16	Add NEON intrinsics for ARIA_GSRK_NEON Update documentation	2017-04-12 12:15:32 -04:00
Jeffrey Walton	af561758df	Rework ARIA_GSRK to have MSVC generate "rotate imm" rather than "rot reg" The immediate version of rotate can be 4 to 6 times faster than the register version	2017-04-11 20:47:54 -04:00
Jeffrey Walton	d6b295203b	Additional library integration for ARIA	2017-04-11 16:19:36 -04:00
Jeffrey Walton	0d742591e0	Switch to code based on 32-bit implementation The 32-bit code is based on Aaram Yun's code. Yun's code combined with a few library specific tweaks improves performance to roughly Camellia.	2017-04-11 11:39:45 -04:00
Jeffrey Walton	8ca0f47939	Add ARIA block cipher This is the reference implementation, test data and test vectors from the ARIA.zip package on the KISA website. The website is located at http://seed.kisa.or.kr/iwt/ko/bbs/EgovReferenceList.do?bbsId=BBSMSTR_000000000002. We have optimized routines that improve Key Setup and Bulk Encryption performance, but they are not being checked-in at the moment. The ARIA team is updating its implementation for contemporary hardware and we would like to use it as a starting point before we wander too far away from the KISA implementation.	2017-04-10 10:52:40 -04:00

8 Commits (2bb36c790e7cbe6fc7aeef2998d1b9a49935762b)