Wei's AESNI routines use ARMV8_Enc_Block, ARMV8_Enc_4_Blocks, ARMV8_Dec_Block, ARMV8_Dec_4_Blocks. They increased performance for ECB, CTR and CBC mode. Formerly ECB mode was rinning at 2.3 cpb. After the cut-over ECB dropped to 1.1 cpb.
Both Apple and LLVM Clang want -msse4.2 even when only SSE4.1 is used. Sidestep it because we don't know how it will affect some of the lower end Atoms.
84877:/usr/include/clang/3.5.0/include/nmmintrin.h:28:2: error: "SSE4.2 instruction set not enabled"
84878:#error "SSE4.2 instruction set not enabled"
84880:rijndael-simd.cpp:466:26: error: use of undeclared identifier '_mm_extract_epi32'; did you mean '_mm_extract_epi16'?
84887:rijndael-simd.cpp:480:11: error: use of undeclared identifier '_mm_insert_epi32'; did you mean '_mm_insert_epi16'?
84894:rijndael-simd.cpp:485:11: error: use of undeclared identifier '_mm_insert_epi32'; did you mean '_mm_insert_epi16'?
...
getauxval() is the recommended way to determine features on Linux. Its likely less expensive than CPU probing for SIGILLs. We gave up portability, but some gained stability
rijndael-simd.cpp(318): warning C4267: 'argument': conversion from 'size_t' to 'unsigned int', possible loss of data [C:\projects\cryptopp\cryptlib.vcxproj]
rijndael-simd.cpp(376): note: see reference to function template instantiation 'size_t CryptoPP::Rijndael_AdvancedProcessBlocks_AESNI<void(__cdecl *)(__m128i &,const __m128i *,unsigned int),void(__cdecl *)(__m128i &,__m128i &,__m128i &,__m128i &,const __m128i *,unsigned int)>(F1,F4,const __m128i *,::size_t,const CryptoPP::byte *,const CryptoPP::byte *,CryptoPP::byte *,::size_t,CryptoPP::word32)' being compiled
with
[
F1=void (__cdecl *)(__m128i &,const __m128i *,unsigned int),
F4=void (__cdecl *)(__m128i &,__m128i &,__m128i &,__m128i &,const __m128i *,unsigned int)
]
rijndael-simd.cpp(355): warning C4267: 'argument': conversion from 'size_t' to 'unsigned int', possible loss of data
ARMv8 AES decryption is not working at the moment. This check-in will allow us to test the current changes more widespread. We expected AES decryption failures only