The Crypto++ functions follow IBM's lead and provide VectorLoad, VectorLoadBE, VectorStore, and VectorStoreBE. Additionally, VectorLoadKey was removed in favor of vanilla VectorLoad.
Fix two compilation errors encountered with C++Builder (Starter Edition):
- In `cpu.cpp`, 0ccdc197b introduced a dependency on `_xgetbv()` from `<immintrin.h>` that doesn't exist on C++Builder. Enlist it for the workaround, similar to SunCC in 692ed2a2b.
- In `adv-simd.h`, `<pmmintrin.h>` is being #included under the `CRYPTOPP_SSE2_INTRIN_AVAILABLE` macro. This header, [which apparently provides SSE3 intrinsics](https://stackoverflow.com/a/11228864/1433768), is not shipped with C++Builder. (This section of code was recently downgraded from a SSSE3 to a SSE2 block in 09c8ae28, followed by moving away from `<immintrin.h>` in bc8da71a, followed by reintroducing the SSSE3 check in d1e646a5.) Split the SSE2 and SSSE3 cases such that `<pmmintrin.h>` is not #included for SSE2. This seems safe to do, because some `git grep` analysis shows that:
- `adv-simd.h` is not #included by any other header, but only directly #included by some `.cpp` files.
- Among those `.cpp` files, only `sm4-simd.cpp` has a `CRYPTOPP_SSE2_INTRIN_AVAILABLE` preprocessor block, and there it again includes the other two headers (`<emmintrin.h>` and `<xmmintrin.h>`).
NOTE: I was compiling via the IDE after [setting up a project file](https://github.com/tanzislam/cryptopals/wiki/Importing-into-Embarcadero-C%E2%94%BC%E2%94%BCBuilder-Starter-10.2#using-the-crypto-library). My compilation command was effectively:
```
bcc32c.exe -DCRYPTOPP_NO_CXX11 -DCRYPTOPP_DISABLE_SSSE3 -D__SSE2__ -D__SSE__ -D__MMX__
```
Travis testing on GitHub showed a RDSEED failure with a "no implementation" failure. Stepping back the RDRAND and RDSEED impl logic was too complex. It offered choices when there was no need for them. For MSC we only need the MASM implementation. For U&L we only need the inline assembly that emits the byte codes (and not the instruction). The byte codes cover from GCC 3.2, Clang 2.8 and onwards
We were using aligned loads of the key table SHA256_K. The key table was declared as 16-byte aligned but it appears the table was not aligned in memory.