Well, the IBM docs were not quite correct when they stated "The block is aligned so that it can be used for any type of data". The vector data types are pretty standard, even across different machines from diffent manufacturers
TweetNaCl is a compact reimplementation of the NaCl library by Daniel J. Bernstein, Bernard van Gastel, Wesley Janssen, Tanja Lange, Peter Schwabe and Sjaak Smetsers. The library is less than 20 KB in size and provides 25 of the NaCl library functions.
The compact library uses curve25519, XSalsa20, Poly1305 and SHA-512 as default primitives, and includes both x25519 key exchange and ed25519 signatures. The complete list of functions can be found in TweetNaCl: A crypto library in 100 tweets (20140917), Table 1, page 5.
Crypto++ retained the function names and signatures but switched to data types provided by <stdint.h> to promote interoperability with Crypto++ and avoid size problems on platforms like Cygwin. For example, NaCl typdef'd u64 as an unsigned long long, but Cygwin, MinGW and MSYS are LP64 systems (not LLP64 systems). In addition, Crypto++ was missing NaCl's signed 64-bit integer i64.
Crypto++ enforces the 0-key restriction due to small points. The TweetNaCl library allowed the 0-keys to small points. Also see RFC 7748, Elliptic Curves for Security, Section 6.
TweetNaCl is well written but not well optimized. It runs 2x to 3x slower than optimized routines from libsodium. However, the library is still 2x to 4x faster than the algorithms NaCl was designed to replace.
The Crypto++ wrapper for TweetNaCl requires OS features. That is, NO_OS_DEPENDENCE cannot be defined. It is due to TweetNaCl's internal function randombytes. Crypto++ used DefaultAutoSeededRNG within randombytes, so OS integration must be enabled. You can use another generator like RDRAND to avoid the restriction.
Power4 lacks 'vector long long'
Rename datatypes such as 'uint8x16_p8' to 'uint8x16_p'. Originally the p8 suffix indicated use with Power8 in-core crypto. We are now using Altivec/Power4 for general vector operations.
Both BLAKE2 and SPECK slow down when using NEON/ASIMD. When just BLAKE2 experienced the issue, it was a one-off problem. Its now wider than a one-off, so add the formal define
* Set default target Windows version for MinGW to XP
The original MinGW from mingw.org targets Windows 2000 by default, but lacks
the <wspiapi.h> include needed for Windows 2000 support.
* Disable CRYPTOPP_CXX11_SYNCHRONIZATION for original MinGW
std::mutex is only available in libstdc++ if _GLIBCXX_HAS_GTHREADS is defined,
which is not the case for original MinGW. Make the existing fix for AIX more
general to fix this. Unfortunately, any C++ header has to be included to
detect the standard library and the otherwise empty <ciso646> is going to be
removed from C++20, so use <cstddef> instead.
This cleans up the compile on old PwerMac G5's. Our Altivec and Crypto code relies on Power7 and Power8 extensions. There's no need to shoehorn Altivec and Power4 into old platforms, so we disable Altivec and Crypto unless Power7 is available. The GNUmakefile sets CRYPTOPP_DISABLE_ALTIVEC if Power7 is not available.
Enable aligned allocations under IBM XL C/C++. Based on the AIX malloc man pages, "... the block is aligned so that it can be used for any type of data". Previously CRYPTOPP_NO_ALIGNED_ALLOC was in effect.
Use malloc instead of calloc on OS X. Based on the OS X malloc man pages, "... the allocated memory is aligned such that it can be used for any data type, including AltiVec- and SSE-related types". Additionally, calloc zero'd the memory it allocated which slowed things down on Apple systems.
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.
The strategy of "cleanup under-aligned buffers" is not scaling well. Corner cases are still turing up. The library has some corner-case breaks, like old 32-bit Intels. And it still has not solved the AltiVec and Power8 alignment problems.
For now we are backing out the changes and investigating other strategies