This was added for compatibility with BouncyCastle and other libraries. ElGamals paper and the HAC says to select x over the interval [1,p-1]. Crypto++ selects x over [1,q-1] as with other GFP schemes. Crypto++ fails to validate some of the keys of other libraries.
DL_PublicKey_GFP_OldFormat used to perform a reduction on x, but I think it treated a symptom and not the underlying cause. The underlying cause was, Crypto++ wass too strict in validating the parameter.
Note that wikipedia says to select the privaye key x over [1,q-1]. We are unable to find a reference for the practice, though it is OK.
DWORD and ULONG are 32-bit. The conversion from size_t could fail, and the RNG would return a truncated result. I think it is low risk, but the test for the conversion test is cheap.
We are seeing RNG falures on HURD, but we are not throwing when constructing BlockingRng or NonblockingRng. This is despite the fact that /dev/urandom is missing during testing. NonblockingRng should always thwo when /dev/urandom is missing.
Debian HURD was slipping between the cracks. HURD appeared to be a minor failure because entropy on the heap improved the test result. After we zero'd the block, it was a catastrophic failure.
Calls to `MASM_RDSEED_GenerateBlock` would hang for an unknown reasons on Windows 10 and VS2017/VS2019 toolchains. Similar calls to `MASM_RDRAND_GenerateBlock` worked as expected. They were effectively the same code. The only differences were the function names and the opcodes (they were literally copy/paste).
Splitting `rdrand.asm` (with both `RDRAND` and `RDSEED`) into `rdrand.asm` (with `RDRAND`) and `rdseed.asm` (with `RDSEED`) resolved the issue. We don't know why.
This check-in provides the fix for leaks in ECP's Add() and Double(). The fixes were taken from Joost Renes, Craig Costello, and Lejla Batina's [Complete addition formulas for prime order elliptic curves](https://eprint.iacr.org/2015/1060.pdf).
The Pull Request includes two additional changes that were related to testing the primary fix. First, an `AuthenticatedKeyAgreementWithRolesValidate` interface was added. It allows us to test key agreement when roles are involved. Roles are "client", "server", "initiator", "recipient", etc.
Second, `SetGlobalSeed` was added to `test.cpp` to help with reproducible results. We had code in two different places that set the seed value for the random number generator. But it was sloppy and doing a poor job since results could not be reproduced under some circumstances.
This fixes the timing leakage of bit-length of nonces in ECDSA by essentially
fixing the bit-length, by using a nonce equivalent modulo the subgroup order.
This is a minor fix to AutoSeededX917RNG::Reseed. Valgrind produces a finding if user input is too small or seed size is too large. The constraints make it a little tricky to use correctly. HKDF will always produce the correct amount of material with provable security, and avoid the Valgrind finding.
Travis CI fails "deep tests" of DLIES with #857 applied. Let's revert it for now and get back to
```c++
cipherKey = key + MAC::DEDAULT_KEYLENGTH;
```
and see if it improves the situation.
The Gentoo folks caught a bug at https://bugs.gentoo.org/689162. The 689162 bug uses -march=bdver1 -msse4.1 on a AMD Bulldozer machine.
Investigating the issue we are missing the XOP header blake2b_simd.cpp. However, adding the XOP header is not enough for this particular config. Four source files fail to compile with the expected headers. We are waiting on the GCC folks to get back to us with a fix.
Change from `MAC::DEFAULT_KEYLENGTH` to `MAC::DIGESTSIZE` in `DL_EncryptionAlgorithm_Xor` was only partially done. This was discovered when null hash was used. This, along with the proposed fix, was discovered by Andrew Wason (thanks!).
In file included from test.cpp:31:0:
validate.h:213:93: error: operator '||' has no right operand
#elif (_POSIX_C_SOURCE >= 1 || _XOPEN_SOURCE || _BSD_SOURCE || _SVID_SOURCE || _POSIX_SOURCE)
In file included from test.cpp:31:0:
validate.h:213:93: error: operator '||' has no right operand
#elif (_POSIX_C_SOURCE >= 1 || _XOPEN_SOURCE || _BSD_SOURCE || _SVID_SOURCE || _POSIX_SOURCE)
Using BMI2 saves about 0.03 ms on a Core i5 6400 @ 2.7 GHz. It is small but measurable. It also gives GCC more freedom in selecting memory or register operands
Using BMI2 saves about 0.03 ms on a Core i5 6400 @ 2.7 GHz. It is small but measurable. It also gives GCC more freedom in selecting memory or register operands
The single buffer IncrementCounterByOne generated a Valgrind finding on ARM. This commit uses the same pattern for both overloads in case Valgrind wants to fire on the two-buffer version.
This was introduced earlier in the day when clearing a Valgrind finding. It tested good with the self tests. However, we double process byte[0] if there's a carry.
This was a mistake when porting from Cryptogams to Crypto++. The macros VFP_ABI_PUSH and VFP_ABI_POP needed to be defined because they save and restore SIMD register state. They were originally missing during the port. The benchmarks would hang because the doubles we used for benchmarking were blown away in sha512_block_data_order_neon.
Andy advised against removing the global caps variable. This commit reintroduces CRYPTOGAMS_armcap_P. However, due to the shared object symbol loading problem, we needed to use CRYPTOGAMS_armcap_P as a global, and not CRYPTOGAMS_armcap as a local. Using CRYPTOGAMS_armcap_P directly caused the symbol to be marked as R_ARM_ABS32 which avoids the problem with R_ARM_REL32.
Previous to the Cryptogams cut-in we could be sloppy and return anything for ARMv8. Now e have real code backing ARMv7 we need to return an accurate value.
This fixes the query under Clang. This appears to be a trickier problem because there is no explicit define for HWCAP_ARMv7. We rely on HWCAP_NEON as a proxy, or fallback to a CPU_ProbeARMv7.
Also declare storage for CRYPTOGAMS_armcaps. This moves the symbol from BSS to initialized data. The Cryptogams module declares the symbol as common, so they are weak and use our declaration.
Cryptogams is Andy Polyakov's project used to create high speed crypto algorithms and share them with other developers. Cryptogams has a dual license. First is the OpenSSL license because Andy contributes to OpenSSL. Second is a BSD license for those who want a more permissive license.
Andy's implementation runs about 45% faster than C/C++ code. Testing on a 1.8 GHz Cortex-A17 shows Cryptograms at 45 cpb, and C++ at 79 cpb.
The integration instructions are documented at [Cryptogams SHA](https://wiki.openssl.org/index.php/Cryptogams_SHA) on the OpenSSL wiki.
Cryptogams is Andy Polyakov's project used to create high speed crypto algorithms and share them with other developers. Cryptogams has a dual license. First is the OpenSSL license because Andy contributes to OpenSSL. Second is a BSD license for those who want a more permissive license.
Andy's implementation runs about 45% faster than C/C++ code. Testing on a 1 GHz Cortex-A7 shows Cryptograms at 17 cpb, and C++ at 30 cpb.
The integration instructions are documented at [Cryptogams SHA](https://wiki.openssl.org/index.php/Cryptogams_SHA) on the OpenSSL wiki.
Add ARM SHA1 asm implementation from Cryptogams.
Cryptogams is Andy Polyakov's project used to create high speed crypto algorithms and share them with other developers. Cryptogams has a dual license. First is the OpenSSL license because Andy contributes to OpenSSL. Second is a BSD license for those who want a more permissive license.
Andy's implementation runs about 30% faster than C/C++ code. Testing on a 1 GHz Cortex-A7 shows Cryptograms at 16 cpb, and C++ at 23 cpb.
The integration instructions are documented at [Cryptogams SHA](https://wiki.openssl.org/index.php/Cryptogams_SHA) on the OpenSSL wiki.
Use PowerPC unaligned loads and stores with Power8. Formerly we were using Power7 as the floor because the IBM POWER Architecture manuals said unaligned loads and stores were available. However, some compilers generate bad code for unaligned loads and stores using `-march=power7`, so bump to a known good.
This should have been checked-in during GH #783 and PR #784. I think there was one mailing list message about missing symbols GF2NT_233_Multiply_Reduce_CLMUL and GF2NT_233_Square_Reduce_CLMUL. I missed it when attempting to reproduce the issue. I can duplicate it now using VS2013. I think the addition of CRYPTOPP_DLL caused the issue to surface.
As done with others tests. This will avoid a miss-detection of aarch64 features
when using flags such as _FORTIFY_SOURCE that needs to be filtered for testing
This fixes https://github.com/weidai11/cryptopp/issues/812
V2: Fix all cases
Signed-off-by: Nicolas Chauvet <kwizart@gmail.com>
The makefile tries to pre-qualify NEON (for lack of a better term), and sets IS_NEON accordingly. If IS_NEON=1, then we go on to perform test compiles to see if -mfloat-abi=X -mfpu=neon (and friends) actually work. Effectively we are performing a test to see if we should perform another test.
The IS_NEON flag predates our compile time feature tests. It was kind of helpful when we were trying to sort out if a platform and compiler options supported NEON without a compile test. That was an absolute mess and we quickly learned we needed a real compile time feature test (which we now have).
Additionally, Debian and Fedora ARMEL builds are failing because we are misdetecting NEON availability. It looks like we fail to set IS_NEON properly, so we never get into the code paths that set either (1) -mfloat-abi=X -mfpu=neon or (2) -DCRYPTOPP_DISABLE_NEON or -DCRYPTOPP_DISABLE_ASM. Later, the makefile builds a *_simd.cpp and the result is an error that NEON needs to be activated (or disabled).
This commit removes IS_NEON so we immediately move to compile time feature tests.
This was added to misc.h due to the noise created by NumericLimitsMin and NumericLimitsMax. It should make it easier to remove -Wno-unused-function from config.h.
We tweaked ChaCha to arrive at the IETF's implementation specified by RFC 7539. We are not sure how to handle block counter wrap. At the moment the caller is responsible for managing it. We were not able to find a reference implementation so we disable SIMD implementations like SSE, AVX, NEON and Power4. We need the wide block tests for corner cases to ensure our implementation is correct.
We also clamp the private key and recalculate the public key. Note: we already know some IETF keys fail to validate because they are not clamped as specified in Bernsteain's paper or the RFCs (derp....)
Embarcadero C++Builder v10.3 [has a bug](https://quality.embarcadero.com/browse/RSP-22883) where its old Intel intrinsics headers try to use retired Clang builtins and fail to compile. In devising a workaround with `-DCRYPTOPP_DISABLE_ASM`, I found that `sse_simd.cpp` includes `<emmintrin.h>` even when its code doesn't need the intrinsics.
With this patch, `-DCRYPTOPP_DISABLE_ASM` will be a sufficient workaround because `CRYPTOPP_SSE2_INTRIN_AVAILABLE` is derived from it in `config.h`.
The initial cut-in was missing preamble present in Moon's curve25519_donna function. It originally tested good because we only perform a pairwise consistency check in release builds. Comprehensive testing with debug builds revealed the problem. Debug builds cross-validate against Bernstein's TweetNaCl library.
The initial cut-in was missing preamble present in Moon's curve25519_donna function. It originally tested good because we only perform a pairwise consistency check in release builds. Comprehensive testing with debug builds revealed the problem. Debug builds cross-validate against Bernstein's TweetNaCl library.
The initial cut-in was missing preamble present in Moon's curve25519_donna function. It originally tested good because we only perform a pairwise consistency check in release builds. Comprehensive testing with debug builds revealed the problem. Debug builds cross-validate against Bernstein's TweetNaCl library.
Add is_clamped for secret key validation.
Cleanup paramter names in Donna::curve25519 to follow function.
Overload Donna::curve25519 to implicitly use base point if not provided.
Add additional asserts to let the code debug itself.
Update documentation.
Some comments in config.h were old. Time for a refresh.
Switch from CRYPTOPP_BOOL_ARM64 to CRYPTOPP_BOOL_ARMV8. Aarch32 is ARMv8, and that's the important part.
ANDROID_HOME is /c/Users/Jeff/.android on desktops. It is a place where user's private data goes, like Android debug signing keys. It is not the SDK directory like answered on Stack Overflow.
The library used to provide DEFAULT_CHANNEL and AAD_CHANNEL this way. We experienced Static Initialization Order Fiasco crashes on occassion, so we moved them into cryptlib.h with internal linkage. The cost was, each translation unit got a copy of the strings which contributed to bloat. Issue 751 shows Clang compiles the global constructors for DEFAULT_CHANNEL and AAD_CHANNEL above the base ISA so we caught crashes on OS X with down-level hardware.
We are now at a "pick your poison" point. We selected Static Initialization Order Fiasco because it seems to be less prevalent.
Hat tip to the C++ Committee for allowing this problem to fester for three decades.
I thought this was due to trying to call the darn instruction even though g_hasDARN == false on Power8. However, the problem turned out to be a Power9 load was used when DARN class threw a DARN_Err.
According to the OpenPOWER specs, unsigned long long vectors first appeared in ISA 2.06, which is POWER7. However some support functions, like vec_add, did not arrive until ISA 2.07, which is POWER8.
This assert checks the array we return to the caller is large enough. Spoiler alert... it is not always large enough, like on 64-bit AIX. The linker on AIX appears to align smaller than 8-bytes
This will allow users to specify agreesive warning flags without accidentally failing a feature test. The feature tests are minimal but the system headers could be noisy under elevated warnings
The ParameterBlocks for BLAKE2 had undefined behavior. We relied on the compiler packing the bytes in the structure, then we used the first byte as the start of an array.
This rewrite does things correctly. We don't memset the structure, and we don't treat the structure as a contiguous array.
The CORE function provides the implementation for ChaCha_OperateKeystream_ALTIVEC, ChaCha_OperateKeystream_POWER7, BLAKE2_Compress32_ALTIVEC and BLAKE2_Compress32_POWER7. Depending on the options used to compile the source files, either POWER7 or ALTIVEC will be used.
This is needed to support the "new toolchain, ancient hardware" use case.
This is kind of tricky. We automatically drop from POWER7 to POWER4 if 7 is notavailable. However, if POWER7 is available the runtime test checks for HasAltivec(), and not HasPower7(), if the drop does not occur.
All of this goodness is happening on an old Apple G4 laptop with Gentoo. It is a "new toolchain on old hardware".
This is kind of tricky. We automatically drop from POWER7 to POWER4 if 7 is not available. However, if POWER7 is available the runtime test checks for HasAltivec(), and not HasPower7(), if the drop does not occur.
All of this goodness is happening on an old Apple G4 laptop with Gentoo. It is a "new toolchain on old hardware".
GCM can do some bulk XOR's using the SIMD unit. However, we still need loads and stores to be fast. Fast loads and stores of unaligned data requires the VSX unit
The way the existing ppc_simd.h is written makes it hard to to switch between the old Altivec loads and stores and the new POWER7 loads and stores. This checkin rewrites the wrappers to use _ALTIVEC_, _ARCH_PWR7 and _ARCH_PWR8. The wrappers in this file now honor -maltivec, -mcpu-power7 and -mcpu=power8. It allows users to compile a source file, like chacha_simd.cpp, with a lower ISA and things just work for them.
This costs about 0.6 cpb (700 MB/s on GCC112), but it makes the faster algorithm available to more machines. In the future we may want to provide both POWER7 and POWER8
This switches to line oriented parsing for the test files. Previously we we using streams for names, and lines for values. We can now use whitespace and make the tests a bit more readable by grouping similar tests. AlgorithmType will clear the current accumlated values.
This is only a work-around for the moment. The issue only affects SIMD code. The problem is, the algorithm we use performs a 32-bit add as an intermediate result, but we really need a 64-bit add. We are running 4 transforms in parallel, and we can't add and carry the way we need to.
The workaround is, whenever we could cross the 32-bit counter boundary we use the C version of the transform. We determine the cross-over point by 'bool safe = 0xffffffff - state.low > 4'. When not safe we skip the SIMD version of the algorithm and use the C version. Once we are safe again we use the SIMD version again.
The work-around costs us about 0.1 to 0.2 cpb. At 1.10 or 1.15 cpb that equates to about 200 MB/s on a Skylake. We'd like to get it back eventually.
I think this may be related to the VectorSource check-in. The error is:
algparam.h: In constructor 'ConstByteArrayParameter::ConstByteArrayParameter(const T&, bool) [with T = std::vector<byte, std::allocator<byte> >]':
filters.h:1444: instantiated from here
algparam.h:56: error: 'const class std::vector<byte, std::allocator<byte> >' has no member named 'data'
This allows users to -DCRYPTOPP_ALTIVEC_AVAILABLE=0 on the command line. It is especially important on PPC, which varies wildly among compilers dating back to the 2000's
adhoc.cpp was a bit uncomfortable because we had to copy it out from adhoc.cpp.proto. For some reason CMake could not perform the copy, so we started using pch.cpp in CMake. This commit keeps them consistent.
We may have problems with one test, and that is the Newlib tests. I seem to recall they a C++ header included to properly identify its use. We cross that bridge during MinGW testing.
Autotools sets up its config.h file with the '#define XXX 0' or '#define XXX 1' pattern. This check-in makes the sources Autotools aware. We need to verify CMake does the same
This picks up about 0.2 cpb in ChaCha::OperateKeystream. It may not sound like much but it puts SSE2 intrinsics version on par with the ASM version of Salsa20. Salsa20 leads ChaCha by 0.1 to 0.15 cpb, which equates to about 50 MB/s.
Thanks to Jack Lloyd and Botan for allowing us to use the implementation.
The numbers for SSE2 are very good. When compared with Salsa20 ASM the results are:
* Salsa20 2.55 cpb; ChaCha/20 2.90 cpb
* Salsa20/12 1.61 cpb; ChaCha/12 1.90 cpb
* Salsa20/8 1.34 cpb; ChaCha/8 1.5 cpb
Commit afbd3e60f6 effectively treated a symptom and not the underlying problem. The problem was linkers on 32-bit systems ignore CRYPTOPP_ALIGN_DAT(16) passed down by the compiler and align to 8-bytes or less. We have to use Wei's original code in some places. It is not a bad thing, but the bit fiddling is something we would like to contain a little more by depending more on language or platform features.
This commit keeps the original changes which improve partial specializations; but fixes 32-bit linker behavior by effectively reverting afbd3e60f6 and e054d36dc8. We also add more comments so the next person has understands why things are done they way they are.
These fixes were interesting in a morbid sort of way. I thought the FixedSizeAllocatorWithCleanup specializations faithfully reproduced semantics but I was wrong on Win32 and Sparc. Also see Commit e054d36dc8.
It seems there was another requirement or dependency that we missed, but it was not readily apparent. If I am parsing results correctly (which I may not be), it appears the bit twiddling using 8 byte alignment had more influence on alignment than I originally thought based on use of CRYPTOPP_BOOL_ALIGN16 and T_Align16. Or maybe the alignment attributes specified by CRYPTOPP_ALIGN_DATA are not being honored like they should for stack allocations.
This check-in avoids some uses of x86 movdqa (aligned) in favor of movdqu (unaligned). The uses were concentrated on memory operands which were 8-byte aligned instead of 16-byte aligned. It is not clear to me how the specializations lost 8-bytes of alignment. The check-in also enlists CRYPTOPP_ASSERT to tell us when there's a problem so we don't need to go hunting for bugs.
This allocator still has some demons buried inside due to the bit fiddling. This commit should isolate the demons to aligned stack allocations when an alignment facility from the platform or OS is not available. That is, we use CRYPTOPP_ALIGN_DATA when we can because it is most reliable.
We can tell when things have gone sideways using Debug builds. The CRYPTOPP_ASSERT(m_allocated) will fire on destruction because the flag gets overwritten.
This commit also favors init priorities over C++ dynamic initialization. After the std::call_once problems on Sparc and PowerPC I'm worried about problems with Dynamic Initialization and Destruction with Concurrency.
We also do away with supressing warnings and use CRYPTOPP_UNUSED instead.
I was not able to duplicate it under GCC. That includes the GCC's supplied with Debian 8 and Ubuntu 14. It looks like the problem was with Asan insread of the library
It looks like GCC is rejecting the -pthread option but it is advertising Pthread support by defining 39 related macros. I'm not sure what to make of it, but we can't use -pthread because it breaks the compile.
On my MinGW-w64 setup, the build failed:
>mingw32-make: *** No rule to make target 'winpipes.o', needed by 'libcryptopp.a'.
>mingw32-make: Target 'default' not remade because of errors.
Looks like `winpipes.cpp` was removed in f2171cbe2 but not de-listed from the `GNUmakefile`. Remove it.
Also use CRYPTOPP_DISABLE_XXX_ASM consistently. The pattern is needed for Clang which still can't compile Intel assembly language. Also see http://llvm.org/bugs/show_bug.cgi?id=24232.
SIMON-64 and SIMON-128 have different ISA requirements. The same applies to SPECK-64 and SPECK-128. GCC generated code that resulted in a SIGILL due to the ISA differences on a down level machine. The instructions was a mtfprwz from POWER8. It was prsent in a function prologue on a POWER7 machine.
SIMON-64 and SPECK-64 don't use 64-bit type so they can run on Power7. We may be able to drop to Power4, but we need to test the effects of Loads and Stores without vec_vxs_ld and vec_vsx_st
Commit 3ed38e42f6 added the POWER8 infrastructure for GCM mode. It also added GCM_SetKeyWithoutResync_VMULL, GCM_Multiply_VMULL and GCM_Reduce_VMULL. This commit adds the remainder, which includes GCM_AuthenticateBlocks_VMULL.
GCC is OK on Linux (ppc64-le) and AIX (ppc64-be). We may need some touchups for XLC compiler
GCM_SetKeyWithoutResync_VMULL, GCM_Multiply_VMULL and GCM_Reduce_VMULL work as expected on Linux (ppc64-le) and AIX (ppc64-be). We are still working on GCM_AuthenticateBlocks_VMULL.
The Crypto++ functions follow IBM's lead and provide VectorLoad, VectorLoadBE, VectorStore, and VectorStoreBE. Additionally, VectorLoadKey was removed in favor of vanilla VectorLoad.
Fix two compilation errors encountered with C++Builder (Starter Edition):
- In `cpu.cpp`, 0ccdc197b introduced a dependency on `_xgetbv()` from `<immintrin.h>` that doesn't exist on C++Builder. Enlist it for the workaround, similar to SunCC in 692ed2a2b.
- In `adv-simd.h`, `<pmmintrin.h>` is being #included under the `CRYPTOPP_SSE2_INTRIN_AVAILABLE` macro. This header, [which apparently provides SSE3 intrinsics](https://stackoverflow.com/a/11228864/1433768), is not shipped with C++Builder. (This section of code was recently downgraded from a SSSE3 to a SSE2 block in 09c8ae28, followed by moving away from `<immintrin.h>` in bc8da71a, followed by reintroducing the SSSE3 check in d1e646a5.) Split the SSE2 and SSSE3 cases such that `<pmmintrin.h>` is not #included for SSE2. This seems safe to do, because some `git grep` analysis shows that:
- `adv-simd.h` is not #included by any other header, but only directly #included by some `.cpp` files.
- Among those `.cpp` files, only `sm4-simd.cpp` has a `CRYPTOPP_SSE2_INTRIN_AVAILABLE` preprocessor block, and there it again includes the other two headers (`<emmintrin.h>` and `<xmmintrin.h>`).
NOTE: I was compiling via the IDE after [setting up a project file](https://github.com/tanzislam/cryptopals/wiki/Importing-into-Embarcadero-C%E2%94%BC%E2%94%BCBuilder-Starter-10.2#using-the-crypto-library). My compilation command was effectively:
```
bcc32c.exe -DCRYPTOPP_NO_CXX11 -DCRYPTOPP_DISABLE_SSSE3 -D__SSE2__ -D__SSE__ -D__MMX__
```
Travis testing on GitHub showed a RDSEED failure with a "no implementation" failure. Stepping back the RDRAND and RDSEED impl logic was too complex. It offered choices when there was no need for them. For MSC we only need the MASM implementation. For U&L we only need the inline assembly that emits the byte codes (and not the instruction). The byte codes cover from GCC 3.2, Clang 2.8 and onwards
We were using aligned loads of the key table SHA256_K. The key table was declared as 16-byte aligned but it appears the table was not aligned in memory.
Use std::ostringstream instead. Eventually I'd like to see the output stream passed into the function of interest. It will avoid problems on some mobile OSes that don't have standard inputs and outputs.
These changes made it in by accident at Commit b74a6f4445. We were going to try to let them ride but they broke versioning. They may be added later but we should avoid the change at this time.
We also switched to "IsAligned<HashWordType>(input)". Using word64 was due to debug testing on Solaris (the alignment check is needed). Hard coding word64 should not have been checked in.
It looks like GetAlignmentOf was returning the "UnsignedMin(4U, sizeof(T))" for SunCC. It was causing SIGBUSes on Sparc when T=word64. OpenCSW provided access to their build farm and we were able to test "__alignof__(T)" back to an early SunCC on Solaris 9.
Man, Sparc does not mess around with unaligned buffers. Without -xmemalign=4i the hardware wants 8-byte aligned word64's so it can use the high performance 64-bit move or add.
Since we do not use -xmemalign we get the default behavior of either -xmemalgin=8i or -xmemalgin=8s. It shoul dnot matter to us since we removed unaligned data access at GH #682.
IteratedHashBase::Update aligns the buffer, but IteratedHashBase::HashBlock does not. It was causing a fair number of asserts to fire when the code was instrumented with alignment checks. Linux benchmarks shows the code does not run materially slower on i686 or x86_64.
I believe Andrew Marlow first reported it. At the time we could not get our hands on hardware to fully test things. Instead we were using -xmemalign=4i option as a band-aide to avoid running afoul of the Sparc instruction that moves 64-bits of data in one shot.
Local scopes and loading the constants with _mm_set_epi32 saves about 0.03 cpb. It does not sound like much but it improves GMAC by about 500 MB/s. GMAC is just shy of 8 GB/s.
We got reports that x86_64 was producing incorrect results. Also, the problem persisted in i386 builds. I don't think we can work around this issue. Oracle must fix it.
I think we have this issue somewhat sorted out. First, there is a compiler bug. Second, it seems to be triggered when function parameters mix const and non-const references. Third, to work around it, all parameters need to be non-const (as in this patch).
I'm really glad we kind of got to the bottom of things. The crash when compiling GCM has been bothering me for nearly 3 years.
This PR adds ARMv8.4 cpu feature detection support. Previously we only needed ARMv8.1 and things were much easier. For example, ARMv8.1 `__ARM_FEATURE_CRYPTO` meant PMULL, AES, SHA-1 and SHA-256 were available. ARMv8.4 `__ARM_FEATURE_CRYPTO` means PMULL, AES, SHA-1, SHA-256, SHA-512, SHA-3, SM3 and SM4 are available.
We still use the same pattern as before. We make something available based on compiler version and/or preprocessor macros. But this time around we had to tighten things up a bit to ensure ARMv8.4 did not cross-pollinate down into ARMv8.1.
ARMv8.4 is largely untested at the moment. There is no hardware in the field and CI lacks QEMU with the relevant patches/support. We will probably have to revisit some of this stuff in the future.
Since this update applies to ARM gadgets we took the time to expand Android and iOS testing on Travis. Travis now tests more platforms, and includes Autotools and CMake builds, too.
We were able to gut CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS for everything except Rijndael. Rijndael uses unaligned accesses on x86 to harden against timing attacks.
There's a little more to CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS and Rijndael. If we remove unaligned access then AliasedWithTable hangs in an endless loop on non-AESNI machines. So care must be taken when trying to remove the vestige from Rijndael.
cryptest.sh and UBsan caught a "secblock.h:389:4: runtime error: load of value 206, which is not a valid value for type 'bool'". m_t[4] is accessed in UncheckedSetKey. The extra m_t[] element was inadvertently removed when ProcessAndXorBlock no longer used it.
This changes both the encryption and decryption loops to perform 4 rounds per iteration rather than 8 rounds. Decryption was necessary for this bug. Encryption was included to keep things symmetrical in case of future maintenance
GCC 8 was producing bad decryption results for CBC mode on x86. NEON and Aarch64 was fine. We lose 0.6 cpb so LEA runs around 3.5 cpb instead of 2.9 cpb. It would be nice to pinpoint the GCC issue but it is kind of difficult at the moment.
This was needed a while ago but we mostly side-stepped the issues with casts. CHAM64 uses a word16 type for subkeys and a cast won't fix it because we favor word32 for 64-bit block sizes.
There are no corresponding defines in config.h at the moment. Programs will have to use the preprocessor macros __AVX__ and __AVX2__ to determine when they are available.
* remove superfluous semicolon
* Remove C-style casts from public headers
clang warns about them with -Wold-style-cast. It also warns about
implicitly casting away const with -Wcast-qual. Fix both by removing
unnecessary casts and converting the remaining ones to C++ casts.
Some distros don't want to install cryptest.exe. For folks who don't want to install the test program, they can issue 'make install-lib'.
install-lib is a non-standard target, but the GNU Coding Standard does not have a standard target for the task.
As described in https://github.com/weidai11/cryptopp/issues/652 for consistency we should add assert in all hash transformations. The expectation is to have a good pointer and a non-0 length or a null pointer and 0-length.
The finding was an out-of-bounds read but Coverity does not realize the API takes a byte count, not element count. This change may produce the same finding.
Scrypt performance jumps as expected. For example, on a machine with 4 logical cores:
$ time OMP_NUM_THREADS=1 ./test.exe
Threads: 1
Key: DCF073537D25A10C9733...
real 0m17.959s
user 0m16.165s
sys 0m1.759s
$ time OMP_NUM_THREADS=4 ./test.exe
Threads: 4
Key: B37A0127DBE178ED604F...
real 0m4.488s
user 0m15.391s
sys 0m1.981s
Visual Studio 2008 kind of forced out hand with this. VS2008 lacks <stdint.h> and <cstdint> and it caused compile problems in NaCl gear. We were being a tad bit lazy by relying on int8_t, int32_t and int64_t, but the compiler errors made us act
We could fix aria.cpp by using word32. However, NaCl gear uses int64_t and we don't have a typedef setup for it. So we will need <cstdint> later for NaCl
cryptest.sh revealed a corner case still producing an incorrect result. We need to check for '*this > m', not '*this > 2m-1'.
The corner case looks obscure. The failure surfaced as 1 failed self test for about every 2048 tests. It was also in a code path where 'a' was explicitly set to '2m-1', with 'm' random.
The test result can be duplicated with 'cryptest.exe v 9996 1521969687'. The value '1521969687' is a seed for the random number generator to reproduce.
It looks like the 0 return value for _SC_LEVEL1_DCACHE_LINESIZE is not a 1-off problem with PPC. It appears Glibc regularly returns 0 instead of failure. Also see https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/posix/sysconf.c.
We were OK before the change. The difference now is, we expect all Glibc queries to misbehave
Perforance increases significantly, but there's still room for improvement. Even OpenSSL's numbers are relatively dull. We expect Power8's SHA-256 to be somewhere between 2 to 8 cpb but we are not hitting them.
SHA-256, GCC112 (ppc64-le): C++ 23.43, Power8 13.24 cpb (+ 110 MiB/s)
SHA-256, GCC119 (ppc64-be): C++ 10.16, Power8 9.74 cpb (+ 50 MiB/s)
SHA-512, GCC112 (ppc64-le): C++ 14.00, Power8 9.25 cpb (+ 150 MiB/s)
SHA-512, GCC119 (ppc64-be): C++ 21.05, Power8 6.17 cpb (+ 450 MiB/s)
This one should have been fixed before the Crypto++ 6.1 release. Its no big deal, however. Power8 accelerated SHA-256 is 1.5x to 2x slower than straight C++. SHA-512 may be better, but the implementation is not ready to performance test.
If CRYPTOPP_GETAUXV_AVAILABLE is undefined, getauxval function is
defined to return 0 however AT_HWCAP and AT_HWCAP2 are not defined so
compilation on toolchain without getauxval and these variables such as
uclibc-ng will fail.
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
The commit also adds an assert on memcpy_s pointers. GCC 8 claims the pointers are the same. We think it is a spurious finding. The assert never fired during test.
The ALTIVEC function required an inline declaration. Lack of inline caused the self test failure. Two NEON functions needed the same. We also cleaned up constants in unnamed namespaces
This is a convention that binary compatibity uses one number.
Using that, it's possible to have bugfixes releases (patchlevel
incremented) and enhancement release (minor incremented with no
public interface removed).
Here is more information about convention
https://autotools.io/libtool/version.html
(libtool isn't relevant to this project, but the explanation hold)
Signed-off-by: Nicolas Chauvet <kwizart@gmail.com>
The code below was flagged by undefined behavior santizier under GCC 8. The offender was the doubling at "r4 = vec_add(r4, r4)". R4 is rcon and an unsigned type. It depends on integer wrap but GCC is generating code that is being flagged for signed overflow. GCC 7 and below is OK.
for (unsigned int i=0; i<8; ++i)
{
r1 = Rijndael_Subkey_POWER8(r1, r4, r5);
r4 = vec_add(r4, r4);
skptr = IncrementPointerAndStore(r1, skptr);
}
// Final two rounds using table lookup
...
Throughout the library the message "FAILED" (not "failed") is used to signal failures. It makes it easy to grep for them. This change makes the message consistent.
Of the 200+ test vectors only 10 are semi-authentic. The ten are from the Simon and Speck paper but they had permutations applied to them so they worked with the algorithms described in the paper. The remaining 200 or so were generated with Crypto++ using straight C++ code. The library generated the test vectors because we don't have a reference implementation
$ CXXFLAGS=-std=gnu++17 gmake
clang++ -std=gnu++17 -fPIC -pthread -pipe -c cryptlib.cpp
In file included from cryptlib.cpp:19:
./misc.h:2542:43: error: no member named 'bind2nd' in namespace 'std'
return std::find_if(first, last, std::bind2nd(std::not_equal_to<T>(), value));
~~~~~^
1 error generated.
We recently learned our Simon and Speck implementation was wrong. The removal will stop harm until we can loop back and fix the issue.
The issue is, the paper, the test vectors and the ref-impl do not align. Each produces slightly different result. We followed the test vectors but they turned out to be wrong for the ciphers.
We have one kernel test vector but we don't have a working implementation to observe it to fix our implementation. Ugh...
It is easier to defer to the runtime for aligned allocations. We found the preprocessor macros needed to identitify the availability. Also see https://forum.kde.org/viewtopic.php?p=66274
It is easier to defer to the runtime for aligned allocations. We found the preprocessor macros needed to identitify the availability. Also see https://forum.kde.org/viewtopic.php?p=66274
This commit does a few things. First, it uses the compiler's triplet and the build component to determine the machine we are targeting. Second, it adds an 'X' prefix so we don't collide with someone else's variables. Third it cleans up some of the recipes. Fourth, it removes X32 detection since the system differences are handled in config.h and the source files
Linuxbrew is a fork of Homebrew on Linux.
In which, the `gcc --version` will report "homebrew".
Therefore, the current code will incorrectly set OSXPORT_COMPILER
under such environment, which results to the following compiling errors:
gcm.cpp:823: Error: too many memory references for `add'
gcm.cpp:824: Error: too many memory references for `pxor'
gcm.cpp:825: Error: ambiguous operand size for `shr'
gcm.cpp:826: Error: too many memory references for `movzx'
gcm.cpp:827: Error: too many memory references for `add'
gcm.cpp:828: Error: too many memory references for `pxor'
gcm.cpp:829: Error: too many memory references for `movzx'
gcm.cpp:830: Error: too many memory references for `add'
gcm.cpp:831: Error: too many memory references for `pxor'
gcm.cpp:832: Error: ambiguous operand size for `add'
gcm.cpp:833: Error: ambiguous operand size for `sub'
gcm.cpp:835: Error: too many memory references for `movdqa'
g++-5 -DNDEBUG -g2 -O3 -fPIC -Wa,-q -DCRYPTOPP_CLANG_INTEGRATED_ASSEMBLER=1 -pthread -pipe -c md4.cpp
make: *** [GNUmakefile:1120: gcm.o] Error 1
make: *** Waiting for unfinished jobs....
Fix this problem by checking IS_DARWIN before setting OSXPORT_COMPILER.
Analysis tools are generating findings when the pointer xorBlocks is used as the flag. The other missing piece is, xorBlocks is never NULL when either BT_XorInput or BT_XorOuput. But we don't know how to train the analyzers with the information, so we make it explicit with the boolean flags xorInput and xorOutput.
Switching to the explicit flags costs us about 0.01 cpb on a modern Intel Core processor. In the typical case 0.01 is negligible.
vector_ptr was added at Crypto++ 5.6.5 to manage an array acquired with new[] in C++03. We can now use a combination of SecBlock and SetMark(0) to achieve the same effect.
Well, the IBM docs were not quite correct when they stated "The block is aligned so that it can be used for any type of data". The vector data types are pretty standard, even across different machines from diffent manufacturers
Travis is having infrastructure problems since it migrated in November 2017. Our OS X and iOS tests hang for days. When the current job hangs, new jobs that enter the queue later hang too because the original job is still waiting.
The subsequent hangs effect Android and Linux, too. Our Travis scripts test Android, Linux, OS X and iOS. A hang effects everything.
We are going to disable Travis OS X and iOS tests until things improve.
Fix Environment setup for android to match the new unified headers.
Adjust the Makefile accordingly.
Updated the test scripts and travis to test these changes.
This check-in adds three additional functions for backwards compatibility: crypto_box_unchecked, crypto_box_open_unchecked and crypto_box_beforenm_unchecked. The functions can be used for interoperability with downlevel clients, like old versions of NaCl and libsodium. It should also help some cryptocurrencies, like Bitcoin, Ethereum, Monero and Zcash.
Also see https://eprint.iacr.org/2017/806.pdf (low order element attack) and https://github.com/jedisct1/libsodium/issues/662 (Zcash break).
TweetNaCl is a compact reimplementation of the NaCl library by Daniel J. Bernstein, Bernard van Gastel, Wesley Janssen, Tanja Lange, Peter Schwabe and Sjaak Smetsers. The library is less than 20 KB in size and provides 25 of the NaCl library functions.
The compact library uses curve25519, XSalsa20, Poly1305 and SHA-512 as default primitives, and includes both x25519 key exchange and ed25519 signatures. The complete list of functions can be found in TweetNaCl: A crypto library in 100 tweets (20140917), Table 1, page 5.
Crypto++ retained the function names and signatures but switched to data types provided by <stdint.h> to promote interoperability with Crypto++ and avoid size problems on platforms like Cygwin. For example, NaCl typdef'd u64 as an unsigned long long, but Cygwin, MinGW and MSYS are LP64 systems (not LLP64 systems). In addition, Crypto++ was missing NaCl's signed 64-bit integer i64.
Crypto++ enforces the 0-key restriction due to small points. The TweetNaCl library allowed the 0-keys to small points. Also see RFC 7748, Elliptic Curves for Security, Section 6.
TweetNaCl is well written but not well optimized. It runs 2x to 3x slower than optimized routines from libsodium. However, the library is still 2x to 4x faster than the algorithms NaCl was designed to replace.
The Crypto++ wrapper for TweetNaCl requires OS features. That is, NO_OS_DEPENDENCE cannot be defined. It is due to TweetNaCl's internal function randombytes. Crypto++ used DefaultAutoSeededRNG within randombytes, so OS integration must be enabled. You can use another generator like RDRAND to avoid the restriction.
* Conditionally use a lambda rather than the older `bind2nd` style.
* Duplicate the if statements.
* Centralise the conditional compilation to an implementation of find_if_not.
* Refactoring of name and code placement after review.
* Use `FindIfNot` where appropriate.
* Remove whitespace.
This was a latent bug that just surfaced on a Sun Core2 workstation. RDSEED caused an illegal instruction exception on the Core2. It seems we managed to miss it because old processors had family and stepping values so low they never set CPUID.EBX.RDSEED[bit 18] = 1. Newer processors had the feature so CPUID.EBX.RDSEED[bit 18] = 1 was accurate.
This is initial testing support for N4713, "Working Draft, Standard for Programming Language C++". We know GCC uses -std=c++20 and -std=gnu++20, so we can start testing things
This commit also tweaks the way Integer parses byte arrays. The modified routines are slightly faster. On a Core-i5 6400 the self tests are 0.1 to 0.2 seconds faster
As reported in #520, C++Builder standard libraries don't have a `log()` function at global namespace. Change the invocations to unqualified name lookup, and apply a using-declaration to `std::log()` when compiling under C++Builder.
This may cause GH #300, "Clang 3.9 and missing member definitions for template classes" or GH #294, "Fix clang warnings about undefined variable templates in pkcspad.h" to resurface. Man I hope not...
This may cause GH #300, "Clang 3.9 and missing member definitions for template classes" or GH #294, "Fix clang warnings about undefined variable templates in pkcspad.h" to resurface. Man I hope not...
We were cating UBsan findings under Clang similar to "adv-simd.h:1138:26: runtime error: addition of unsigned offset to 0x000002d41410 overflowed to 0x000002d41400". The problem was CRYPTOPP_CONSTANT, which used an enum. The compiler is allowed to pick the underlying data type, and Clang was picking a signed type
* GNUmakefile-cross: Fix install target
The install target was not working: missing mkdir before copying files,
wrong dynamic library copied, missing ldconf.
The fix is mostly taken from the install target from GNUmakefile.
* Makefile: call 'ln -sf' instead of 'ln -sf -sf'
Power4 lacks 'vector long long'
Rename datatypes such as 'uint8x16_p8' to 'uint8x16_p'. Originally the p8 suffix indicated use with Power8 in-core crypto. We are now using Altivec/Power4 for general vector operations.
Power4 lacks 'vector long long'
Rename datatypes such as 'uint8x16_p8' to 'uint8x16_p'. Originally the p8 suffix indicated use with Power8 in-core crypto. We are now using Altivec/Power4 for general vector operations.
Power4 lacks 'vector long long'
Rename datatypes such as 'uint8x16_p8' to 'uint8x16_p'. Originally the p8 suffix indicated use with Power8 in-core crypto. We are now using Altivec/Power4 for general vector operations.
AES does not have a NEON implementation. However, because it includes "adv-simd.h", it needs the compiler options so NEON types are available. Otherwise the compile fails.
We can't guard "adv-simd.h" and NEON on just AES because Simon and Speck use the templates in their NEON implementations.
This also fixes the SPECK64 bug where CTR mode self tests fail. It was an odd failure because it only affected 64-bit SPECK. SIMON was fine and it used nearly the same code. We tracked it down through trial and error to the table based rotates.
It looks like the delay was due to some GCC 7 issue. We had to disable parallel blocks on Aarch64 with GCC 7. We may be running out of registers and that could be causing problems. It looks like GCC uses up to v30.
For Simon-64 and Speck-64 this means we are effectively using 12x-4x-1x. We are mostly at the threshold for IA-32 and parallelization. At any time 10 to 13 XMM registers are being used.
Prefer movsd by way of _mm_load_sd and _mm_store_sd.
Fix "error C3861: _mm_cvtsi128_si64x identifier not found".
The tests were generated using Crypto++ and the straight C++ implementation. It should allow us to test the SSE and NEON impelmentations and multiple blocks
The tests were generated using Crypto++ and the straight C++ implementation. It should allow us to test the SSE and NEON impelmentations and multiple blocks
The folding of statements helps GCC elimate some of the intermediate stores it was performing. The elimination saved about 1.0 cpb. SIMON-128 is now running around 10 cpb, but it is still off the Simon and Speck team's numbers of 3.5 cpb
Louis Wingers and Bryan Weeks from the Simon and Speck team offered the suggestion. The change save 0.7 cpb for Speck, and 5 cpb for Simon on x86_64.
Speck is now running very close to the Team's time sor SSE4. Simon is still off, but we know the root cause. For Simon, the Team used a fast bit-sliced implementation
The class declaration needs to always include the functions for the platform. The implementation can simply return a different number, and that is hidden from the user
China’s SM2 uses an identity field for digital signatures. We used a ConstByteArrayParameter rather than a char* because the identifier may not be a C-string. The observation is based on experience with Thomas Wu’s Secure Remote Protocol (SRP)
The template functions take the rotate amount as a template parameter, which will allow the constexpr to propagate into the rotate expression. It should avoid some of the compile problems we were seeing under Clang and C++11
This was interesting... The C&-D is an early 2000's 32-bit processor with SSE2 and SSSE3. Using a destination register constraint of "xm" witnessed a crash, while a constraint of "m" does not
ARM64 is kind of useless. We need A-32 (old ARM), Aarch32 (new 32-bit ARM) and Aarch64 (new 64-bit ARM). Aarch32 and Aarch64 is captured by IS_ARMV8, and A-32 is captured by IS_ARM
Both BLAKE2 and SPECK slow down when using NEON/ASIMD. When just BLAKE2 experienced the issue, it was a one-off problem. Its now wider than a one-off, so add the formal define
Performance increased by about 100% on a 3.1 GHz Core i5 Skylake. Throughput went from about 7.3 cpb to about 3.5 cpb. Not bad for a software-based implementation of a block cipher
Performance increased by about 100% on a 3.1 GHz Core i5 Skylake. Throughput went from about 7.3 cpb to about 3.5 cpb. Not bad for a software-based implementation of a block cipher
These were generated by Crypto++ using the C/C++ implementation, which operates on 1 block at a time. They are consumed by the SSSE3 implementation, which operates on 4 blocks at a time. Its not ideal, but it will have to do.
We need to ensure SSE2 does not cross pollinate into other CPU functions since SSE2 is greater than the minimum arch. The minimum arch is i586/i686, and both lack SSE2 instructions
* Set default target Windows version for MinGW to XP
The original MinGW from mingw.org targets Windows 2000 by default, but lacks
the <wspiapi.h> include needed for Windows 2000 support.
* Disable CRYPTOPP_CXX11_SYNCHRONIZATION for original MinGW
std::mutex is only available in libstdc++ if _GLIBCXX_HAS_GTHREADS is defined,
which is not the case for original MinGW. Make the existing fix for AIX more
general to fix this. Unfortunately, any C++ header has to be included to
detect the standard library and the otherwise empty <ciso646> is going to be
removed from C++20, so use <cstddef> instead.
Add tests for Support for Control-flow Enforcement Technology (CET). This is an upcoming processor feature. We want to be out in front of breaks to our inline assembly
This cleans up the compile on old PwerMac G5's. Our Altivec and Crypto code relies on Power7 and Power8 extensions. There's no need to shoehorn Altivec and Power4 into old platforms, so we disable Altivec and Crypto unless Power7 is available. The GNUmakefile sets CRYPTOPP_DISABLE_ALTIVEC if Power7 is not available.
Enable aligned allocations under IBM XL C/C++. Based on the AIX malloc man pages, "... the block is aligned so that it can be used for any type of data". Previously CRYPTOPP_NO_ALIGNED_ALLOC was in effect.
Use malloc instead of calloc on OS X. Based on the OS X malloc man pages, "... the allocated memory is aligned such that it can be used for any data type, including AltiVec- and SSE-related types". Additionally, calloc zero'd the memory it allocated which slowed things down on Apple systems.
This is the first of possibly two or three for Borland compilers. We have to be careful because its very easy to break something due to math overloads with other compilers like SunCC or XL/C
This avoids the probe for SSE2 in most circumstances. The SSE2 test is mostly benign nowadays since SSE2 and OS support is nearly ubiquitous. But the define CRYPTOPP_NO_CPU_FEATURE_PROBES added for Apple OSes was interacting badly on x86 machines. Also see GH #511.
UncheckedSetKey has traditionally been a protected member function. The public API traditionally uses SetKey (and friends) to set the key. Internally, SetKey may call UncheckedSetKey. It looks like UncheckedSetKey was made public when authenticated encryption support was added.
Its probably not a good idea to have users calling UncheckedSetKey. Most (all?) of the time it does nothing for authenc modes. The other remaining cases it may not work as expected.
Move m_aliasBlock into Rijndael::Base. m_aliasBlock is now an extra data member for Dec because the aliased table is only used for Enc when unaligned data access is in effect. However, the SecBlock is not allocated in the Dec class so there is no runtime penalty.
Moving m_aliasBlock into Base also allowed us to remove the Enc::Enc() constructor, which always appeared as a wart in my eyes. Now m_aliasBlock is sized in UncheckedSetKey, so there's no need for the ctor initialization.
Also see https://stackoverflow.com/q/46561818/608639 on Stack Overflow. The SO question had an unusual/unexpected interaction with CMake, so the removal of the Enc::Enc() ctor should help the problem.
Add m_isSpecial, m_mandatoryBlockSize and m_optimalBufferSize members. The additional members stabilize running times and avoid some unnecessary calculations. Previously we were calculating some values in each call to Put and LastPut.
CC optimizes things best when isSpecial uses the two predicates. If the 'm_cipher.MandatoryBlockSize() > 0' is removed, then some block ciphers and modes lose up to 0.2 cpb. Apparently GCC can optimize away the second predicate easier than the first predicate.
Some authenticated encryption modes have needs that are not expressed well with MandatoryBlockSize() and MinLastBlockSize(). When IsLastBlockSpecial() returns true three things happen. First, standard block cipher padding is not applied. Second, the ProcessLastBlock() is used that provides inString and outString lengths. Third, outString is larger than inString by 2*MandatoryBlockSize(). That is, there's a reserve available when processing the last block.
The return value of ProcessLastBlock() indicates how many bytes were written to outString. A filter driving data will send outString and returned length to an AttachedTransformation() for additional processing.
StreamTransformationFilter had a small hack to accomodate AuthenticatedEncryptionFilter and AuthenticatedDecryptionFilter. The hack was enough to support CCM, EAX and GCM modes, which looks a lot like a regular stream cipher from the filter framework point of view.
OCB is slightly different. To the filter framework it looks like a block cipher with an unusual last block size and padding scheme. OCB uses MandatoryBlockSize() == BlockSize() and MinLastBlockSize() == 1 with custom padding of the last block (see the handling of P_* and A_* in the RFC). The unusual config causes the original StreamTransformationFilter assert to fire even though OCB is in a normal configuration.
For the time being, we are trying to retain the assert becuase it is a useful diagnostic. Its possible another authenticated encryption mode, like AEZ or NORX, will cause the assert to incorrectly fire (yet again). We will cross that bridge when we come to it.
This provides the functions needed for an implementation. It does not provide the implementation itself
Signed-off-by: Jeffrey Walton <noloader@gmail.com>
The problem was vec_sld is endian sensitive. The built-in required more than us setting up arguments to ensure the vsx load resulted in a big endian value. Thanks to Paul R on Stack Overflow for sharing the information that IBM did not provide. Also see http://stackoverflow.com/q/46341923/608639
Make internal functions static. We get better optimizations depsice using unnamed namespaces
Add PowerPC uint32x4 functions for handling 32-bit rcon and mask
This is AES-128 key expansion for big endian. Little endian has a bug in it so it can't be enabled at the moment. GDB is acting up on GCC112, so I've had trouble investigating it
We determine machine capabilities by performing an os/platform *query* first, like getauxv(). If the *query* fails, we move onto a cpu *probe*. The cpu *probe* tries to exeute an instruction and then catches a SIGILL on Linux or the exception EXCEPTION_ILLEGAL_INSTRUCTION on Windows. Some OSes fail to hangle a SIGILL gracefully, like Apple OSes. Apple machines corrupt memory and variables around the probe.
The commit also attempts to avoid the shell command for Windows machines.
If no one has patches to offer for the outstanding CMake bugs, then this is the version that will be moved to the Wiki Patch Page. The community will have to tend to the outstanding bugs when someone with domain experience can work them
The inline ASM code now uses local variables to save the EAX-EDX registers, and then copies the locals into the function parameters. It side steps problems with calling conventions
This increases performance to about 1.6 cpb. We are about 0.5 cpb behind Botan, and about 1.0 cpb behind OpenSSL. However, it beats the snot out of C/C++, which runs at 20 to 30 cpb
This is the forward direction on encryption only. Crypto++ uses the "Equivalent Inverse Cipher" (FIPS-197, Section 5.3.5, p.23), and it is not compatible with IBM hardware. The library library will need to re-work the decryption key scheduling routines. (We may be able to work around it another way, but I have not investigated it).
The strategy of "cleanup under-aligned buffers" is not scaling well. Corner cases are still turing up. The library has some corner-case breaks, like old 32-bit Intels. And it still has not solved the AltiVec and Power8 alignment problems.
For now we are backing out the changes and investigating other strategies
This commit supports the upcoming AltiVec and Power8 processor. This commit affects a number of classes due to the ubiquitous use of AES. The commit adds debug asserts to warn of under-aligned and misaligned buffers in debug builds.
This commit supports the upcoming AltiVec and Power8 processor. This commit affects a number of classes due to the ubiquitous use of AES. The commit provides the data alignment requirements.
This commit supports the upcoming AltiVec and Power8 processor support for stream ciphers. This commit affects GlobalRNG() most because its an AES-based generator. The commit favors AlignedSecByteBlock over SecByteBlock in places where messages are handled on the AltiVec and Power8 processor data paths. The data paths include all block cipher modes of operation, and some filters like FilterWithBufferedInput.
Intel and ARM processors are tolerant of under-aligned buffers when using crypto instructions. AltiVec and Power8 are less tolerant, and they simply ignore the three low-order bits to ensure an address is aligned. The AltiVec and Power8 have caused a fair number of wild writes on the stack and in the heap.
Testing on a 64-bit Intel Skylake show a marked improvement in performance. We suspect GCC is generating better code since it knows the alignment of the pointers, and does not have to emit fixup code for under-aligned and mis-aligned data. Testing on an mid-2000s 32-bit VIA C7-D with SSE2+SSSE3 showed no improvement, and no performance was lost.
This commit supports the upcoming AltiVec and Power8 processor support for DefaultEncryptors and DefaultDecryptors. The commit favors AlignedSecByteBlock over SecByteBlock in places where messages are handled on the AltiVec and Power8 processor data paths. The data paths include all block cipher modes of operation, and some filters like FilterWithBufferedInput.
Intel and ARM processors are tolerant of under-aligned buffers when using crypto intstructions. AltiVec and Power8 are less tolerant, and they simply ignore the three low-order bits to ensure an address is aligned. The AltiVec and Power8 have caused a fair number of wild writes on the stack and in the heap.
Testing on a 64-bit Intel Skylake show a marked improvement in performance. We suspect GCC is generating better code since it knows the alignment of the pointers, and does not have to emit fixup code for under-aligned and mis-aligned data. Testing on an mid-2000's 32-bit VIA C7-D with SSE2+SSSE3 showed no improvement, and no performance was lost.
This commit supports the upcoming AltiVec and Power8 processor support. The commit favors AlignedSecByteBlock over SecByteBlock in places where messages are handled on the AltiVec and Power8 processor data paths. The data paths include all block cipher modes of operation, and some filters like
Intel and ARM processors are tolerant of under-aligned buffers when using crypto intstructions. AltiVec and Power8 are less tolerant, and they simply ignore the three low-order bits to ensure an address is aligned. The AltiVec and Power8 have caused a fair number of wild writes on the stack and in the heap.
Testing on a 64-bit Intel Skylake show a marked improvement in performance. We suspect GCC is generating better code since it knows the alignment of the pointers, and does not have to emit fixup code for under-aligned and mis-aligned data. Here are some data points:
SecByteBlock
- Poly1305: 3.4 cpb
- Blake2s: 6.7 cpb
- Blake2b: 4.5 cpb
- SipHash-2-4: 3.1 cpb
- SipHash-4-8: 3.5 cpb
- ChaCha20: 7.4 cpb
- ChaCha12: 4.6 cpb
- ChaCha8: 3.5 cpb
AlignedSecByteBlock
- Poly1305: 2.9 cpb
- Blake2s: 5.5. cpb
- Blake2b: 3.9 cpb
- SipHash-2-4: 1.9 cpb
- SipHash-4-8: 3.3 cpb
- ChaCha20: 6.0 cpb
- ChaCha12: 4.0 cpb
- ChaCha8: 2.9 cpb
Testing on an mid-2000's 32-bit VIA C7-D with SSE2+SSSE3 showed no improvement, and no performance was lost.
Since removing the allocator overloards that handled the wipe mark, we have to route deallocate into the standard one. The standard one fires an assert for [now] normal operation
There was no aliasing violation in practice. We used a to assign the right pointer. If the compiler would have removed the unneeded assignment based on T_64bit, then we would not have been flagged.
* Fix build on FreeBSD 10.3 x86 with clang++ v. 3.4.1. The x64 build (also clang++ 3.4.1) doesn't require CRYPTOPP_DISABLE_SHA_ASM. It seems to be a bug specific to the x86 version of clang++.
* Based on suggestion from @noloader, don't split x86/x64 clang++ version detection. Just wait until clang++ is consistently working in both x86/x64.
16-byte aligned is the default for most systems nowadays, so we side stepped alignment problems on all platforms except 32-bit Solaris. We need the 16-byte alignment for all Intel compatibles since the late 1990s, which is nearly all processors in the class.
The worst case is, if a processor lacks SSE2, then it gets an aligned SecBlock anyways. The last time we saw processors without the features was 486 and early Pentiums, and that was 1996 or so. Even low-end processors like Intel Atoms and VIA have SSE2+SSSE3.
Also see "Enable 16-byte alignment full-time for i386 and x86_64?" (https://groups.google.com/forum/#!topic/cryptopp-users/ubp-gFC1BJI) for a discussion.
This reverts commit 64def346cd. It broke AppVeyor and Travis builds (it tested good locally on Intel, Aarch and Solaris i86). CMake is so fucked up. I regret the day we added it to the project.
We don't know if this is going to fix the issue because we don't have a MinGW platofrm for testing. However, from VRE's answer on Stack Overflow (and the chronic CMake problems with execute_process), we believe this may be the fix.
The fix tested OK on WIndows, Linux, OS X and Cygwin. At worse, it won't do any harm
Currently the CRYPTOPP_BOOL_XXX macros set the macro value to 0 or 1. If we remove setting the 0 value (the #else part of the expression), then the self tests speed up by about 0.3 seconds. I can't explain it, but I have observed it repeatedly.
This check-in prepares for the removal in Upstream master
The Padlock SDK sample code leaves a lot to be desired. Regariding the 64-bit samples and instr_linux64.asm... it looks like the sample sill uses 32-bit constants, but most anything related to extended registers, like rdi, is commented out
After more investigation it appears the issue was either Undefined Behavior or a Strict Aliasing violation in GCC; and it was in the test program and not the library. We're not sure which at the moment, but we were able to identify the problematic code. See the comments with Issue 414 (https://github.com/weidai11/cryptopp/issues/414)
I wish this god damn compiler would stop pretending to be other compilers when it can't consume the same program. Even the GCC devs have told the LLVM devs to stop ding that crap
I wish GCC would get its head out of its ass and define the apprpriate defines. NEON/ASIMD cannot be disgorged from Aarch32/Aarch64 just like SSE2 cannot be disgorged from x86_64. They are core instruction sets
Clang causes too many problems. Early versions of the compiler simply crashes. Later versions of the compiler still have trouble with Intel ASM and still produce incorrect results on occassion. Additionally, we have to special case the integrated assemvler. Its making a mess of the code and causing self test failures
This shows up under debug builds when testing instantiations.
warning: binding dereferenced null pointer to reference has
undefined behavior [-Wnull-dereference]
DH2 dh(*(SimpleKeyAgreementDomain*)NULLPTR);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The implementation came from Jack Lloyd and the Botan team. Jack and the Botan was gracious and allowed us to use Botan's x86_encrypt_blocks function. They also allowed us to release it under the Crypto++ licensing terms. Also see https://github.com/randombit/botan/pull/1151/files
passed: 128 deflates and inflates
passed: 128 zlib decompress and compress
default.cpp:69:2: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null
Information Dispersal and Secret Sharing...
GCC117 is a Aarch64/ARM64 server with AMD's ARM chip and GCC 7.10. It looks like GCC is performing some std::string optimizations that generates a finding. We did not witness the finding on other platforms, like other Aarch64 devices and x86_64.
We will need to check if taking the address of element-0 is still approved way to get the non-const pointer to the elements
GCC117 is a Aarch64/ARM64 server powered by AMD's ARM chip. It runs GCC 7.10. It looks like GCC is performing some std::string optimizations that generates a finding. We have not witnessed the finding on other platforms
Reworked SHA class internals to align all the implementations. Formerly all hashes were software based, IterHashBase handled endian conversions, IterHashBase repeatedly called the single block SHA{N}::Transform. The rework added SHA{N}::HashMultipleBlocks, and the SHA classes attempt to always use it.
Now SHA{N}::Transform calls into SHA{N}_HashMultipleBlocks, which is a free standing function. An added wrinkle is hardware wants little endian data and software presents big endian data, so HashMultipleBlocks accepts a ByteOrder for the incoming data. Hardware based SHA{N}_HashMultipleBlocks can often perform the endian swap much easier by setting an EPI mask so it was profitable to defer to hardware when available.
The rework also removed the hacked-in pointers to implementations. The class now looks more like AES, GCM, etc.
kalyna.cpp:432: error: integer constant is too large for 'long' type
kalyna.cpp:509: error: integer constant is too large for 'long' type
kalyna.cpp:608: error: integer constant is too large for 'long' type
kalyna.cpp:713: error: integer constant is too large for 'long' type
kalyna.cpp:833: error: integer constant is too large for 'long' type
...
Effectively this creates a workspace for encrypting the nonce. The zeroizer will run when the class is destroyed, rather than each invocation of UncheckedSetKey.
Performance went from 3.6 cpb as a temporary to 2.9 cpb as a class member
This broke MSbuild, which can no longer build a static library. Attempting to build with 'msbuild /t:Build cryptlib.vcxproj' results in:
...
X64\cryptlib\Debug\zinflate.obj
X64\cryptlib\Debug\zlib.obj
LINK : fatal error LNK1561: entry point must be defined [c:\Users\cryptopp\cryptlib.vcxproj]
Done Building Project "c:\Users\Jeff\Desktop\cryptopp\cryptlib.vcxproj" (Build target(s)) -- FAILED.
Microsoft tools are so fucked up. It should be illegal to sell them.
Users of OldRandomPool must use the new interface. All that means is they must call IncorporateEntropy instead of Put, and GenerateBlock instead of Get
The existing interface still exists. The new interface is routed into the old methods. Without the new interface, using OldRandPool could result in:
$ ./cryptest.exe v
terminate called after throwing an instance of CryptoPP::NotImplemented
what(): RandomNumberGenerator: IncorporateEntropy not implemented
Aborted (core dumped)
RandomPool used to be a PGP-style deterministic generator and folks used it as a key generation function. At Crypto++ 5.5 the design changed to harden it agianst rollback attacks. The design change resulted in an upgrade barrier. That is, some folks are stuck at Crypto++ 4.2 or Crypto++ 5.2 because they must interoperate with existing software.
Below is the test program we used for the test vector. It was run against Crypto++ 5.4.
RandomPool prng;
SecByteBlock seed(0x00, 384), result(64);
prng.Put(seed, seed.size());
prng.GenerateBlock(result, result.size());
HexEncoder encoder(new FileSink(std::cout));
std::cout << "RandomPool: ";
encoder.Put(result, sizeof(result));
std::cout << std::endl;
Commit 4630a5dab6 broke compilation for
Windows 2000 and earlier as getaddrinfo was introduced in Windows XP.
Fix this by including <wspiapi.h> when targeting Windows 2000 and
earlier, which falls back to an inline implementation of getaddrinfo
when necessary.
Some MinGW flavors still target Windows 2000 by default.
Ref:
https://msdn.microsoft.com/en-us/library/windows/desktop/ms738520.aspx,
section "Support for getaddrinfo on Windows 2000 and older versions"
Benchmark2 is used to benchmark shared key algorithms. At the moment Benchmark2 is all or nothing. It does not understand SharedKeyMAC, SharedKeyStream, SharedKeyBlock. It will be fixed in the future.
CRYPTOPP_NO_UNALIGNED_DATA_ACCESS was required in Crypto++ 5.6 and earlier because unaligned data access was the norm. It caused problems at -O3 and on ARM NEON.
At Crypto++ 6.0 no unaligned data access became a first class citizen. Folks who want to allow it must now define CRYPTOPP_ALLOW_UNALIGNED_DATA_ACCESS
- don't enable SSE2 explicitly for x64, it's always enabled and causes
warnings (issue #445)
- remove newlines in project files that Visual Studio doesn't like and
removes on every change to project options
* Fix compilation on Windows with /DUNICODE
* Fix linking of fipstest for MSVC targeting ARM (__crt_debugger_hook is not available).
* Fix build for Clang on Windows with optimizations on.
* Fix a warning about a non-existant warning under Clang.
* Fix compilation under Intel C++ 18.0 on Windows
This check-in supports Romain Geissler's work on cleaning up our use of ::byte when it collides with std::byte. Regardless of what happens, such as removing ::byte and adding CryptoPP::byte, providing the typedef here makes Kalyna immune to the outside changes. Also see Pull Request 437 and 438.
When compiling with Visual Studio 2015+, Crypto++ uses CryptoNG by
default. CryptoNG is only available on Windows Vista and later and
Crypto++ currently ignores if the user explicitly wants to target
Windows XP. Unlike with other Windows SDK features, everything
compiles, but the application doesn't start on Windows XP because
bcrypt.dll is missing. That is an issue when updating Visual Studio
because the root cause is hard to find.
Making use of CryptoNG when targeting Windows 8+ instead by default,
regardless of the Visual Studio version, to fix this.
In the bigger picture, the code to use inline ASM when intrinsics are not available still needs to be checked-in. Its a big change since we moved into SSE4, AVX and SHA. Design changes are still being evaluated, and its still being tested.
This reverts commit eb3b27a6a5. The change broke GCC 4.8 and unknown version of Clang on OS X. UB reported the OS X break, and JW found duplicated the break on a ARM CubieTruck with GCC 4.8.
Its a new Google Group created at https://groups.google.com/forum/#!forum/cryptopp-build. The list should allow us to run services on unprivileged forks and other unrelated accounts while making it easy to find results.
They seemed to produce a hang when running self tests in AppVeyor.
Also use IsDebuggerPresent() to determine when we should call DebugBreak(). The OS killed our debug build when fuzzing caused an assert to fail
Make builds reproducible
See https://reproducible-builds.org/ for why this is good.
Without this patch g++ would order functions in libcryptopp.so.5.6.5
depending on random order of files in the build system's filesystem.
We have not been able to determine a reliable way to detect cpu's and platforms with Cmake. We are side stepping the Cmake problem by building rdrand.cpp all the time. If its not avilable for a cpu or platform, then RDRAND or RDSEED throw an exception.
Most of these appear to have been cleared over the last couple of years.
C4127 is too prevelant. We are probably going to have to live with it.
We may be able to clear C4250 with a using statement. For example 'using ASN1CryptoMaterial::Load'.
MSVC resisted clearing C4661 by pushing/poping in iterhash.h and osrng.h. It was like MSVC simply ignored it.
We also add a helper to PutDecodedDatumInto which reverses the little-endian values from the Threefish test vectors. Test vectors will follow at next check-in.
Formatting of data for a failed self test was still off a bit. It was due to retaining a whitespace character from the test vector file. The problem was, the whitespace was a tab on occasion.
This will support Threefish and its 1024-bit block size. I believe this is correct, but it may be wrong. According to "Table of Low-Weight Binary Irreducible Polynomials" (http://www.hpl.hp.com/techreports/98/HPL-98-135.pdf), the polynomial is x^1024 + x^19 + x^6 + x + 1.
This will support Threefish and its 1024-bit block size. I believe this is correct, but it may be wrong. According to "Table of Low-Weight Binary Irreducible Polynomials" (http://www.hpl.hp.com/techreports/98/HPL-98-135.pdf), the polynomial is x^1024 + x^19 + x^6 + x + 1.
I believe this is correct, but it may be wrong. According to the Kalyna team, the polynomial for GCM mode is x^512 + x^8 + x^5 + x^2 + 1. It appears the polinomial applies to other block cipher modes of operations, like CMAC.Dropping the first term and evaluating the remaining terms at X=2 results in 293 (0x125)
Variable block size ciphers need the key set before they can return an accurate size for BlockSize(). This issue surfaced during Kalyna testing with authenticated encryption modes. In particular, EAX mode, which effectively uses CMAC:
AlgorithmParameters params = MakeParameters(Name::BlockSize(), 64)
(Name::IV(), ConstByteArrayParameter((const byte *)iv, 64));
EAX<Kalyna>::Encryption kalyna;
kalyna.SetKey(key, 64, params);
This was introduced at Commit e456cd2275, and affected Uri during his rounds of testing.
We also took the opportunity to write it in modern C++ (and remove the VC++ 6.0 bug workaround)
Previously the parsed string would look as follows. You would get this on a failed self test.
Key: 0000000000000000
0000000000000000
0000000000000000
0000000000000000
The new behavior eats the leading whitespace, so the key is reported as:
Key: 0000000000000000000000000000000000000000000000000000000000000000
zinflate.cpp:553:41: runtime error: index 30 out of bounds for type 'unsigned int [30]'
zinflate.cpp:553:11: runtime error: load of address 0x0000011806b8 with insufficient space for an object of type 'const unsigned int'
zinflate.cpp:32:32: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int'
The code at check-in a5c67cfdd6 did not include it. Unlike Threefish, it looks like Kalyna could benefit from the cache hardening given how similar Kalyna is to AES. The hardening costs less than 0.1 cpb, which equates to about 199 MB/s vs 201 MB/s on a 6th gen Skylake
CRYPTOPP_COVERAGE was added at 9614307ab7 to increase code coverage support. This commit enables additional validation routines when CRYPTOPP_COVERAGE is in effect.
VariableBlockSize and VariableBlockCipherImpl were added at Commit bd8edfa87b. Reflecting on FixedKeyLength and VariableKeyLength, the const KEYLENGTH is only provided by FixedKeyLength. VariableKeyLength provides DEFAULT_KEYLENGTH. This check-in makes VariableBlockSize follow VariableKeyLength.
This check-in also splits block size and iv length. Its conceivable we will encounter a cipher with a block size of 128-bits with an iv of 256-bits. The bd8edfa87b check-in could not handle the difference, so we fix it now.
This should lead the way for more modern block ciphers like Threefish and Kalyna. It tested well with both regular cipher modes (the mode has an instance of the cipher) and external cipher modes (the cipher and mode are distinct objects, and the mode holds a reference to the cipher).
We still have to work out the details of naming a cipher. For example, Kalyna with a 128-bit key can use a 128-bit or 256-bit block size. Kalyna-128 is not enough to describe the algorithm and locate it in the object registry. Kalyna-128-128 looks kind of weird; maybe Kalyna-128(128) or Kalyna-128(256) would be better.
Here are the initial test cases to verify functionality:
byte key[64] = {}, iv[32] = {};
ECB_Mode<Kalyna>::Encryption enc1;
enc1.SetKey(key, 16);
CBC_Mode<Kalyna>::Encryption enc2;
enc2.SetKeyWithIV(key, 16, iv);
AlgorithmParameters params = MakeParameters
(Name::BlockSize(), 32)
(Name::IV(), ConstByteArrayParameter(iv, 32));
CTR_Mode<Kalyna>::Encryption enc3;
enc3.SetKey(key, 16, params);
CBC_Mode<Kalyna>::Encryption enc4;
enc4.SetKey(key, 32, params);
Kalyna::Encryption enc5;
ECB_Mode_ExternalCipher::Encryption ecb(enc5);
ecb.SetKey(key, 16, params);
Kalyna::Encryption enc6;
ECB_Mode_ExternalCipher::Encryption cbc(enc6);
cbc.SetKey(key, 32, params);
This should have happened when we removed most of MAINTAIN_BACKWARDS_COMPATIBILITY artifacts. Its not practical move SHA1 into Weak:: namespace or "typedef SHA256 SHA" because SHA1 is too intertwined at the moment.
In the interim, maybe we can place SHA1 in both CryptoPP:: and Weak:: namespaces. This will allow us to transition into Weak::SHA1 over time, and signal to users SHA1 should be avoided.
They are giving ARIA and BLAKE2 trouble. It looks like SSE4 support appeared in the GCC compiler around 4.1 or 4.2. It looks like SHA support appeared in the GNU assembler around 2.18
Initially we performed a 32-bit word-size ByteReverse() on the entire 64-byte buffer being hashed. Then we performed another fix-up when loading each 16-byte portion of the buffer into the SSE2 registers for SHA processing. The [undesired] consequence was byte swapping and reversals happened twice. Worse, the call to ByteReverse() produced 16 bswaps instead of 1 call pshufb, so it was orders of magnitude slower than it needed to be.
This check-in takes the sane approach to byte reversals and swapping. It performs it once when the message is loaded for SSE processing. The result is SHA1 calculations drop from about 3.0 cpb to about 2.5 cpb.
Previously, all 1024-bit tests were run, and then 2048-bit tests were run. Splitting them meant there were two entries for DSA-RFC6979/SHA-1, two entries for DSA-RFC6979/SHA-256 and so on. Now there will be one entry output during testing.
sha2.txt and sha3.txt are just collections of other files, so they don't take up much space.
This commit stens from and exception when running 'cryptest.exe tv sha2' and 'cryptest.exe tv sha3'. Its not obvious the name of the file to be run sha2_224_fips_180.txt. Users should not have to hunt for the reason sha2 and sha3 do not work.
sha2.txt and sha3.txt are just collections of other files, so they don't take up much space.
This commit stens from and exception when running 'cryptest.exe tv sha2' and 'cryptest.exe tv sha3'. Its not obvious the name of the file to be run sha2_224_fips_180.txt. Users should not have to hunt for the reason sha2 and sha3 do not work.
regtest.cpp is where ciphers register by name. The library has added a number of ciphers over the last couple of years and the source file has experienced bloat. Most of the ARM and MIPS test borads were suffering Out of Memory (OOM) kills as the compiler processed the source fille and the included header files.
This won't stop the OOM kills, but it will help the situation. An early BeagleBoard with 512 MB of RAM is still going to have trouble, but it can be worked around by building with 1 make job as opposed to 2 or 4.
The ARIA S-boxes could leak timining information. This commit applies the counter measures present in Rijndael and Camellia to ARIA. We take a penalty of about 0.05 to 0.1 cpb. It equates to about 0 MiB/s on an ARM device, and about 2 MiB/s on a modern Skylake.
We recently gained some performance though use of SSE and NEON in ProcessAndXorBlock, so the net result is an improvement.
Tune CRYPTOPP_ENABLE_ARIA_SSE2_INTRINSICS and CRYPTOPP_ENABLE_ARIA_SSSE3_INTRINSICS macro for older GCC and Clang. Clang needs some more tuning on Aarch64 becuase performance is off by about 15%.
Add additional NEON code paths.
Remove keyBits from Aarch64 code paths.
The SSSE3 intrinsics were performing aligned loads using _mm_load_si128 using user supplied pointers. The pointers are only a byte pointer, so its alignment can drop to 1 or 2. Switching to _mm_loadu_si128 will sidestep potential problems. The crash surfaced under Win32 testing.
Switch to memcpy's when performing bulk assignment x[0]=y[0] ... x[3]=y[3]. I believe Yun used the pattern to promote vectorization. Some compilers appear to be braindead and issue integer move's one word at a time. Non-braindead compiler will still take the optimization when advantageous, and slower compilers will benefit from the bulk move. We also cherry picked vectorization opportunities, like in ARIA_GSRK_NEON.
Remove keyBits variable. We now use UncheckedSetKey's keylen throughout.
Also fix a typo in CRYPTOPP_BOOL_SSSE3_INTRINSICS_AVAILABLE. __SSSE3__ was listed twice.
Win32 and Win64 benefited from the Intel intrinsics. A32 and Aarch64 benefited from the ARM intrinsics. The intrinsics shaved 150 to 350 cycles from key setup.
The intrinsics slowed modern GCC down a small bit, and did not appear to affect old GCC. As such, Intel intrinsics were only enabled for Microsoft compilers.
We were not able to improve encryption and decryption. In fact, some of the attempted macro conversions and intrinsics attempts slowed things down considerably. For example, GCC 5.4 on x86_64 went from 120 MB/s to about 70 MB/s when we tried to improve code around the Key XOR Layer (ARIA_KXL).
This is the reference implementation, test data and test vectors from the ARIA.zip package on the KISA website. The website is located at http://seed.kisa.or.kr/iwt/ko/bbs/EgovReferenceList.do?bbsId=BBSMSTR_000000000002.
We have optimized routines that improve Key Setup and Bulk Encryption performance, but they are not being checked-in at the moment. The ARIA team is updating its implementation for contemporary hardware and we would like to use it as a starting point before we wander too far away from the KISA implementation.
Couple use of initialization priorities to no NO_OS_DEPENDENCE
Add comments explaining what integer does, how it does it, and why we want to inprove on the Singleton pattern as a resource manager.
Update documentation.
This effectively decouples Integer and Public Key from the rest of the library. The change means a compile time define is used rather than a runtime pointer. It avoids the race with Issue 389.
The Public Key algorithms will fail if you use them. For example, running the self tests with CRYPTOPP_NO_ASSIGN_TO_INTEGER in effect results in "CryptoPP::Exception caught: NameValuePairs: type mismatch for 'EquivalentTo', stored 'i', trying to retrieve 'N8CryptoPP7IntegerE'". The exception is expected, and the same happend when g_pAssignIntToInteger was present.
This check-in also enables the 64-bit RDRAND routines for X32. The changes were with held until they could be tested. The testing occurred with Issue 395
Wrap DetectArmFeatures and DetectX86Features in InitializeCpu class
Use init_priority for InitializeCpu
Remove HAVE_GCC_CONSTRUCTOR1 and HAVE_GCC_CONSTRUCTOR0
Use init_seg(<name>) on Windows and explicitly insert at XCU segment
Simplify logic for HAVE_GAS
Remove special recipies for MACPORTS_GCC_COMPILER
Move C++ static initializers into anonymous namespace when possible
Add default NullNameValuePairs ctor for Clang
When MSVC init_seg or GCC init_priority is available, we don't need to use the Singleton. We only need to create a file scope class variable and place it in the segment for MSVC or provide the attribute for GCC.
An additional upside is we cleared all the memory leaks that used to be reported by MSVC for debug builds.
StringWiden converts a narrow C-style string to a wide string. It serves the opposite role of StringNarrow function. The function is useful on Windows platforms where the OS favors wide functions with the UTF-16 character set. For example, the Data Proction API (DPAPI) allows a description, but its a wide character C-string. There is no narrwo version of the API.
Port rdrand.S to Solaris
Port rdrand.S to X32
The X32 port is responsible for the loop unwinding. The unwind generates a 32-byte block (X64 and X32) or 16-byte block (X86). On X32, it increases throughut by 100% (doubles it). On X86 and X64, throughput increases by about 6%. Anything over 4 machine words slows things down.
Port rdrand.S to Cygwin and OS X
Add DISABLE_NATIVE_ARCH to CmakefileList and GNUmakefile. It supresses the addition of -march=native. DISABLE_NATIVE_ARCH replaces DISABLE_CXXFLAGS_OPTIMIZATIONS in CmakefileList (the latter is now deprecated).
Add benchmarks for SHA1 and SHA256 variants
Hash_DRBG sped-up by about 2 MiB/s by using word128 and word64 in the initial update loop. It did not benefit other loops
HMAC_DRBG sped-up by about 5 MiB/s by reworking variables, access and loop control
Move HTML header and footer into benchmark functions
Switch to <cmath> and standard math routines
Switch to <ctime> and standard clock and time routines
Move static variable^Cinto anonymous namespace
Add TimeToString function for printing start and end times
By default the member, named m_mark, is set to the maximum number of elements. If SetMark() is called, then m_mark is adjusted. Upon deallocation and zeroization, STDMIN(m_size, m_mark) elements are zeroized.
We wanted to use a high water mark, but we could not track the writes to the allocation. operator[] would have been OK, but ::memcpy would have been problematic
These function are intended to catch mining and matching of library versions. BuildVersion provides CRYPTOPP_VERSION when the shared object was built. RuntimeVersion provides CRYPTOPP_VERSION the app compiled against, which could be different than the shared object's version
MOVBE is a modest gain over BSWAP. Though its guarded by CRYPTOPP_MOVBE_AVAILABLE, we cannot detect availability with a preprocessor macro. That is, GCC does not provide __MOVBE__ or similar. It has to be enabled manually
Since we switched to CRYPTOPP_ASSERT we don't have to worry about an accidental assert in production. We can now assert ValidateElement and ValidateGroup and let the code warn of potential problems during development.
This came about because ECGDSA inadvertently used GetGroupOrder() rather than GetSubgroupOrder(). The assert alerted to the problem area without the need for debugging
The macros that invoke GCC inline ASM have better code generation and speedup GCM ops by about 70 MiB/s on an Opteron 1100. The intrinsics are still available for Windows platforms and Visual Studio 2017 and above
It appears Apple Clang disgorges carryless multiply (PMULL) from Crypto (AES and SHA). The breakout added CRYPTOPP_BOOL_ARM_PMULL_INTRINSICS_AVAILABLE for PMULL, and retained CRYPTOPP_BOOL_ARM_CRYPTO_INTRINSICS_AVAILABLE for AES and SHA only
Targets with only object inputs do not work correctly with some
generators (like Xcode, see issue #355). Defining these directly in
terms of the source code files (rather than a reused set of object
files) allows correct builds in such cases. This can now be controlled
through a new option USE_INTERMEDIATE_OBJECTS_TARGET which defaults to
ON.
In release builds replace assert with void instruction `(void)0`. Otherwise in some places you will end up with statements like `if (...) ;` and some compiler will complain about it.
The library was a tad bit fast and loose with respect to parsing some of the ASN.1 presented to it. It was kind of like we used Alternate Encoding Rules (AER), which was more relaxed than BER, CER or DER. This commit closes most of the gaps.
The changes are distantly related to Issue 346. Issue 346 caught a CVE bcause of the transient DoS. These fixes did not surface with negative effcts. Rather, the library was a bit too accomodating to the point it was not conforming
Default backlog value was 5, which appears to stem back to the maximum supported by Windows Sockets 1. This was bound to cause problems for applications receiving many connections at the same time. Changed to SOMAXCONN, which is the standard way on Windows and POSIX to use a maximum reasonable backlog value.
Also add missing validation functions to test.cpp. The test and functions were present, but only accessible with 'cryptest.ex v', where all the tests were run
The typedefs were only commented so folks could search for a missing symbol, like Crypto++ 4.0 PK_FixedLengthEncryptor or PK_FixedLengthDecryptor
This is a distinct change from CRYPTOPP_MAINTAIN_BACKWARDS_COMPATIBILITY_562
This avoids a potential problem when OR'ing with 0 that results in a WordCount() of 1. Integer's minimum reg[] size is 2 due to RoundupSize(), and there could be implicit assumptions for the minimum that did not surface under testing
This avoids a potential problem when OR'ing with 0 that results in a WordCount() of 1. Integer's minimum reg[] size is 2 due to RoundupSize(), and there could be implicit assumptions for the minimum that did not surface under testing
This should not be needed, but it does not hurt. According to Ian Lance Taylor (http://gcc.gnu.org/ml/gcc-help/2014-02/msg00023.html), the CC clobber causes GCC to forget its internal representation of flag state. It should not be needed for cpuid. However, Clang has some odd behave in a couple of versions of its compiler when using cpuid. Both JW and UB experienced it on separate occassions.
It went missing after cleaning up the local fileystem, and was subsequently deleted with a 'git commit -S -am...'. An explict 'git rm' was not used, so I am not sure why it got whacked
Debian Hurd defines __MACH__, and it was picking up "#define CRYPTOPP_SECTION_INIT __attribute__((section (__DATA,__data)))" intended for Apple linkers
handy when packaging should control optimization without build system
masking. Especially handy when building to common architecture.
no change of behavior if DISABLE_CXXFLAGS_OPTIMIZATIONS is unset.
Signed-off-by: Alon Bar-Lev <alon.barlev@gmail.com>
Thanks for taking the time to report an issue. Reporting issues helps us improve stability and reliability for all users, so it is a valuable contribution.
Please do not ask questions in the bug tracker. Please ask questions on the Crypto++ Users List at http://groups.google.com/forum/#!forum/cryptopp-users.
There is a wiki page with information on filing useful bug reports. If you have some time please visit http://www.cryptopp.com/wiki/Bug_Report on the wiki. The executive summary is:
* State the operating system and version (Ubutnu 17 x86_64, Windows 7 Professional x64, etc)
* State the version of the Crypto++ library (Crypto++ 5.6.5, Master, etc)
* State how you built the library (Makefile, Cmake, distro, etc)
* Show a typical command line (the output of the compiler for cryptlib.cpp)
* Show the link command (the output of the linker for libcryptopp.so or cryptest.exe)
* Show the exact error message you are receiving (copy and paste it); or
* Clearly state the undesired behavior (and state the expected behavior)
Crypto++ Library is a free C++ class library of cryptographic algorithms and schemes. It was written and placed in public domain by Wei Dai. The library homepage is at http://www.cryptopp.com/. The latest library source code can be found at http://github.com/weidai11/cryptopp. For licensing and copyright information, please see License.txt.
Crypto++ Library is a free C++ class library of cryptographic algorithms and schemes. The library was originally written and placed in public domain by Wei Dai, but it is now maintained by the community. The library homepage is at http://www.cryptopp.com/. The latest library source code can be found at http://github.com/weidai11/cryptopp. For licensing and copyright information, please see License.txt.
These are general instructions for the BSDs, Linux, OS X, Solaris and Unix. The library uses a GNU makefile, which combines configuration and a non-anemic make. On BSD and Solaris you will likely have to use `gmake` to build the library. On Linux, OS X and Unix, the system's make should be OK. On Windows, Crypto++ provides Borland and Visual Studio solutions.
These are general instructions for the AIX, BSDs, Linux, OS X, Solaris and Unix. The library uses a GNU makefile, which combines configuration and a non-anemic make. On AIX, BSD and Solaris you will likely have to use `gmake` to build the library. On Linux, OS X and Unix, the system's make should be OK. On Windows, Crypto++ provides Visual Studio solutions.
You should look through the GNUmakefile and config.h to ensure settings look reasonable before building. If you need compatibility with Crypto++ 5.6.2, then you can use `config.compat` in place of `config.h`. You are discouraged from using `config.compat` because it re-introduces undefined behavior that was cleared at Crypto++ 5.6.3.
You should look through the GNUmakefile and config.h to ensure settings look reasonable before building. There are two wiki pages that help explain them at http://www.cryptopp.com/wiki/GNUmakefile and http://www.cryptopp.com/wiki/Config.h.
Wiki pages are available for some platforms with specific build instructions. The wiki can be found at http://cryptopp.com/wiki/. The pages include Android, ARM, iOS and Solaris. Solaris users should visit the wiki for important information on compiling the library with different versions of SunCC and options, and information on improving library performance and features.
Wiki pages are available for some platforms with specific build instructions. The pages include Android, ARM, iOS, MSBuild and Solaris. Solaris users should visit the wiki for important information on compiling the library with different versions of SunCC and options, and information on improving library performance and features.
Crypto++ does not depend upon other tools or libraries. It does not use Autotools, does not use CMake, and does not use Boost. If you use an alternate build system, like Autotools or CMake, then see the warning below about CXXFLAGS and lack of -DNDEBUG. CMake is available in Master as a matter of convenience, but its not officially supported.
Crypto++ does not depend upon other tools or libraries. It does not use Autotools, does not use CMake, and does not use Boost. If you use an alternate build system, like Autotools or CMake, then see the warning below about CXXFLAGS and lack of -DNDEBUG. CMake is available in Master as a matter of convenience, but its not officially supported.
There is a partially complete CmakeList.txt available on the wiki at http://www.cryptopp.com/wiki/CMake. It is not recommended for use because it is not in a good state. If you have CMake expertise and can work some problems, then please see the wiki page for tasks related to CMake.
BUILDING THE LIBRARY
BUILDING THE LIBRARY
--------------------
--------------------
@ -33,7 +36,9 @@ In general, all you should have to do is open a terminal, and then:
make test
make test
sudo make install
sudo make install
The command above builds the static library and cryptest.exe program. If you want to build the shared object, then issue:
The command above builds the static library and cryptest.exe program. It also uses a sane set of default flags, which are usually "-DNDEBUG -g2 -O3 -fPIC".
If you want to build the shared object, then issue:
make static dynamic cryptest.exe
make static dynamic cryptest.exe
@ -47,33 +52,50 @@ If you would like to use a different compiler, the set CXX:
export CXX=/opt/intel/bin/icpc
export CXX=/opt/intel/bin/icpc
make
make
Or:
CXX=/opt/solarisstudio12.4/bin/CC make
If you want to build using C++11, then:
If you want to build using C++11, then:
make CXXFLAGS="-std=c++11"
export CXXFLAGS="-DNDEBUG -g2 -O3 -std=c++11"
make
Or:
Or:
CXXFLAGS="-std=c++11"
CXXFLAGS="-DNDEBUG -g2 -O3 -std=c++11" make
make
LLVM's libc++ is also supported, so you can:
LLVM's libc++ is also supported, so you can:
CXXFLAGS="-std=c++11 -stdlib=libc++"
export CXXFLAGS="-std=c++11 -stdlib=libc++"
make
make
If you target 32-bit IA-32 machines (i386, i586 or i686), then the makefile forgoes -fPIC due to register pressures. You should add -fPIC yourself in this case:
CXXFLAGS="-DNDEBUG -g2 -O3 -fPIC" make
You can also override a variable so that only your flags are present. That is, the makefile will not add additional flags. For example, the following builds with only -std=c++11:
make CXXFLAGS="-std=c++11"
Crypto++ does not enagage Specter remediations at this time. You can build with Specter resistance with the following flags:
CXXFLAGS="-DNDEBUG -g2 -O3 -mfunction-return=thunk -mindirect-branch=thunk" make
ALTERNATE BUILD SYSTEMS
ALTERNATE BUILD SYSTEMS
-----------------------
-----------------------
The Crypto++ library is Make based and uses GNU Make by default. The makefile uses '-DNDEBUG -g2 -O2' CXXFLAGS by default. If you use an alternate build system, like Autotools or CMake, then ensure the build system includes '-DNDEBUG' for production or release builds. The Crypto++ library uses asserts for debugging and diagnostics during development; it does not rely on them to crash a program at runtime.
The Crypto++ library is Make based and uses GNU Make by default. The makefile uses '-DNDEBUG -g2 -O2' CXXFLAGS by default. If you use an alternate build system, like Autotools or CMake, then ensure the build system includes '-DNDEBUG' for production or release builds. The Crypto++ library uses asserts for debugging and diagnostics during development; it does not rely on them to crash a program at runtime.
If an assert triggers in production software, then unprotected sensitive information could be egressed from the program to the filesystem or the platform's error reporting program, like Apport on Ubuntu or CrashReporter on Apple.
If an assert triggers in production software, then unprotected sensitive information could be written to the filesystem and could be egressed to the platform's error reporting program, like Apport on Ubuntu, CrashReporter on Apple and Windows Error Reporting on Microsoft OSes.
The makefile orders object files to help remediate problems associated with C++ static initialization order. The library does not use custom linker scripts. If you use an alternate build system, like Autotools or CMake, and collect source files into a list, then ensure these three are at the head of the list: 'cryptlib.cpp cpu.cpp integer.cpp <other sources>'. They should be linked in the same order: 'cryptlib.o cpu.o integer.o <other objects>'.
The makefile orders object files to help remediate problems associated with C++ static initialization order. The library does not use custom linker scripts. If you use an alternate build system, like Autotools or CMake, and collect source files into a list, then ensure these three are at the head of the list: 'cryptlib.cpp cpu.cpp integer.cpp <other sources>'. They should be linked in the same order: 'cryptlib.o cpu.o integer.o <other objects>'.
If your linker supports initialization attributes, like init_priority, then you can define CRYPTOPP_INIT_PRIORITY to control object initialization order. Set it to a value like 250. User programs can use CRYPTOPP_USER_PRIORITY to avoid conflicts with library values. Initialization attributes are more reliable than object file ordering, but its not ubiquitously supported by linkers.
If your linker supports initialization attributes, like init_priority, then you can define CRYPTOPP_INIT_PRIORITY to control object initialization order. Set it to a value like 250. User programs can use CRYPTOPP_USER_PRIORITY to avoid conflicts with library values. Initialization attributes are more reliable than object file ordering, but its not ubiquitously supported by linkers.
The makefile links to the static version of the Crypto++ library to avoid binary planting and other LD_PRELOAD tricks. You should use the static version of the library in your programs to help avoid unwanted redirections.
The makefile links to the static version of the Crypto++ library to avoid missing dynamic libraries, incorrect dynamic libraries, binary planting and other LD_PRELOAD tricks. You should use the static version of the library in your programs to help avoid unwanted redirections.
INSTALLING THE LIBRARY
INSTALLING THE LIBRARY
----------------------
----------------------
@ -120,7 +142,7 @@ The following are some of the targets provided by the GNU makefile.
DYNAMIC ANALYSIS
DYNAMIC ANALYSIS
----------------
----------------
The Crypto++ embraces tools like Undefined Behavior sanitizer (UBsan), Address sanitizer (Asan) and Valgrind. Both Clang 3.2 and above and GCC 4.8 and above provide sanitizers. Please check with your distribution on how to install the compiler with its sanitizer libraries (they are sometimes a separate install item).
The Crypto++ embraces tools like Undefined Behavior sanitizer (UBsan), Address sanitizer (Asan), Bounds sanitizer (Bsan), Coverity Scan and Valgrind. Both Clang 3.2 and above and GCC 4.8 and above provide sanitizers. Please check with your distribution on how to install the compiler with its sanitizer libraries (they are sometimes a separate install item).
UBsan and Asan are mutually exclusive options, so you can perform only one of these at a time:
UBsan and Asan are mutually exclusive options, so you can perform only one of these at a time:
@ -150,13 +172,15 @@ ACCEPTANCE TESTING
Crypto++ uses five security gates in its engineering process. The library must maintain the quality provided by the review system and integrity of the test suites. You can use the information to decide if the Crypto++ library suits your needs and provides a compatible security posture.
Crypto++ uses five security gates in its engineering process. The library must maintain the quality provided by the review system and integrity of the test suites. You can use the information to decide if the Crypto++ library suits your needs and provides a compatible security posture.
The first gate is code review and discussion of proposed patches. Git commits often cross reference a User Group discussions.
The first gate is code review and discussion of proposed chnages. Git commits often cross reference a User Group discussions.
Second is the compiler warning system. The code must clean compile under the equivalent of GCC's -Wall -Wextra (modulo -Wno-type-limits -Wno-unknown-pragmas). This is a moving target as compiler analysis improves.
Second is the compiler warning system. The code must clean compile under the equivalent of GCC's -Wall -Wextra (modulo -Wno-type-limits -Wno-unknown-pragmas). This is a moving target as compiler analysis improves.
Third, the code must pass cleanly though GCC and Clang's Undefined Behavior sanitizer (UBsan) and Address sanitizer (Asan) with CRYPTOPP_NO_UNALIGNED_DATA_ACCESS defined in config.h. See DYNAMIC ANALYSIS above on how to execute them.
Third, the code must pass cleanly though GCC and Clang's Undefined Behavior sanitizer (UBsan), Address sanitizer (Asan) and Bounds sanitizer (Bsan). See DYNAMIC ANALYSIS above on how to execute them.
Fourth, the test harness provides a "validation" option which performs basic system checks (like endianess and word sizes) and exercises algorithms (like AES and SHA). You run the validation suite as shown below. The tail of the output should indicate 0 failed tests.
Fourth, the code must pass cleanly though Coverity Scan and Valgrind. Valgrind testing is performed at -O1 in accordance with Valgrind recommendations.
Fifth, the test harness provides a "validation" option which performs basic system checks (like endianness and word sizes) and exercises algorithms (like AES and SHA). You run the validation suite as shown below. The tail of the output should indicate 0 failed tests.
./cryptest.exe v
./cryptest.exe v
...
...
@ -165,7 +189,7 @@ Fourth, the test harness provides a "validation" option which performs basic sys
Test ended at Sun Jul 26 02:10:57 2015
Test ended at Sun Jul 26 02:10:57 2015
Seed used was: 1437891055
Seed used was: 1437891055
Fifth, the test harness provides a "test vector" option which uses many known test vectors, even those published by other people (like Brian Gladman for AES). You run the test vectors as shown below. The tail of the output should indicate 0 failed tests.
Sixth, the test harness provides a "test vector" option which uses many known test vectors, even those published by other people (like Brian Gladman for AES). You run the test vectors as shown below. The tail of the output should indicate 0 failed tests.
./cryptest.exe tv all
./cryptest.exe tv all
...
...
@ -182,4 +206,4 @@ REPORTING PROBLEMS
Dirty compiles and failures in the validation suite or test vectors should be reported at the Crypto++ User Group. The User Group is located at http://groups.google.com/forum/#!forum/cryptopp-users.
Dirty compiles and failures in the validation suite or test vectors should be reported at the Crypto++ User Group. The User Group is located at http://groups.google.com/forum/#!forum/cryptopp-users.
Also see http://www.cryptopp.com/wiki/Bug_Report.
The library uses Wei Dai's GitHub to track issues. The tracker is located at http://github.com/weidai11/cryptopp/issues. Please do not ask questions in the bug tracker; ask questions on the mailing list instead. Also see http://www.cryptopp.com/wiki/Bug_Report.