Motivation is similar to NO_UDBL_DIVISION.
The alternative implementation of 64-bit mult is straightforward and aims at
obvious correctness. Also, visual examination of the generate assembly show
that it's quite efficient with clang, armcc5 and arm-clang. However current
GCC generates fairly inefficient code for it.
I tried to rework the code in order to make GCC generate more efficient code.
Unfortunately the only way to do that is to get rid of 64-bit add and handle
the carry manually, but this causes other compilers to generate less efficient
code with branches, which is not acceptable from a side-channel point of view.
So let's keep the obvious code that works for most compilers and hope future
versions of GCC learn to manage registers in a sensible way in that context.
See https://bugs.launchpad.net/gcc-arm-embedded/+bug/1775263
This reduces clutter, making the functions more readable.
Also, it makes lcov see each line as covered. This is not cheating, as the
lines that were previously seen as not covered are not supposed to be reached
anyway (failing branches of the selftests).
Thanks to this and previous test suite enhancements, lcov now sees chacha20.c
and poly1305.c at 100% line coverage, and for chachapoly.c only two lines are
not covered (error returns from lower-level module that should never happen
except perhaps if an alternative implementation returns an unexpected error).
This module used (len, pointer) while (pointer, len) is more common in the
rest of the library, in particular it's what's used in the CMAC API that is
very comparable to Poly1305, so switch to (pointer, len) for consistency.
- in .h files: only put the context declaration inside the #ifdef _ALT
(this was changed in 2.9.0, ie after the original PR)
- in .c file: only leave selftest out of _ALT: even though some function are
trivial to build from other parts, alt implementors might want to go another
way about them (for efficiency or other reasons)
This change permits users of the ChaCha20/Poly1305 algorithms
(and the AEAD construction thereof) to pass NULL pointers for
data that they do not need, and avoids the need to provide a valid
buffer for data that is not used.