Repeatedly retrieving the vectors and registers from unicorn involves
copying the entire set of registers and vectors by value instead of
simply retrieving a reference to them. Instead, we can just do the work
once and print out the values.
While we're at it, also make our bracing consistent.
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
enum classes are still considered complete types when forward declared
(as the compiler knows the exact size of the type from the declaration
alone). The only difference in this case being that the members of the
enum class aren't visible. Given we don't use the members within this
header in any way, we can simply forward declare them here and remove
the inclusions.
x64 rounds before flushing to zero
AArch64 rounds after flushing to zero
This difference of behaviour is noticable if something would round to a smallest normalized number
Discovered by @Subv.
Fixes incomplete fix begun in 5a91c94dca47c9702dee20fbd5ae1f4c07eef9df.
That fix fails to take into account that LinkBlock doesn't update ticks until there
are no remaining ticks to be executed.
Test added to confirm fix.
Centralizes all register and vector array definitions to a single set of
aliases, so if these are ever changed, then the rest of the testing code
will follow suit without the need to manually change them.
Avoids constructing and destructing the vector repeatedly, we can just
alter the contents of the vector on each iteration instead. Also move
out the std::array instances as well, like with the floating-point test
case and the single random instruction test case.
We can also use the regular form of std::generate and avoid hardcoding
size values twice.