In MSVC, having files with identical filenames will result into massive slowdowns when compiling.
The approach I have taken to resolve this is renaming the identically named files in frontend/(A32, A64) to (a32, a64)_filename.cpp/h
Math operations such as Matrix multiplication utilize these particular
instructions enough that there should be some unit tests for thesein particular.
The lane-splatting form of FMUL and FMLA instructions are of particular
interest and I've found them to be very common in retail game binaries
such as Pokemon Sword.
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication
I'm primarily adding this unit test so that I can ensure compatibility
while I tune and optimize them.
We might want to allocate different sizes for each of them.
e.g. for the unsafe fastmem approach without bounds checking.
Or for using the full 48bit adress range (with mirrors) by allocating our real arena as close to 1<<47 as possible.
Tests the TBL instruction with implementation with {1-4} register
lookups and the handling of out-of-bound indices.
Intended to target the implementation of VectorTableLookup128
Repeatedly retrieving the vectors and registers from unicorn involves
copying the entire set of registers and vectors by value instead of
simply retrieving a reference to them. Instead, we can just do the work
once and print out the values.
While we're at it, also make our bracing consistent.
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
enum classes are still considered complete types when forward declared
(as the compiler knows the exact size of the type from the declaration
alone). The only difference in this case being that the members of the
enum class aren't visible. Given we don't use the members within this
header in any way, we can simply forward declare them here and remove
the inclusions.
x64 rounds before flushing to zero
AArch64 rounds after flushing to zero
This difference of behaviour is noticable if something would round to a smallest normalized number