Lioncash
fc731dddae
ir: Add opcodes for performing vector absolute floating-point values
...
This will be usable for implementing FACGE and FACGT
2020-04-22 20:46:18 +01:00
MerryMage
2fc6b33829
CMakeLists: Add missing files
2020-04-22 20:46:18 +01:00
Lioncash
0bee648b4f
emit_x64_vector: Deduplicate a bit of code in EmitVectorSetElement{8, 32, 64} functions
...
Given both branches are the same, we can hoist out the common code.
2020-04-22 20:46:18 +01:00
Lioncash
d86fea0d28
A64: Implement FCMEQ (zero)'s vector single and double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
593eca7fb1
A64: Implement load/store single structure instructions
...
Implements LD{1, 2, 3, 4}, LD{1, 2, 3, 4}R, and ST{1, 2, 3, 4} single
structure variants.
2020-04-22 20:46:18 +01:00
Lioncash
9bec354791
A64: Implement FCMEQ (register)'s vector single and double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
b6e223fc58
emit_x64_vector: Deduplicate a bit of code within EmitVectorGetElement8()
...
Given both branches use the same destination register size, we can hoist
the common code out.
2020-04-22 20:46:18 +01:00
Lioncash
5ce187a54e
ir: Add opcodes for floating-point vector equalities
2020-04-22 20:46:18 +01:00
MerryMage
be354dbfd0
ir/basic_block: Add missing U16 immediate type to DumpBlock
2020-04-22 20:46:18 +01:00
Lioncash
cf188448d4
emit_x64_vector: Vectorize fallback case in EmitVectorMultiply64()
...
Gets rid of the need to perform a fallback.
2020-04-22 20:46:18 +01:00
MerryMage
5503ff28c3
llvm_disassemble: Allow disassembly of invalid AArch64 instructions
2020-04-22 20:46:18 +01:00
Lioncash
954deff2d4
emit_x64_vector: Add break to final case in EmitVectorRoundingHalvingAddUnsigned()
...
This doesn't alter behavior but does make the code better if anything
else is ever added to this function in the future.
2020-04-22 20:46:18 +01:00
Lioncash
11a92eaaef
A64: Implement SRHADD and URHADD
2020-04-22 20:46:18 +01:00
Lioncash
9e75d08860
A64: Implement FABD's scalar single/double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
bc718c5b28
ir: Add opcodes for performing rounding halving adds
2020-04-22 20:46:18 +01:00
Lioncash
d898d1779d
A64: Implement FABD's vector single/double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
054549da35
emit_x64_vector: Simplify AVX-512 codepath in EmitVectorMultiply64
...
I realized I introduced a helper for simple AVX operation emitting, so
use that instead of writing it all out long-form.
2020-04-22 20:46:18 +01:00
Lioncash
8a4f8aed06
ir: Add opcode for performing FP vector absolute differences
2020-04-22 20:46:18 +01:00
Lioncash
cb456f914b
A64: Implement UMLAL{2}, UMLSL{2}, and UMULL{2}
...
Now that we have the helper function set up for the signed variants, we
can also modify it to be used with the unigned ones by performing a zero
extension instead of a sign extension.
2020-04-22 20:46:18 +01:00
MerryMage
ba84e7a8de
A64: Implement FNMSUB
2020-04-22 20:46:18 +01:00
Lioncash
3576c02d91
A64: Implement SMLSL{2}
2020-04-22 20:46:18 +01:00
MerryMage
a1042cfcd8
A64: Implement FNMADD
2020-04-22 20:46:18 +01:00
Lioncash
ada5c0b2fa
A64: Implement SMLAL{2}
2020-04-22 20:46:18 +01:00
MerryMage
0d83032a6f
A64: Implement FMSUB
2020-04-22 20:46:18 +01:00
Lioncash
2d1aca25e6
A64: Implement SMULL{2}
2020-04-22 20:46:18 +01:00
MerryMage
69e00d225c
A64: Implement FMADD
2020-04-22 20:46:18 +01:00
MerryMage
8c90fcf58e
IR: Implement FPMulAdd
2020-04-22 20:46:18 +01:00
Lioncash
c5ae9107a9
A64: Implement SABAL/SABAL2 and SABDL/SABDL2
...
Now that we have a helper function for the unsigned variants, we can
modify it to also be usable with the signed variants.
2020-04-22 20:46:18 +01:00
Lioncash
24e3299276
A64: Implement FCMGT, FCMGE (register) vector double and single precision variants
2020-04-22 20:46:18 +01:00
Lioncash
26d4473851
A64: Implement UABAL/UABAL2
2020-04-22 20:46:18 +01:00
Lioncash
350bc70be8
A64: Implement FCMGT, FCMGE, FCMLE, FCMLT (zero) vector double and single precision variants.
2020-04-22 20:46:18 +01:00
Lioncash
3397742c74
A64: Implement UABDL/UABDL2
2020-04-22 20:46:18 +01:00
Lioncash
c695da1cf3
ir: Add opcode for floating-point GE and GT comparisons
...
The rest of the comparisons can be implemented in terms of these two
2020-04-22 20:46:18 +01:00
Lioncash
6de5ed96e5
emit_x64_vector: Emit VPMULLQ in EmitVectorMultiply64 on AVX-512{DQ, VL} capable CPUs
...
Shortens code-gen down to a single instruction in the 64-bit path.
2020-04-22 20:46:18 +01:00
Lioncash
9054d1c20b
A64: Implement LDR (literal, SIMD&FP)
2020-04-22 20:46:18 +01:00
Lioncash
0da5e949a8
Correct typo in DataCacheOperation enum
...
Fixes a typo for the InvalidateByVAToPoC enum entry. Given yuzu is the
only known user of 64-bit mode and it doesn't use this value, we can get
away with changing this.
2020-04-22 20:46:18 +01:00
Lioncash
9736e2cce2
A64: Implement FABS' half-precision variant
2020-04-22 20:46:18 +01:00
Lioncash
6e5750e4ec
A64: Implement FABS' single and double precision variant
2020-04-22 20:46:18 +01:00
Lioncash
7bce8d8757
A64: Implement URSHR (scalar) and URSRA (scalar)
...
Now that the utility function is all set up from implementing SRSRA, the
unsigned variants can now be trivially implemented by modifying the
utility function to perform a logical shift right instead of an
arithmetical shift right for the unsigned case.
2020-04-22 20:46:18 +01:00
Lioncash
1e70a589b0
A64: Implement SRSRA (scalar)
2020-04-22 20:46:18 +01:00
Lioncash
998aef07f6
A64: Implement SRSHR (scalar)
2020-04-22 20:46:17 +01:00
Lioncash
7c0250e9f8
A64: Implement SABA
2020-04-22 20:46:17 +01:00
Lioncash
f00789e6f7
A64: Implement SABD
2020-04-22 20:46:17 +01:00
Lioncash
1e10017f4b
ir: Add opcodes for signed absolute differences
2020-04-22 20:46:17 +01:00
Tillmann Karras
d3b44c1b5a
decoder_detail: use structured bindings
2020-04-22 20:46:17 +01:00
Lioncash
f745eb28bf
simd_two_register_misc: Handle 64-bit case for SCVTF_int_4
2020-04-22 20:46:17 +01:00
Lioncash
3f6c529da2
ir: Add opcode to perform the vector conversion S64->F64
...
Unfortunately x86 prior to AVX-512 doesn't really give us any convenient instruction to do the work for us
2020-04-22 20:46:17 +01:00
Lioncash
0e61ee6bf6
A64: Implement SHLL/SHLL2
2020-04-22 20:46:17 +01:00
Lioncash
43e6e98c3b
A64: Add missing decoding for PRFM (unscaled offset)
2020-04-22 20:46:17 +01:00
Lioncash
f2a85d5601
A64: Implement UHSUB
2020-04-22 20:46:17 +01:00
Lioncash
b33360a324
A64: Implement SHSUB
2020-04-22 20:46:17 +01:00
Lioncash
44a5f8095a
ir: Add opcodes for performing vector halving subtracts
2020-04-22 20:46:17 +01:00
Lioncash
4f37c0ec5a
A64: Implement SM4EKEY
2020-04-22 20:46:17 +01:00
Lioncash
3bde3347a5
A64: Implement SM4E
2020-04-22 20:46:17 +01:00
Lioncash
b312d28295
ir: Add an opcode for doing an SM4 lookup table query
2020-04-22 20:46:17 +01:00
Lioncash
27a6d5f6ce
emit_x64_vector: Use VPOPCNTB in EmitVectorPopulationCount() if AVX-512 BITALG is available
2020-04-22 20:46:17 +01:00
Lioncash
4dcc7724e0
A64: Implement UHADD
2020-04-22 20:46:17 +01:00
Lioncash
f8714f7250
A64: Implement SHADD
2020-04-22 20:46:17 +01:00
Lioncash
089096948a
ir: Add opcodes for performing halving adds
2020-04-22 20:46:17 +01:00
Lioncash
3d00dd63b4
emit_x64_vector: Emit VPMINSQ and VPMINUQ for 64-bit vector min operations if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
b97b71b8aa
emit_x64_vector: Emit VPMAXSQ and VPMAXUQ for 64-bit vector max operations if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
033e400df0
emit_x64_vector_floating_point: Deduplicate accurate NaN handling code
...
Allows the code to both be used from the 32 bit and 64 bit operations without duplicating code.
2020-04-22 20:46:17 +01:00
Lioncash
0f067b7330
emit_x64_vector: Emit VPABSQ in EmitVectorAbs() for the 64-bit case if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
d4ee878cbd
emit_x64_vector: Use VPSRAQ in EmitVectorArithmeticShiftRight64() if AVX-512VL is available
2020-04-22 20:46:17 +01:00
Lioncash
b38dd191bd
disassembler_arm: Remove rotation helper function in favor of Common::RotateRight
...
Mildly reduces the amount of duplicated behavior
2020-04-22 20:46:17 +01:00
Lioncash
51e4f1d9db
emit_x64_vector: Vectorize fallback path of EmitVectorMaxS32()
2020-04-22 20:46:17 +01:00
Lioncash
c692ccdd6d
emit_x64_vector: Vectorize fallback path of EmitVectorMaxS8()
2020-04-22 20:46:17 +01:00
Lioncash
b194313d8c
emit_x64_vector: Vectorize fallback path in EmitVectorMinU32()
2020-04-22 20:46:17 +01:00
Lioncash
7ceda6d919
emit_x64_vector: Vectorize fallback path in EmitVectorMinU16()
2020-04-22 20:46:17 +01:00
Lioncash
cda85a1da0
emit_x64_vector: Vectorize fallback path in EmitVectorMinS32()
2020-04-22 20:46:17 +01:00
Lioncash
6e08eed210
emit_x64_vector: Vectorize fallback path in EmitVectorMinS8()
2020-04-22 20:46:17 +01:00
Lioncash
0fb6dce689
emit_x64_vector: Remove unnecessary if constexpr expression in LogicalVShift
...
This can simply be merged with the previous one.
2020-04-22 20:46:17 +01:00
Lioncash
5b71b1337b
emit_x64_vector: Avoid left shift of negative value in LogicalVShift
...
Now that we handle the signed variants, we also have to be careful about left shifts with negative values,
as this is considered undefined behavior.
2020-04-22 20:46:17 +01:00
Lioncash
9954d28868
a64_jitstate: Zero SP and PC on construction of A64JitState
...
Given we zero out/reset everything else in the struct, do the same for these members to keep initialization consistent
2020-04-22 20:46:17 +01:00
Lioncash
4efbd40ea4
backend_x64/callback: Default virtual destructor in the cpp file
...
Prevents the vtable being generated in each translation unit that includes the header (and silences -Wweak-vtables warnings)
2020-04-22 20:46:17 +01:00
Lioncash
edd0b5c8c7
a32_interface/a64_interface: Change reinterpret_casts to static_casts in GetCurrentBlock thunks
...
It's well-defined to static_cast a void* to its proper type.
2020-04-22 20:46:17 +01:00
Lioncash
e71612d394
A64: Implement SSHL (scalar)
2020-04-22 20:46:17 +01:00
Lioncash
ef1e69a1e3
A64: Implement SSHL (vector)
2020-04-22 20:46:17 +01:00
Lioncash
21974ee57e
backend_x64/ir: Amend generic LogicalVShift() template to also handle signed variants
...
Also adds IR opcodes to dispatch said variants
2020-04-22 20:46:17 +01:00
Lioncash
9fc89f0a0e
emit_x64_vector_floating_point: Use arrays for retrieving size instead of hardcoding the size
...
Similar changes were done in emit_x64_vector, but these were missed.
2020-04-22 20:46:17 +01:00
Lioncash
af28e89a13
emit_x64_vector: Vectorize fallback path in EmitVectorMaxU16()
2020-04-22 20:46:17 +01:00
Lioncash
cda75e2079
A64: Implement CMTST's scalar variant
2020-04-22 20:46:17 +01:00
Lioncash
0d20423ad5
emit_x64_vector: Vectorize non-SSE4.1 fallback path for VectorMultiply32()
2020-04-22 20:46:17 +01:00
Lioncash
d70ee7c0d1
emit_x64_vector: Use VBPROADCAST where applicable and available
...
Uses the instruction that does what it says in its name if available. Allows avoiding the use
of a scratch register in EmitVectorBroadcast8() and EmitVectorBroadcastLower8()'s SSSE3 path.
2020-04-22 20:46:17 +01:00
Lioncash
bebe7235ae
A64: Implement UZP1 and UZP2
2020-04-22 20:46:17 +01:00
Lioncash
26d77c6f09
ir: Add opcodes for performing vector deinterleaving
2020-04-22 20:46:17 +01:00
Lioncash
d6f9ed47d9
A64: Implement FNEG (half-precision)
2020-04-22 20:46:17 +01:00
Lioncash
7efbd73bac
A64: Implement USHL (scalar)
2020-04-22 20:46:17 +01:00
Lioncash
41f4717f2b
A64: Implement FNEG (vector)
2020-04-22 20:46:17 +01:00
Lioncash
ba1cc6366d
A64: Implement RSUBHN/RSUBHN2
2020-04-22 20:46:17 +01:00
Lioncash
e41640fe33
A64: Implement RADDHN/RADDHN2
2020-04-22 20:46:17 +01:00
Lioncash
b719a6b3f7
A64: Implement XAR
2020-04-22 20:46:17 +01:00
Lioncash
0b1b131ec2
simd_two_register_misc: Factor out common comparison code
...
Gets rid of a tiny bit of duplicated code.
2020-04-22 20:46:17 +01:00
Lioncash
ed0b84da70
A64: Implement CMLE (zero)'s vector variant
2020-04-22 20:46:17 +01:00
Lioncash
b595a68ffa
A64: Implement CMTST (vector)
2020-04-22 20:46:17 +01:00
Lioncash
48c7f8630c
A64: Implement ADDHN{2} and SUBHN{2}
2020-04-22 20:46:17 +01:00
Lioncash
3acd9c9200
translate: zero extend result in Vpart when storing to lower part of vector
2020-04-22 20:46:17 +01:00
Lioncash
87ca63699f
emit_x64_vector: Emit PMAXUD in EmitVectorMaxU32 on SSE4.1-capable CPUs
2020-04-22 20:46:17 +01:00
Lioncash
f17702f608
emit_x64_vector: Emit PMINUD in EmitVectorMinU32 on SSE4.1-capable CPUs
2020-04-22 20:46:17 +01:00
Lioncash
596a8dd1dd
emit_x64_vector: Emit PMINSD in EmitVectorMinS32 on SSE4.1-capable CPUs
...
Provides a better alternative to a fallback operation.
2020-04-22 20:46:17 +01:00