Commit graph

1553 commits

Author SHA1 Message Date
MerryMage
7162f6f254 emit_x64_vector_floating_point: SSE4.1 implementation of EmitFPVectorToFixed 2020-04-22 20:55:50 +01:00
MerryMage
e7a5592699 emit_x64_vector_floating_point: EmitFPVectorRoundInt: Use FCODE 2020-04-22 20:55:50 +01:00
MerryMage
b8fde48732 emit_x64_vector: AVX implementation for EmitVectorCountLeadingZeros8 2020-04-22 20:55:50 +01:00
MerryMage
fd37b637aa emit_x64_vector: SSE implementation of EmitVectorCountLeadingZeros16 2020-04-22 20:55:50 +01:00
MerryMage
09bf273bc8 A64: Implement SCVTF, UCVTF (vector, fixed-point), scalar variant 2020-04-22 20:55:06 +01:00
MerryMage
03ad2072a7 emit_x64_floating_point: Reduce fallback LUT code in EmitFPToFixed 2020-04-22 20:55:06 +01:00
MerryMage
f9129db6fd A64: Implement FCVTZS, FCVTZU, UCVTF, SCVTF (vector, fixed-point), vector variant 2020-04-22 20:55:06 +01:00
Lioncash
48df9b9a7d A64: Implement UQSHL's vector immediate and register variants 2020-04-22 20:55:06 +01:00
Lioncash
d426dfe942 ir: Add opcodes for unsigned saturating left shifts 2020-04-22 20:55:06 +01:00
Lioncash
ab60720418 A64/translate/impl: Make signatures consistent for unimplemented by-element SIMD variants
Makes them all consistent, so it isn't necessary to change the
prototypes over when implementing them.
2020-04-22 20:55:06 +01:00
Lioncash
6b5ea6ee66 A64: Implement BRK
Currently, we can just implement this as part of the exception
interface, similar to how it's done for the A32 interface with BKPT.
2020-04-22 20:55:06 +01:00
Lioncash
b915364c16 A64/imm: Add full range of comparison operators to Imm template
Makes the comparison interface consistent by providing all of the
relevant members. This also modifies the comparison operators to take
the Imm instance by value, as it's really only a u32 under the covers,
and it's cheaper to shuffle around a u32 than a 64-bit pointer address.
2020-04-22 20:55:06 +01:00
MerryMage
02150bc0b7 IR: Add fbits argument to FPVectorFrom{Signed,Unsigned}Fixed 2020-04-22 20:55:06 +01:00
MerryMage
027b0ef725 A64: Implement SCVTF, UCVTF (scalar, fixed-point) 2020-04-22 20:55:06 +01:00
MerryMage
8051f60db0 opcodes.inc: Align columns to a tabstop of 4 2020-04-22 20:55:06 +01:00
MerryMage
90193b0e3d IR: Add fbits argument to FixedToFP-related opcodes 2020-04-22 20:55:06 +01:00
Lioncash
616a153c16 A64: Implement SQSHL's vector immediate variant 2020-04-22 20:55:06 +01:00
Lioncash
e8b0f25dff A64: Implement SQSHL's vector register variant 2020-04-22 20:55:06 +01:00
Lioncash
b14eaaec46 ir: Add opcodes for left signed saturated shifts 2020-04-22 20:55:06 +01:00
Lioncash
da55ed7b31 branch: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash
867b666285 move_wide: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash
78024a9dc4 load_store_register_unprivileged: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash
e45e5da610 load_store_register_immediate: Place conditional bodies on their own line
Makes the conditionals visually consistent with the rest of the
codebase.
2020-04-22 20:55:06 +01:00
Lioncash
b586cf3f56 load_store_load_literal: Make variables const where applicable 2020-04-22 20:55:06 +01:00
Lioncash
c3a3b9687e data_processing_logical: Move datasize declarations after early-exit conditionals
While we're at it, make variables const where applicable.
2020-04-22 20:55:06 +01:00
Lioncash
ed797e6540 data_processing_conditional_select: Make variables const where applicable
Makes CSEL's function consistent with all of the others.
2020-04-22 20:55:06 +01:00
Lioncash
c82fa5ec5a data_processing_addsub: Move datasize declarations after early-exit conditionals
While we're at it, also make relevant variables const where applicable
2020-04-22 20:55:06 +01:00
Lioncash
f4a66d2477 data_processing_bitfield: Move datasize variables after early-exit conditionals
Moves the declaration of datasize to the scope that it's used within.
This also takes the opportunity to apply const where applicable, and
make early-exits all vertically consistent with one another.
2020-04-22 20:55:06 +01:00
Lioncash
2e0fcd6161 A64: Implement CLS's vector variant
Leverages CLZ like the integral variant does.
2020-04-22 20:55:06 +01:00
Lioncash
a2cd643525 emit_x64_vector: Make EmitVectorUnsignedSaturatedAccumulateSigned() internally linked
Given this is just an internal helper function, it can be marked static.
2020-04-22 20:55:06 +01:00
Lioncash
c39ea2e3c9 perf_map: Use std::string_view instead of std::string for PerfMapRegister()
We can just use a non-owning view into a string in this case instead of
potentially allocating a std::string instance.
2020-04-22 20:55:06 +01:00
MerryMage
12243692f5 A64: Implement SQRDMULH (vector), vector variant 2020-04-22 20:55:06 +01:00
MerryMage
a9ffcf08b1 A64: Implement SQDMULL (vector), vector variant 2020-04-22 20:55:06 +01:00
MerryMage
3e447614c6 IR: Add VectorSignedSaturatedDoublingMultiplyLong 2020-04-22 20:55:06 +01:00
MerryMage
06b31448aa emit_x64_vector: Changes to VectorSignedSaturatedDoublingMultiply
* Return both the upper and lower parts of the multiply if required
* SSE2 does not support the pmuldq instruction, do sign correction to an unsigned result instead
* Improve port utilisation where possible (punpck instructions were a bottleneck)
2020-04-22 20:55:06 +01:00
MerryMage
08c0e017a5 IR: Implement Vector{Signed,Unsigned}Multiply{16,32} 2020-04-22 20:55:06 +01:00
Lioncash
b6df34cdde backend_x64/a64_interface: Re-enable the constant folding pass
This was disabled for debugging, but never re-enabled. Just to be sure,
testing was done downstream in yuzu to make sure this didn't happen to
break anything (which seems to be the case).
2020-04-22 20:55:06 +01:00
MerryMage
06ba397af2 emit_x64_vector_floating_point: Hardware FMA implementation for RSqrtStepFused 2020-04-22 20:55:06 +01:00
MerryMage
e553c4fe8d emit_x64_vector_floating_point: Hardware FMA implementation of FPVectorRecipStepFused 2020-04-22 20:55:06 +01:00
MerryMage
3caeb62ef1 emit_x64_floating_point: Hardware FMA implementation of FPRSqrtStepFused 2020-04-22 20:55:06 +01:00
MerryMage
344ee76aba emit_x64_floating_point: Hardware FMA implementation of FPRecipStepFused{32,64} 2020-04-22 20:55:06 +01:00
MerryMage
1492573267 emit_x64_vector: SSE implementation of VectorSignedSaturatedAccumulateUnsigned{8,16,32} 2020-04-22 20:55:06 +01:00
Lioncash
26df6e5e7b emit_x64_vector: Correct static asserts for < 64-bit type checks in saturated accumulate fallbacks
I had initially meant to use BitSize() here, not sizeof()
2020-04-22 20:55:06 +01:00
MerryMage
a4a26ac226 emit_x64_vector: EmitVectorSignedSaturatedAccumulateUnsigned64: SSE implementation 2020-04-22 20:55:06 +01:00
MerryMage
a7c66d2d28 emit_x64_vector: Simplify fpsr_qc related code
Move the bool conversion into A64JitState::GetFpsr so we don't have to continuously
pay the cost of conversion for every saturation instruction.
2020-04-22 20:55:06 +01:00
Lioncash
112cff9ab9 A64: Implement CLZ's vector variant 2020-04-22 20:55:06 +01:00
Lioncash
e739624296 ir: Add opcodes for vector CLZ operations
We can optimize these cases further for with the use of a fair bit of
shuffling via pshufb and the use of masks, but given the uncommon use of
this instruction, I wouldn't consider it to be beneficial in terms of
amount of code to be worth it over a simple manageable naive solution
like this.

If we ever do hit a case where vectorized CLZ happens to be a
bottleneck, then we can revisit this. At least with AVX-512CD, this can
be done with a single instruction for the 32-bit word case.
2020-04-22 20:55:05 +01:00
MerryMage
d4c37a68a8 A64/translate: VectorZeroUpper for V(64) stores
Ensures correctness.
2020-04-22 20:55:05 +01:00
MerryMage
b8daa4feac simd_two_register_misc: FNEG (vector) with Q == 0 had dirty upper 2020-04-22 20:55:05 +01:00
Lioncash
5653e7637e emit_x64_vector: Remove unnecessary [[maybe_unused]] attributes
These were unintentionally left in when introducing SUQADD and USQADD
2020-04-22 20:55:05 +01:00
Lioncash
14e026a7f0 A64: Implement USQADD's scalar and vector variants 2020-04-22 20:55:05 +01:00
Lioncash
d4a76aaa04 ir: Add opcodes form unsigned saturated accumulations of signed values 2020-04-22 20:55:05 +01:00
Lioncash
18ad7f237d A64: Implement SUQADD's scalar and vector variants 2020-04-22 20:55:05 +01:00
Lioncash
6f911a26da ir: Add opcodes for signed saturated accumulations of unsigned values 2020-04-22 20:55:05 +01:00
Lioncash
9a3d38d2ee A64: Implement SMLAL{2}, SMLSL{2}, UMLAL{2}, and UMLSL{2}'s vector by-element variants
We can simply modify the general function made for SMULL{2} and
UMULL{2}'s by-element variants to also handle the other multiply-based
by-element variants.
2020-04-22 20:55:05 +01:00
Lioncash
6ccfbc9b39 A64: Implement UMULL{2}'s vector by-element variant 2020-04-22 20:55:05 +01:00
Lioncash
58e21f175c A64: Implement SMULL{2}'s vector by-element variant 2020-04-22 20:55:05 +01:00
Lioncash
134bb02e19 ir/value: Replace includes with forward declarations
enum classes are still considered complete types when forward declared
(as the compiler knows the exact size of the type from the declaration
alone). The only difference in this case being that the members of the
enum class aren't visible. Given we don't use the members within this
header in any way, we can simply forward declare them here and remove
the inclusions.
2020-04-22 20:55:05 +01:00
Lioncash
2c8e07e7d0 ir/cond: Migrate to C++17 nested namespace specifiers 2020-04-22 20:55:05 +01:00
Lioncash
c3b7819a55 CMakeLists: Add missing cond.h header to file listing
Allows the file to show up within IDEs more easily.
2020-04-22 20:55:05 +01:00
Lioncash
0a3976059f A64: Implement URSQRTE 2020-04-22 20:55:05 +01:00
Lioncash
b6e74fd17d ir: Add opcodes for performing unsigned reciprocal square root estimates 2020-04-22 20:55:05 +01:00
Lioncash
bd3582e811 A64: Implement URECPE 2020-04-22 20:55:05 +01:00
Lioncash
af83360f89 ir: Add opcodes for unsigned reciprocal estimate 2020-04-22 20:55:05 +01:00
Lioncash
740ffa52ae A64: Implement SQNEG's scalar and vector variant 2020-04-22 20:53:46 +01:00
Lioncash
fca7eddb9e A64: Add opcodes for signed saturating negations 2020-04-22 20:53:46 +01:00
Lioncash
f1ebbcd7bc emit_x64_vector: Simplify "position == 0" case for EmitVectorExtract()
In the event position is zero, we can just treat it as a NOP, given
there's no need to move the data.
2020-04-22 20:53:46 +01:00
Lioncash
87372917f9 emit_x64_vector: Simplify "position == 0" case for EmitVectorExtractLower()
In the event position == 0, we can just treat it as a simple movq,
clearing the upper half of the XMM register. This also makes that case
use only one register.
2020-04-22 20:53:46 +01:00
Lioncash
f5fb496e7e A64: Implement SQDMULH's by-element scalar variant 2020-04-22 20:53:46 +01:00
Lioncash
40f0576995 A64: Implement SQDMULH's by-element vector variant 2020-04-22 20:53:46 +01:00
MerryMage
8f9206901d backend/x64: Do not clear fast_dispatch_table if not enabled
There is no need to pay for the cost of setting a large block of memory if we're not using it.
2020-04-22 20:53:46 +01:00
MerryMage
9b65100660 A64: Implement FastDispatchHint 2020-04-22 20:53:46 +01:00
MerryMage
f96c43d422 A32: Implement FastDispatchHint 2020-04-22 20:53:46 +01:00
MerryMage
aa8d826c13 ir/terminal: Add FastDispatchHint 2020-04-22 20:53:46 +01:00
Lioncash
1a69a61cb4 A64: Implement SQDMULH's scalar variant 2020-04-22 20:53:46 +01:00
Lioncash
7ebfd0f31c ir: Add opcodes for scalar signed saturated doubling multiplies 2020-04-22 20:53:46 +01:00
Lioncash
9c03311fed A64: Implement SQDMULH's vector variant 2020-04-22 20:53:46 +01:00
Lioncash
a0231e5546 ir: Add opcodes for signed saturated doubling multiplies 2020-04-22 20:53:46 +01:00
Lioncash
db24e1f09b A64: Implement SQABS' scalar variant 2020-04-22 20:53:46 +01:00
Lioncash
bda5d14c7f A64: Implement SQABS' vector variant. 2020-04-22 20:53:46 +01:00
Lioncash
0507e47420 ir: Add opcodes for signed saturated absolute values 2020-04-22 20:53:46 +01:00
MerryMage
27427595b7 emit_x64_floating_point: EmitFPToFixed: maxsd optimization
maxsd is not required when doing a signed conversion, because x64
produces a 0x80...00 value for out of range values.
2020-04-22 20:53:46 +01:00
MerryMage
1abf82ac4a emit_x64_floating_point: ZeroIfNaN: pxor -> xorps
xorps is shorter and more appropriate here.
2020-04-22 20:53:46 +01:00
MerryMage
3415828fb4 IR: Simplify FP{Single,Double}ToFixed{U,S}{32,64} 2020-04-22 20:53:46 +01:00
Lioncash
e30f9816ec A32/decoder: Add missing <algorithm> includes
These includes should be present, as we use std::find_if() within these headers.
2020-04-22 20:53:46 +01:00
Lioncash
4507627905 emit_x64_vector: Provide AVX path for EmitVectorMinU64() 2020-04-22 20:53:46 +01:00
Lioncash
fd49a62b06 emit_x64_vector: Provide AVX path for EmitVectorMinS64() 2020-04-22 20:53:46 +01:00
Lioncash
770723f449 emit_x64_vector: Provide AVX path for EmitVectorMaxU64() 2020-04-22 20:53:46 +01:00
Lioncash
8fb90c0cf1 emit_x64_vector: Provide AVX path for EmitVectorMaxS64() 2020-04-22 20:53:46 +01:00
Lioncash
2cac6ad129 emit_x64_vector: Simplify EmitVectorLogicalLeftShift8()
Similar to EmitVectorLogicalRightShift8(), we can determine a mask ahead
of time and just and the results of a halfword left shift.
2020-04-22 20:53:46 +01:00
Lioncash
135107279d emit_x64_vector: Simplify EmitVectorLogicalShiftRight8()
We can generate the mask and AND it against the result of a halfword
shift instead of looping.
2020-04-22 20:53:46 +01:00
Lioncash
2952b46b16 emit_x64_vector: Amend value definition in SSE 4.1 path for EmitVectorSignExtend16()
We should be defining the value after the results have been calculated
to be consistent with the rest of the code.
2020-04-22 20:53:46 +01:00
Lioncash
fda19095ea emit_x64_vector: Remove fallback in EmitVectorSignExtend64()
This is fairly trivial to do manually.
2020-04-22 20:53:46 +01:00
Lioncash
39593fcd26 emit_x64_vector: Remove fallback for EmitVectorSignExtend32()
We can just do the extension manually, which gets rid of the need to
fall back here.
2020-04-22 20:53:46 +01:00
Lioncash
053175f69b ir_emitter: Rename fpscr_controlled parameters to fpcr_controlled
Part of addressing #333
2020-04-22 20:53:46 +01:00
MerryMage
f0184c4b8d a32/exception_generating: BPKT: Define unpredictable behaviour
Define unpredictable behaviour to be BKPT executes conditionally
2020-04-22 20:53:46 +01:00
MerryMage
a12854857b A32: Add define_unpredictable_behaviour option 2020-04-22 20:53:46 +01:00
MerryMage
b0abaa8312 A32/location_descriptor: Change formatting to use hex 2020-04-22 20:53:46 +01:00
MerryMage
ccbf6c7f63 microinstruction: A32ExceptionRaised causes CPU exception 2020-04-22 20:53:46 +01:00
MerryMage
6595e49a31 A32/types: CondToString: Add nv 2020-04-22 20:53:46 +01:00
MerryMage
d5b9c4a4bb block_of_code: Hide NX support behind compiler flag
Systems that require W^X can use the DYNARMIC_ENABLE_NO_EXECUTE_SUPPORT cmake option.
2020-04-22 20:53:46 +01:00
MerryMage
de4494ffa5 Implement perfmap 2020-04-22 20:53:46 +01:00
MerryMage
f73104633b a32_emit_x64: Fix incorrect BMI2 implementation for SetCpsr
* The MSB for each byte in cpsr_ge were not being appropriately set.
* We also expand test coverage to test this case.
* We fix the disassembly of the MSR (imm) and MSR (reg) instructions as well.
2020-04-22 20:53:46 +01:00
MerryMage
3432a08e0a backend/x64: Support W^X systems
Closes #176.
2020-04-22 20:53:46 +01:00
BreadFish64
2a65442933 Backend: Create "backend" folder
similar to the "frontend" folder
2020-04-22 20:53:46 +01:00
MerryMage
3b13f1eb12 A64/translate: Standardize arguments of helper functions
Don't pass in IREmitter when TranslatorVisitor is already available.
2020-04-22 20:53:45 +01:00
MerryMage
a4e556d59c A64/translate: Standardize TranslatorVisitor abbreviation
Prefer v to tv.
2020-04-22 20:53:45 +01:00
MerryMage
9a0dc61efd emit_x64_vector: Avoid recalculating addresses in EmitVectorTableLookup 2020-04-22 20:53:45 +01:00
Lioncash
3d465e2c36 A64: Implement SQXTN, SQXTUN, and UQXTN's scalar variants
We can implement these in terms of the vector variants
2020-04-22 20:53:45 +01:00
Lioncash
4ff39c6ea8 A64: Implement SDOT and UDOT's (by element) variants
Gets all of the dot product instructions out of the way.
2020-04-22 20:53:45 +01:00
MerryMage
21df1fb539 emit_x64_vector: Don't load zero constant from memory in EmitVectorTableLookup 2020-04-22 20:53:45 +01:00
MerryMage
3bbcca8757 emit_x64_vector: Special-case is_defaults_zero && table_size == 2 in EmitVectorTableLookup 2020-04-22 20:53:45 +01:00
MerryMage
9cc00f900c emit_x64_vector: Release registers when possible in EmitVectorTableLookup 2020-04-22 20:53:45 +01:00
MerryMage
a12afd1065 reg_alloc: Add the ability to Release an allocation early 2020-04-22 20:53:45 +01:00
MerryMage
e68bd3c6c1 emit_x64_vector: Special-case table_size == 1 in EmitVectorTableLookup 2020-04-22 20:53:45 +01:00
MerryMage
a4e1f8a63a emit_x64_vector: SSE4.1 implementation of EmitVectorTableLookup 2020-04-22 20:53:45 +01:00
MerryMage
0c18b85c27 A64: Implement TBL and TBX 2020-04-22 20:53:45 +01:00
MerryMage
89d08c7d61 IR: Add VectorTable and VectorTableLookup IR instructions 2020-04-22 20:53:45 +01:00
MerryMage
0288974512 opcodes: Cleanup opcodes table
* Remove T:: prefix from types.
* Add another column for a 4th argument.
2020-04-22 20:53:45 +01:00
Lioncash
d9fc6cf31f A64: Implement SDOT and UDOT's vector variant 2020-04-22 20:53:45 +01:00
Lioncash
cb5e5c5d49 A64: Implement SADALP and UADALP
While we're at it we can join the code for SADDLP and UADDLP with these
instructions, since the only difference is we do an accumulate at the
end of the operation.
2020-04-22 20:53:45 +01:00
Lioncash
29f8b30634 A64: Implement SRSHL and URSHL
Implements both scalar and vector variants.
2020-04-22 20:53:45 +01:00
Lioncash
0efa2ce3b0 ir: Add opcodes for performing rounding left shifts 2020-04-22 20:53:45 +01:00
MerryMage
656ceff225 emit_x64_floating_point: Fix smallest normal check in EmitFPMulAdd 2020-04-22 20:53:45 +01:00
Lioncash
f3f60cd179 A64: Implement ISB
Given we want to ensure that all instructions are fetched again, we can
treat an ISB instruction as a code cache flush.
2020-04-22 20:53:45 +01:00
Lioncash
be53e356a2 A64: Implement FCVTN{2} 2020-04-22 20:53:45 +01:00
Lioncash
4c3d7c5a8d A64: Implement FCVTL{2} 2020-04-22 20:53:45 +01:00
Lioncash
7eb6be7a6a A64: Implement FMAXNM and FMINNM vector variants.
Currently we can implement these in terms of the scalar IR variants.
2020-04-22 20:53:45 +01:00
Lioncash
8b65ea68c0 A64: Implement FMAXP, FMAXNMP, FMINP, and FMINNMP's vector variants
We can just implement these in terms of scalars for the time being.
2020-04-22 20:53:45 +01:00
MerryMage
ec76f95f5a emit_x64_vector_floating_point: Correct value of smallest_normal_number 2020-04-22 20:53:45 +01:00
MerryMage
e60d6c0d20 fp/info: Incorrect point_position in FPValue 2020-04-22 20:53:45 +01:00
MerryMage
8a3b6364c2 load_store_exclusive: Define s == t state to be Constraint_NONE
Downstream (yuzu) mentioned that the instruction:

STXR W9, W9, [X0]

was executed in the program "Crash N-Sane Trilogy".
2020-04-22 20:53:45 +01:00
MerryMage
cd40e4dae0 A64/translate: Allow for unpredictable behaviour to be defined 2020-04-22 20:53:45 +01:00
MerryMage
d1d6f4feb5 system: Implement MRS CNTFRQ_EL0 2020-04-22 20:53:45 +01:00
Lioncash
7ef7def661 A64: Implement SQ{ADD, SUB}, and UQ{ADD, SUB}'s vector variants
Currently we implement these in terms of the scalar variants. Falling
back to the interpreter is slow enough to make it more effective than
doing that.
2020-04-22 20:46:23 +01:00
Lioncash
a4b0e2ace6 A64: Implement UQADD/UQSUB's scalar variants 2020-04-22 20:46:23 +01:00
Lioncash
acbaf04fef ir: Add opcodes for unsigned saturating add and subtract 2020-04-22 20:46:23 +01:00
Lioncash
c41b5a3492 x64/reg_alloc: Use type alias for array returned by GetArgumentInfo()
This way if the number ever changes, we don't need to change the type in
other places.
2020-04-22 20:46:23 +01:00
Lioncash
2188765e28 ir/value: Use type alias CoprocessorInfo for std::array<u8, 8>
Provides a more descriptive label for the interface, and avoids the need
to hardcode the array size in multiple places.
2020-04-22 20:46:23 +01:00
MerryMage
71e137715d status_register_access: Add support for bits 0 and 1 of mask to MSR 2020-04-22 20:46:23 +01:00
MerryMage
ac51c2547d A32/translate/load_store: Correct detection of writeback 2020-04-22 20:46:23 +01:00
MerryMage
d345220251 A32/translate: Add TranslateSingleInstruction 2020-04-22 20:46:23 +01:00
MerryMage
5fc197c564 A32/ir_emitter: Bug fix: IREmitter::ExceptionRaised using incorrect opcode 2020-04-22 20:46:23 +01:00
MerryMage
ff3805e332 A32/decoders: Split instruction list into include file 2020-04-22 20:46:23 +01:00
MerryMage
3f4d118d73 microinstruction: Improve assert messages 2020-04-22 20:46:23 +01:00
MerryMage
a7e6f2a235 emit_x64_vector: EmitVectorNarrow16: AVX512 implementation 2020-04-22 20:46:23 +01:00
MerryMage
b6350e3947 emit_x64_vector: EmitVectorNarrow32: prefer pblendw to loading constant 2020-04-22 20:46:23 +01:00
MerryMage
8fdba189cb emit_x64_vector: packusdw is SSE4.1 2020-04-22 20:46:23 +01:00
MerryMage
1ef388d1cd emit_x64_vector_floating_point: Simplify FPVector{Min,Max} 2020-04-22 20:46:23 +01:00
MerryMage
4a1ce797cb emit_x64_vector_floating_point: Simplify Get*Vector functions 2020-04-22 20:46:23 +01:00
MerryMage
bcaced297a emit_x64_floating_point: Remove EmitProcessNaNs 2020-04-22 20:46:23 +01:00
MerryMage
2e0885388e devirtualize: Replace DEVIRT macro with function template 2020-04-22 20:46:23 +01:00
Lioncash
54d8552177 a32_emit_x64: std::move A32::UserConfig in the constructor
This avoids a few redundant atomic increments and decrements,
considering the UserConfig instance contains a std::array of
std::shared_ptr<Coprocessor> instances.
2020-04-22 20:46:23 +01:00
MerryMage
b098c650df emit_x64_floating_point: Use EmitPostProcessNaNs in EmitFPMulX 2020-04-22 20:46:23 +01:00
MerryMage
c1babf41b2 emit_x64_floating_point: Remove unnecessary DenormalsAreZero from EmitFPSingleToDouble and EmitFPDoubleToSingle 2020-04-22 20:46:23 +01:00
MerryMage
700088408d emit_x64_floating_point: Simplify EmitFP{Min,Max}{,Numeric}{32,64} 2020-04-22 20:46:23 +01:00
MerryMage
07e0585994 emit_x64_floating_point: Reduce NaN processing overhead 2020-04-22 20:46:23 +01:00
MerryMage
f5e11d117a A64: Implement FMULX, scalar single/double variant 2020-04-22 20:46:23 +01:00
MerryMage
17f73974f2 IR: Implement FPMulX IR instruction 2020-04-22 20:46:23 +01:00
Lioncash
391e16be64 emit_x64_vector: Vectorize 32-bit variants of paired min/max
Gets rid of the fallbacks for these cases.
2020-04-22 20:46:23 +01:00
MerryMage
5ae045d67e emit_x64_vector: Improve code emission of VectorGetElement* for index == 0 2020-04-22 20:46:23 +01:00
MerryMage
e9ab7f7664 reg_alloc: Do a UseScratch if a Use destination is too small 2020-04-22 20:46:23 +01:00
MerryMage
90f8dda966 emit_x64_floating_point: AVX implementation of ForceToDefaultNaN 2020-04-22 20:46:23 +01:00
MerryMage
dfb660cd16 emit_x64_vector_floating_point: Prefer blendvp{s,d} to vblendvp{s,d} where possible
It's a cheaper instruction.
2020-04-22 20:46:23 +01:00
MerryMage
476c0f15da backend_x64: Remove all use of xmm0 2020-04-22 20:46:23 +01:00
MerryMage
8252efd7b1 emit_x64_vector_floating_point: AVX implementation of ForceToDefaultNaN 2020-04-22 20:46:23 +01:00
MerryMage
746dc521b9 emit_x64_vector_floating_point: Reduce codesize of ForceToDefaultNaN 2020-04-22 20:46:23 +01:00
MerryMage
7731dcdca9 emit_x64_vector_floating_point: Reduce codesize of EmitTwoOpVectorOperation 2020-04-22 20:46:23 +01:00
MerryMage
bb93353f94 emit_x64_vector_floating_point: Correct FMA in FTZ mode
x64 rounds before flushing to zero
AArch64 rounds after flushing to zero

This difference of behaviour is noticable if something would round to a smallest normalized number
2020-04-22 20:46:23 +01:00
MerryMage
8ef195db3c emit_x64_floating_point: DenormalsAreZero is redundant as hardware already does DAZ
Exceptions: F{MIN,MAX}{,NM}
2020-04-22 20:46:23 +01:00
MerryMage
de9d8c461c emit_x64_floating_point: FlushToZero is redundant as hardware already does FTZ 2020-04-22 20:46:23 +01:00
MerryMage
822fd4a875 backend_x64: Fix FPVectorMulAdd and FPMulAdd NaN handling with denormals
Denormals should be treated as zero in NaN handler
2020-04-22 20:46:23 +01:00
MerryMage
b393e15ab6 backend_x64: Fix bugs when FPCR.FZ=1
Bugs:
* DenormalsAreZero flushed to positive zero instead of preserving sign.
* FMAXNM/FMINNM (scalar) should perform DAZ *before* special zero handling.
* FMAX/FMIN/FMAXNM/FMINNM (vector) did not DAZ.
2020-04-22 20:46:23 +01:00
MerryMage
5e88d66470 fp/info: Deduplicate functions 2020-04-22 20:46:23 +01:00
MerryMage
2019d32743 emit_x64_floating_point: Deduplicate EmitFPMulAdd implementation 2020-04-22 20:46:23 +01:00
MerryMage
e038fe72df emit_x64_floating_point: Deduplicate code 2020-04-22 20:46:23 +01:00
MerryMage
ec82a845b7 emit_x64_vector_floating_point: Fix FPVector{Max,Min} when FPCR.DN = 1 2020-04-22 20:46:23 +01:00
MerryMage
7f27945411 emit_x64_floating_point: Fix FP{Max,Min} when FPCR.DN = 1 2020-04-22 20:46:23 +01:00
MerryMage
21a28c2545 IR: SSE4.1 implementation of FPVectorRoundInt 2020-04-22 20:46:23 +01:00
MerryMage
9669e49817 A64: Implement FRINT{N,M,P,Z,A,X,I} (vector), single/double variant 2020-04-22 20:46:23 +01:00
MerryMage
f976c47008 IR: Initial implementation of FPVectorRoundInt 2020-04-22 20:46:23 +01:00
MerryMage
f2393488fe A64: Implement SQADD and SQSUB, scalar variant 2020-04-22 20:46:23 +01:00
MerryMage
10e196480f IR: Generalise SignedSaturated{Add,Sub} to support more bitwidths 2020-04-22 20:46:23 +01:00
MerryMage
71db0e67ae a64_emit_x64: Bugfix EmitA64OrQC - Incorrect argument 2020-04-22 20:46:23 +01:00
Lioncash
d0fdd3c6e6 simd_three_same: Extract non-paired SMAX, SMIN, UMAX, UMIN code to a common function
Deduplicates a bit of code and makes its layout consistent with the
paired variants
2020-04-22 20:46:23 +01:00
Lioncash
2bea2d0512 A64: Implement SMAXP, SMINP, UMAXP, UMINP 2020-04-22 20:46:23 +01:00
Lioncash
463b9a3d02 ir: Add opcodes for vector paired maximum and minimums
For the time being, we can just do a naive implementation which avoids
falling back to the interpreter a bit. Horizontal operations aren't
necessarily x86 SIMD's forte anyways.
2020-04-22 20:46:23 +01:00
Lioncash
43344c5400 A64: Implement SMAXV, SMINV, UMAXV, and UMINV 2020-04-22 20:46:23 +01:00
Lioncash
2501bfbfae ir: Add opcodes for performing scalar integral min/max 2020-04-22 20:46:23 +01:00
Lioncash
7fdd8b0197 A64: Implement PMULL{2} 2020-04-22 20:46:23 +01:00
Lioncash
5ebf496d4e translate: Deduplicate GetDataSize() functions
Avoids defining the same function multiple times in different files.
2020-04-22 20:46:22 +01:00
Lioncash
f83cd2da9a floating_point_{conditional}_compare: Deduplicate code
Deduplicates the implementation code of instructions by extracting the
code to a common function.
2020-04-22 20:46:22 +01:00
MerryMage
f9c6d5e1a0 common: Move all cryptographic function to common/crypto 2020-04-22 20:46:22 +01:00
MerryMage
5dc23e49d7 a32_emit_x64: BMI2 implementation of A32SetCpsr 2020-04-22 20:46:22 +01:00
MerryMage
0f85305933 a32_emit_x64: Shorten EmitA32GetCpsr 2020-04-22 20:46:22 +01:00
MerryMage
9fe2bf8733 a32_emit_x64: Assert that memory layout assumption in EmitA32GetCpsr is valid 2020-04-22 20:46:22 +01:00
Lioncash
b48fb8ca6b A64: Implement PMUL 2020-04-22 20:46:22 +01:00
Lioncash
affa312d1d ir: Add opcode for performing polynomial multiplication 2020-04-22 20:46:22 +01:00
MerryMage
dd4ac86f8e A64: Implement FCVT{N,M,A,P}{U,S} (vector), FCVTZU (vector, integer), single/double variant 2020-04-22 20:46:22 +01:00
MerryMage
28b38916a8 A64: Implement FCVTZS (vector, integer), single/double variant 2020-04-22 20:46:22 +01:00