Commit graph

173 commits

Author SHA1 Message Date
gdkchan
e26d5f3f59 Fix silly copy/paste error on float variant of the FMINNM instruction 2018-08-05 18:56:30 -03:00
gdkchan
21fa932514 More accurate impl of FMINNM/FMAXNM, add vector variants (#296)
* More accurate impl of FMINNM/FMAXNM, add vector variants

* Optimize for the 0 case when op1 != op2

* Address PR feedback
2018-08-05 02:54:21 -03:00
LDj3SNuD
f736be2efb Add SQADD, UQADD, SQSUB, UQSUB, SUQADD, USQADD, SQABS, SQNEG (Scalar, Vector) instructions; add 24 Tests. Most saturation instructions now on ASoftFallback. (#314)
* Update AOpCodeTable.cs

* Update AInstEmitSimdHelper.cs

* Update AInstEmitSimdArithmetic.cs

* Update Pseudocode.cs

* Update Instructions.cs

* Update CpuTestSimd.cs

* Update CpuTestSimdReg.cs

* Update AInstEmitSimdHelper.cs

* Update AInstEmitSimdHelper.cs

* Update AInstEmitSimdHelper.cs

* Update AInstEmitSimdHelper.cs

* Update ASoftFallback.cs

* Update AInstEmitSimdHelper.cs

* Update ASoftFallback.cs

* Update AInstEmitSimdHelper.cs

* Update CpuTestSimd.cs

* Update CpuTestSimdReg.cs

* Update ASoftFallback.cs

* Update AInstEmitSimdHelper.cs

* Opt. (retest).
2018-08-04 16:58:54 -03:00
ReinUsesLisp
e51ccf206b Cache changes (#302)
* Skip repeated cache tests between same sync

* Skip some checks for regions where just one resource is resident

* Dehardcode residency page size

* Some cleanup
2018-07-29 01:39:15 -03:00
Arthur Chen
dcbeed72c4 update encoding for branch instruction (#305) 2018-07-26 13:46:05 -03:00
ReinUsesLisp
34ebf2acbe Send data to OpenGL host without client-side copies (#285)
* Directly send host address to buffer data

* Cleanup OGLShader

* Directly copy vertex and index data too

* Revert shader bind "cache"

* Address feedback
2018-07-19 16:02:51 -03:00
Merry
ee2452c136 AOpCodeTable: Speed up instruction decoding (#284) 2018-07-19 02:32:37 -03:00
LDj3SNuD
a3a5545c05 Implement Ssubw_V and Usubw_V instructions. (#287)
* Update AOpCodeTable.cs

* Update AInstEmitSimdHelper.cs

* Update AInstEmitSimdArithmetic.cs

* Update AInstEmitSimdMove.cs

* Update AInstEmitSimdCmp.cs

* Update Instructions.cs

* Update CpuTestSimd.cs

* Update CpuTestSimdReg.cs
2018-07-18 21:06:28 -03:00
gdkchan
02b8d80068 Fix LDXP/LDAXP when Rt == Rn (#274) 2018-07-16 15:57:15 -03:00
LDj3SNuD
1f2400ed18 Fix EmitHighNarrow(), EmitSaturatingNarrowOp() when Rd == Rn || Rd == Rm (& Part != 0). Optimization of EmitVectorTranspose(), EmitVectorUnzip(), EmitVectorZip() algorithms (reduction of the number of operations and their complexity). Add 12 Tests about Trn1/2, Uzp1/2, Zip1/2 (V) instructions. (#268)
* Update CpuTestSimdArithmetic.cs

* Update CpuTestSimd.cs

* Update CpuTestSimdReg.cs

* Update Instructions.cs

* Update AInstEmitSimdArithmetic.cs

* Update AInstEmitSimdHelper.cs

* Update AInstEmitSimdMove.cs

* Delete CpuTestSimdMove.cs
2018-07-15 00:53:26 -03:00
LDj3SNuD
97e469e315 Improve CountLeadingZeros() algorithm, nits. (#219)
* Update AInstEmitSimdArithmetic.cs

* Update ASoftFallback.cs

* Update ASoftFallback.cs

* Update ASoftFallback.cs

* Update AInstEmitSimdArithmetic.cs
2018-07-14 15:07:44 -03:00
gdkchan
e64f484521 Add SMLSL, SQRSHRN and SRSHR (Vector) cpu instructions, nits (#225)
* Add SMLSL, SQRSHRN and SRSHR (Vector) cpu instructions

* Address PR feedback

* Address PR feedback

* Remove another useless temp var

* nit: Alignment

* Replace Context.CurrOp.GetBitsCount() with Op.GetBitsCount()

* Fix encodings and move flag bit test out of the loop
2018-07-14 13:13:02 -03:00
Merry
1e35b99f74 AInstEmitSimdCvt: Half-precision to single-precision conversion (#235) 2018-07-12 15:51:02 -03:00
gdkchan
c590201b2a Fix ZIP/UZP/TRN instructions when Rd == Rn || Rd == Rm (#239) 2018-07-09 22:48:28 -03:00
gdkchan
0daf70a1d9 Query multiple pages at once with GetWriteWatch (#222)
* Query multiple pages at once with GetWriteWatch

* Allow multiple buffer types to share the same page, aways use the physical address as cache key

* Remove a variable that is no longer needed
2018-07-08 16:55:15 -03:00
Merry
a47b96571e ChocolArm64: More accurate implementation of Frecpe & Frecps (#228)
* ChocolArm64: More accurate implementation of Frecpe

* ChocolArm64: Handle infinities and zeros in Frecps
2018-07-08 16:54:47 -03:00
Merry
666e7e2e4c ASoftFloat: Fix InvSqrtEstimate for negative values (#233) 2018-07-08 12:41:46 -03:00
gdkchan
8aee846940 Remove broken adds/cmn with condition check optimization (#218) 2018-07-03 21:54:05 -03:00
gdkchan
392c5b7d98 Add SMAXP, SMINP, UMAX, UMAXP, UMIN and UMINP cpu instructions (#200) 2018-07-03 03:31:48 -03:00
LDj3SNuD
8d7582a918 Add Rbit_V instruction. Add 8 tests (Rbit_V; Rev16_V, Rev32_V, Rev64_V). Improve CountSetBits8() algorithm. (#212)
* Update AOpCodeTable.cs

* Update AInstEmitSimdArithmetic.cs

* Update AInstEmitSimdLogical.cs

* Update AVectorHelper.cs

* Update ASoftFallback.cs

* Update Instructions.cs

* Update CpuTestSimd.cs

* Update CpuTestSimdReg.cs

* Improve CountSetBits8() algorithm.

* Improve CountSetBits8() algorithm.
2018-07-03 03:31:16 -03:00
Thomas Guillemard
051740f344 Add linux-x64 to RID property to make tests works on linux (#205) 2018-06-30 12:43:04 -03:00
LDj3SNuD
ac5c1e5107 Add Saba_V, Sabal_V, Sabd_V, Sabdl_V, Uaba_V, Uabal_V; Update Uabd_V, Uabdl_V. Add 16 tests. (#204)
* Update AOpCodeTable.cs

* Update AInstEmitSimdArithmetic.cs

* Update AInstEmitSimdHelper.cs

* Update Instructions.cs

* Update CpuTest.cs

* Update CpuTestSimd.cs

* Update CpuTestSimdReg.cs
2018-06-30 12:40:41 -03:00
gdkchan
7e59d1b639 Add Sse2 fallback to Vector{Extract|Insert}Single methods on the CPU (#193) 2018-06-28 20:52:32 -03:00
gdkchan
f58651d009 Add support for the FMLA (by element/scalar) instruction (#187)
* Add support for the FMLA (by element/scalar) instruction

* Fix encoding
2018-06-28 20:51:38 -03:00
gdkchan
a12f31867c Implement SvcGetThreadContext3 2018-06-26 01:10:15 -03:00
LDj3SNuD
31871077e2 Add Sqxtun_S, Sqxtun_V with 3 tests. (#188)
* Update AInstEmitSimdArithmetic.cs

* Update Instructions.cs

* Update CpuTestSimd.cs
2018-06-25 23:36:20 -03:00
LDj3SNuD
86aae79b9d Add Sse Opt. for Cmeq_V_2D, Cmgt_V_2D (Reg). Add Sse Opt. for Crc32cb, Crc32ch, Crc32cw, Crc32cx. Add 10 simple tests for Fcmgt, Fcmge, Fcmeq, Fcmle, Fcmlt (S, V) (Reg, Zero). Add 2 Cnt_V tests. (#183)
* Add files via upload

* Add files via upload

* Add files via upload

* CPE

* Add EmitSse42Crc32()

* Update CpuTestSimdCmp.cs

* Update Pseudocode.cs

* Update Instructions.cs

* Update CpuTestSimd.cs

* Update Instructions.cs
2018-06-25 22:32:29 -03:00
gdkchan
216bcd7a65 Add REV16/32 (vector) instructions and fix REV64 2018-06-25 18:40:55 -03:00
Rygnus
3f81e1c795 Add opcodes SQXTUN_S and SQXTUN_V (#184)
* Add SQXTUN_S and SQXTUN_V

Part 1/2 of commit

* Add SQXTUN_S and SQXTUN_V (2/2)

Part 2/2 of commit
2018-06-25 14:23:46 -03:00
gdkchan
c9813159d1 Small OpenGL Renderer refactoring (#177)
* Call OpenGL functions directly, remove the pfifo thread, some refactoring

* Fix PerformanceStatistics calculating the wrong host fps, remove wait event on PFIFO as this wasn't exactly was causing the freezes (may replace with an exception later)

* Organized the Gpu folder a bit more, renamed a few things, address PR feedback

* Make PerformanceStatistics thread safe

* Remove unused constant

* Use unlimited update rate for better pref
2018-06-23 21:39:25 -03:00
gdkchan
f6ff678834 Fix some thread sync issues (#172)
* Fix some thread sync issues

* Remove some debug stuff

* Ensure that writes to the mutex address clears the exclusive monitor
2018-06-21 23:05:42 -03:00
riperiperi
32900cc223 Rework signed multiplication. Fixed an edge case and passes all tests. (#174) 2018-06-20 10:45:20 -03:00
LDj3SNuD
7084bf58a4 Add Cmeq_S, Cmge_S, Cmgt_S, Cmhi_S, Cmhs_S, Cmle_S, Cmlt_S (Reg, Zero) & Cmtst_S compare instructions. Add 22 compare tests (Scalar, Vector). Add Eor_V, Not_V tests. (#171)
* Add files via upload

* Add files via upload

* Delete CpuTestScalar.cs

* Update CpuTestSimdArithmetic.cs
2018-06-18 14:55:26 -03:00
gdkchan
ed80772500 Add the FADDP (scalar) instruction 2018-06-18 00:41:28 -03:00
riperiperi
05ef572474 Faster soft implementation of smulh and umulh (#134)
* Faster soft implementation of smulh and umulh

* smulh: Fixed mul with 0 acting like it had a negative result.

* Use compliment for negative smulh result.
2018-06-13 10:55:45 -03:00
Lordmau5
d99c39b448 Implement Fabs_V (#146) 2018-06-12 09:29:16 -03:00
gdkchan
231539a9e8 Move WriteBytes to AMemory, implement it with a Marshal copy like ReadBytes, fix regression on address range checking 2018-06-09 13:05:41 -03:00
gdkchan
0e8fd39636 Small cleanup in AMemory and removed some unused usings 2018-06-08 23:54:50 -03:00
gdkchan
1743dde334 Do not inline the scalar vector load methods as a workaround to a .net JIT bug 2018-06-08 23:49:53 -03:00
gdkchan
eafe47fee0 Texture/Vertex/Index data cache (#132)
* Initial implementation of the texture cache

* Cache vertex and index data aswell, some cleanup

* Improve handling of the cache by storing cached ranges on a list for each page

* Delete old data from the caches automatically, ensure that the cache is cleaned when the mapping/size changes, and some general cleanup
2018-06-08 21:15:56 -03:00
riperiperi
81b59077f8 ReadBytes function in AMemory, with cleaner range check. (#136) 2018-06-08 21:15:02 -03:00
gdkchan
f1027d5511 Force inline some of the vector read/write methods 2018-06-04 16:11:11 -03:00
gdkchan
65f781ae7b Fix mistake on astc conversion, make some static methods that shouldn't be public private, remove old commmented out code 2018-06-02 11:44:52 -03:00
gdkchan
7869b7e257 Added support for more shader instructions and texture formats, fix swapped channels in RGB565 and RGBA5551? texture formats, allow zero values on blending registers, initial work to build CFG on the shader decoder, update the BRA instruction to work with it (WIP) 2018-05-29 20:37:10 -03:00
gdkchan
09b194aaf0 Initial work to support AArch32 with a interpreter, plus nvmm stubs (not used for now) 2018-05-26 17:50:47 -03:00
gdkchan
d29632d7de Fix wrong type on CMTST instruction 2018-05-23 12:57:28 -03:00
gdkchan
e54a0ff9c6 Remove some calls generated on the CPU for inexistent intrinsic methods 2018-05-23 00:27:48 -03:00
gdkchan
173c3e616d Add scalar variants of FCVTZS/FCVTZU, fix a issue on Ryushader 2018-05-18 14:44:49 -03:00
gdkchan
1aa96453ef Add intrinsics support (#121)
* Initial intrinsics support

* Update tests to work with the new Vector128 type and intrinsics

* Drop SSE4.1 requirement

* Fix copy-paste mistake
2018-05-11 20:10:27 -03:00
gdkchan
428360c5ac NvServices refactoring (#120)
* Initial implementation of NvMap/NvHostCtrl

* More work on NvHostCtrl

* Refactoring of nvservices, move GPU Vmm, make Vmm per-process, refactor most gpu devices, move Gpu to Core, fix CbBind

* Implement GetGpuTime, support CancelSynchronization, fix issue on InsertWaitingMutex, proper double buffering support (again, not working properly for commercial games, only hb)

* Try to fix perf regression reading/writing textures, moved syncpts and events to a UserCtx class, delete global state when the process exits, other minor tweaks

* Remove now unused code, add comment about probably wrong result codes
2018-05-07 15:53:23 -03:00