dynarmic

Author	SHA1	Message	Date
merry	b8dd1c7510	emit_x64_floating_point: Correct dead-code warning in MSVC 2019	2022-02-12 22:07:26 +00:00
merry	95a1ebfb97	backend/x64: Bugfix: A32 frontent also uses FPSCR.QC	2022-02-12 21:46:45 +00:00
merry	473bbd422e	test_arm_instructions: Add vmsr/vcmp/vmrs test	2022-02-12 21:43:05 +00:00
Fernando Sahmkow	a8cbfd9af4	X86_Backend: set fences correctly for memory barriers and synchronization.	2022-02-01 14:27:54 +00:00
Alexandre Bouvier	0cafcc1af9	cmake: Always build static externals	2022-01-08 14:23:34 +00:00
Mai M	1635958d06	Merge pull request #658 from liushuyu/master disassembler_thumb: fix formatting issues with fmt 8.1.x	2022-01-06 00:17:16 -05:00
liushuyu	40afbe1927	disassembler_thumb: fix formatting issues with fmt 8.1.x ... ... fmt 8.1.0 added more formatting checks and Cond can't be formatted directly now	2022-01-05 21:49:51 -07:00
Wunkolo	ad5465d6ce	constant_pool: Use `tsl::robin_map` rather than `unordered_map` Finding a much more drastic improvement with `robin_map`. `map`: ``` [master] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 567.0 ms ± 6.9 ms [User: 513.1 ms, System: 53.2 ms] Range (min … max): 554.4 ms … 588.1 ms 100 runs ``` `unordered_map`: ``` [opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 561.1 ms ± 4.5 ms [User: 508.1 ms, System: 52.3 ms] Range (min … max): 552.6 ms … 574.2 ms 100 runs ``` `tsl::robin_map`: ``` [opt_const_pool] % hyperfine -r 100 "./dynarmic_tests --durations yes" Benchmark 1: ./dynarmic_tests --durations yes Time (mean ± σ): 553.5 ms ± 5.6 ms [User: 500.7 ms, System: 52.1 ms] Range (min … max): 545.7 ms … 569.3 ms 100 runs ```	2022-01-01 12:13:13 +00:00
Wunkolo	e57bb0569a	constant_pool: Convert hashtype from `tuple` to `pair`	2022-01-01 12:13:13 +00:00
Wunkolo	befc22a61e	constant_pool: Use `unordered_map` rather than `map` `map` is an ordinal structure with log(n) time searches. `unordered_map` uses O(1) average-time searches and O(n) in the worst case where a bucket has a to a colliding hash and has to start chaining. The unordered version should speed up our general-case when looking up constants. I've added a trivial order-dependent(_(0,1) and (1,0) will return a different hash_) hash to combine a 128-bit constant into a 64-bit hash that generally will not collide, using a bit-rotate to preserve entropy.	2022-01-01 12:13:13 +00:00
Morph	28714ee75a	general: Rename files with duplicate names In MSVC, having files with identical filenames will result into massive slowdowns when compiling. The approach I have taken to resolve this is renaming the identically named files in frontend/(A32, A64) to (a32, a64)_filename.cpp/h	2021-12-23 11:38:58 +00:00
Andrea Pappacoda	4dcebc1822	build(cmake): add install target This makes dynarmic installable, and also adds a CMake package config file, that allows projects to use `find_package(dynarmic)` to import the library. I know #636 adds the same thing, but while experimenting with the different install options in https://github.com/merryhime/dynarmic/pull/636#discussion_r725656034 I ended up with a working patch, so I'm proposing this as well. This implements solution 2.	2021-10-30 19:03:23 +01:00
Mai M	cce7e4ee5d	Merge pull request #651 from ameerj/fmt-cmake externals/cmake: Fix fmt target check	2021-10-12 14:33:36 -04:00
ameerj	4cfbbe3df2	externals/cmake: Fix fmt target check	2021-10-11 13:44:19 -04:00
Andrea Pappacoda	b87a889d98	build(cmake): add version and soversion to the library This adds versioning information to the built library. When building the shared library on Linux systems, a new object will be created: libdynarmic.so.5 This is really useful when talking about ABI compatibility. The variables dynarmic_VERSION and dynarmic_VERSION_MAJOR are implicitly created when calling project(dynarmic VERSION x.y.z)	2021-10-11 06:53:05 +01:00
ameerj	55bede81f8	CMake: Fix fmt target check	2021-10-11 06:52:52 +01:00
Fernando S	e4146ec3a1	x64 Interface: Allow for asynchronous invalidation (#647 ) * x64 Interface: Make Invalidation asynchronous. * Apply suggestions from code review	2021-10-05 15:06:41 +01:00
Wunkolo	5e7d2afe0f	IR: Introduce `VectorReduceAdd{8,16,32,64}` opcode Adds all elements of vector and puts the result into the lowest element. Accelerates the `addv` instruction into a vectorized implementation rather than a serial one.	2021-09-27 19:54:11 +01:00
Wunkolo	69b831d7d2	tests: Add {S,V}ADD{V,P} tests These are the instructions emitted for each variant of the `vaddv{q}_{s}{8,16,32,64}` intrinsic.	2021-09-27 19:54:11 +01:00
Marshall Mohror	0b8fd755d8	Fix `signal_stack_size` for glibc 2.34 `SIGSTKSZ` is now defined as `sysconf(_SC_SIGSTKSZ)` which is not constexpr, and returns a long which throws off the `std::max` template deduction.	2021-09-22 20:38:11 +01:00
Ben	6ce8bfaf32	Add API function to retrieve dissassembly as vector of strings (#644 ) Co-authored-by: ben <Avuxo@users.noreply.github.com>	2021-09-16 16:45:20 -04:00
Macchiarch	f88aa570a3	cpu_info: remove tSSE4a and tSSE5 (#643 ) tSSE4a and tSSE5 have been removed from xbyak	2021-09-06 20:49:10 +01:00
merry	1697902948	Merge pull request #641 from abouvier/unbundle CMakeLists: Add options to unbundle most external libraries	2021-08-25 07:56:12 +01:00
Alexandre Bouvier	352898e88b	cmake: Add options to unbundle Zydis	2021-08-24 12:28:44 +02:00
Merry	517e35f845	decoder_detail: Avoid MSVC ICE MSVC has an internal compiler error when assume is present in this constexpr function	2021-08-15 19:32:05 +01:00
Merry	2e4f99ae3d	CMakeLists: Expose DYNARMIC_IGNORE_ASSERTS option	2021-08-15 16:09:37 +01:00
Merry	3b4459d112	CMakeLists: Enable C++20 support	2021-08-15 15:17:01 +01:00
Merry	4988d9fab3	disassembler_arm: Fix format strings for vfp_VMOV_from_i{8,16}	2021-08-15 15:16:53 +01:00
Merry	615ce8c7c5	IR: Remove A32 IR instructions Get{N,Z,V}Flag	2021-08-12 13:06:15 +01:00
Alexandre Bouvier	04b1c78166	cmake: Add checks for projects using dynarmic as subproject	2021-08-10 16:16:02 +02:00
Alexandre Bouvier	33b89cca08	cmake: Add options to unbundle some externals	2021-08-10 16:05:38 +02:00
Merry	72f8abe11d	externals: Update mp to latest Merge commit '163b59390c32745f95838b121be3ef5e2cf08e8c'	2021-08-10 12:30:46 +01:00
Merry	163b59390c	Squashed 'externals/mp/' changes from 649fde1e..b50053ce b50053ce function_info: Implement equivalent_function_type_with_class git-subtree-dir: externals/mp git-subtree-split: b50053cef50385419c59fb3aebb78974547318bc	2021-08-10 12:30:46 +01:00
Merry	2bc86209bd	catch: Correct include directory	2021-08-08 12:52:55 +01:00
Wunkolo	1e94acff66	ir: Add VectorBroadcastElement{Lower} IR instruction The lane-splatting variant of `FMUL` and `FMLA` is very common in instruction streams when implementing things like matrix multiplication. When used, they are used very densely. https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication The way this is currently implemented is by grabbing the particular lane into a general purpose register and then broadcasting it into a simd register through `VectorGetElement` and `VectorBroadcast`. ```cpp const IR::U128 operand2 = v.ir.VectorBroadcast(esize, v.ir.VectorGetElement(esize, v.V(idxdsize, Vm), index)); ``` What could be done instead is to keep it within the vector-register and use a permute/shuffle to "splat" the particular lane across all other lanes, removing the GPR-round-trip. This is implemented as the new IR instruction `VectorBroadcastElement`: ```cpp const IR::U128 operand2 = v.ir.VectorBroadcastElement(esize, v.V(idxdsize, Vm), index); ```	2021-08-07 23:03:57 +01:00
Wunkolo	46b8cfabc0	bit_util: Protect Replicate from automatic up-casting Recursive calls to `Replicate` beyond the first call might cause an unintentional up-casting to an `int` type due to `\|` and `<<` operations on types such as `uint8_t` and `uint16_t` This makes sure calls such as `Recursive<u8>` stay as the `u8` type through-out.	2021-08-07 23:03:57 +01:00
Wunkolo	f171ce7859	tests: Add FMLA(lane) test Math operations such as Matrix multiplication utilize these particular instructions enough that there should be some unit tests for thesein particular. The lane-splatting form of FMUL and FMLA instructions are of particular interest and I've found them to be very common in retail game binaries such as Pokemon Sword. https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/coding-for-neon---part-3-matrix-multiplication I'm primarily adding this unit test so that I can ensure compatibility while I tune and optimize them.	2021-08-07 23:03:57 +01:00
Merry	d41bc492fe	{a32,a64}_jitstate: Remove unnecessary headers	2021-08-07 19:35:33 +01:00
Merry	07b5734fb0	xbyak: Correct xbyak include directory xbyak is intended to be installed in /usr/local/include/xbyak. Since we desire not to install xbyak before using it, we copy the headers to the appropriate directory structure and use that instead	2021-08-07 15:13:49 +01:00
Merry	31cefb22a0	fuzz_with_unicorn: Correct printing of vectors	2021-08-06 15:29:43 +01:00
Merry	59fb568b27	tests: Use Zydis for disassembly	2021-08-06 15:29:43 +01:00
Wunkolo	f33bd69ec2	emit_x64_vector_floating_point: AVX512 implementation of EmitFPVectorToFixed AVX512 introduces the _unsigned_ variant of float-to-integer conversion functions via `vcvttp{sd}2u{dq}q`. In the case that a value is not representable as an unsigned integer, it will result in `0xFFFFF...` which can be utilized to get "free" saturation when the floating point value exceeds the unsigned range, after masking away negative values. https://www.felixcloutier.com/x86/vcvttps2udq https://www.felixcloutier.com/x86/vcvttpd2uqq This PR also speeds up the _signed_ conversion function for fp64->int64 https://www.felixcloutier.com/x86/vcvttpd2qq	2021-07-17 22:13:11 +01:00
SachinVin	048da372e9	block_of_code.cpp: remove redundant `align()`	2021-07-17 22:12:31 +01:00
Kappamalone	6ca6461450	docs/Design: Fix links (#633 )	2021-07-11 19:22:46 +01:00
Merry	65309eb6bc	gitignore: Update mig path	2021-07-11 11:38:43 +01:00
Wunkolo	5971361160	IR: Add AndNot{32,64} IR instruction Also includes BMI1-acceleration for x64, when available	2021-07-02 22:27:29 +01:00
Wunkolo	49d00634f9	IR: Add VectorAndNot IR instruction And(a, Not(b)) is a common enough operation that this can be fused into a single `AndNot` operation. On x64 this is also a single `pandn` instruction rather than two.	2021-07-02 22:27:29 +01:00
Wunkolo	253713baf1	opcodes.inc: Disable clang format	2021-07-02 22:27:29 +01:00
Wunkolo	1fc96fd0c2	emit_x64{_vector}_floating_point: Unsafe AVX512 implementation of Emit{RSqrt,Recip}Estimate This implementation exists within the unsafe optimization paths and utilize the 14-bit-precision `vrsqrt14` and `vrcp14p` instructions provided by AVX512F+VL. These are _more_ accurate than the fallback path and the current `rsqrt`-based unsafe code-path but still falls in line with what is expected of the `Unsafe_ReducedErrorFP` optimization flag. Having AVX512 available will mean this function has 14 bits of precision. Not having AVX512 available will mean these functions have 11 bits of precision.	2021-06-27 11:18:58 +01:00
MerryMage	ea02a7d05d	conditional_state: Break from translation when invalid NV instruction is hit	2021-06-25 22:09:39 +01:00

... 3 4 5 6 7 ...

2981 commits