1
1
Fork 0
forked from suyu/suyu
suyu/src/common
Lioncash 0d8ef2d3b9 common/swap: Improve codegen of the default swap fallbacks
Uses arithmetic that can be identified more trivially by compilers for
optimizations. e.g. Rather than shifting the halves of the value and
then swapping and combining them, we can swap them in place.

e.g. for the original swap32 code on x86-64, clang 8.0 would generate:

    mov     ecx, edi
    rol     cx, 8
    shl     ecx, 16
    shr     edi, 16
    rol     di, 8
    movzx   eax, di
    or      eax, ecx
    ret

while GCC 8.3 would generate the ideal:

    mov     eax, edi
    bswap   eax
    ret

now both generate the same optimal output.

MSVC used to generate the following with the old code:

    mov     eax, ecx
    rol     cx, 8
    shr     eax, 16
    rol     ax, 8
    movzx   ecx, cx
    movzx   eax, ax
    shl     ecx, 16
    or      eax, ecx
    ret     0

Now MSVC also generates a similar, but equally optimal result as clang/GCC:

    bswap   ecx
    mov     eax, ecx
    ret     0

====

In the swap64 case, for the original code, clang 8.0 would generate:

    mov     eax, edi
    bswap   eax
    shl     rax, 32
    shr     rdi, 32
    bswap   edi
    or      rax, rdi
    ret

(almost there, but still missing the mark)

while, again, GCC 8.3 would generate the more ideal:

    mov     rax, rdi
    bswap   rax
    ret

now clang also generates the optimal sequence for this fallback as well.

This is a case where MSVC unfortunately falls short, despite the new
code, this one still generates a doozy of an output.

    mov     r8, rcx
    mov     r9, rcx
    mov     rax, 71776119061217280
    mov     rdx, r8
    and     r9, rax
    and     edx, 65280
    mov     rax, rcx
    shr     rax, 16
    or      r9, rax
    mov     rax, rcx
    shr     r9, 16
    mov     rcx, 280375465082880
    and     rax, rcx
    mov     rcx, 1095216660480
    or      r9, rax
    mov     rax, r8
    and     rax, rcx
    shr     r9, 16
    or      r9, rax
    mov     rcx, r8
    mov     rax, r8
    shr     r9, 8
    shl     rax, 16
    and     ecx, 16711680
    or      rdx, rax
    mov     eax, -16777216
    and     rax, r8
    shl     rdx, 16
    or      rdx, rcx
    shl     rdx, 16
    or      rax, rdx
    shl     rax, 8
    or      rax, r9
    ret     0

which is pretty unfortunate.
2019-04-12 00:07:39 -04:00
..
logging general: Use deducation guides for std::lock_guard and std::unique_lock 2019-04-01 12:53:47 -04:00
x64 common: Remove dependency on xbyak 2018-11-21 03:43:41 -05:00
alignment.h common: Add function for checking word alignment to alignment.h 2018-10-18 12:58:27 -04:00
assert.h Permit a Null Shader in case of a bad host_ptr. 2019-04-07 07:52:01 -04:00
bit_field.h common/bit_util: Fix bad merge duplicating the copy constructor 2019-03-20 23:48:37 -04:00
bit_util.h common/bit_util: Make CountLeading/CountTrailing functions have the same return types 2019-04-05 15:29:40 -04:00
cityhash.cpp Port #4182 from Citra: "Prefix all size_t with std::" 2018-09-15 15:21:06 +02:00
cityhash.h Port #4182 from Citra: "Prefix all size_t with std::" 2018-09-15 15:21:06 +02:00
CMakeLists.txt common/zstd_compression: Add Zstandard wrapper 2019-03-29 18:22:08 +01:00
color.h common/vector_math: Move Vec[x] types into the Common namespace 2019-02-26 22:38:36 -05:00
common_funcs.h Port #3732 from Citra: "common: Fix compilation on ARM" 2018-07-29 15:51:31 +02:00
common_paths.h file_util: Add shader directory 2019-02-06 22:20:57 -03:00
common_types.h gpu: Move GPUVAddr definition to common_types. 2019-03-20 22:36:02 -04:00
detached_tasks.cpp general: Use deducation guides for std::lock_guard and std::unique_lock 2019-04-01 12:53:47 -04:00
detached_tasks.h Review comments - part 5 2018-10-02 16:04:10 +02:00
file_util.cpp file_util: Add shader directory 2019-02-06 22:20:57 -03:00
file_util.h file_util: Add shader directory 2019-02-06 22:20:57 -03:00
hash.h Port #4182 from Citra: "Prefix all size_t with std::" 2018-09-15 15:21:06 +02:00
hex_util.cpp ips_layer: Deduplicate resource usage 2018-10-04 11:34:36 -04:00
hex_util.h ips_layer: Deduplicate resource usage 2018-10-04 11:34:36 -04:00
lz4_compression.cpp common/lz4_compression: Remove #pragma once directive from the cpp file 2019-04-03 22:07:04 -04:00
lz4_compression.h Addressed feedback 2019-03-29 18:12:42 +01:00
math_util.h common/math_util: Move contents into the Common namespace 2019-02-27 03:38:39 -05:00
memory_hook.cpp core: Move PageTable struct into Common. 2019-03-16 22:05:40 -04:00
memory_hook.h core: Move PageTable struct into Common. 2019-03-16 22:05:40 -04:00
microprofile.cpp Integrate the MicroProfile profiling library 2015-08-24 22:16:28 -03:00
microprofile.h Sources: Run clang-format on everything. 2016-09-18 09:38:01 +09:00
microprofileui.h Common: Remove section measurement from profiler (#1731) 2016-04-29 00:07:10 -07:00
misc.cpp Port #4182 from Citra: "Prefix all size_t with std::" 2018-09-15 15:21:06 +02:00
multi_level_queue.h common/multi_level_queue: Silence truncation warning in iterator operator++ 2019-04-05 15:35:46 -04:00
page_table.cpp gpu: Rewrite virtual memory manager using PageTable. 2019-03-20 22:36:02 -04:00
page_table.h gpu: Rewrite virtual memory manager using PageTable. 2019-03-20 22:36:02 -04:00
param_package.cpp citra_qt/configuration: misc input tab improvements 2018-10-06 15:43:49 +02:00
param_package.h citra_qt/configuration: misc input tab improvements 2018-10-06 15:43:49 +02:00
quaternion.h common/vector_math: Move Vec[x] types into the Common namespace 2019-02-26 22:38:36 -05:00
ring_buffer.h ring_buffer: Use std::atomic_size_t in a static assert 2018-09-18 23:36:04 -04:00
scm_rev.cpp.in gl_shader_disk_cache: Invalidate shader cache changes with CMake hash 2019-02-06 22:20:57 -03:00
scm_rev.h gl_shader_disk_cache: Invalidate shader cache changes with CMake hash 2019-02-06 22:20:57 -03:00
scope_exit.h Format: Run the new clang format on everything 2018-01-20 16:45:11 -07:00
string_util.cpp am: Deglobalize software keyboard applet 2018-11-18 10:53:47 -05:00
string_util.h am: Deglobalize software keyboard applet 2018-11-18 10:53:47 -05:00
swap.h common/swap: Improve codegen of the default swap fallbacks 2019-04-12 00:07:39 -04:00
telemetry.cpp common/telemetry: Migrate core-independent info gathering to common 2018-08-14 18:57:46 -04:00
telemetry.h compatdb: Use a seperate endpoint for testcase submission 2018-10-28 13:23:02 +01:00
thread.cpp common/thread: Remove unused functions 2019-03-29 13:26:21 -04:00
thread.h general: Use deducation guides for std::lock_guard and std::unique_lock 2019-04-01 12:53:47 -04:00
thread_queue_list.h common/thread_queue_list: Remove unnecessary dependency on boost 2019-03-16 05:01:39 -04:00
threadsafe_queue.h general: Use deducation guides for std::lock_guard and std::unique_lock 2019-04-01 12:53:47 -04:00
timer.cpp Port #3972 from Citra: "common/timer: use std::chrono, avoid platform-dependent code" 2018-07-29 14:58:30 +02:00
timer.h Port #3972 from Citra: "common/timer: use std::chrono, avoid platform-dependent code" 2018-07-29 14:58:30 +02:00
uint128.cpp common/uint128: Add missing top-file source text 2019-03-20 22:38:25 -04:00
uint128.h common/uint128: Add missing header guard 2019-03-20 22:39:00 -04:00
vector_math.h common/vector_math: Move Vec[x] types into the Common namespace 2019-02-26 22:38:36 -05:00
web_result.h web_backend: Make Client use the PImpl idiom 2018-10-10 22:29:35 -04:00
zstd_compression.cpp common/zstd_compression: simplify decompression interface 2019-03-29 18:22:08 +01:00
zstd_compression.h common/zstd_compression: simplify decompression interface 2019-03-29 18:22:08 +01:00