Due to BindBufferRangeNV limitations and poor quality code emission from
our side, assembly shaders are currently slower than GLSL. Their build
time and feature advantages are still relevant, but they are outweighted
by their runtime performance.
Detect when a memory region has been joined several times and increase
the size of the created buffer on those instances. The buffer is assumed
to be a "stream buffer", increasing its size should stop us from
constantly recreating it and fragmenting memory.
Ports from OpenGL the optimization to skip small 3D uniform buffer
uploads. This will take advantage of the previously introduced stream
buffer.
Fixes instances where the staging buffer offset was being ignored.
This uses a ring buffer similar to OpenGL's stream buffer for small
uploads. This stops us from allocating several small buffers, reducing
memory fragmentation and cache locality.
It uses dedicated allocations when possible.
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.
- Bindings are cached, allowing to skip work when the game changes few
bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
from the GPU to emulate constant buffers, instead GL_EXT_memory_object
is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
theory this should save one hash table resolve inside the driver
compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
that are not Nvidia's proprietary, due to their low performance on
partial glBufferSubData calls synchronized with 3D rendering (that
some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
cache more bindings than before, skipping unnecesarry work.
This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.
Workaround an issue on Nvidia where creating a Vulkan instance from an
active OpenGL thread disables threaded optimization on the driver.
This optimization is important to have good performance on Nvidia
OpenGL.
Instead of using a two step initialization to report errors, initialize
the GPU renderer and rasterizer on the constructor and report errors
through std::runtime_error.
Some games usually write memory pages currently used by the GPU, causing
rendering issues (e.g. flashing geometry and shadows on Link's
Awakening). To workaround this issue, Guest CPU writes are delayed until
the command buffer finishes processing, but the pages are updated
immediately.
The overall behavior is:
- CPU writes are cached until they are flushed, they update the page
state, but don't change the modification state. Cached writes stop
pages from being flushed, in case games have meaningful data in it.
- Command processing writes (e.g. push constants) update the page state
and are marked to the command processor as dirty. They don't remove
the state of cached writes.
This implements KScopedReservation, allowing resource limit reservations to be more HW accurate, and release upon failure without requiring too many conditionals.
* kernel: Unify result codes
Drop the usage of ERR_NAME convention in kernel for ResultName. Removed seperation between svc_results.h & errors.h as we mainly include both most of the time anyways.
* oops
* rename errors to svc_results
Fixes assertion on Bloodstained Ritual of the Night.
We would over read sometimes, this is fixed by checking if the top bit is set in the first iteration. We also lock the loop off to be only the max size of the type we can fit. Finally we changed an incorrect print of "DEBUG" to "TRACE" to reflect the proper log severity
This is a useful function in a generic context or with types that
overload unary operator&. However, primitives and pointers will never do
this, so we can opt for a more straightforward syntax.
Also renames related CMake variables to match both the Find*FFmpeg* and
variables defined within the file. Fixes odd errors produced by the old
FindFFmpeg.
Citra's FindFFmpeg is slightly modified here: adds Citra's copyright at
the beginning, renames FFmpeg_INCLUDES to FFmpeg_INCLUDE_DIR, disables a
few components in _FFmpeg_ALL_COMPONENTS, and adds the missing avutil
component to the comment above.
For Linux, instructs CMake to use the FFmpeg submodule in externals.
This is HEAVILY based on our usage of the late Unicorn. Minimal change
to MSVC as it uses the yuzu-emu/ext-windows-bin. MinGW now targets the
same ext-windows-bin libraries as MSVC for FFmpeg. Adds FFMPEG_LIBRARIES
to WIN32 and simplifies video_core/CMakeLists.txt a bit.