The new Nvidia drivers have a bug where the FastReplicateTo6 function produces a lookup into the REPLICATE_TO_8 table rather than the REPLICATE_TO_6 table.
This seems to be an optimization gone wrong. Combining the logic of the FastReplicate functions seems to address the bug.
This buffer was a list of EncodingData structures sorted by their bit length, with some duplication from the cpu decoder implementation.
We can take advantage of its sorted property to optimize its usage in the shader.
Thanks to wwylele for the optimization idea.
These changes should help in reducing crashes/drivers panics that may
occur due to synchronization issues between the shader completion and
later access of the decoded texture.
Per the spec, L1 is clamped to the value 0xff if it is greater than 0xff. An oversight caused us to take the maximum of L1 and 0xff, rather than the minimum.
Huge thanks to wwylele for finding this.
Co-Authored-By: Weiyi Wang <wwylele@gmail.com>
yuzu requires CMake 3.15 yet find_program was using REQUIRED, which is
only available on 3.18 and later. Instead, we check for
"<VAR>-NOTFOUND".
In addition, check for additional requirements before building libusb or
FFmpeg with autotools. Otherwise, CMake configuration will pass yet
compilation will fail.
ASTC texture decoding is currently handled by a CPU decoder for GPU's without native ASTC decoding support (most desktop GPUs). This is the cause for noticeable performance degradation in titles which use the format extensively.
This commit adds support to accelerate ASTC decoding using a compute shader on OpenGL for GPUs without native support.
OpenGL does not natively support BGR internal formats, which causes many BGR textures to render incorrectly, with Red and Blue channels swapped.
This commit aims to address this by swizzling the blue and red channels on texture copies when a BGR format is encountered.
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.
- Bindings are cached, allowing to skip work when the game changes few
bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
from the GPU to emulate constant buffers, instead GL_EXT_memory_object
is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
theory this should save one hash table resolve inside the driver
compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
that are not Nvidia's proprietary, due to their low performance on
partial glBufferSubData calls synchronized with 3D rendering (that
some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
cache more bindings than before, skipping unnecesarry work.
This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.