The check to flip faces when viewports are negative were a left over
from the old OpenGL code. This is not required on Vulkan where we have
negative viewports.
Hardware S2R special registers match gl_Thread*MaskNV. We can trivially
implement these using Nvidia's extension on OpenGL or naively stubbing
them with the ARB instructions to match. This might cause issues if the
host device warp size doesn't match Nvidia's. That said, this is
unlikely on proper shaders.
Refer to the attached url for more documentation about these flags.
https://www.khronos.org/registry/OpenGL/extensions/NV/NV_shader_thread_group.txt
Some operations like atomicMin were ignored because they returned were
being stored to RZ. This operations have a side effect and it was being
ignored.
Instead of using boost::icl::interval_map for caching, use
boost::intrusive::set. interval_map is intended as a container where the
keys can overlap with one another; we don't need this for caching
buffers and a std::set-like data structure that allows us to search with
lower_bound is enough.
Constant attributes (in OpenGL known disabled attributes) are not
supported on Vulkan, even with extensions. To emulate this behavior we
return zero on reads from disabled vertex attributes in shader code.
This has no caching cost because attribute formats are not dynamic state
on Vulkan and we have to store it in the pipeline cache anyway.
- Fixes Animal Crossing: New Horizons terrain borders
This was a left over from OpenGL when disabled buffers where not properly
emulated. We no longer have to assert this as it is checked in vertex
buffer initialization.
This should fix grass interactions on Breath of the Wild on Vulkan.
It is currently untested against validation layers.
Nvidia's Windows 443.09 beta driver or Linux 440.66.12 is required for
now.
Reduces some header churn and reduces rebuilds when some header
internals change.
While we're at it we can also resolve a missing include in buffer_cache.
Xenoblade 2 invokes a draw call with zero vertices.
This is likely due to indirect drawing (glDrawArraysIndirect).
This causes a crash in the staging buffer pool when trying to create a
buffer with a size of zero. To workaround this, skip index buffer setup
entirely when the number of indices is zero.
Drop MemoryBarrier from the buffer cache and use Maxwell3D's register
WaitForIdle.
To implement this on OpenGL we just call glMemoryBarrier with the
necessary bits.
Vulkan lacks this synchronization primitive, so we set an event and
immediately wait for it. This is not a pretty solution, but it's what
Vulkan can do without submitting the current command buffer to the queue
(which ends up being more expensive on the CPU).
Using deko3d as reference:
4e47ba0013/source/maxwell/gpu_3d_state.cpp (L42)
We were using bits 3 and 4 to determine depth clamping, but these are
the same both enabled and disabled:
state->depthClampEnable ? 0x101A : 0x181D
The same happens on Nvidia's OpenGL driver, where they do something like
this (default capabilities, GL 4.5 compatibility):
(state & DEPTH_CLAMP) != 0 ? 0x201a : 0x281c
There's always a difference between the first bits in this register, but
bit 11 is consistently disabled on both deko3d/NVN and OpenGL. This
commit changes yuzu's behaviour to use bit 11 to determine depth
clamping.
- Fixes depth issues on Super Mario Odyssey's intro.
This reverts commit 94b0e2e5da.
preserve_contents proved to be a meaningful optimization. This commit
reintroduces it but properly implemented on OpenGL.
We have to make sure the clear removes all the previous contents of the
image.
It's not currently implemented on Vulkan because we can do smart things
there that's preferred to be introduced in a separate commit.
Deduplicate code shared between vk_pipeline_cache and gl_shader_cache as
well as shader decoder code.
While we are at it, fix a bug in gl_shader_cache where compute shaders
had an start offset of a stage shader.
Sometimes for unknown reasons NVN games can bind a render target format
of 0. This may be a yuzu bug.
With the commits before this the formats were specified without being
"packed", assuming all formats and texceptions will be written like in
the color_attachments vector.
To address this issue, iterate all render targets and pack them as they
are valid. This way they will match color_attachments.
- Fixes validation errors and graphical issues on Breath of the Wild.
The intention behind this was to assign a float to from an uint32_t, but
it was unintentionally being copied directly into the std::optional.
Copy to a temporary and assign that temporary to std::optional. This can
be replaced with std::bit_cast<float> once we are in C++20.
All drivers (even Intel) seem to have a device local memory type that is
not host visible. Remove this flag so all devices follow the same path.
This fixes a crash when trying to map to host device local memory on
integrated devices.
Introduce a default buffer getter that lazily constructs an empty
buffer. This is intended to match OpenGL's buffer 0.
Use this for disabled vertex and uniform buffers.
While we are at it, include vertex buffer usages for staging buffers to
silence validation errors.
On NVN buffers can be enabled but have no size. According to deko3d and
the behavior we see in Animal Crossing: New Horizons these buffers get
the special address of 0x1000 and limit themselves to 0xfff.
Implement buffers without a size by binding a null buffer to OpenGL
without a side.
1d1930beea/source/maxwell/gpu_3d_vbo.cpp (L62-L63)
Render.Vulkan <Error> video_core/renderer_vulkan/renderer_vulkan.cpp:CreateInstance:131: Presentation not supported on this platform
Render.Vulkan <Error> video_core/renderer_vulkan/renderer_vulkan.cpp:CreateSurface:378: Presentation not supported on this platform
Core <Critical> core/core.cpp:Load:199: Failed to initialize system (Error 5)!
Sort discrete GPUs over the rest, Nvidia over AMD, AMD over Intel, Intel
over the rest. This gives us a somewhat consistent order when Optimus
is removed (renderdoc does this when it's attached).
This can break the configuration of users with an Intel GPU that
manually remove Optimus on yuzu. That said, it's a very unlikely to
happen.
Pad FixedPipelineState's size to 384 bytes to be a multiple of 16.
Compare the whole struct with std::memcmp and hash with CityHash. Using
CityHash instead of a naive hash should reduce the number of collisions.
Improve used type traits to ensure this operation is safe.
With these changes the improvements to the hashable pipeline state are:
Optimized structure
Hash: 89 ns
Comparison: 103 ns
Construction*: 164 ns
Struct size: 384 bytes
Original structure
Hash: 148 ns
Equal: 174 ns
Construction*: 281 ns
Size: 1384 bytes
* Attribute state initialization is not measured
These measures are averages taken with std::chrono::high_accuracy_clock
on MSVC shipped on Visual Studio 16.6.0 Preview 2.1.
Nvidia recently introduced a new memory type for data streaming
(awesome!), but yuzu was assuming that all heaps had enough memory
for the assumed stream buffer size (256 MiB).
This worked fine on AMD but Nvidia's new memory heap was smaller than
256 MiB. This commit changes this assumption and allocates a bit less
than the size of the preferred heap, with a maximum of 256 MiB (to avoid
allocating all system memory on integrated devices).
- Fixes a crash on NVIDIA 450.82.0.0
Implement indexed quads (GL_QUADS used with glDrawElements*) with a
compute pass conversion.
The compute shader converts from uint8/uint16/uint32 indices to uint32.
The format is passed through push constants to avoid having different
variants of the same shader.
- Used by Fast RMX
- Used by Xenoblade Chronicles 2 (it still has graphical due to
synchronization issues on Vulkan)
The original idea of returning pointers is that handles can be moved.
The problem is that the implementation didn't take that in mind and made
everything harder to work with. This commit drops pointer to handles and
returns the handles themselves. While it is still true that handles can
be invalidated, this way we get an old handle instead of a dangling
pointer.
This problem can be solved in the future with sparse buffers.
When the dynamic state is specified, pViewports and pScissors are
ignored, quoting the specification:
pViewports is a pointer to an array of VkViewport structures, defining
the viewport transforms. If the viewport state is dynamic, this member
is ignored.
That said, AMD's proprietary driver itself seem to read it regardless of
what the specification says.
Adds optional support for Nsight Aftermath. It is enabled through
ENABLE_NSIGHT_AFTERMATH in cmake. A path to the SDK has to be provided
by the environment variable NSIGHT_AFTERMATH_SDK.
Nsight Aftermath allows an application to generate "minidumps" of the
GPU state when a device loss happens. By analysing these on Nsight we
can know what a game was doing and why it triggered a device loss.
The dump is generated inside %APPDATA%\yuzu\log\gpucrash and this
directory is deleted every time a new instance is initialized with
Nsight enabled.
To enable it on yuzu there has a to be a driver and device capable of
running Nsight Aftermath on Vulkan. That means only Turing based GPUs
on the latest stable driver, beta drivers won't work for now.
It is manually enabled in Configuration>Debug>Enable Graphics Debugging
because when using all debugging capabilities there is a runtime cost.
preserve_contents was always true. We can't assume we don't have to
preserve clears because scissored and color masked clears exist.
This removes preserve_contents and assumes it as true at all times.