Removes common_sizes.h in favor of having `_KiB`, `_MiB`, `_GiB`, etc
user-literals within literals.h.
To keep the global namespace clean, users will have to use:
```
using namespace Common::Literals;
```
to access these literals.
These changes should help in reducing crashes/drivers panics that may
occur due to synchronization issues between the shader completion and
later access of the decoded texture.
The FPS counter was based on metrics in the nvdisp swapbuffers call. This metric would be accurate if the gpu thread/renderer were synchronous with the nvdisp service, but that's no longer the case.
This commit moves the frame counting responsibility onto the concrete renderers after their frame draw calls. Resulting in more meaningful metrics.
The displayed FPS is now made up of the average framerate between the previous and most recent update, in order to avoid distracting FPS counter updates when framerate is oscillating between close values.
The status bar update frequency was also changed from 2 seconds to 500ms.
In order to force the BGRA8 conversion on Nvidia using OpenGL, we need to forbid texture copies and views with other formats.
This commit also adds a boolean relating to this, as this needs to be done only for the OpenGL api, Vulkan must remain unchanged.
Load the current tick to a local variable, moving it out of an atomic
and allowing us to compare the value without going through a pointer
each time. This should make the loop more optimizable.
Fix a tragic off-by-one condition that causes Vulkan's stream buffer to
think it's always full, using fallback memory. The OpenGL was also
affected by this bug to a lesser extent.
There was still a code path that could wait on a timeline semaphore tick
that would never be signalled.
While we are at it, make use of more STL algorithms.
Games can bind a null index buffer (size=0) where all indices are
evaluated as zero. VK_EXT_robustness2 doesn't support this and all
drivers segfault when a null index buffer is passed to
vkCmdBindIndexBuffer.
Workaround this by creating a 4 byte buffer and filling it with zeroes.
If it's read out of bounds, robustness takes care of returning zeroes as
indices.
Avoids waiting idle while the GPU finishes to do work, and fixes an
issue where we'd wait forever if a single command buffer (logic tick)
all the data.
Ports from OpenGL the optimization to skip small 3D uniform buffer
uploads. This will take advantage of the previously introduced stream
buffer.
Fixes instances where the staging buffer offset was being ignored.
This uses a ring buffer similar to OpenGL's stream buffer for small
uploads. This stops us from allocating several small buffers, reducing
memory fragmentation and cache locality.
It uses dedicated allocations when possible.
Reimplement the buffer cache using cached bindings and page level
granularity for modification tracking. This also drops the usage of
shared pointers and virtual functions from the cache.
- Bindings are cached, allowing to skip work when the game changes few
bits between draws.
- OpenGL Assembly shaders no longer copy when a region has been modified
from the GPU to emulate constant buffers, instead GL_EXT_memory_object
is used to alias sub-buffers within the same allocation.
- OpenGL Assembly shaders stream constant buffer data using
glProgramBufferParametersIuivNV, from NV_parameter_buffer_object. In
theory this should save one hash table resolve inside the driver
compared to glBufferSubData.
- A new OpenGL stream buffer is implemented based on fences for drivers
that are not Nvidia's proprietary, due to their low performance on
partial glBufferSubData calls synchronized with 3D rendering (that
some games use a lot).
- Most optimizations are shared between APIs now, allowing Vulkan to
cache more bindings than before, skipping unnecesarry work.
This commit adds the necessary infrastructure to use Vulkan object from
OpenGL. Overall, it improves performance and fixes some bugs present on
the old cache. There are still some edge cases hit by some games that
harm performance on some vendors, this are planned to be fixed in later
commits.
Instead of using a two step initialization to report errors, initialize
the GPU renderer and rasterizer on the constructor and report errors
through std::runtime_error.
Vulkan 1.0 didn't support creating sRGB image views on an ABGR8 VkImage
with storage usage bits. VK_KHR_maintenance2 addressed this allowing to
reduce the usage bits on a VkImageView.
To allow image store on non-sRGB image views when the VkImage is created
with sRGB, always create VkImages without sRGB and add the sRGB format
on the view.
The VertexA stage is not yet implemented, but Vulkan is adding its
descriptors, causing a discrepancy in the pushed descriptors and the
template. This generally ends up in a driver side crash.
Bypass the VertexA stage for now.
Silence the new validation layer error about SPIR-V not allowing OpUndef
on a OpTypeVoid, even when the SPIR-V spec doesn't say anything against
it.
They will be inserted as an undefined int to avoid SPIRV-Cross and
validation errors, but only when a debugging tool is attached.
Allow users of the allocator to hint memory usage for downloads. This
removes the non-descriptive boolean passed for "host visible" or not
host visible memory commits, and uses an enum to hint device local,
upload and download usages.
Fix a bug where the memory allocator could leave gaps between commits.
To fix this the allocation algorithm was reworked, although it's still
short in number of lines of code.
Rework the allocation API to self-contained movable objects instead of
naively using an unique_ptr to do the job for us. Remove the VK prefix.
With timeline semaphores we can avoid creating objects. Instead of
creating an event, grab the current tick from the scheduler and flush
the current command buffer. When the fence has to be queried/waited, we
can do so against the master semaphore instead of spinning on an event.
If Vulkan supported NVN like events or fences, we could signal from the
command buffer and wait for that without splitting things in two
separate command buffers.
Intel and AMD proprietary drivers are incapable of rendering to texture
views of different formats than the original texture. Avoid creating
these at a cache level. This will consume more memory, emulating them
with copies.
The "VK" prefix predates the "Vulkan" namespace. It was carried around
the codebase for consistency. "VKDevice" currently is a bad alias with
"VkDevice" (only an upcase character of difference) that can cause
confusion. Rename all instances of it.
For listing the available physical devices we can use Vulkan 1.0.
Now that MoltenVK supports 1.1 we can require it for running games.
Add missing documentation.
VKDevice::IsSuitable was not being called. To address this issue, check
suitability before initialization and throw an exception if it fails.
By doing this, we can deduplicate some code on queue searches.
Previosuly we would first search if a present and graphics queue
existed, then on initialization we would search again to find the index.
Report device enumeration errors with exceptions to be consistent with
other initialization related function calls. Reduces the amount of code
to maintain.
Move surface initialization code to a separate file. It's unlikely to
use this code outside of Vulkan, but keeping platform-specific code
(Win32, Xlib, Wayland) in its own translation unit keeps things cleaner.
Move more Vulkan code to report errors with exceptions and report them
through a log before notifying it with an error boolean for backwards
compatibility. In the future we can replace the rasterizer two-step
initialization to always use exceptions.
Initialize debug callbacks (messenger) from a separate file. This allows
sharing code with different backends.
Change our Vulkan error handling to use exceptions instead of error
codes, simplifying the initialization process.
The current texture cache has several points that hurt maintainability
and performance. It's easy to break unrelated parts of the cache
when doing minor changes. The cache can easily forget valuable
information about the cached textures by CPU writes or simply by its
normal usage.The current texture cache has several points that hurt
maintainability and performance. It's easy to break unrelated parts
of the cache when doing minor changes. The cache can easily forget
valuable information about the cached textures by CPU writes or simply
by its normal usage.
This commit aims to address those issues.
Without using VK_EXT_robustness2, we can't consider the 'enabled' (not
null) vertex buffers as dynamic state, as this leads to invalid Vulkan
state. Move this to static state that is always hashed and compared in
the pipeline key.
The bits for enabled vertex buffers are moved into the attribute state
bitfield. This is not 'correct' as it's not an attribute state, but that
struct has bits to spare, and it's used in an array of 32 elements (the
exact same number of vertex buffer bindings).
fmt now automatically prints the numeric value of an enum class member
by default, so we don't need to use casts any more.
Reduces the line noise a bit.
The previous definition was:
#define NUM(field_name) (sizeof(Maxwell3D::Regs::field_name) / sizeof(u32))
In cases where `field_name` happens to refer to an array, Clang thinks
`sizeof(an array value) / sizeof(a type)` is an instance of the idiom
where `sizeof` is used to compute an array length. So it thinks the
type in the denominator ought to be the array element type, and warns if
it isn't, assuming this is a mistake.
In reality, `NUM` is not used to get array lengths at all, so there is no
mistake. Silence the warning by applying Clang's suggested workaround
of parenthesizing the denominator.
Migrates the video core code closer to enabling variable shadowing
warnings as errors.
This primarily sorts out shadowing occurrences within the Vulkan code.
Force early fragment tests when the 3D method is enabled.
The established pipeline cache takes care of recompiling if needed.
This is implemented only on Vulkan to avoid invalidating the shader
cache on OpenGL.
EmuWindow::PollEvents was called from the GPU thread (or the CPU thread
in sync-GPU mode) when swapping buffers. It had three implementations:
- In GRenderWindow, it didn't actually poll events, just set a flag and
emit a signal to indicate that a frame was displayed.
- In EmuWindow_SDL2_Hide, it did nothing.
- In EmuWindow_SDL2, it did call SDL_PollEvents, but this is wrong
because SDL_PollEvents is supposed to be called on the thread that set
up video - in this case, the main thread, which was sleeping in a
busyloop (regardless of whether sync-GPU was enabled). On macOS this
causes a crash.
To fix this:
- Rename EmuWindow::PollEvents to OnFrameDisplayed, and give it a
default implementation that does nothing.
- In EmuWindow_SDL2, do not override OnFrameDisplayed, but instead have
the main thread call SDL_WaitEvent in a loop.
Vulkan has requirements for primitive topologies that don't play nicely
with yuzu's. Since it's only 4 bits, we can move it to fixed state
without changing the size of the pipeline key.
- Fixes a regression on recent Nvidia drivers on Fire Emblem: Three
Houses.
RDNA devices seem to crash when using VK_EXT_extended_dynamic_state in
the latest 20.9.2 proprietary Windows drivers. As a workaround, for now
we block device names corresponding to current RDNA released products.
The old code had a sort function that was invalid and it didn't work as
expected when the base vector had a different order (e.g. renderdoc was
attached).
This sorts devices as expected and fixes a debug assert on MSVC.
The previous fix only partially solved the issue, as only certain GPUs that needed 9 or less MiB subtracted would work (i.e. GTX 980 Ti, GT 730). This takes from DXVK's example to divide `heap_size` by 2 to determine `allocable_size`. Additionally tested on my Quadro K4200, which previously required setting it to 12 to boot.
This is a hack to destroy all HostCounter instances before the base
class destructor is called. The query cache should be redesigned to have
a proper ownership model instead of using shared pointers.
For now, destroy the host counter hierarchy from the derived class
destructor.
This reworks how host<->device synchronization works on the Vulkan
backend. Instead of "protecting" resources with a fence and signalling
these as free when the fence is known to be signalled by the host GPU,
use timeline semaphores.
Vulkan timeline semaphores allow use to work on a subset of D3D12
fences. As far as we are concerned, timeline semaphores are a value set
by the host or the device that can be waited by either of them.
Taking advantange of this, we can have a monolithically increasing
atomic value for each submission to the graphics queue. Instead of
protecting resources with a fence, we simply store the current logical
tick (the atomic value stored in CPU memory). When we want to know if a
resource is free, it can be compared to the current GPU tick.
This greatly simplifies resource management code and the free status of
resources should have less false negatives.
To workaround bugs in validation layers, when these are attached there's
a thread waiting for timeline semaphores.
Now that the GPU is initialized when video backends are initialized,
it's no longer needed to query components once the game is running: it
can be done when yuzu is booting.
This allows us to pass components between constructors and in the
process remove all Core::System references in the video backend.
'driver_id' can only be known on Vulkan 1.1 after creating a logical
device. Move the driver id check to disable
VK_EXT_extended_dynamic_state after the logical device is successfully
initialized.
The Vulkan device will have the extension enabled but it will not be
used.
Vertex binding's <stride> is bugged on AMD's proprietary drivers when
using VK_EXT_extended_dynamic_state. Blacklist it for now while we
investigate how to report this issue to AMD.
State track the current primitive topology with a regular comparison
instead of using dirty flags.
This fixes a bug in dirty flags for this particular state and it also
avoids unnecessary state changes as this property is stored in a
frequently changed bit field.
Migrates a remaining common file over to the Common namespace, making it
consistent with the rest of common files.
This also allows for high-traffic FS related code to alias the
filesystem function namespace as
namespace FS = Common::FS;
for more concise typing.