-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate vertex buffers for increased CPU performance in RenderingDevice #11620
Comments
For clarity, we are actually doing the middle allocation strategy (labelled "the Bad") AFAIK, our memory pool comes from VMA, so that is already handled nicely. What we should move towards is having one large buffer for each buffer type and then using offset commands instead of rebinding the buffer on each draw (you will need to reorder draw commands to benefit from this). Given the flexible nature of Godot, I'm not sure how much value there will actually be in doing that widely across the renderer. Doing it for shadow meshes is a no brainer since so many share the same format (and thus can share a buffer). But we support a number of mesh formats for regular rendering, so the overhead may not be worth the savings there |
Can this also be done with Metal? cc @stuartcarnie |
@Calinou yes, Metal can bind buffers at specific offsets. We also have the option of using Metal heaps to allocate from a pre-allocated memory buffer. Metal heaps can be aliased between render passes, so the memory can be reused if you are sure it is no longer needed from a prior pass. I don't think we can do that in Godot right now, but we can certainly implement the basics of this proposal. |
To be clear, we shouldn't need to do anything at the driver level for this PR. The whole proposal can be implemented using only the existing RenderingDevice API. |
Ah, thanks for the clarity! I got as far into my investigation as seeing that the drivers are doing an alloc in the And agreed, I think I can make most of this happen |
Describe the project you are working on
I am working improving rendering CPU performance by sharing vertex buffers for shadow meshes and all meshes generally. These are two items on the 4.x rendering roadmap.
This will not impact the gles3 driver.
Describe the problem or limitation you are having in your project
Our current implementation causes Godot to spend a lot of time in the graphics driver working with vertex buffers. For each mesh (and shadow mesh), Godot allocates memory on the GPU, creates a buffer with it, and copies the data in. During rendering Godot then calls bind buffer for each mesh, resulting in a lot of CPU time in the driver.
Our setup goes against Nvidia's recommendation for managing memory with Vulkan. We are currently using the right memory layout in their diagram:
Describe the feature / enhancement and how it helps to overcome the problem or limitation
My proposal is to create a buffer per category of mesh and bind to that.
Ideally we move to the left option in the Nvidia diagram above. At the driver level that would require building a memory manager for buffers and understanding the implication of sharing allocation flags across buffers. At the buffer level it would require building a memory manager for usages of the buffer and understanding the implications on things like buffer usage flags. Even the middle option would require changes to the driver code to support.
Fortunately we can reduce the impact of issue with excessive buffer bind calls by pooling categories of meshes into a single buffer. We'll still have multiple memory allocations and a buffer each, but we will have drastically less of them thus reducing the number of bind buffer calls during rendering.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
The diagram below demonstrates my proposal. It is between the middle and right solutions in Nvidia's diagrams.
You might ask, why not go all of the way? I think that this will give us measurable performance improvements while not getting in the way of the continuing to move further left. With this done we can achieve performance improvements while evaluating the impact of moving further left on Nvidia's diagram.
At this time I do not have code or pseudo-code for this proposal, I am still in the research phase. I will update it when I have more to share there.
If this enhancement will not be used often, can it be worked around with a few lines of script?
This enhancement will be used every time a mesh is rendered.
Is there a reason why this should be core and not an add-on in the asset library?
It requires changes to RenderingServer - not exactly asset material!
The text was updated successfully, but these errors were encountered: