Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate vertex buffers for increased CPU performance in RenderingDevice #11620

Open
tom-schultz opened this issue Jan 21, 2025 · 5 comments

Comments

@tom-schultz
Copy link

tom-schultz commented Jan 21, 2025

Describe the project you are working on

I am working improving rendering CPU performance by sharing vertex buffers for shadow meshes and all meshes generally. These are two items on the 4.x rendering roadmap.

This will not impact the gles3 driver.

Describe the problem or limitation you are having in your project

Our current implementation causes Godot to spend a lot of time in the graphics driver working with vertex buffers. For each mesh (and shadow mesh), Godot allocates memory on the GPU, creates a buffer with it, and copies the data in. During rendering Godot then calls bind buffer for each mesh, resulting in a lot of CPU time in the driver.

Our setup goes against Nvidia's recommendation for managing memory with Vulkan. We are currently using the right memory layout in their diagram:

Image

Describe the feature / enhancement and how it helps to overcome the problem or limitation

My proposal is to create a buffer per category of mesh and bind to that.

Ideally we move to the left option in the Nvidia diagram above. At the driver level that would require building a memory manager for buffers and understanding the implication of sharing allocation flags across buffers. At the buffer level it would require building a memory manager for usages of the buffer and understanding the implications on things like buffer usage flags. Even the middle option would require changes to the driver code to support.

Fortunately we can reduce the impact of issue with excessive buffer bind calls by pooling categories of meshes into a single buffer. We'll still have multiple memory allocations and a buffer each, but we will have drastically less of them thus reducing the number of bind buffer calls during rendering.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

The diagram below demonstrates my proposal. It is between the middle and right solutions in Nvidia's diagrams.

Image

You might ask, why not go all of the way? I think that this will give us measurable performance improvements while not getting in the way of the continuing to move further left. With this done we can achieve performance improvements while evaluating the impact of moving further left on Nvidia's diagram.

At this time I do not have code or pseudo-code for this proposal, I am still in the research phase. I will update it when I have more to share there.

If this enhancement will not be used often, can it be worked around with a few lines of script?

This enhancement will be used every time a mesh is rendered.

Is there a reason why this should be core and not an add-on in the asset library?

It requires changes to RenderingServer - not exactly asset material!

@clayjohn
Copy link
Member

For clarity, we are actually doing the middle allocation strategy (labelled "the Bad") AFAIK, our memory pool comes from VMA, so that is already handled nicely.

What we should move towards is having one large buffer for each buffer type and then using offset commands instead of rebinding the buffer on each draw (you will need to reorder draw commands to benefit from this).

Given the flexible nature of Godot, I'm not sure how much value there will actually be in doing that widely across the renderer. Doing it for shadow meshes is a no brainer since so many share the same format (and thus can share a buffer). But we support a number of mesh formats for regular rendering, so the overhead may not be worth the savings there

@Calinou
Copy link
Member

Calinou commented Jan 21, 2025

Can this also be done with Metal? cc @stuartcarnie

@stuartcarnie
Copy link

@Calinou yes, Metal can bind buffers at specific offsets. We also have the option of using Metal heaps to allocate from a pre-allocated memory buffer. Metal heaps can be aliased between render passes, so the memory can be reused if you are sure it is no longer needed from a prior pass. I don't think we can do that in Godot right now, but we can certainly implement the basics of this proposal.

@clayjohn
Copy link
Member

To be clear, we shouldn't need to do anything at the driver level for this PR. The whole proposal can be implemented using only the existing RenderingDevice API.

@Calinou Calinou changed the title Consolidate vertex buffers for increased CPU performance in Vulkan/D3D12 Consolidate vertex buffers for increased CPU performance in RenderingDevice Jan 21, 2025
@tom-schultz
Copy link
Author

Ah, thanks for the clarity! I got as far into my investigation as seeing that the drivers are doing an alloc in the buffer_create call but didn't look at what that alloc was doing. Thanks, I will update the proposal.

And agreed, I think I can make most of this happen RenderingDevice. I'm tempted to share my initial thoughts on a design, but I want to understand the whole picture better first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants