Consolidate vertex buffers for increased CPU performance in RenderingDevice #11620

tom-schultz · 2025-01-21T19:49:21Z

Describe the project you are working on

I am working improving rendering CPU performance by sharing vertex buffers for shadow meshes and all meshes generally. These are two items on the 4.x rendering roadmap.

This will not impact the gles3 driver.

Describe the problem or limitation you are having in your project

Our current implementation causes Godot to spend a lot of time in the graphics driver working with vertex buffers. For each mesh (and shadow mesh), Godot allocates memory on the GPU, creates a buffer with it, and copies the data in. During rendering Godot then calls bind buffer for each mesh, resulting in a lot of CPU time in the driver.

Our setup goes against Nvidia's recommendation for managing memory with Vulkan. We are currently using the right memory layout in their diagram:

Describe the feature / enhancement and how it helps to overcome the problem or limitation

My proposal is to create a buffer per category of mesh and bind to that.

Ideally we move to the left option in the Nvidia diagram above. At the driver level that would require building a memory manager for buffers and understanding the implication of sharing allocation flags across buffers. At the buffer level it would require building a memory manager for usages of the buffer and understanding the implications on things like buffer usage flags. Even the middle option would require changes to the driver code to support.

Fortunately we can reduce the impact of issue with excessive buffer bind calls by pooling categories of meshes into a single buffer. We'll still have multiple memory allocations and a buffer each, but we will have drastically less of them thus reducing the number of bind buffer calls during rendering.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

The diagram below demonstrates my proposal. It is between the middle and right solutions in Nvidia's diagrams.

You might ask, why not go all of the way? I think that this will give us measurable performance improvements while not getting in the way of the continuing to move further left. With this done we can achieve performance improvements while evaluating the impact of moving further left on Nvidia's diagram.

At this time I do not have code or pseudo-code for this proposal, I am still in the research phase. I will update it when I have more to share there.

If this enhancement will not be used often, can it be worked around with a few lines of script?

This enhancement will be used every time a mesh is rendered.

Is there a reason why this should be core and not an add-on in the asset library?

It requires changes to RenderingServer - not exactly asset material!

clayjohn · 2025-01-21T19:56:13Z

For clarity, we are actually doing the middle allocation strategy (labelled "the Bad") AFAIK, our memory pool comes from VMA, so that is already handled nicely.

What we should move towards is having one large buffer for each buffer type and then using offset commands instead of rebinding the buffer on each draw (you will need to reorder draw commands to benefit from this).

Given the flexible nature of Godot, I'm not sure how much value there will actually be in doing that widely across the renderer. Doing it for shadow meshes is a no brainer since so many share the same format (and thus can share a buffer). But we support a number of mesh formats for regular rendering, so the overhead may not be worth the savings there

Calinou · 2025-01-21T20:47:15Z

Can this also be done with Metal? cc @stuartcarnie

stuartcarnie · 2025-01-21T20:56:49Z

@Calinou yes, Metal can bind buffers at specific offsets. We also have the option of using Metal heaps to allocate from a pre-allocated memory buffer. Metal heaps can be aliased between render passes, so the memory can be reused if you are sure it is no longer needed from a prior pass. I don't think we can do that in Godot right now, but we can certainly implement the basics of this proposal.

clayjohn · 2025-01-21T20:58:57Z

To be clear, we shouldn't need to do anything at the driver level for this PR. The whole proposal can be implemented using only the existing RenderingDevice API.

tom-schultz · 2025-01-21T22:18:38Z

Ah, thanks for the clarity! I got as far into my investigation as seeing that the drivers are doing an alloc in the buffer_create call but didn't look at what that alloc was doing. Thanks, I will update the proposal.

And agreed, I think I can make most of this happen RenderingDevice. I'm tempted to share my initial thoughts on a design, but I want to understand the whole picture better first.

Calinou added topic:rendering performance labels Jan 21, 2025

Calinou changed the title ~~Consolidate vertex buffers for increased CPU performance in Vulkan/D3D12~~ Consolidate vertex buffers for increased CPU performance in RenderingDevice Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidate vertex buffers for increased CPU performance in RenderingDevice #11620

Consolidate vertex buffers for increased CPU performance in RenderingDevice #11620

tom-schultz commented Jan 21, 2025 •

edited

Loading

clayjohn commented Jan 21, 2025

Calinou commented Jan 21, 2025

stuartcarnie commented Jan 21, 2025

clayjohn commented Jan 21, 2025

tom-schultz commented Jan 21, 2025

Consolidate vertex buffers for increased CPU performance in RenderingDevice #11620

Consolidate vertex buffers for increased CPU performance in RenderingDevice #11620

Comments

tom-schultz commented Jan 21, 2025 • edited Loading

Describe the project you are working on

Describe the problem or limitation you are having in your project

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

If this enhancement will not be used often, can it be worked around with a few lines of script?

Is there a reason why this should be core and not an add-on in the asset library?

clayjohn commented Jan 21, 2025

Calinou commented Jan 21, 2025

stuartcarnie commented Jan 21, 2025

clayjohn commented Jan 21, 2025

tom-schultz commented Jan 21, 2025

tom-schultz commented Jan 21, 2025 •

edited

Loading