Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: 16bit shader conversions #1581

Draft
wants to merge 50 commits into
base: master
Choose a base branch
from
Draft

Conversation

Julusian
Copy link
Member

@Julusian Julusian commented Oct 7, 2024

This is something I started late last year, but haven't had a motivation to finish it. So I am pushing it here, in case someone wants to use it as inspiration or to copy pieces.

The work here was focussed on SDR 16bit compositing. At some point that would have evolved to HDR, but it wasn't considered yet. The intention being to get lossless SDR 10bit yuv through the system, rather than the slightly lossy flow that we have today.

The basic design was to on the producer side, to replace the point where we tell opengl to copy a buffer into a texture, with an opengl compute shader. This would allow us to do yuv->rgb conversion, and even to unpack certain common and packed formats, such as the decklink yuv10 packing. This was not implemented yet.

The hope was that doing it here (where opengl is likely already doing a copy, and rearranging the bytes) would have minimal cost on memory, and minimal cost on gpu power. I was trying to avoid doing this on the cpu, as in my experience that is typically under higher pressure (decoding video and deinterlacing). compute shaders are supported in our current minimum opengl version.
On the consumer side, the intention was to do something similar and using a compute shader to do the final copy from the composited texture into the buffer that is copied into cpu memory.
This would also mean that the composite shader could have the existing colour format handling code removed.

This consumer portion is fairly implemented, with a working (but not verified for accuracy) decklink v210 implementation. This does carry risk of doing more downloads from the gpu than before, but I think having more than a couple of consumers on a channel is uncommon, so being slightly more costly on pcie bandwidth and less so on cpu than a simd implementation is reasonable.
To support this, when constructing a consumer, it is passed a frame_converter, which it can use to convert the const_frame into whatever format it prefers. As part of this, the intention is to remove the 8bit rgba buffer off const_frame, so that it has to also be fetched through the frame_converter. Additionally, the intention is that the key_only and subregion options in the decklink consumer would make their way into this converter, so that only the subregion needs to be converted and downloaded from the gpu.

For the status of this, it is possible to play yuv ffmpeg clips, or 16bit pngs, and output then in gpu generated yuv10 out of a decklink. The decklink consumer doesn't support k+f when fed yuv10 frames, but can be done with a second port set to key-only using the sync-group added previously. (I wanted to explore using the 3D api to support k+f on the 4k extreme cards)

A lot of things are hardcoded in testing setups, as this didn't progress beyond a POC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants