Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFmpeg: Misc improvements, use MD5 muxer and add CUDA/NVDEC decoders #175

Merged
merged 15 commits into from
Jul 5, 2024

Conversation

Kwiboo
Copy link
Contributor

@Kwiboo Kwiboo commented Jul 1, 2024

This PR include misc improvements and refactorings to simplify building the FFmpeg command line.

  • Rename and refactor of utils.run_pipe_command_with_std_output()
  • Use shared @lru_cache for ffmpeg commands used in check()
  • Fix version detect of <master>.<minor> releases
  • Separate handling of -hw_output_format and -hwaccel_output_format
  • Convert output_format_to_ffformat() to hw_download_mapping property
  • Refactor to build command line in decode() using a string list
  • Validate and inform what codec to use using -codec
  • Use md5 muxer when supported to avoid always having to save output files
  • Use single-threaded decoding, fluster support running multiple jobs
  • Rename V4L2 mem2mem decoders
  • Add VDPAU VP9 decoder
  • Add CUDA/NVDEC decoders

This was tested using FFmpeg 4.2.1 and 7.0.1 along with GStreamer 1.24.5.

@ceyusa
Copy link
Contributor

ceyusa commented Jul 1, 2024

I've tested these patches wit ffmpeg/vulkan decoding and it worked ok.


# Get ffmpeg version
output = _run_ffmpeg_command(self.binary, "-version", verbose=verbose)
version = re.search(r"ffmpeg version (\d+)\.(\d+)(?:\.(\d+))?", output)
Copy link
Contributor

@ceyusa ceyusa Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this re doesn't match when ffmpeg is compiled from git (N-git-describe)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have adjusted the pattern to also match builds from release/ branches (n-prefix) and changed the version check to presume that -fps_mode can be used when no version can be determined.

@Kwiboo
Copy link
Contributor Author

Kwiboo commented Jul 2, 2024

Updated to also work with older releases, e.g. release/2.8.

  • Update version regex to also match builds from release/ branch
  • Drop use of repeat-flag in -loglevel
  • Change to use -formats instead of -muxers

@rsanchez87 rsanchez87 self-requested a review July 2, 2024 14:12
@rsanchez87 rsanchez87 self-requested a review July 5, 2024 07:02
Kwiboo added 15 commits July 5, 2024 13:31
Print the verbose output inside run_pipe_command_with_std_output instead
of from its callers.

Signed-off-by: Jonas Karlman <[email protected]>
Rename to run_command_with_output and change it to return the full
output as a string instead of a list of lines.

Signed-off-by: Jonas Karlman <[email protected]>
Add a run_ffmpeg_command helper and use it to simplify the check for
supported decoders and hwaccels.

Signed-off-by: Jonas Karlman <[email protected]>
Current version check does not work on FFmpeg releases using a
major.minor version schema, e.g. 7.0 vs 7.0.1.

Save ffmpeg_version as a tuple, e.g. (7, 0) or (7, 0, 1), and use this
to simplify the check for use of -fps_mode or -vsync.

Signed-off-by: Jonas Karlman <[email protected]>
The ffmpeg command line is built using three different functions,
simplify and create the entire command line inside the decode method.

Signed-off-by: Jonas Karlman <[email protected]>
Normally the name used for -hwaccel and -hwaccel_output_format match.
However, that may not always be the case, add hw_output_format to
improve support for future hwaccels.

Signed-off-by: Jonas Karlman <[email protected]>
Change output_format_to_ffformat to a property to improve support for
future hwaccels that need a different hwdownload format mapping.

Signed-off-by: Jonas Karlman <[email protected]>
Refactor how the ffmpeg command line is built, change to use a list and
pass it directly to run_command. Also add hide_banner and change to use
the verbose version of parameters, e.g. -codec instead of -c:v.

Signed-off-by: Jonas Karlman <[email protected]>
Add -codec parameter to avoid ffmpeg having to analyze the stream to
figure out the codec of the input file.

Signed-off-by: Jonas Karlman <[email protected]>
Use md5 muxer when supported to avoid having to write an output file of
rawvideo.

Signed-off-by: Jonas Karlman <[email protected]>
Fluster already can start multiple jobs, restrict ffmpeg to use a single
thread per job.

Signed-off-by: Jonas Karlman <[email protected]>
Change to use warning loglevel when non-verbose mode is used.

Signed-off-by: Jonas Karlman <[email protected]>
FFmpeg VDPAU hwaccel support VP9, add a decoder for it.

Signed-off-by: Jonas Karlman <[email protected]>
The V4L2 mem2mem decoders contain the codec twice in the name, change
to use a more simple name, e.g. FFmpeg-H.264-v4l2m2m instead of
FFmpeg-H.264-h264_v4l2m2m. Also add the H.265 decoder.

Signed-off-by: Jonas Karlman <[email protected]>
Add FFmpeg CUDA/NVDEC hwaccel decoders.

Signed-off-by: Jonas Karlman <[email protected]>
@mdimopoulos mdimopoulos merged commit 466c1b4 into fluendo:master Jul 5, 2024
3 checks passed
@mdimopoulos
Copy link
Contributor

Thank you all for your awesome contributions, it's been a pleasure!
Some test results on my Intel machine actually improved with these changes.

@Kwiboo
Copy link
Contributor Author

Kwiboo commented Jul 9, 2024

@mdimopoulos Thanks!

I am just guessing that the improved score may have to do with the change to use -threads 1.

Prior to the use of -threads 1 I noticed several warnings of too many surfaces being used (more than 32) due to high cpu core count on my machine/server. The reduced / explicit thread_count help limit the number of surfaces some hwaccels uses.

With CUDA/NVDEC some tests still cause such issue, and it report that 33 surfaces is being used, and that is more than the supported 32. That was 32 for the reference and reordering frames, and then it added an extra surface for each thread_count making it 33 with ´-threads 1`, something that may need to be fixed/changed in FFmpeg for such hwaccels.

@mdimopoulos
Copy link
Contributor

mdimopoulos commented Jul 15, 2024

@Kwiboo
Great, appreciate the info, i hadn't noticed the warnings myself 🙈.

Regarding

With CUDA/NVDEC some tests still cause such issue, and it report that 33 surfaces is being used, and that is more than the supported 32. That was 32 for the reference and reordering frames, and then it added an extra surface for each thread_count making it 33 with ´-threads 1`, something that may need to be fixed/changed in FFmpeg for such hwaccels.

is it an issue because of FFmpeg itself or are you referring to the definition of FFmpeg decoder within fluster?

@Kwiboo
Copy link
Contributor Author

Kwiboo commented Jul 15, 2024

is it an issue because of FFmpeg itself or are you referring to the definition of FFmpeg decoder within fluster?

I see this purely as a FFmpeg issue, I think there should not be any need to add an extra surface when a single thread is being used, but have not tried to look into that.

My main goal with this PR was to prepare for #179, adding CUDA/NVDEC was more of a bonus and it also helped me run fluster tests directly on my server equipped with a NVIDIA Quadro T400 card.

@mdimopoulos
Copy link
Contributor

Great, thanks again!
Also, good to know that tests run fine with the pro line of Nvidia GPUs ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants