detect audio file tracks #1439

peetucket · 2024-12-10T17:59:03Z

Why was this change made? 🤔

Fixes #1436 Alter the existing filenames_to_stt method to reject files that do not meet specific criteria. This will automatically filter them out before they get sent to whisper (the same as if they were not a valid mimetype for example).

detect if an audio track exist in a file and is not mostly silent before allowing it to be sent for speech to text
only files with audio tracks will be sent to whisper
if no files are left after they are rejected for not having audio tracks, then either (1) if user kicked off the process from Argo, the rest of the speechToTextWF will be skipped with no further action taken and nothing will be sent to whisper or (2) if the process was kicked off via a pre-assembly job, the end-accession step will go into an error state indicating that speech to text is not possible for this object.

HOLD: this work is dependent on this being done first: sul-dlss/technical-metadata-service#572 so that we can use the additional metadata to check it here the same way we check for audio tracks existing

Note: once merged, if you try to start a speechToTextWF and technical metadata is not present as generated by sul-dlss/technical-metadata-service#572, then the workflow will go into an error state. This means all media objects that may be sent through speechToTextWF will need to have technical metadata regenerated as described in #1441

How was this change tested? 🤨

Specs
Integration test

peetucket · 2024-12-10T18:15:33Z

lib/robots/dor_repo/speech_to_text/start_stt.rb

@@ -10,9 +10,6 @@ def initialize
        end

        def perform_work
-          # TODO: Note that the `possible?` method is not complete until we further refine the mimetypes available
-          # see https://github.com/sul-dlss/common-accessioning/issues/1346
-          # and lib/dor/text_extraction/speech_to_text.rb#allowed_mimetypes


no longer true, remove this TODO

peetucket · 2024-12-13T22:16:16Z

lib/dor/text_extraction/speech_to_text.rb

-        stt_files.map(&:filename)
+        available_files = stt_files.map(&:filename)
+        available_files.select { |filename| has_audio_track?(filename) && has_useful_audio_track?(filename) }
+      end


this is what filters the files to send to the speech to text to only those with an audio track that is not mostly silent

peetucket · 2024-12-13T22:17:14Z

lib/dor/text_extraction/speech_to_text.rb

+          raise "Technical-metadata-service returned #{resp.status} when requesting techmd for #{bare_druid}: #{resp.body}" unless resp.success?
+
+          JSON.parse(resp.body)
+        end


this is the code that does that detection by examining technical metadata

note that volume detection level technical metadata is dependent on sul-dlss/technical-metadata-service#572 being merged first and technical metadata updated on any object that needs it

peetucket · 2024-12-13T22:53:35Z

lib/dor/text_extraction/speech_to_text.rb

+      # rubocop:disable Metrics/PerceivedComplexity
+      def has_useful_audio_track?(filename)
+        audio_metadata = file_level_tech_metadata(filename)&.dig('dro_file_parts')&.find { |parts| parts['part_type'] == 'audio' }&.dig('audio_metadata')
+        return false unless audio_metadata && audio_metadata['max_volume'] && audio_metadata['mean_volume']


note that this "return false" means that if the audio technical metadata with max/mean volume is NOT present, it will not send the file to speech to text. this means any existing media objects won't work unless they have technical metadata re-generated.

we could be more inclusive here and instead return true instead of false in that situation (essentially skipping this check if we can't do it and reverting to the current behavior of no check at all)

peetucket added 3 commits December 9, 2024 15:29

add methods for file detection

f0ff149

call out to technical metadata service to see if file has an audio track

b807d34

test for has_audio_track?

0942540

peetucket changed the title ~~[WIP] detect audio files~~ [WIP] detect audio file tracks Dec 10, 2024

peetucket commented Dec 10, 2024

View reviewed changes

peetucket mentioned this pull request Dec 11, 2024

add volume detection to media tech metadata sul-dlss/technical-metadata-service#572

Merged

update code comment

109d394

peetucket commented Dec 13, 2024

View reviewed changes

use tech metadata to check volume levels

778e53b

peetucket force-pushed the 1436-detect-audio-files branch from 4f0a4d8 to 778e53b Compare December 13, 2024 22:28

peetucket changed the title ~~[WIP] detect audio file tracks~~ [HOLD] detect audio file tracks Dec 13, 2024

peetucket commented Dec 13, 2024

View reviewed changes

peetucket mentioned this pull request Dec 16, 2024

Re-run technical metadata for all media items #1441

Closed

7 tasks

peetucket added 2 commits December 16, 2024 14:52

raise an exception for missing expected audio technical metadata

e0c11da

add extra test

27db579

peetucket force-pushed the 1436-detect-audio-files branch from 987ce0a to 27db579 Compare December 16, 2024 23:01

edsu approved these changes Dec 20, 2024

View reviewed changes

peetucket changed the title ~~[HOLD] detect audio file tracks~~ detect audio file tracks Jan 27, 2025

peetucket merged commit 7579717 into main Jan 27, 2025
5 checks passed

peetucket deleted the 1436-detect-audio-files branch January 27, 2025 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

detect audio file tracks #1439

detect audio file tracks #1439

peetucket commented Dec 10, 2024 •

edited

Loading

peetucket Dec 10, 2024

peetucket Dec 13, 2024

peetucket Dec 13, 2024

peetucket Dec 13, 2024 •

edited

Loading

detect audio file tracks #1439

detect audio file tracks #1439

Conversation

peetucket commented Dec 10, 2024 • edited Loading

Why was this change made? 🤔

How was this change tested? 🤨

peetucket Dec 10, 2024

Choose a reason for hiding this comment

peetucket Dec 13, 2024

Choose a reason for hiding this comment

peetucket Dec 13, 2024

Choose a reason for hiding this comment

peetucket Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

peetucket commented Dec 10, 2024 •

edited

Loading

peetucket Dec 13, 2024 •

edited

Loading