Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect audio file tracks #1439

Merged
merged 7 commits into from
Jan 27, 2025
Merged

detect audio file tracks #1439

merged 7 commits into from
Jan 27, 2025

Conversation

peetucket
Copy link
Member

@peetucket peetucket commented Dec 10, 2024

Why was this change made? 🤔

Fixes #1436 Alter the existing filenames_to_stt method to reject files that do not meet specific criteria. This will automatically filter them out before they get sent to whisper (the same as if they were not a valid mimetype for example).

  • detect if an audio track exist in a file and is not mostly silent before allowing it to be sent for speech to text
  • only files with audio tracks will be sent to whisper
  • if no files are left after they are rejected for not having audio tracks, then either (1) if user kicked off the process from Argo, the rest of the speechToTextWF will be skipped with no further action taken and nothing will be sent to whisper or (2) if the process was kicked off via a pre-assembly job, the end-accession step will go into an error state indicating that speech to text is not possible for this object.

HOLD: this work is dependent on this being done first: sul-dlss/technical-metadata-service#572 so that we can use the additional metadata to check it here the same way we check for audio tracks existing

Note: once merged, if you try to start a speechToTextWF and technical metadata is not present as generated by sul-dlss/technical-metadata-service#572, then the workflow will go into an error state. This means all media objects that may be sent through speechToTextWF will need to have technical metadata regenerated as described in #1441

How was this change tested? 🤨

Specs
Integration test

@peetucket peetucket changed the title [WIP] detect audio files [WIP] detect audio file tracks Dec 10, 2024
@@ -10,9 +10,6 @@ def initialize
end

def perform_work
# TODO: Note that the `possible?` method is not complete until we further refine the mimetypes available
# see https://github.com/sul-dlss/common-accessioning/issues/1346
# and lib/dor/text_extraction/speech_to_text.rb#allowed_mimetypes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer true, remove this TODO

stt_files.map(&:filename)
available_files = stt_files.map(&:filename)
available_files.select { |filename| has_audio_track?(filename) && has_useful_audio_track?(filename) }
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what filters the files to send to the speech to text to only those with an audio track that is not mostly silent

raise "Technical-metadata-service returned #{resp.status} when requesting techmd for #{bare_druid}: #{resp.body}" unless resp.success?

JSON.parse(resp.body)
end
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the code that does that detection by examining technical metadata

note that volume detection level technical metadata is dependent on sul-dlss/technical-metadata-service#572 being merged first and technical metadata updated on any object that needs it

@peetucket peetucket force-pushed the 1436-detect-audio-files branch from 4f0a4d8 to 778e53b Compare December 13, 2024 22:28
@peetucket peetucket changed the title [WIP] detect audio file tracks [HOLD] detect audio file tracks Dec 13, 2024
# rubocop:disable Metrics/PerceivedComplexity
def has_useful_audio_track?(filename)
audio_metadata = file_level_tech_metadata(filename)&.dig('dro_file_parts')&.find { |parts| parts['part_type'] == 'audio' }&.dig('audio_metadata')
return false unless audio_metadata && audio_metadata['max_volume'] && audio_metadata['mean_volume']
Copy link
Member Author

@peetucket peetucket Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that this "return false" means that if the audio technical metadata with max/mean volume is NOT present, it will not send the file to speech to text. this means any existing media objects won't work unless they have technical metadata re-generated.

we could be more inclusive here and instead return true instead of false in that situation (essentially skipping this check if we can't do it and reverting to the current behavior of no check at all)

@peetucket peetucket force-pushed the 1436-detect-audio-files branch from 987ce0a to 27db579 Compare December 16, 2024 23:01
@peetucket peetucket changed the title [HOLD] detect audio file tracks detect audio file tracks Jan 27, 2025
@peetucket peetucket merged commit 7579717 into main Jan 27, 2025
5 checks passed
@peetucket peetucket deleted the 1436-detect-audio-files branch January 27, 2025 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants