-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
detect audio file tracks #1439
detect audio file tracks #1439
Conversation
@@ -10,9 +10,6 @@ def initialize | |||
end | |||
|
|||
def perform_work | |||
# TODO: Note that the `possible?` method is not complete until we further refine the mimetypes available | |||
# see https://github.com/sul-dlss/common-accessioning/issues/1346 | |||
# and lib/dor/text_extraction/speech_to_text.rb#allowed_mimetypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer true, remove this TODO
stt_files.map(&:filename) | ||
available_files = stt_files.map(&:filename) | ||
available_files.select { |filename| has_audio_track?(filename) && has_useful_audio_track?(filename) } | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what filters the files to send to the speech to text to only those with an audio track that is not mostly silent
raise "Technical-metadata-service returned #{resp.status} when requesting techmd for #{bare_druid}: #{resp.body}" unless resp.success? | ||
|
||
JSON.parse(resp.body) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the code that does that detection by examining technical metadata
note that volume detection level technical metadata is dependent on sul-dlss/technical-metadata-service#572 being merged first and technical metadata updated on any object that needs it
4f0a4d8
to
778e53b
Compare
# rubocop:disable Metrics/PerceivedComplexity | ||
def has_useful_audio_track?(filename) | ||
audio_metadata = file_level_tech_metadata(filename)&.dig('dro_file_parts')&.find { |parts| parts['part_type'] == 'audio' }&.dig('audio_metadata') | ||
return false unless audio_metadata && audio_metadata['max_volume'] && audio_metadata['mean_volume'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that this "return false" means that if the audio technical metadata with max/mean volume is NOT present, it will not send the file to speech to text. this means any existing media objects won't work unless they have technical metadata re-generated.
we could be more inclusive here and instead return true instead of false in that situation (essentially skipping this check if we can't do it and reverting to the current behavior of no check at all)
987ce0a
to
27db579
Compare
Why was this change made? 🤔
Fixes #1436 Alter the existing
filenames_to_stt
method to reject files that do not meet specific criteria. This will automatically filter them out before they get sent to whisper (the same as if they were not a valid mimetype for example).HOLD: this work is dependent on this being done first: sul-dlss/technical-metadata-service#572 so that we can use the additional metadata to check it here the same way we check for audio tracks existing
Note: once merged, if you try to start a speechToTextWF and technical metadata is not present as generated by sul-dlss/technical-metadata-service#572, then the workflow will go into an error state. This means all media objects that may be sent through speechToTextWF will need to have technical metadata regenerated as described in #1441
How was this change tested? 🤨
Specs
Integration test