-
-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I wanted to use this for my lectures, but i can only transcribe 1 minute out of the full hour #3
Comments
No, that's not normal. Audio splitting is there to allow large files, splitting requests to the API. But no audio-splitting is done for open-source model (it should not be required, the whisper code already processes the audio in chunks). Are you using the API or open-source model (Colab or local CLI)? Also, is there any error displayed while you're transcribing? |
I've updated the whisper and openai dependencies to the latest releases, check if now works for you. I see no changes whatsoever, it's working fine for my testing files. |
Hi there, I had success using the API. Using the google colab document does
work but it limits the transcript length. I made a script to cut my 60
minute lectures into 20 minute parts and these are small enough to be
transcribed. I can then add these together and finally use GPT to trim a
lot of the unnecessary content in the transcription using the following
prompt
'
I have transcribed my audio lectures. You will simulate a preprocessor has
the task of reducing the word count and removing information that is
off-topic. You are a simple text editor, do not converse with the user, do
not thank the user, or mention an explanation, the user is not delivering
the lecture, they are processing a file. First the user will enter their
transcription input and you must store this text for your next response.
Your first response must only be one word afterward, you will create the
output of the preprocessor. Can you start the machine?
'
Thanks for making this, I plan to fork the code, remove API (as it may be
against TOS to upload the audio to a 3rd party), and bundle in my pre/post
processing scripts so that other students can use this audio -> conscice
text pipeline.
…On Wed, May 17, 2023 at 12:41 AM Carlos Lázaro Costa < ***@***.***> wrote:
I've updated the whisper and openai dependencies to the latest releases,
check if now works for you. I see no changes whatsoever, it's working fine
for my testing files.
—
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVZUPYBXCWCRBJSYGVPW74DXGNYYPANCNFSM6AAAAAAX4WCZWY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Oh, I tested it with files up to 30 minutes with the open-source model, as they take a while to process. I know that with the API longer files can be processed because of the audio splitting I implemented, I'm glad that worked for you, but that's not currently implemented for the open-source model. I remember whisper already processes the audio in chunks in the open-source model, but maybe I should implement the audio splitting also for the open-source model in the AudioToText code if it's causing problems with the open-source model and large files. The audio-splitting also adds the transcriptions together in the resulting txt/srt/vtt files as you want. The usage of GPT to concise the text is out of the scope of AudioToText, but transcribing 1h+ audio files should be ok with open-source model (although it takes a while with the Tesla T4 GPU Google Colab offers to free users). Usage of the OpenAI whisper API is up to the user, it is not mandatory in the AudioToText Google Colab (just fill or empty the api_key field). |
Is this normal? I looked at the code and there is audio splitting (great work btw) that looks like it can handle really large file lengths
The text was updated successfully, but these errors were encountered: