Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamlit-webrct doesn't work with faster-whisper model #252

Open
aqiao opened this issue Jul 21, 2024 · 0 comments
Open

streamlit-webrct doesn't work with faster-whisper model #252

aqiao opened this issue Jul 21, 2024 · 0 comments

Comments

@aqiao
Copy link

aqiao commented Jul 21, 2024

Hi, i'm working on voice translator app, and trying streamlit-webrtc with faster-whisper model
I noticed this link :https://github.com/whitphx/streamlit-webrtc/blob/main/pages/10_sendonly_audio.py
and found there is a while loop to handle audio data like below:

while True:
    if webrtc_ctx.audio_receiver:
        try:
            audio_frames = webrtc_ctx.audio_receiver.get_frames(timeout=1)
        except queue.Empty:
            logger.warning("Queue is empty. Abort.")
            break

        sound_chunk = pydub.AudioSegment.empty()
        for audio_frame in audio_frames:
            sound = pydub.AudioSegment(
                data=audio_frame.to_ndarray().tobytes(),
                sample_width=audio_frame.format.bytes,
                frame_rate=audio_frame.sample_rate,
                channels=len(audio_frame.layout.channels),
...


Doesn't it block UI thread since the while loop is in UI thread in my point view, and once i use below code, the page is pending.
Below is my code, please help to figure out how to fix it

webrtc_ctx = webrtc_streamer(
    key="speech-to-text",
    mode=WebRtcMode.SENDONLY,
    audio_receiver_size=10240,
    rtc_configuration={"iceServers": [{"urls": ["stun:stun.l.google.com:19302"]}]},
    media_stream_constraints={"video": False, "audio": True},
)

status_indicator = st.empty()
text_output = st.empty()
stream = None

while True:
    if webrtc_ctx.audio_receiver:
        sound_chunk = pydub.AudioSegment.empty()
        try:
            audio_frames = webrtc_ctx.audio_receiver.get_frames(timeout=1)
        except queue.Empty:
            time.sleep(0.1)
            status_indicator.write("No frame arrived.")

        status_indicator.write("Running. Say something!")

        for audio_frame in audio_frames:
            sound = pydub.AudioSegment(
                data=audio_frame.to_ndarray().tobytes(),
                sample_width=audio_frame.format.bytes,
                frame_rate=audio_frame.sample_rate,
                channels=len(audio_frame.layout.channels),
            )
            sound_chunk += sound
        print(sound_chunk)
        if len(sound_chunk) > 0:
            sound_chunk = sound_chunk.set_channels(1).set_frame_rate(44100)
            buffer = np.array(sound_chunk.get_array_of_samples())

            segments, _ = WhisperModel(model_size
                                       , device=f"{'cuda' if supports_gpu else 'cpu'}"
                                       , compute_type=compute_type).transcribe(buffer)
            transcript = " ".join(segment.text for segment in segments)
            print(transcript)
            text = " ".join(segment.text for segment in segments)
            text_output.markdown(f"**Text:** {text}")
    else:
        status_indicator.write("AudioReciver is not set. Abort.")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant