This repository contains Python code that utilizes OpenAI's GPT-3 and Whisper models to perform Speech-to-Text (STT) and Text-to-Speech (TTS) conversions. The code records audio, transcribes it using OpenAI's Whisper STT model, generates a response using GPT-3, and converts the response text into speech using OpenAI's TTS model. This README provides an overview of the code and its usage.
Before using this code, ensure you have the following prerequisites installed:
- Python 3.x
- OpenAI Python SDK (
openai
) - PyAudio (
pyaudio
) - Wave (
wave
) - PyDub (
pydub
) - Dotenv (
dotenv
) - Pygame (
pygame
)
You should also have an OpenAI API key, which you can obtain by signing up for an account on the OpenAI platform.
- Clone this repository to your local machine.
- Install the required Python packages listed above using
pip install -r requirements.txt
. - Create a
.env
file in the project directory with your OpenAI API key:
PROJECT_API_KEY=your_api_key_here
- The code records audio for a specified duration (5 seconds by default) using the PyAudio library.
- The recorded audio is then saved as an MP3 file named
input.mp3
. - The
openai
library is used to transcribe the audio using the Whisper STT model. - The transcribed text is stored in the
transcript
variable.
- The transcribed text is used as a prompt for the GPT-3 model to generate a response.
- The code sends a system message and user message to the GPT-3 model.
- The response generated by GPT-3 is extracted and printed to the console.
- The generated response from GPT-3 is passed to the OpenAI TTS model for conversion.
- The TTS model generates an MP3 file named
blah.mp3
containing the synthesized speech.
- The Pygame library is used to play the synthesized speech.
- The code loads the
blah.mp3
file and plays it through the speakers.
- The generated audio files (
input.mp3
andblah.mp3
) are created in the project directory and can be used as needed.
- Make sure your microphone is properly configured and connected to your computer to record audio.
- You can customize the recording parameters such as duration and audio format in the code.
This code is provided under the MIT License for personal and open-source use. Please refer to the license file for more details.
This code uses the OpenAI GPT-3 and Whisper models. Make sure to review OpenAI's usage policies and pricing details on their website.
Feel free to modify and extend this code as needed and provide proper attribution to OpenAI when using their models. Enjoy experimenting with Speech-to-Text and Text-to-Speech conversion!