For years, many users have struggled with the inconsistent performance of macOS's built-in dictation feature. This tool aims to solve that problem by providing a reliable, customizable, and powerful alternative using OpenAI's Whisper model for speech-to-text transcription.
A macOS dictation tool that uses OpenAI's Whisper model for speech-to-text transcription. This tool allows you to dictate text using your microphone and have it transcribed and pasted into the active application on your Mac.
Installation time: 5-10 mintues
- Real-time dictation using OpenAI's Whisper model.
- Customizable models: Choose from different Whisper model sizes for varying accuracy and performance.
- Keyboard shortcut activation: Start and stop dictation with a key press.
- Automatic pasting: Transcribed text is automatically pasted into the active application.
- Easy startup: Create a clickable application to start the dictation tool from your desktop.
- Run at startup: Optionally configure the tool to run automatically when you log in.
- Prerequisites
- Installation
- Usage
- Creating a Clickable Application
- Running the Tool at Startup
- Configuration
- Troubleshooting
- Contributing
- License
- Acknowledgments
- Planned Future Features
- Issues and Feature Requests
- macOS: This tool is designed to run on macOS systems.
- Python 3.10 or higher: Ensure you have Python installed.
- Homebrew: Recommended for installing dependencies.
git clone https://github.com/tristancgardner/macos-dictate.git
cd macos-dictate
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew update
brew install portaudio
brew install ffmpeg
This is required for Whisper to process audio files.
Download and install Python from the official website:
Verify the installation:
python3 --version
Ensure it shows Python 3.10 or higher.
Create a virtual environment using the built-in venv
module:
python3 -m venv venv
Activate the virtual environment:
source venv/bin/activate
pip install --upgrade pip
Install necessary Python packages, including torch
, sounddevice
, numpy
, pyobjc
, and other required libraries.
pip install torch sounddevice numpy pyobjc
Install the latest version of OpenAI Whisper directly from the GitHub repository to utilize most recent ASR updates (as opposed to using pip install openai-whisper
):
pip install git+https://github.com/openai/whisper.git
You can skip this step and enable permissions once you run the application later on.
- Go to System Preferences > Security & Privacy > Privacy > Accessibility.
- Click the lock to make changes and enter your password.
- Add your Terminal application (e.g., iTerm, Terminal) or Python interpreter to the list and ensure it's checked.
- Go to System Preferences > Security & Privacy > Privacy > Microphone.
- Add your Terminal application or Python interpreter to the list and ensure it's checked.
Ensure your virtual environment is activated:
source venv/bin/activate
Run the script with your desired model size:
python dictate.py --model base
Replace base
with the desired Whisper model size (tiny
, base
, small
, medium
, large
).
The base model is enabled by default if you don't use the --model
tag.
The base model uses about 400 to 500 megabytes of system RAM. The small model increases that by 400 to 500 mb. Use Activity Monitor
to monitor RAM & CPU usage to find a model that has no system impact while left running persistenly.
- Press the F1 key (default trigger key) to start recording (press fn + F1 if you haven't set function keys to override mac controls (the symbols on the function keys like brightness, volume, etc.)).
- You will receive a notification indicating that recording has started.
You can customize the trigger key for starting and stopping dictation. For instructions on how to change the trigger key, refer to the Change the Trigger Key section in the Configuration part of this document.
- Press the F1 key (default trigger key) again to stop recording.
- You will receive a notification indicating that recording has stopped.
- The transcribed text will be automatically pasted into the active application.
You can create an application that you can double-click to start the dictation tool without opening Terminal manually.
- Open Script Editor by searching for it in Spotlight (press
Cmd + Space
and type "Script Editor").
Paste the following script into the editor:
-- Prompt the user to choose a Whisper model size
set modelSizes to {"tiny", "base", "small", "medium", "large"}
set defaultModel to "base"
set chosenModel to choose from list modelSizes with prompt "Select Whisper model size:" default items defaultModel
if chosenModel is false then
display alert "No model selected. Exiting."
return
end if
set modelSize to item 1 of chosenModel
-- Path to your virtual environment activation script
set venvPath to "/Users/yourusername/path/to/your/venv/bin/activate"
-- Path to your Python script
set scriptPath to "/Users/yourusername/path/to/your/macos-dictate/dictate.py"
-- Build the command to run
set shellCommand to "source " & venvPath & " && python " & scriptPath & " --model " & modelSize
-- Run the command in Terminal
tell application "Terminal"
activate
do script shellCommand
end tell
- Go to File > Export.
- Set File Format to Application.
- Name it
Start Dictation
. - Choose Desktop as the location.
- Click Save.
Instructions:
- When you double click the app executable on your desktop, terminal will open, and dictation is ready to use after you see the falling line post:
"...ting
weights_only=True
for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. > checkpoint = torch.load(fp, map_location=device)" - Ignore any warnings about torch or malicious content via pickle hacks (for now).
You can configure the dictation tool to run automatically when you log in.
Note: Be cautious when setting scripts to run at startup. Ensure that the script does not require user interaction at startup, or it may hinder the login process.
-
Create a Launch Agent
Create a property list file (
.plist
) in the~/Library/LaunchAgents
directory. -
Create the
.plist
FileOpen Terminal and run:
touch ~/Library/LaunchAgents/com.yourusername.macos-dictate.plist
-
Edit the
.plist
FileOpen the file in a text editor:
open -e ~/Library/LaunchAgents/com.yourusername.macos-dictate.plist
Paste the following content:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.yourusername.macos-dictate</string> <key>ProgramArguments</key> <array> <string>/bin/bash</string> <string>-c</string> <string>source /path/to/your/venv/bin/activate && python /path/to/your/dictate.py --model base</string> </array> <key>RunAtLoad</key> <true/> <key>KeepAlive</key> <true/> <key>StandardOutPath</key> <string>/tmp/macos-dictate.log</string> <key>StandardErrorPath</key> <string>/tmp/macos-dictate.error.log</string> </dict> </plist>
Replace
/path/to/your/venv/bin/activate
and/path/to/your/dictate.py
with the actual paths. -
Load the Launch Agent
launchctl load ~/Library/LaunchAgents/com.yourusername.macos-dictate.plist
Important Notes:
-
User Interaction: Since the script may require user interaction (e.g., pressing the trigger key), ensure it doesn't block the startup process.
-
Logging: Output and errors are logged to
/tmp/macos-dictate.log
and/tmp/macos-dictate.error.log
respectively. -
Unloading the Launch Agent: To stop the script from running at startup:
launchctl unload ~/Library/LaunchAgents/com.yourusername.macos-dictate.plist
To change the key that starts and stops dictation, modify the keycode
in the dictate.py
script:
# For example, to use F5 (keycode 96)
if keycode == 96:
toggle_recording()
return None # Suppress the event to prevent system beep
Remember to save the file after making changes.
Refer to macOS Virtual Keycodes for keycode values.
- Use a larger model size (e.g.,
small
,medium
,large
) when running the script. - Ensure you're in a quiet environment with minimal background noise.
- Use a high-quality microphone.
-
Accessibility Permissions: Ensure your Terminal application or Python interpreter has the necessary permissions under System Preferences > Security & Privacy > Privacy.
-
Microphone Permissions: Ensure your Terminal application or Python interpreter is allowed to access the microphone.
-
PortAudio Errors: If you encounter errors related to
PortAudio
orsounddevice
, ensure thatportaudio
is installed via Homebrew and reinstallsounddevice
:brew install portaudio pip uninstall sounddevice pip install sounddevice
-
No Text Pasted: Ensure the active application accepts paste commands and is not blocking automated inputs.
-
Script Not Running at Startup: Check the contents of your
.plist
file for correctness and verify that the paths are accurate. -
Application Not Opening: If the application created via AppleScript or Automator doesn't open, ensure that the script paths are correct and that you have execution permissions.
We're constantly working to improve the macOS Dictation Tool. Here are some features we're planning to implement in future updates:
- Responsive cursor updates for certain keywords like "New Line" or "New Paragraph"
- Custom voice commands for text formatting (e.g., "Bold this", "Italicize that")
- Real-time transcription display with on-the-fly editing
- Better UI/UX for dictation settings including always-on-top indicator and top-bar icon
- Extended punctuation auto-correction and smart capitalization abilities
- Multi-language support with language detection
- User-defined custom vocabulary and acronym expansion
- Voice-activated undo and redo functionality
- Master log of all dictations saved to file for any use: training on your own speech patterns, etc.
- Customizable noise cancellation and audio filtering options
We're excited about these upcoming improvements and welcome any suggestions for additional features!
We welcome feedback, bug reports, and feature requests! If you encounter any problems or have ideas for improvements, please use our GitHub issue tracker:
- For bug reports: Submit an issue
- For feature requests: Submit a feature request
When submitting an issue, please provide as much detail as possible, including:
- Steps to reproduce the problem
- Expected behavior
- Actual behavior
- Your operating system version
- Your Python version
- Any relevant error messages or screenshots
Your contributions help make this tool better for everyone. Thank you for your support!
Contributions are welcome! Please fork the repository and create a pull request with your changes.
This project is licensed under the MIT License. See the LICENSE file for details.
- OpenAI Whisper
- PyObjC
- PortAudio
- Homebrew
- Community contributions and support.