A React hook that converts speech from the microphone to text and makes it available to your React components.
useSpeechRecognition
is a React hook that gives a component access to a transcript of speech picked up from the user's microphone.
SpeechRecognition
manages the global state of the Web Speech API, exposing functions to turn the microphone on and off.
Under the hood, it uses Web Speech API. Note that browser support for this API is currently limited, with Chrome having the best experience - see supported browsers for more information.
This version requires React 16.8 so that React hooks can be used. If you're used to version 2.x of react-speech-recognition
or want to use an older version of React, you can see the old README here. If you want to migrate to version 3.x, see the migration guide here.
- Basic example
- Supported browsers
- API docs
- Version 3 migration guide
- TypeScript declaration file in DefinitelyTyped
To install:
npm install --save react-speech-recognition
To import in your React code:
import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition'
The most basic example of a component using this hook would be:
import React from 'react'
import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition'
const Dictaphone = () => {
const { transcript, resetTranscript } = useSpeechRecognition()
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
return null
}
return (
<div>
<button onClick={SpeechRecognition.startListening}>Start</button>
<button onClick={SpeechRecognition.stopListening}>Stop</button>
<button onClick={resetTranscript}>Reset</button>
<p>{transcript}</p>
</div>
)
}
export default Dictaphone
You can see more examples in the example React app attached to this repo. See Developing.
Currently, this feature is not supported in all browsers, with the best experience being available on desktop Chrome. However, it fails gracefully on other browsers. It is recommended that you render some fallback content if it is not supported by the user's browser:
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
// Render some fallback content
}
As of June 2020, the following browsers support the Web Speech API:
- Chrome (desktop): this is by far the smoothest experience
- Microsoft Edge
- Chrome (Android): a word of warning about this platform, which is that there can be an annoying beeping sound when turning the microphone on. This is part of the Android OS and cannot be controlled from the browser
- Android webview
- Samsung Internet
For all other browsers, you can render fallback content using the SpeechRecognition.browserSupportsSpeechRecognition
function described above.
Before consuming the transcript, you should be familiar with SpeechRecognition
, which gives you control over the microphone. The state of the microphone is global, so any functions you call on this object will affect all components using useSpeechRecognition
.
To start listening to speech, call the startListening
function.
SpeechRecognition.startListening()
This is an asynchronous function, so it will need to be awaited if you want to do something after the microphone has been turned on.
To turn the microphone off, but still finish processing any speech in progress, call stopListening
.
SpeechRecognition.stopListening()
To turn the microphone off, and cancel the processing of any speech in progress, call abortListening
.
SpeechRecognition.abortListening()
To make the microphone transcript available in your component, simply add:
const { transcript } = useSpeechRecognition()
To set the transcript to an empty string, you can call the resetTranscript
function provided by useSpeechRecognition
. Note that this is local to your component and does not affect any other components using Speech Recognition.
const { resetTranscript } = useSpeechRecognition()
To respond when the user says a particular phrase, you can pass in a list of commands to the useSpeechRecognition
hook. Each command is an object with the following properties:
command
: This is a string orRegExp
representing the phrase you want to listen forcallback
: The function that is executed when the command is spoken. The last argument that this function receives will always be an object containing the following properties:resetTranscript
: A function that sets the transcript to an empty string
matchInterim
: Boolean that determines whether "interim" results should be matched against the command. This will make your component respond faster to commands, but also makes false positives more likely - i.e. the command may be detected when it is not spoken. This isfalse
by default and should only be set for simple commands.isFuzzyMatch
: Boolean that determines whether the comparison between speech andcommand
is based on similarity rather than an exact match. Fuzzy matching is useful for commands that are easy to mispronounce or be misinterpreted by the Speech Recognition engine (e.g. names of places, sports teams, restaurant menu items). It is intended for commands that are string literals without special characters. Ifcommand
is a string with special characters or aRegExp
, it will be converted to a string without special characters when fuzzy matching. The similarity that is needed to match the command can be configured withfuzzyMatchingThreshold
.isFuzzyMatch
isfalse
by default. When it is set totrue
, it will pass four arguments tocallback
:- The value of
command
- The speech that matched
command
- The similarity between
command
and the speech - The object mentioned in the
callback
description above
- The value of
fuzzyMatchingThreshold
: If the similarity of speech tocommand
is higher than this value whenisFuzzyMatch
is turned on, thecallback
will be invoked. You should set this only ifisFuzzyMatch
istrue
. It takes values between0
(will match anything) and1
(needs an exact match). The default value is0.8
.
To make commands easier to write, the following symbols are supported:
- Splats: this is just a
*
and will match multi-word text:- Example:
'I would like to order *'
- The words that match the splat will be passed into the callback, one argument per splat
- Example:
- Named variables: this is written
:<name>
and will match a single word:- Example:
'I am :height metres tall'
- The one word that matches the named variable will be passed into the callback
- Example:
- Optional words: this is a phrase wrapped in parentheses
(
and)
, and is not required to match the command:- Example:
'Pass the salt (please)'
- The above example would match both
'Pass the salt'
and'Pass the salt please'
- Example:
import React, { useState } from 'react'
import SpeechRecognition, { useSpeechRecognition } from 'react-speech-recognition'
const Dictaphone = () => {
const [message, setMessage] = useState('')
const commands = [
{
command: 'I would like to order *',
callback: (food) => setMessage(`Your order is for: ${food}`)
},
{
command: 'The weather is :condition today',
callback: (condition) => setMessage(`Today, the weather is ${condition}`)
},
{
command: 'My top sports are * and *',
callback: (sport1, sport2) => setMessage(`#1: ${sport1}, #2: ${sport2}`)
},
{
command: 'Pass the salt (please)',
callback: () => setMessage('My pleasure')
},
{
command: 'Hello',
callback: () => setMessage('Hi there!'),
matchInterim: true
},
{
command: 'Beijing',
callback: (command, spokenPhrase, similarityRatio) => setMessage(`${command} and ${spokenPhrase} are ${similarityRatio * 100}% similar`),
// If the spokenPhrase is "Benji", the message would be "Beijing and Benji are 40% similar"
isFuzzyMatch: true,
fuzzyMatchingThreshold: 0.2
},
{
command: 'clear',
callback: ({ resetTranscript }) => resetTranscript()
}
]
const { transcript } = useSpeechRecognition({ commands })
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
return null
}
return (
<div>
<p>{message}</p>
<p>{transcript}</p>
</div>
)
}
export default Dictaphone
By default, the microphone will stop listening when the user stops speaking. This reflects the approach taken by "press to talk" buttons on modern devices.
If you want to listen continuously, set the continuous
property to true
when calling startListening
. The microphone will continue to listen, even after the user has stopped speaking.
SpeechRecognition.startListening({ continuous: true })
To listen for a specific language, you can pass a language tag (e.g. 'zh-CN'
for Chinese) when calling startListening
. See here for a list of supported languages.
SpeechRecognition.startListening({ language: 'zh-CN' })
Unfortunately, speech recognition will not function in Chrome when offline. According to the Web Speech API docs: On Chrome, using Speech Recognition on a web page involves a server-based recognition engine. Your audio is sent to a web service for recognition processing, so it won't work offline.
If you are building an offline web app, you can detect when the browser is offline by inspecting the value of navigator.onLine
. If it is true
, you can render the transcript generated by React Speech Recognition. If it is false
, it's advisable to render offline fallback content that signifies that speech recognition is disabled. The online/offline API is simple to use - you can read how to use it here.
You can run an example React app that uses react-speech-recognition
with:
npm i
npm run dev
On http://localhost:3000
, you'll be able to speak into the microphone and see your speech as text on the web page. There are also controls for turning speech recognition on and off. You can make changes to the web app itself in the example
directory. Any changes you make to the web app or react-speech-recognition
itself will be live reloaded in the browser.
View the API docs here or follow the guide above to learn how to use react-speech-recognition
.
MIT