-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathimportAudioAISystem.m
319 lines (294 loc) · 12.9 KB
/
importAudioAISystem.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
%% Use a Python Pre-Trained Audio AI System in MATLAB
%% This Example Shows How to:
%%
% * Execute a pretrained Python speech command recognition system in MATLAB.
% * Convert the Python system to a MATLAB system where Python is not required.
% * Use the MATLAB speech command recognition system in Simulink.
% * Generate C code from the MATLAB or Simulink system, and deploy it to a Raspberry
% Pi device.
%% Overview
% There are different options for accessing deep learning models trained in
% non-MATLAB frameworks within MATLAB, including:
%%
% * Co-executing models from other frameworks with MATLAB.
% * Converting models from other frameworks into MATLAB.
%%
% This example provides an overview of both approaches using a Python speech
% command recognition system as a starting point. In addition to a pretrained convolutional
% network, the system also includes audio feature extraction. You access both
% the neural network and the feature extraction in MATLAB.
%
% You use the converted command recognition system in both MATLAB and Simulink.
% You also learn how to generate C code from the system and deploy it to a Raspberry
% Pi device.
%% Requirements
%%
% * <https://www.mathworks.com/ MATLAB®> R2021b or later
% * <https://www.mathworks.com/products/deep-learning.html Deep Learning Toolbox™>
% * <https://www.mathworks.com/products/audio.html Audio Toolbox™>
%%
% The Python code uses the following packages:
%%
% * Librosa™ version 0.8.1
% * PyTorch™ version 1.10.2
%% System Description
% In this example, you start with a deep learning speech command recognition
% system that was trained in Python.
%
% The system recognizes the following commands:
%%
% * "yes"
% * "no"
% * "up"
% * "down"
% * "left"
% * "right"
% * "on"
% * "off"
% * "stop"
% * "go"
%%
% The system is comprised of a convolutional neural network. The network accepts
% auditory spectrograms as an input. Auditory spectrograms are time-frequency representations
% of speech. They are derived from the raw (time-domain) audio signal.
%
% For the training workflow, a supervized learning approach is followed,
% where auditory spectrograms labeled with commands are fed to the network.
%
%
% The following were used to train the command recognition system:
%%
% * *PyTorch* to design and train the model.
% * *Librosa* to perform feature extraction (auditory spectrogram computation).
%%
% You perform speech recognition in Python by first extracting an auditory spectrogram
% from an audio signal, and then feeding the spectrogram to the trained convolutional
% network.
%
%
%% Using the Python System in MATLAB: Two Approaches
% You can use the Python system in MATLAB with co-execution. This approach allows
% you to access variables computed by running Python code in MATLAB, including
% the predicted speech command, the network activations, and the computed auditory
% spectrogram. This approach enables you to test the network in MATLAB (for example,
% by plotting the results, or using them to compute quality metrics implemented
% in MATLAB).
%
% Co-execution does not address all workflows. For example, if you want to leverage
% MATLAB's C code generation capabilities to deploy the system to an embedded
% target, you must first convert the Python system to a MATLAB version. To do this, you translate
% both feature extraction and the neural network to MATLAB. You then generate
% code from the MATLAB command recognition system.
%
% You go through both approaches in the next sections.
%% Supporting Python Files
% The deep learning model is defined in <./PythonCode\speechCommandModel.py
% |speechCommandModel.py|>|.|
%
% The script <./PythonCode\InferSpeechCommands.py |InferSpeechCommands.py|>
% performs command recognition using the pretrained system.
%% Running the System in MATLAB
% First, add the required files to the path.
addpath("HelperFiles")
addpath("samples")
%%
% It is possible to call Python from MATLAB. For more information about
% this functionality, see <https://www.mathworks.com/help/matlab/call-python-libraries.html |Call Python from MATLAB|> in the documentation. In this example, we use
% <https://www.mathworks.com/help/matlab/ref/pyrunfile.html?searchHighlight=pyrunfile&s_tid=srchtitle_pyrunfile_1
% |pyrunfile|> to achieve this.
%
% The Python script <./PythonCode\InferSpeechCommands.py |InferSpeechCommands.py|>
% performs speech command recognition. Instead of defining the sample audio file
% path in Python, you define it in MATLAB, and pass it to Python. Note that |filename|
% corresponds to a variable name in the Python script.
%
% Execute Python inference in MATLAB. The Python script prints out the recognized
% keyword.
command = fullfile(pwd,"samples","yes.flac");
cd("PythonCode")
pyrunfile("InferSpeechCommands.py",filename=command)
%%
% It is also possible to return Python variables to MATLAB. Call |InferSpeechCommands|
% again, and return the following:
%%
% * The mel spectrogram (computed by *Librosa*).
% * The network activations.
% * The parameter-value pairs passed to Librosa's mel spectrogram function.
[LibrosaMelSpectrogram,PyTorchActivations,MelSpectrogramArgs] = pyrunfile("InferSpeechCommands.py",...
["mel_spectrogram","z","args"],filename=command);
cd ..
%%
% Notice that the output arguments are Python datatypes:
PyTorchActivations
%%
% Convert them to MATLAB datatypes:
PyTorchActivations = double(PyTorchActivations)
%%
% Convert the mel spectrogram to a MATLAB matrix as well.
LibrosaMelSpectrogram = double(LibrosaMelSpectrogram);
size(LibrosaMelSpectrogram)
%% Converting the Python Speech Command Recognition System to a MATLAB System
% Executing Python code in MATLAB is useful for many workflows, but it is not
% sufficient for certain scenarios. In this example, you want to leverage MATLAB's
% code generation abilities to deploy your speech command recognition system to
% an embedded target. For this task, you need access to a full MATLAB version
% of the speech command recognition system.
%
% Recall that the system is comprised of two components (feature extraction
% and network inference). The example explains how to convert each from Python
% to MATLAB.
% Converting the Network to MATLAB
% To import the ONNX network to MATLAB, use <https://www.mathworks.com/help/deeplearning/ref/importonnxnetwork.html
% importONNXNetwork>. Note that we saved the trained network to an Open Neural
% Network Exchange (ONNX) format for you.
onnxFile = "cmdRecognitionPyTorch.onnx";
%%
% Import the network to MATLAB
net = importONNXNetwork(onnxFile)
% Verify The Network Activations
% Verify that the MATLAB network gives the same activations as the PyTorch network
% for the same mel spectrogram input. Use the spectrogram you returned from the
% Python inference script.
MATLABActivations = predict(net,LibrosaMelSpectrogram);
%%
% Compare MATLAB and PyTorch activations.
figure
plot(MATLABActivations,"b*-")
hold on
grid on
plot(PyTorchActivations,"ro-")
xlabel("Activation #")
legend("MATLAB", "Python")
%% Converting Feature Extraction to MATLAB
% You now have access to a MATLAB implementation of the pretrained network.
%
% You now translate the Librosa feature extraction to its MATLAB equivalent.
%
% Feature extraction is part of the speech command recognition system. Access
% to a MATLAB version of this feature extraction will enable you to leverage advanced
% MATLAB functionality (e.g. deploy the entire system to an embedded target, including
% feature extraction).
% Mel Spectrogram Computation in MATLAB
% |audioFeatureExtractor| in Audio Toolbox supports mel spectrogram computations.
%
% But how do we go about setting it to ensure we get the exact same mel spectrogram
% as Librosa?
[y, fs] = audioread(command);
afe = audioFeatureExtractor(melSpectrum=true, SampleRate = fs);
MATLABMelSpectrogram = extract(afe,y);
size(MATLABMelSpectrogram)
%%
% Compare to the Librosa spectrogram. Notice that the sizes do not match by default.
size(LibrosaMelSpectrogram)
% Librosa-to-Audio Toolbox Conversion
% Matching MATLAB and Librosa feature extraction by setting parameter-value
% pairs on |audioFeatureExtractor| is possible, but it can be a tedious and time-consuming
% task.
%
% Simplify the process by using the helper function <./HelperFiles\librosaToAudioToolbox.m
% |librosaToAudioToolbox|>. This function takes the parameter-value pair arguments
% used in Librosa's mel spectrogram function, and automatically maps them to an
% equivalent |audioFeatureExtractor| object.
afe = librosaToAudioToolbox("melSpectrogram", MelSpectrogramArgs)
%%
% Compute the log mel spectrogram based on this object.
MATLABMelSpectrogram = extract(afe,y);
MATLABMelSpectrogram = log10(MATLABMelSpectrogram + 1e-6);
%%
% Compare to the Librosa mel spectrogram.
figure
subplot(2,1,1)
smag = MATLABMelSpectrogram;
T = 1:size(MATLABMelSpectrogram,1);
F = 1:size(MATLABMelSpectrogram,2);
pcolor(F,T, smag)
ylabel("Time Bin #")
xlabel("Band #")
shading flat
colorbar
title("Audio Toolbox")
subplot(2,1,2)
smag = LibrosaMelSpectrogram;
pcolor(F,T, smag)
ylabel("Time Bin #")
xlabel("Band #")
shading flat
colorbar
title("Librosa")
%%
% The two spectrograms look identical. Compute an error metric to make sure.
norm(MATLABMelSpectrogram(:)-LibrosaMelSpectrogram(:))
%% Generate MATLAB Code for Feature Extraction
% You now have access to an |audioFeatureExtractor| that matches Librosa, but
% it would be nice to know how to exactly set that object.
%
% You can generate MATLAB code that instantiates and sets up |audioFeatureExtractor|
% by using the "code" option on |librosaToAudioToolbox|.
code = librosaToAudioToolbox("melSpectrogram", MelSpectrogramArgs,"code")
%% Speech Command Recognition in MATLAB
% You are now ready to perform speech command recognition entirely in MATLAB.
%
% Copy and paste the feature extraction code from above.
afe = audioFeatureExtractor(SampleRate=16000.000000,Window=hann(512,"periodic"),...
OverlapLength=352,FFTLength=512,melSpectrum=true);
setExtractorParameters(afe,"melSpectrum",SpectrumType="power",...
FilterBankDesignDomain="linear",...
FilterBankNormalization="bandwidth",...
WindowNormalization=false,...
NumBands=50,...
FrequencyRange=[0 8000]);
%%
% Compute the mel spectrogram.
S = extract(afe,y);
S = log10(S+1e-6);
%%
% Compute the network activation using the MATLAB Deep Learning network.
activations = predict(net,S)
%%
% Deduce the spoken command.
CLASSES = ["unknown" " yes" " no" " up" " down" " left" " right" " on" " off" " stop" "go"];
[~,ind] = max(activations);
fprintf("Recognized command: %s\n",CLASSES(ind))
%% Speech Command Recognition in Simulink
% You successfully implemented a pure MATLAB version of speech command recognition.
%
% You can also use this system in Simulink.
%% Using the Deep Learning Model in Simulink
% You can use the deep learning network in Simulink using the <https://www.mathworks.com/help/deeplearning/ref/predict.html
% Predict> block.
%
% Save the network to a MAT file. You can point the Predict block to this MAT
% file to use it in Simulink.
save net.mat net
% Implementing Feature Extraction for Simulink
% What about feature extraction in Simulink? Use the "block" switch on |librosaToAudioToolbox|
% to generate an equivalent mel Spectrogram block. This library block ships with
% Audio Toolbox. You need R2022A or later to access this block.
librosaToAudioToolbox("melSpectrogram", MelSpectrogramArgs,"block");
% Speech Command Recognition System in Simulink
% The following model uses the two blocks discussed above to perform speech
% command recognition in Simulink.
open_system("speechCommandRecognition")
%%
% Run the model to observe the network output activations.
%% Deploying The System to an Embedded Target
% The Simulink and MATLAB versions highlighted above both support C code generation
% and deployment to an embedded target.
%
% For more information on how to deploy the MATLAB version, see the following
% examples:
%%
% * <https://www.mathworks.com/help/deeplearning/ug/speech-command-recognition-code-generation-with-intel-mkl-dnn.html
% Speech Command Recognition Code Generation with Intel MKL-DNN>
% * <https://www.mathworks.com/help/deeplearning/ug/speech-command-recognition-code-generation-on-raspberry-pi.html
% Speech Command Recognition Code Generation on Raspberry Pi>
%%
% For more information on how to deploy the Simulink version, see the following
% examples:
%%
% * <https://www.mathworks.com/help/audio/ug/speech-command-recognition-code-generation-with-intel-mkl-dnn-using-simulink.html
% Speech Command Recognition Code Generation with Intel MKL-DNN Using Simulink>
% * <https://www.mathworks.com/help/audio/ug/speech-command-recognition-on-raspberry-pi-using-simulink.html
% Speech Command Recognition on Raspberry Pi Using Simulink>
%%
% _Copyright 2022 The MathWorks, Inc.