Questions about the data normalization #25

cqjjjzr · 2019-05-07T14:54:31Z

Hi Kim,

Apologize for disturbing you for many times, but I have problem understanding your normalization code. I found some code in the acoustic_feat_ex.m:

%% Save global normalization factor

global_mean = train_mean / length(audio_list);
global_std = train_std / length(audio_list);
save([save_dir, '/global_normalize_factor'], 'global_mean', 'global_std');

and in every data_reader_XXX.py:

norm_param = sio.loadmat(self._norm_dir+'/global_normalize_factor.mat')
self.train_mean = norm_param['global_mean']
self.train_std = norm_param['global_std']

My questions are:

Is a global normalize factor for the whole dataset saved in acoustic_feat_ex.m? Why don't calculate factor for every single train file and apply normalization on it?
If so, why this factor is used also during the prediction phase (because data_reader_XXX.pys are also used during the prediction)? Is this a mistake?

Thanks in advance!
Charlie Jiang

The text was updated successfully, but these errors were encountered:

jtkim-kaist · 2019-08-13T03:05:55Z

If each sample file has much different noise characteristic and high noise energy, the mean and variance can be depends on noise signal rather than speech signal. However, the purpose of VAD is utilizing the speech signal's statistical characteristic, global mean and variance are likely to have speech signal's mean and variance rather than noise as severe noise situation is not frequent.
It is not a mistake as we cannot find global mean and variance from test dataset, however, if you use the local mean, and variance from each sample file, you can use local mean and variance from the test file if you want.

cqjjjzr · 2019-08-14T09:56:33Z

Thanks for your reply!

One more question, when the program is being used in production environment, is there any difference between using local mean and variance from each input file and using global train mean and stdvariance? If so, which should I choose?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the data normalization #25

Questions about the data normalization #25

cqjjjzr commented May 7, 2019

jtkim-kaist commented Aug 13, 2019

cqjjjzr commented Aug 14, 2019

Questions about the data normalization #25

Questions about the data normalization #25

Comments

cqjjjzr commented May 7, 2019

jtkim-kaist commented Aug 13, 2019

cqjjjzr commented Aug 14, 2019