In speaker diarization process, we will only use the first channel as the SAD model and the Speaker-Embedding-Extrator do not support the multi-channel wav.
You can find all the process in
sd/run.sh
and the comments in it.
Here is the main stage:
- You should first prepare the key file
wav.scp
for our tools, the scriptslocal/make_aishell4_test.sh
is just a sample, you can use any language (like perl, python or shell) as you like. If you find some naming problems when use, you can replace this step by your own script to prepare the files. - You should change your own data path and
kaldi-root
first inrun.sh
andpath.sh
. The scriptlocal/do_segmentations.sh
is to get the SAD result for the future work. You will find the segments file under the$sad_result_dir
. - When use the
VBx
tools for the diarization, you should convert the segments to the.lab
. Usescripts/segment_to_lab.sh
to change the file format - The speaker diarization code needs two stage the speaker-embedding extract and the speaker-embedding cluster. Our baseline use the
VBx
tools to extract speaker-embeddings. The feature-extractor is inside, you don't have to prepare the feature before. Note our scripts run in a SGE systems sor we sub the extract-embedding jobs to the queue.pl, if you do not have it, try to extract the speaker embeddings one by one. Besides, we recommand you add aexit 1
after the stage 3 to waiting for the extracting process finished. - For the speaker-embedding cluster, you can use the
run_cluster.sh
and the code will make the rttm for each audio in the wav.scp.
You need download the model from the path, you should mv the exp
to the sd/
and the ResNet101_16kHz
to the VBx/models
.