This repository contains the code and model weights for our paper (CVPR 2023):
A Light Weight Model for Active Speaker Detection
Junhua Liao, Haihan Duan, Kanghui Feng, Wanbing Zhao, Yanbing Yang, Liangyin Chen
-
在开发板上安装 Python 执行环境。注意:
- RKNN-Toolkit Ubuntu 20.04 只支持 Python 3.8 和 3.9。
- Ubuntu 22.04 支持 Python 3.10 和 3.11。
-
将本项目拷贝至 RK3588 开发板,将
models/rknn_models.zip
解压至models
目录。 -
在开发板上执行以下命令以安装必要的库:
sudo apt-get install libxslt1-dev zlib1g zlib1g-dev libglib2.0-0 libsm6 libgl1-mesa-glx libprotobuf-dev gcc ffmpeg pip install -r requirements.txt
-
获得瑞芯微官方的 RKNN-Toolkit:
git clone https://hub.nuaa.cf/airockchip/rknn-toolkit2.git
-
将 RKNN-Toolkit 安装到开发板上:
cd rknn-toolkit2 pip install rknn-toolkit-lite2/packages/rknn_toolkit_lite2-2.0.0b0-cpxx-cpxx-linux_aarch64.whl
其中的
cpxx
需要替换为 Python 的版本号,请在rknn-toolkit-lite2/packages
目录下查找和你的 Python 版本匹配的.whl
文件。例如:pip install rknn-toolkit-lite2/packages/rknn_toolkit_lite2-2.0.0b0-cp38-cp38-linux_aarch64.whl
-
执行:
python inference_onnx.py
利用 ONNX 模型推理
或者
python inference_rknn.py
利用 RKNN 模型推理
-
速度优化: 执行 config_cpufreq.sh 调整开发板运行频率,提升推理速度。
-
如果需要重新生成模型,请执行以下步骤:
-
参考 RKNN-Toolkit2 快速入门指南,在 PC 端用 Docker 安装 RKNN-Toolkit2 镜像。
-
启动镜像后,在镜像中执行以下步骤:
-
利用
docker cp
将本项目源码拷贝进容器中。 -
执行:
python gen_asd_rknn.py
会依次生成 LightASD 的 pt, onnx, rknn 模型。
-
执行:
python gen_loss_rknn.py
会依次生成 LossAV 模型 的 pt, onnx, rknn 模型。
-
执行:
python evaluate_acc_asd.py
评估 LightASD onnx 和 rknn 模型的精度差。
-
执行:
python evaluate_acc_loss.py
评估 LossAV 模型的 onnx 和 rknn 模型的精度差。
-
执行:
python evaluate_perf.py
评估 LightASD 模型的计算性能。执行这一步时,需要用 USB 和开发板的 otg 端口相连。或者利用 adb connect 通过网络方式连接开发板。
-
执行:
python evaluate_mem.py
查看 LightASD 模型的内存占用情况。
-
执行:
python gen_quantize_data.py python quantize.py
生成量化数据并执行量化。注意,量化数据是从 Columbia_test.py 解析出的文件中提取而来,注意阅读 gen_quantize_data.py 开始的注释。
Use the following code to download and preprocess the AVA dataset.
python train.py --dataPathAVA AVADataPath --download
The AVA dataset and the labels will be downloaded into AVADataPath
.
You can train the model on the AVA dataset by using:
python train.py --dataPathAVA AVADataPath
or
python train_2D.py --dataPathAVA AVADataPath
which is a customized version to support rknn.
exps/exps1/score.txt
: output score file, exps/exp1/model/model_00xx.model
: trained model, exps/exps1/val_res.csv
: prediction for val set.
Our model weights have been placed in the weight
folder. It performs mAP: 94.06%
in the validation set. You can check it by using:
python train.py --dataPathAVA AVADataPath --evaluation
The model weights trained on the AVA dataset have been placed in the weight
folder. Then run the following code.
python Columbia_test.py --evalCol --colSavePath colDataPath
The Columbia ASD dataset and the labels will be downloaded into colDataPath
. And you can get the following F1 result.
Name | Bell | Boll | Lieb | Long | Sick | Avg. |
---|---|---|---|---|---|---|
F1 | 82.7% | 75.7% | 87.0% | 74.5% | 85.4% | 81.1% |
We have also provided the model weights fine-tuned on the TalkSet dataset. Due to space limitations, we did not exhibit it in the paper. Run the following code.
python Columbia_test.py --evalCol --pretrainModel weight/finetuning_TalkSet.model --colSavePath colDataPath
And you can get the following F1 result.
Name | Bell | Boll | Lieb | Long | Sick | Avg. |
---|---|---|---|---|---|---|
F1 | 97.7% | 86.3% | 98.2% | 99.0% | 96.3% | 95.5% |
You can put the raw video (.mp4
and .avi
are both fine) into the demo
folder, such as 0001.mp4
.
python Columbia_test.py --videoName 0001 --videoFolder demo
By default, the model loads weights trained on the AVA-ActiveSpeaker dataset. If you want to load weights fine-tuned on TalkSet, you can execute the following code.
python Columbia_test.py --videoName 0001 --videoFolder demo --pretrainModel weight/finetuning_TalkSet.model
You can obtain the output video demo/0001/pyavi/video_out.avi
, where the active speaker is marked by a green box and the non-active speaker by a red box.
Please cite our paper if you use this code or model weights.
@InProceedings{Liao_2023_CVPR,
author = {Liao, Junhua and Duan, Haihan and Feng, Kanghui and Zhao, Wanbing and Yang, Yanbing and Chen, Liangyin},
title = {A Light Weight Model for Active Speaker Detection},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {22932-22941}
}
Thanks for the support of TaoRuijie's open source repository for this research.