A framework predicts driver's maneuver behaviors.
Pytorch implementation of the paper "Driver Intention Anticipation Based on In-Cabin and Driving Scene Monitoring".
Here is the demo of our framework:
In this demo, the prediction is made after every second. If the prediction is correct, there is a ✓.
The dataset used is downloaded from Brain4cars.
-
Videos are extracted into images with the fps=25 under each directory. The file name format is e.g. "image-0001.png".
You can use our script
extract_frames.py
indatasets/annotation
to extract images: Copy this file to directory of "face_camera", and then run this script. -
We split the dataset using 5-fold cross validaton. Run script
n_fold_Brain4cars.py
in directorydatasets/annotation
to split.You can use the five
.csv
files indatasets/annotation
and skip this step.
The network, 3D-ResNet 50 and its pretrained model is downloaded from 3D ResNets.
We thank this project :)
Before running the run-3DResnet.sh script. Please give the path to:
-
root_path
: path to this project. -
annotation_path
: path to annotation directory in this project. -
video_path
: path to image frames of driver videos. -
pretrain_path
: path to the pretrained 3D ResNet 50 model.
Notes:
-
n_fold
: is the number of the fold. Here, n_fold is from 0 to 4. -
sample_duration
: the length of input vidoes. Here, 16 frames. -
end_second
: the second before the maneuver. Here, end_second is from 1 to 5.E.g. end_second = 3, frames which are 3 seconds (including the third second) before maneuver are given as input.
More details about other args, please refer to the opt.py
.
The trained model using our script can be found under this link. The model name is "save_best_3DResNet50.pth".
We used FlowNet 2.0 to extract the optical flow of all outside images.
You could also find these optical flow images under this link.
We adapted our ConvLSTM network from this repo.
We thank those two projects :)
Before running the run-ConvLSTM.sh script. Please give the path to:
-
root_path
: path to this project. -
annotation_path
: path to annotation directory in this project. -
video_path
: path to image frames of optical flow images.
Notes:
-
n_fold
: is the number of the fold. Here, n_fold is from 0 to 4. -
sample_duration
: the length of input vidoes. Here, 5 frames. -
interval
: this is the interval of two frames inside the input clip. It should be between 5 to 30. More details can be found in the Section IV.B of the paper. -
end_second
: the second before the maneuver. Here, end_second is from 1 to 5.
If you find this work useful, please cite as follows:
@INPROCEEDINGS{9294181,
author={Rong, Yao and Akata, Zeynep and Kasneci, Enkelejda},
booktitle={2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)},
title={Driver Intention Anticipation Based on In-Cabin and Driving Scene Monitoring},
year={2020},
volume={},
number={},
pages={1-8},
doi={10.1109/ITSC45102.2020.9294181}}