This is the official pytorch implementation of the paper:
Deep Depth Estimation from Thermal Images: Dataset, Bechmark, and Analysis
Ukcheol Shin, Jinsun Park
- 2025.01.29: Release source code & pre-trained weights.
This codebase was developed and tested with the following packages.
- OS: Ubuntu 20.04.1 LTS
- CUDA: 11.3
- PyTorch: 1.9.1
- Python: 3.9.16
You can build your conda environment with the following commands.
conda create python=3.9 pytorch=1.9.1 cudatoolkit=11.1 -c pytorch -c conda-forge --name MS2_bench
conda activate MS2_bench
conda install core iopath pytorch3d -c pytorch -c conda-forge -c pytorch3d -y
pip install mmcv pytorch_lightning timm setuptools==59.5.0 matplotlib imageio path
After building conda environment, compile CUDA kernel for the deformable conv layer of AANet. If you have trouble, refer here
cd models/network/aanet/deform_conv/
sh build.sh
You can download MS2 dataset here. For the train/val/test list for MS2 dataset, copy the txt files from "MS2dataset Github" After downloading the dataset, locate the dataset as follows:
<datasets>
|-- <KAIST_MS2>
|-- <sync_data>
|-- <proj_depth>
|-- ...
|-- train_list.txt
|-- val_list.txt
|-- ...
- Download each model's pre-trained or backbone model weights if you need. For the NeWCRF model training, please download pre-trained Swin-V1 weights here.
After downloading the weights, locate them "pt_weights" folder as follows:
<pt_weights>
|-- <swin_tiny_patch4_window7_224_22k.pth>
|-- ...
|-- <swin_large_patch4_window7_224_22k.pth>
- Train a model with the config file. If you want to change hyperparamter (e.g., batch size, epoch, learning rate, etc), edit config file in 'configs' folder.
Single GPU, MS2 dataset, MonoStereoCRF model
CUDA_VISIBLE_DEVICES=0 python train.py --config ./configs/StereoSupDepth/MSCRF.yaml --num_gpus 1 --exp_name MSCRF_MS2_singleGPU
Multi GPUs, MS2 dataset, other models
# Mono depth estimation model
CUDA_VISIBLE_DEVICES=0,1 python train.py --config ./configs/MonoSupDepth/<target_model>.yaml --num_gpus 2 --exp_name <Model>_MS2_multiGPU
# Stereo matching model
CUDA_VISIBLE_DEVICES=0,1 python train.py --config ./configs/StereoSupDepth/<target_model>.yaml --num_gpus 2 --exp_name <Model>_MS2_multiGPU
- Start a
tensorboard
session to check training progress.
tensorboard --logdir=checkpoints/ --bind_all
You can see the progress by opening https://localhost:6006 on your browser.
Evaluate the trained model by running
# MS2-day evaluation set / Monocular depth estimation
CUDA_VISIBLE_DEVICES=0 python test_monodepth.py --config ./configs/MonoSupDepth/<target_model>.yaml --ckpt_path "PATH for WEIGHT" --test_env test_day --save_dir ./results/<target_model>/thr_day --modality thr
# MS2-night evaluation set / Monocular depth estimation
CUDA_VISIBLE_DEVICES=0 python test_monodepth.py --config ./configs/MonoSupDepth/<target_model>.yaml --ckpt_path "PATH for WEIGHT" --test_env test_night --save_dir ./results/<target_model>/thr_day --modality thr
# MS2-rainy_day evaluation set / Monocular depth estimation
CUDA_VISIBLE_DEVICES=0 python test_monodepth.py --config ./configs/MonoSupDepth/<target_model>.yaml --ckpt_path "PATH for WEIGHT" --test_env test_rain --save_dir ./results/<target_model>/thr_day --modality thr
# MS2-day evaluation set / Stereo matching
CUDA_VISIBLE_DEVICES=0 python test_disparity.py --config ./configs/StereoSupDepth/<target_model>.yaml --ckpt_path "PATH for WEIGHT" --test_env test_day --save_dir ./results/<target_model>/thr_day --modality thr
# MS2-day evaluation set / Stereo matching / MS_CRF model for Monocular depth estimation mode
CUDA_VISIBLE_DEVICES=0 python test_disparity.py --config ./configs/StereoSupDepth/MSCRF.yaml --ckpt_path "PATH for WEIGHT" --test_env test_day --save_dir ./results/MSCRF_mono/thr_day --mono_mode --modality thr
# Timing & FLOP evaluation for monocular depth estimation network
CUDA_VISIBLE_DEVICES=0 python test_timingNflop.py --config configs/MonoSupDepth/<target_model>.yaml
# Timing & FLOP evaluation for stereo matching network
CUDA_VISIBLE_DEVICES=0 python test_timingNflop.py --config configs/StereoSupDepth/<target_model>.yaml --is_stereo
Inference demo images by running
# Download the pre-trained weights
bash ./checkpoints/download_pretrained_weights.sh
# Monocular depth estimation (RGB)
CUDA_VISIBLE_DEVICES=0 python inference_depth.py --config ./configs/MonoSupDepth/<target_model>.yaml --ckpt_path "PATH for WEIGHT" --save_dir ./demo_results/mono/ --modality rgb
# Monocular depth estimation (THR)
CUDA_VISIBLE_DEVICES=0 python inference_depth.py --config ./configs/MonoSupDepth/<target_model>.yaml --ckpt_path "PATH for WEIGHT" --save_dir ./demo_results/mono/ --modality thr
# Stereo matching (RGB)
CUDA_VISIBLE_DEVICES=0 python inference_disp.py --config ./configs/StereoSupDepth/<target_model>.yaml --ckpt_path "PATH for WEIGHT" --save_dir ./demo_results/stereo/ --modality rgb
# Stereo matching (THR)
CUDA_VISIBLE_DEVICES=0 python inference_disp.py --config ./configs/StereoSupDepth/<target_model>.yaml --ckpt_path "PATH for WEIGHT" --save_dir ./demo_results/stereo/ --modality thr
We offer the pre-trained model weights that reported on the extended paper. Differing to CVPR paper, the models are trained and evaluated with the "filtered" depth/disparity GT labels.
To reproduce the reported results, follows the below instructions.
# Download the pre-trained weights
bash ./checkpoints/download_pretrained_weights.sh
# Run benchmark for monocular depth networks and stereo matching networks
bash ./scripts/run_benchmark_monodepth.sh && bash ./scripts/run_benchmark_stereomatching.sh
The results are averaged over MS^2 evaluation sets (test_day, test_night, test_rain).
Please refer to the extended paper for more detailed results.
Models | TestSet | Abs Rel | Sq Rel | RMSE | RMSE(log) | Acc.1 | Acc.2 | Acc.3 | Weight |
---|---|---|---|---|---|---|---|---|---|
DORN | RGB | 0.130 | 0.908 | 4.840 | 0.175 | 0.840 | 0.970 | 0.992 | RGB |
DORN | NIR | 0.139 | 0.917 | 4.471 | 0.177 | 0.825 | 0.967 | 0.991 | NIR |
DORN | THR | 0.109 | 0.540 | 3.660 | 0.144 | 0.887 | 0.982 | 0.996 | THR |
BTS | RGB | 0.107 | 0.635 | 4.128 | 0.147 | 0.883 | 0.980 | 0.995 | RGB |
BTS | NIR | 0.123 | 0.784 | 4.159 | 0.158 | 0.862 | 0.972 | 0.992 | NIR |
BTS | THR | 0.086 | 0.380 | 3.163 | 0.117 | 0.927 | 0.990 | 0.998 | THR |
Adabin | RGB | 0.112 | 0.650 | 4.197 | 0.150 | 0.877 | 0.979 | 0.995 | RGB |
Adabin | NIR | 0.121 | 0.740 | 4.059 | 0.154 | 0.865 | 0.974 | 0.993 | NIR |
Adabin | THR | 0.088 | 0.377 | 3.152 | 0.119 | 0.924 | 0.990 | 0.998 | THR |
NewCRF | RGB | 0.099 | 0.520 | 3.729 | 0.133 | 0.905 | 0.987 | 0.997 | RGB |
NewCRF | NIR | 0.112 | 0.641 | 3.791 | 0.144 | 0.883 | 0.979 | 0.994 | NIR |
NewCRF | THR | 0.081 | 0.331 | 2.937 | 0.109 | 0.937 | 0.992 | 0.999 | THR |
Models | TestSet | EPE-all(px) | D1-all(%) | >1px(%) | >2px(%) | Weight |
---|---|---|---|---|---|---|
PSMNet | RGB | 0.400 | 0.981 | 7.856 | 2.177 | RGB |
PSMNet | THR | 0.292 | 0.257 | 3.794 | 0.737 | THR |
GWCNet | RGB | 0.401 | 1.080 | 8.062 | 2.442 | RGB |
GWCNet | THR | 0.285 | 0.223 | 3.565 | 0.685 | THR |
AANet | RGB | 0.399 | 1.131 | 8.227 | 2.543 | RGB |
AANet | THR | 0.284 | 0.305 | 3.780 | 0.811 | THR |
ACVNet | RGB | 0.393 | 1.024 | 7.772 | 2.353 | RGB |
ACVNet | THR | 0.279 | 0.252 | 3.567 | 0.709 | THR |
Our code is licensed under a MIT License.
Please cite the following paper if you use our work in your research.
@inproceedings{shin2023deep,
title={Deep Depth Estimation From Thermal Image},
author={Shin, Ukcheol and Park, Jinsun and Kweon, In So},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={1043--1053},
year={2023}
}
@inproceedings{shin2025deep,
title={Deep Depth Estimation From Thermal Image: Dataset, Benchmark, and Analysis},
author={Shin, Ukcheol and Park, Jinsun},
TBA
}
Each network architecture built upon the following codebases.