Official PyTorch implementation of paper: Transformer based deep learning for digital image correlation
A DIC network developed based on GMFlow for high accuracy measurement of deformation.
Different from previous models that directly establish the relationship between grayscale value changes and the displacements, DICTr reformulates the problem back to the image registration driven by feature matching, which has clearer physical meaning.
System: Ubuntu 22.04.2 LTS
Datasets generation:
- MATLAB ≥ R2020b
DICTr network:
- Conda ≥ 22.9.0
- PyTorch ≥ 1.13.1
- CUDA ≥ 11.6
- Python ≥ 3.8.11
We recommend creating a Conda environment through the YAML file provided in the repository:
conda env create -f environment.yaml
conda activate dictr
When generating datasets and training on remote server, we recommend using tmux to prevent accidental session interruptions.
The dataset required for DICTr training can be generated through the MATLAB script provided in the repository:
cd ./dataset/DICTrDatasetGenerator
matlab -nodisplay -nosplash
>> main
Execute the following command in the root directory of the repository:
sh ./scripts/train.sh
Detailed explanation of parameters in the train script:
# name of dataset used for training
# you can create your own dataset in the dataset.py file
--stage speckle
# number of image pairs used to update model parameters during each train
# the upper limit depends on your VRAM size
--batch_size 12
# name of dataset used for validation
# you can create your own dataset in the dataset.py and evaluate.py file
--val_dataset speckle
# learning rate
--lr 2e-4
# DICTr use 12 transformer layers (6 blocks) to enhance image features
--num_transformer_layers 12
# DICTr get full resolution result by convex upsampling from 1/2 resolution
--upsample_factor 2
# DICTr use 2 scale features, 1/4 for global match and 1/2 for refinement
--num_scales 2
# number of splits on feature map edge to form window layout for swin transformer
# first parameter is for 1/4 scale feature map
# second parameter is for 1/2 scale feature map
--attn_splits_list 2 8
# radius for feature matching, -1 indicates global matching
# first parameter is for 1/4 scale feature map
# second parameter is for 1/2 scale feature map
--corr_radius_list -1 4
# fequency to perform validation
--val_freq 5000
# fequency to save model
--save_ckpt_freq 5000
# total train step for automatic stopping during UNATTENDED TRAINING
--num_steps 100000
Due to differences in VRAM across GPU devices, you may need to adjust both batch_size
and num_steps
to complete the training.
We employ the Early Stopping regularization approach to determine whether to stop updating the model. Specifically, the network is trained on the training set, and the validation set is periodically evaluated for a decrease in AEE. In order to prevent overfitting, training should halted once the validation performance no longer improves. The final model is then applied to running inference on the test set. This approach means you do not need to complete all training steps (num_steps
).
The training, validation, and test sets should not overlap to prevent data leakage. For further details, please refer to Wikipedia.
For reference, DICTr is trained on a system equipped with an AMD Ryzen 7 5700X@ 3.40GHz CPU, 128 GB RAM, and dual NVIDIA GeForce RTX 3090 Ti GPUs (each with 24GB VRAM). The default batch size is 12 and it took 8 hours.
Execute the following command in the root directory of the repository to run inference:
sh ./scripts/experiment.sh
Detailed explanation of parameters in the experiment script:
# path to resume model
# you can replace with newly trained result
--resume checkpoints/step_080000.pth
# name of experiment for running inference
# you can create custom test in experiment.py file
--exp_type rotation tension star5 mei realcrack
# DICTr use 12 transformer layers (6 blocks) to enhance image features
--num_transformer_layers 12
# DICTr get full resolution result by convex upsampling from 1/2 resolution
--upsample_factor 2
# DICTr use 2 scale features, 1/4 for global match and 1/2 for refinement
--num_scales 2
# number of splits on feature map edge to form window layout for swin transformer
# first parameter is for 1/4 scale feature map
# second parameter is for 1/2 scale feature map
--attn_splits_list 2 8
# radius for feature matching, -1 indicates global matching
# first parameter is for 1/4 scale feature map
# second parameter is for 1/2 scale feature map
--corr_radius_list -1 4
The results will be saved in the ./test
folder in .csv
format, which store the full-field displacement information of
By default, all tests in the paper will performed.
The REF and TAR images can be found in ./test
folder.
You can add custom test in the ./experiment.py
file.
The pretrained models of DICTr used in the paper is ./checkpoints/step_080000.pth
provided in the repository. It will be loaded in the default experiment script.
@article{ZHOU2025108568,
title = {Transformer based deep learning for digital image correlation},
journal = {Optics and Lasers in Engineering},
volume = {184},
pages = {108568},
year = {2025},
issn = {0143-8166},
doi = {https://doi.org/10.1016/j.optlaseng.2024.108568},
url = {https://www.sciencedirect.com/science/article/pii/S0143816624005463},
author = {Yifei Zhou and Qianjiang Zuo and Nan Chen and Licheng Zhou and Bao Yang and Zejia Liu and Yiping Liu and Liqun Tang and Shoubin Dong and Zhenyu Jiang}
}
This project owes its existence to the indispensable contribution of GMFlow.