VERY SLOW training on audio-video dataset like kinetics400 and UCF101 #7

XinyuSun · 2021-11-29T14:11:52Z

Hi authors!
Thank you for making the paper and code open source. It is very helpful.
I am trying to pretrain the GDT model on kinetics400 dataset, while I spent more than 1 day on each epoch. I run on the 8 3090 GPU server and set the batch size on each GPU to 16, and the total batch size is 128, which is a quarter of the original setting in the paper.
According to the paper, the authors spent 3 days on pretraining with 512 batch size, under normal circumstances it should not cost more than 3 hours on each epoch.
I change the video decode method from pyav to decord, which brings a bit of improvement in training speed. I wonder if the speed of the provided code is tested before release? What should I do to find the cues for speeding up training?

Some logs below:

Epoch: [0]  [  360/14961]  eta: 13:42:52  lr: 0.01  clips/s: 16.263  loss: 2.7961 (2.8411)  batch_t/s: 1.0088 (1.4428)  time: 2.8681  data: 1.3705  max mem: 20040
Epoch: [0]  [  370/14961]  eta: 13:46:51  lr: 0.01  clips/s: 13.694  loss: 2.7992 (2.8464)  batch_t/s: 1.0067 (1.0740)  time: 4.3781  data: 3.3474  max mem: 20040
Epoch: [0]  [  370/14961]  eta: 13:46:48  lr: 0.01  clips/s: 13.769  loss: 2.7919 (2.8454)  batch_t/s: 1.0110 (1.7200)  time: 4.3779  data: 1.3611  max mem: 20040
Epoch: [0]  [  370/14961]  eta: 13:46:48  lr: 0.01  clips/s: 13.532  loss: 2.7913 (2.8402)  batch_t/s: 1.0089 (1.4563)  time: 4.3786  data: 2.4327  max mem: 20040
Epoch: [0]  [  380/14961]  eta: 13:31:23  lr: 0.01  clips/s: 14.072  loss: 2.7891 (2.8451)  batch_t/s: 1.0196 (1.0736)  time: 2.5644  data: 1.5199  max mem: 20040
Epoch: [0]  [  380/14961]  eta: 13:31:20  lr: 0.01  clips/s: 14.029  loss: 2.7738 (2.8434)  batch_t/s: 1.0512 (1.7027)  time: 2.5646  data: 0.5402  max mem: 20040
Epoch: [0]  [  380/14961]  eta: 13:31:19  lr: 0.01  clips/s: 14.026  loss: 2.7874 (2.8387)  batch_t/s: 1.0548 (1.4459)  time: 2.5643  data: 1.0631  max mem: 20040
Epoch: [0]  [  390/14961]  eta: 13:36:54  lr: 0.01  clips/s: 15.097  loss: 2.7765 (2.8417)  batch_t/s: 1.0534 (1.7432)  time: 2.6929  data: 0.5196  max mem: 20040
Epoch: [0]  [  390/14961]  eta: 13:36:56  lr: 0.01  clips/s: 14.988  loss: 2.7927 (2.8441)  batch_t/s: 1.0630 (1.0732)  time: 2.6932  data: 1.6344  max mem: 20040
Epoch: [0]  [  390/14961]  eta: 13:36:53  lr: 0.01  clips/s: 16.121  loss: 2.7775 (2.8376)  batch_t/s: 1.0481 (1.4640)  time: 2.6923  data: 1.0834  max mem: 20040
Epoch: [0]  [  400/14961]  eta: 13:43:48  lr: 0.01  clips/s: 16.551  loss: 2.7957 (2.8433)  batch_t/s: 1.0546 (1.0725)  time: 4.4575  data: 3.4058  max mem: 20040
Epoch: [0]  [  400/14961]  eta: 13:43:45  lr: 0.01  clips/s: 1.458  loss: 2.7986 (2.8373)  batch_t/s: 1.0390 (1.4786)  time: 4.4577  data: 2.3538  max mem: 20040
Epoch: [0]  [  400/14961]  eta: 13:43:46  lr: 0.01  clips/s: 0.679  loss: 2.7963 (2.8410)  batch_t/s: 1.0598 (1.7822)  time: 4.4580  data: 1.1610  max mem: 20040
Epoch: [0]  [  410/14961]  eta: 13:29:18  lr: 0.01  clips/s: 15.575  loss: 2.7954 (2.8418)  batch_t/s: 1.0273 (1.0715)  time: 2.8114  data: 1.7718  max mem: 20040
Epoch: [0]  [  410/14961]  eta: 13:29:15  lr: 0.01  clips/s: 15.525  loss: 2.7892 (2.8399)  batch_t/s: 1.0306 (1.7639)  time: 2.8114  data: 0.6421  max mem: 20040

Sincerely yours.

The text was updated successfully, but these errors were encountered:

XinyuSun · 2021-11-30T02:09:56Z

Avg GPU utilization is relatively low compared with other video pretraining methods

GDT pretraining
Other methods

billhhh · 2021-11-30T03:20:55Z

Thanks

XinyuSun · 2021-11-30T07:15:48Z

Hi, the author only use the audio model during pretraining, for fair comparison with other SOTAs they did not use audio for finetuning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VERY SLOW training on audio-video dataset like kinetics400 and UCF101 #7

VERY SLOW training on audio-video dataset like kinetics400 and UCF101 #7

XinyuSun commented Nov 29, 2021

XinyuSun commented Nov 30, 2021

billhhh commented Nov 30, 2021 •

edited

Loading

XinyuSun commented Nov 30, 2021

VERY SLOW training on audio-video dataset like kinetics400 and UCF101 #7

VERY SLOW training on audio-video dataset like kinetics400 and UCF101 #7

Comments

XinyuSun commented Nov 29, 2021

XinyuSun commented Nov 30, 2021

billhhh commented Nov 30, 2021 • edited Loading

XinyuSun commented Nov 30, 2021

billhhh commented Nov 30, 2021 •

edited

Loading