Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

VERY SLOW training on audio-video dataset like kinetics400 and UCF101 #7

Open
XinyuSun opened this issue Nov 29, 2021 · 3 comments
Open

Comments

@XinyuSun
Copy link

Hi authors!
Thank you for making the paper and code open source. It is very helpful.
I am trying to pretrain the GDT model on kinetics400 dataset, while I spent more than 1 day on each epoch. I run on the 8 3090 GPU server and set the batch size on each GPU to 16, and the total batch size is 128, which is a quarter of the original setting in the paper.
According to the paper, the authors spent 3 days on pretraining with 512 batch size, under normal circumstances it should not cost more than 3 hours on each epoch.
I change the video decode method from pyav to decord, which brings a bit of improvement in training speed. I wonder if the speed of the provided code is tested before release? What should I do to find the cues for speeding up training?

Some logs below:

Epoch: [0]  [  360/14961]  eta: 13:42:52  lr: 0.01  clips/s: 16.263  loss: 2.7961 (2.8411)  batch_t/s: 1.0088 (1.4428)  time: 2.8681  data: 1.3705  max mem: 20040
Epoch: [0]  [  370/14961]  eta: 13:46:51  lr: 0.01  clips/s: 13.694  loss: 2.7992 (2.8464)  batch_t/s: 1.0067 (1.0740)  time: 4.3781  data: 3.3474  max mem: 20040
Epoch: [0]  [  370/14961]  eta: 13:46:48  lr: 0.01  clips/s: 13.769  loss: 2.7919 (2.8454)  batch_t/s: 1.0110 (1.7200)  time: 4.3779  data: 1.3611  max mem: 20040
Epoch: [0]  [  370/14961]  eta: 13:46:48  lr: 0.01  clips/s: 13.532  loss: 2.7913 (2.8402)  batch_t/s: 1.0089 (1.4563)  time: 4.3786  data: 2.4327  max mem: 20040
Epoch: [0]  [  380/14961]  eta: 13:31:23  lr: 0.01  clips/s: 14.072  loss: 2.7891 (2.8451)  batch_t/s: 1.0196 (1.0736)  time: 2.5644  data: 1.5199  max mem: 20040
Epoch: [0]  [  380/14961]  eta: 13:31:20  lr: 0.01  clips/s: 14.029  loss: 2.7738 (2.8434)  batch_t/s: 1.0512 (1.7027)  time: 2.5646  data: 0.5402  max mem: 20040
Epoch: [0]  [  380/14961]  eta: 13:31:19  lr: 0.01  clips/s: 14.026  loss: 2.7874 (2.8387)  batch_t/s: 1.0548 (1.4459)  time: 2.5643  data: 1.0631  max mem: 20040
Epoch: [0]  [  390/14961]  eta: 13:36:54  lr: 0.01  clips/s: 15.097  loss: 2.7765 (2.8417)  batch_t/s: 1.0534 (1.7432)  time: 2.6929  data: 0.5196  max mem: 20040
Epoch: [0]  [  390/14961]  eta: 13:36:56  lr: 0.01  clips/s: 14.988  loss: 2.7927 (2.8441)  batch_t/s: 1.0630 (1.0732)  time: 2.6932  data: 1.6344  max mem: 20040
Epoch: [0]  [  390/14961]  eta: 13:36:53  lr: 0.01  clips/s: 16.121  loss: 2.7775 (2.8376)  batch_t/s: 1.0481 (1.4640)  time: 2.6923  data: 1.0834  max mem: 20040
Epoch: [0]  [  400/14961]  eta: 13:43:48  lr: 0.01  clips/s: 16.551  loss: 2.7957 (2.8433)  batch_t/s: 1.0546 (1.0725)  time: 4.4575  data: 3.4058  max mem: 20040
Epoch: [0]  [  400/14961]  eta: 13:43:45  lr: 0.01  clips/s: 1.458  loss: 2.7986 (2.8373)  batch_t/s: 1.0390 (1.4786)  time: 4.4577  data: 2.3538  max mem: 20040
Epoch: [0]  [  400/14961]  eta: 13:43:46  lr: 0.01  clips/s: 0.679  loss: 2.7963 (2.8410)  batch_t/s: 1.0598 (1.7822)  time: 4.4580  data: 1.1610  max mem: 20040
Epoch: [0]  [  410/14961]  eta: 13:29:18  lr: 0.01  clips/s: 15.575  loss: 2.7954 (2.8418)  batch_t/s: 1.0273 (1.0715)  time: 2.8114  data: 1.7718  max mem: 20040
Epoch: [0]  [  410/14961]  eta: 13:29:15  lr: 0.01  clips/s: 15.525  loss: 2.7892 (2.8399)  batch_t/s: 1.0306 (1.7639)  time: 2.8114  data: 0.6421  max mem: 20040

Sincerely yours.

@XinyuSun
Copy link
Author

Avg GPU utilization is relatively low compared with other video pretraining methods

  • GDT pretraining
    image
  • Other methods
    image

@billhhh
Copy link

billhhh commented Nov 30, 2021

Thanks

@XinyuSun
Copy link
Author

Hi, the author only use the audio model during pretraining, for fair comparison with other SOTAs they did not use audio for finetuning.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants