Replies: 1 comment
-
@axife 👋 Hello! Thanks for asking about CUDA memory issues. While we don't have a specific table with requirements, we have general guidelines on reducing CUDA usage when training. I've pasted them here below. Note they were written for YOLOv5 but apply broadly to all PyTorch GPU training. YOLO 🚀 can be trained on CPU, single-GPU, or multi-GPU. When training on GPU it is important to keep your batch-size small enough that you do not use all of your GPU memory, otherwise you will see a CUDA Out Of Memory (OOM) Error and your training will crash. You can observe your CUDA memory utilization using either the CUDA Out of Memory SolutionsIf you encounter a CUDA OOM error, the steps you can take to reduce your memory usage are:
AutoBatchYou can use YOLOv5 AutoBatch (NEW) to find the best batch size for your training by passing Good luck 🍀 and let us know if you have any other questions! |
Beta Was this translation helpful? Give feedback.
-
I try to follow HUB forms to train model. I was able to train smallest COCO128 model in Google Colab.
But when I try to train bigger model like VisDrone I get error out of memory. I use Colab with T4 GPU that has 15GB of RAM.
I tried to play with values like Batch 8 and 4
I used proposed pre-trained model.
But it always tells out of memory.
Is there some table that tells what hardware I need to handle certain training.
Thanks
Artem
Ultralytics HUB: Authenticated ✅
Ultralytics HUB: View model at https://hub.ultralytics.com/models/79M1AWwDkOGUIIVgmpF9 🚀
Ultralytics YOLOv8.0.93 🚀 Python-3.10.11 torch-2.0.0+cu118 CUDA:0 (Tesla T4, 15102MiB)
yolo/engine/trainer: task=detect, mode=train, model=yolov8s.pt, data=VisDrone.yaml, epochs=100, patience=100, batch=4, imgsz=640, save=True, save_period=-1, cache=None, device=, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=0, resume=False, amp=True, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, tracker=botsort.yaml, save_dir=runs/detect/train6
Overriding model.yaml nc=80 with nc=10
0 -1 1 928 ultralytics.nn.modules.Conv [3, 32, 3, 2]
1 -1 1 18560 ultralytics.nn.modules.Conv [32, 64, 3, 2]
2 -1 1 29056 ultralytics.nn.modules.C2f [64, 64, 1, True]
3 -1 1 73984 ultralytics.nn.modules.Conv [64, 128, 3, 2]
4 -1 2 197632 ultralytics.nn.modules.C2f [128, 128, 2, True]
5 -1 1 295424 ultralytics.nn.modules.Conv [128, 256, 3, 2]
6 -1 2 788480 ultralytics.nn.modules.C2f [256, 256, 2, True]
7 -1 1 1180672 ultralytics.nn.modules.Conv [256, 512, 3, 2]
8 -1 1 1838080 ultralytics.nn.modules.C2f [512, 512, 1, True]
9 -1 1 656896 ultralytics.nn.modules.SPPF [512, 512, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.Concat [1]
12 -1 1 591360 ultralytics.nn.modules.C2f [768, 256, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.Concat [1]
15 -1 1 148224 ultralytics.nn.modules.C2f [384, 128, 1]
16 -1 1 147712 ultralytics.nn.modules.Conv [128, 128, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.Concat [1]
18 -1 1 493056 ultralytics.nn.modules.C2f [384, 256, 1]
19 -1 1 590336 ultralytics.nn.modules.Conv [256, 256, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.Concat [1]
21 -1 1 1969152 ultralytics.nn.modules.C2f [768, 512, 1]
22 [15, 18, 21] 1 2119918 ultralytics.nn.modules.Detect [10, [128, 256, 512]]
Model summary: 225 layers, 11139470 parameters, 11139454 gradients, 28.7 GFLOPs
Transferred 349/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs/detect/train6', view at http://localhost:6006/
OutOfMemoryError Traceback (most recent call last)
in <cell line: 4>()
2
3 model = YOLO('https://hub.ultralytics.com/models/79M1AWwDkOGUIIVgmpF9')
----> 4 model.train()
10 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in convert(t)
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
-> 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
1144
1145 return self._apply(convert)
OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 14.75 GiB total capacity; 12.84 GiB already allocated; 832.00 KiB free; 13.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Beta Was this translation helpful? Give feedback.
All reactions