Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnchorHead and add_gt_as_proposals option of the samplers are incompatible #12298

Open
cmhm7 opened this issue Jan 24, 2025 · 0 comments
Open

Comments

@cmhm7
Copy link

cmhm7 commented Jan 24, 2025

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug
When using a sampler (like RandomSampler) with add_gt_as_proposals=True in an AnchorHead (like RPNHead), sometimes theres is a crash because of out of range accesses

Reproduction

  1. What command or script did you run?
python tools/train.py config.py

with a FasterRCNN in the config, and a RandomSampler with add_gt_as_proposals=True in the train_cfg for the rpn :

model = dict(
    type='FasterRCNN',
...
    rpn_head=dict(
        type='RPNHead',
        in_channels=96,
        feat_channels=96,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[1]),
...
   train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=10,
                pos_fraction=0.2,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
  1. Did you make any modifications on the code or config? Did you understand what you have modified?
    Yes and yes
  2. What dataset did you use?
    Custom one

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.

sys.platform: linux
Python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.9.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.0
OpenCV: 4.10.0
MMEngine: 0.10.6
MMDetection: 3.3.0+78cf2bc

  1. You may add addition that may be helpful for locating the problem, such as - How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

I used the Dockerfile from MMDetection with

ARG PYTORCH="1.9.0"
ARG CUDA="11.1"
ARG CUDNN="8"

Error traceback
Error happens randomly in long trainings, I don't have the exact message noted

It said out of array access in anchor_head.py at line 292

Bug fix
The reason is that in AnchorHead , in method _get_targets_single, we have

anchors = flat_anchors[inside_flags]
...

sampling_result = self.sampler.sample(assign_result, pred_instances,
                                              gt_instances)

The rest of the code uses the anchors previously computed. But if add_gt_as_proposals=True in the sampler, the sampler appends the GT in its internal anchor list, so it becomes longer, and thus sampling_result.pos_inds and sampling_result.neg_inds can contain indices >= len(anchors)

If you fix that, you still have issues with anchor_head.py line 415, because images_to_levels expects the lists to be the same length accross all images

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant