Hierarchical Labels and Descriptions
keywords: few-shot object detection, hierarchical taxonomy, fine-grained classes, multimodal learning
Name | Paper | Download | Hierarchy? | cls or det? | total cls | #base cls | #novel cls | FSL work using it? |
---|---|---|---|---|---|---|---|---|
HiFSOD-Bird | link | link | yes | both | 1432 | 1145 | 287 | -- |
FSOD | link | link | yes | det | 1000 | 800 | 200 | Kernel |
COCO | link | link | no, but easy to get | det | 91 | 82 | 9 | FSOD, ONCE, FSCE, CME, DCNet, GFSD, HallucFsDet, SRR, CRDR, TIP, UniT, SQMG, LVC, Sylph, FCT, Kernel, DiGeo, Meta-tuning, VAE, NIFF, DeFRCN, QA-FewDet, UP-FSOD, MPSR, MFDC, TFA, FADI, Meta-FRCN, VFA, Meta-DETR, Fast-HiFSOD |
LVIS | link | link | no | det | 977 | -- | -- | Sylph, DiGeo, TFA |
CUB-200-2011 | link | link | no, but easy to get | cls | 200 | -- | -- | DPGN, PoseNorm, DeepEMD, FRN, TDM, SetFeat, MCL, DeepBDC, VFD, RENet, embedding-propagation |
NABirds | link | link | no, but easy to get | cls | 555 | -- | -- | PoseNorm, VFD |
iNaturalist | link | link | yes | both | 5089 -> 10000 | -- | -- | FRN, TDM, MCL |
Semi-Aves | link | link | no, but easy to get | cls | 200 | -- | -- | -- |
Semi-Fungi | link | link | no, but easy to get | cls | 200 | -- | -- | -- |
Note:
- I'd recommend DF20 (Danish Fungi 2020) dataset (Paper, Github, Download), containing 1604 Species / 566 Genera / 190 Families.
- We created a hierarchical scientific name list excel for CUB-200-2011 dataset (link).
Name | GitHub | Conference | Dataset | Details |
---|---|---|---|---|
Hierarchical Few-Shot Object Detection: Problem, Benchmark and Method | HIFSOD | ACM MM 2022 | HiFSOD-Bird | It proposed and solved hierarchical few-shot object detection problem, which aims to detect objects with hierarchical categories in the FSOD paradigm. Also provided a benchmark dataset HiFSOD-Bird, and the first Hi-FSOD method HiCLPL (Hierarchical Contrastive Learning and Probabilistic Loss). |
Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector | FSOD | CVPR 2020 | FSOD, ImageNet, COCO | It proposed a few-shot object detection network. Central to method are Attention-RPN, Multi-Relation Detector and Contrastive Training strategy. Also contributed a new dataset FSOD. |
DPGN: Distribution Propagation Graph Network for Few-shot Learning | DPGN | CVPR 2020 | ImageNet, CUB-200-2011, CIFAR-FS | It proposed a distribution propagation graph network, which conveys both the distribution-level relations and instance-level relations in each few-shot learning task. |
Hyperbolic Visual Embedding Learning for Zero-Shot Recognition | Hyperbolic_ZSL | CVPR 2020 | ImageNet | It proposed a Hyperbolic Visual Embedding Learning Network, which learns image embeddings in hyperbolic space, which is capable of preserving the hierarchical structure of semantic classes in low dimensions. |
Revisiting Pose-Normalization for Fine-Grained Few-Shot Recognition | PoseNorm | CVPR 2020 | CUB-200-2011, NABirds, FGVC-Aircraft, OID-Aircraft | Want a model to learn subtle, fine-grained distinctions between different classes based on a few images alone. Use pose-normalized representations: first localize semantic parts in each image, and then describe images by characterizing the appearance of each part. |
DeepEMD: Few-Shot Image Classification With Differentiable Earth Mover's Distance and Structured Classifiers | DeepEMD | CVPR 2020 | ImageNet, FC100, CUB-200-2011 | It adopted the Earth Mover’s Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance. To handle k-shot classification, it proposed to learn a structured fully connected layer that can directly classify dense image representations with the EMD. |
Incremental Few-Shot Object Detection | -- | CVPR 2020 | PASCAL VOC, COCO | Consider the Incremental Few-Shot Detection problem setting. Then OpeN-ended Centre nEt (ONCE) is designed for incrementally learning to detect novel class objects with few examples. |
FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding | FSCE | CVPR 2021 | PASCAL VOC, COCO | Object proposals with different IoU scores are analogous to the intra-image augmentation used in contrastive approaches. Presented Few-Shot object detection via Contrastive Proposal Encoding. |
Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection | CME | CVPR 2021 | PASCAL VOC, COCO | It proposed a class margin equilibrium approach, with the aim to optimize both feature space partition and novel class reconstruction in a systematic way. CME first uses a fully connected layer to decouple localization features, then introduces class margin loss during feature learning, finally disturbs the features of novel class instances in an adversarial min-max fashion. |
Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection | DCNet | CVPR 2021 | PASCAL VOC, COCO | It proposed Dense Relation Distillation with Context-aware Aggregation. |
Generalized Few-Shot Object Detection without Forgetting | GFSD | CVPR 2021 | PASCAL VOC, COCO | Retentive R-CNN consists of Bias-Balanced RPN to debias the pretrained RPN and Re-detector to find few-shot class objects without forgetting previous knowledge. |
Few-Shot Incremental Learning with Continually Evolved Classifiers | CEC | CVPR 2021 | CIFAR-100, ImageNet, CUB-200-2011 | First, adopt a decoupled learning strategy of representations and classifiers that only the classifiers are updated in each incremental session. Second, propose a Continually Evolved Classifier that employs a graph model to propagate context information between classifiers for adaptation. |
Hallucination Improves Few-Shot Object Detection | HallucFsDet | CVPR 2021 | PASCAL VOC, COCO | Try to build a better model of variation for novel classes by transferring the shared within-class variation from base classes. It introduced a hallucinator network that learns to generate training examples in the region of interest feature space. |
Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection | -- | CVPR 2021 | PASCAL VOC, COCO | Represent each class concept by a semantic embedding learned from a large corpus of text. The detector projects the image representations of objects into this embedding space. |
Few-Shot Object Detection via Classification Refinement and Distractor Retreatment | -- | CVPR 2021 | PASCAL VOC, COCO | It tried to solve classification incapability (false positives) caused by category confusion from the aspects of both architectural enhancement and hard-example mining. |
Few-Shot Classification With Feature Map Reconstruction Networks | FRN | CVPR 2021 | CUB-200-2011, FGVC-Aircraft, iNaturalist, ImageNet | The ability of the network to reconstruct a query feature map from support features of a given class predicts membership of the query in that class. |
Transformation Invariant Few-Shot Object Detection | -- | CVPR 2021 | PASCAL VOC, COCO | It proposed a Transformation Invariant Principle that can be applied to meta-learning models for boosting the detection performance on novel class objects. |
UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation | UniT | CVPR 2021 | PASCAL VOC, COCO | It proposed a semi-supervised model that is applicable to a range of supervision: from zero to a few instance-level samples per novel class. |
Adaptive Image Transformer for One-Shot Object Detection | -- | CVPR 2021 | PASCAL VOC, COCO | The main idea leverages the concept of language translation to boost metric-learning-based detection methods. It proposed the Adaptive Image Transformer module that deploys an attention-based encoder-decoder architecture. |
Accurate Few-shot Object Detection with Support-Query Mutual Guidance and Hybrid Loss | -- | CVPR 2021 | PASCAL VOC, COCO | It proposed a two-stage detector: 1. Employ a support-query mutual guidance mechanism to generate more support-relevant proposals. 2. Score and filter proposals via multi-level feature comparison based on a distance metric learnt by a hybrid loss. |
DETReg: Unsupervised Pretraining with Region Priors for Object Detection | DETReg | CVPR 2022 | PASCAL VOC, COCO, Airbus Ship | It introduced DETReg, a new self-supervised method that pretrains the entire object detection network, including the object localization and embedding components. |
Label, Verify, Correct: A Simple Few Shot Object Detection Method | LVC | CVPR 2022 | PASCAL VOC, COCO | It introduced a pseudo-labelling method to source high quality pseudo-annotations from the training set, for each new category, to increase the number of training instances and reduce class imbalance. |
Sylph: A Hypernetwork Framework for Incremental Few-shot Object Detection | Sylph | CVPR 2022 | COCO, LVIS | With a carefully designed class-conditional hypernetwork, fine-tune-free iFSD can be highly effective, especially when a large number of base categories with abundant data are available for meta-training. |
Few-Shot Object Detection with Fully Cross-Transformer | FCT | CVPR 2022 | PASCAL VOC, COCO | It proposed a Fully Cross-Transformer based model by incorporating cross-transformer into both the feature backbone and detection head. The model can improve the few-shot similarity learning between the two branches by introducing the multilevel interactions. |
Balanced and Hierarchical Relation Learning for One-Shot Object Detection | BHRL | CVPR 2022 | PASCAL VOC, COCO | Contributions are two-fold: 1. Instance-level Hierarchical Relation module is proposed. 2. Ratio-Preserving Loss can protect the learning of rare positive samples. |
Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference | PMF | CVPR 2022 | ImageNet, CIFAR-FS, CD-FSL, Meta-Dataset | Questions: 1. How pre-training on external data benefits FSL? 2. How state-of-the-art transformer architectures can be exploited? and 3. How fine-tuning mitigates domain shift? |
Generating Representative Samples for Few-Shot Classification | FSL-VAE | CVPR 2022 | ImageNet | It proposed to generate visual samples based on semantic embeddings using a conditional variational autoencoder model. Remove non-representative samples from the base training set when training the CVAE model. |
Task Discrepancy Maximization for Fine-Grained Few-Shot Classification | TDM | CVPR 2022 | CUB-200-2011, FGVC-Aircraft, iNaturalist, Stanford Cars, Stanford Dogs, Oxford-IIIT Pet | Try to localize the class-wise discriminative regions by highlighting channels encoding distinct information of the class. Task Discrepancy Maximization learns task-specific channel weights based on Support Attention Module and Query Attention Module. |
Kernelized Few-Shot Object Detection With Efficient Integral Aggregation | -- | CVPR 2022 | PASCAL VOC, COCO, FSOD | It designed a Kernelized Few-shot Object Detector by leveraging kernelized matrices computed over multiple proposal regions, which yield expressive non-linear representations whose model complexity is learned on the fly. |
VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning | VGSE | CVPR 2022 | AwA2, CUB-200-2011, SUN Attribute | It proposed to discover semantic embeddings containing discriminative visual properties for zero-shot learning, without requiring any human annotation. |
Semantic-Aligned Fusion Transformer for One-Shot Object Detection | -- | CVPR 2022 | PASCAL VOC, COCO | Semantic-aligned Fusion Transformer has a vertical fusion module for cross-scale semantic enhancement and a horizontal fusion module for cross-sample feature fusion. |
Robust Region Feature Synthesizer for Zero-Shot Object Detection | RRFS | CVPR 2022 | PASCAL VOC, COCO, DIOR | The object detection framework contains an Intra-class Semantic Diverging component and an Inter-class Structure Preserving component. |
Matching Feature Sets for Few-shot Image Classification | SetFeat | CVPR 2022 | ImageNet, CUB-200-2011 | It proposed to adapt existing feature extractors to produce sets of feature vectors from images. |
Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification | MCL | CVPR 2022 | ImageNet, CUB-200-2011, iNaturalist | It proposed a Mutual Centralized Learning to fully affiliate two disjoint dense features sets in a bidirectional paradigm. |
Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification | DeepBDC | CVPR 2022 | ImageNet, CUB-200-2011, FGVC-Aircraft, Stanford Cars | Deep Brownian Distance Covariance learns image representations by measuring the discrepancy between joint characteristic functions of embedded features and product of the marginals. |
DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection | DiGeo | CVPR 2023 | PASCAL VOC, COCO, LVIS | It proposed a new training framework to learn Geometry-aware features of interclass separation and intra-class compactness. |
Meta-Tuning Loss Functions and Data Augmentation for Few-Shot Object Detection | -- | CVPR 2023 | PASCAL VOC, COCO | It focused on the role of loss functions and augmentations as the force driving the fine-tuning process, and proposed to tune their dynamics through meta-learning principles. |
Generating Features With Increased Crop-Related Diversity for Few-Shot Object Detection | -- | CVPR 2023 | PASCAL VOC, COCO | It proposed a variational autoencoder based data generation model. The main idea is to transform the latent space such latent codes with different norms represent different crop-related variations. |
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners | CaFo | CVPR 2023 | ImageNet, Stanford Cars, UCF101, Caltech 101, Oxford 102 Flower, SUN397, DTD, EuroSAT, FGVC-Aircraft, Oxford-IIIT Pet, Food-101 | The Cascade of Foundation models incorporates CLIP’s language-contrastive knowledge, DINO’s vision-contrastive knowledge, DALL-E’s vision-generative knowledge, and GPT-3’s language-generative knowledge. |
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models | Cross-Modal | CVPR 2023 | ImageNet, ESC-50; Caltech 101, Oxford-IIIT Pet, Stanford Cars, Oxford 102 Flower, Food-101, FGVC-Aircraft, SUN397, DTD, EuroSAT, UCF101 | It proposed a simple cross-modal adaptation approach that learns from few-shot examples spanning different modalities, repurposing class names as additional one-shot training samples. Also constructed the first audiovisual few-shot benchmark. |
Semantic Prompt for Few-Shot Image Recognition | SemanticPrompt | CVPR 2023 | ImageNet, CIFAR-FS, FC100 | It proposed a Semantic Prompt approach for few-shot learning, and explored leveraging semantic information as prompts to tune the visual feature extraction network adaptively. |
NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging | -- | CVPR 2023 | PASCAL VOC, COCO | Contribution: designed a standalone lightweight generator with class-wise heads to generate and replay diverse instance-level base features to the RoI head while fine-tuning on the novel data. |
Weak-Shot Object Detection Through Mutual Knowledge Transfer | -- | CVPR 2023 | PASCAL VOC, COCO, ILSVRC | By jointly optimizing the classification loss and the proposed Knowledge Transfer loss, the multiple instance learning module effectively learns to classify object proposals into novel categories in the target dataset with the transferred knowledge from base categories in the source dataset. |
Few-Shot Learning with Visual Distribution Calibration and Cross-Modal Distribution Alignment | SADA | CVPR 2023 | CIFAR, ImageNet, Caltech 101, Oxford-IIIT Pet, Food-101, STL-10, UCF101, DTD, Stanford Cars, FGVC-Aircraft | It proposed a Selective Attack module, which consists of trainable adapters that generate spatial attention maps of images to guide the attacks on class-irrelevant image areas. |
Few-shot Object Detection via Feature Reweighting | Fewshot_Detection | ICCV 2019 | PASCAL VOC, COCO | The proposed model leverages fully labeled base classes and quickly adapts to novel classes, using a meta feature learner and a reweighting module within a one-stage detection architecture. |
DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection | DeFRCN | ICCV 2021 | PASCAL VOC, COCO | It proposed Decoupled Faster R-CNN, extending Faster R-CNN by introducing Gradient Decoupled Layer for multistage decoupling and Prototypical Calibration Block for multi-task decoupling. |
Query Adaptive Few-Shot Object Detection with Heterogeneous Graph Convolutional Networks | QA-FewDet | ICCV 2021 | PASCAL VOC, COCO | It proposed a novel FSOD model using heterogeneous graph convolutional networks. Through efficient message passing among all the proposal and class nodes with three different types of edges, we could obtain context-aware proposal features and query-adaptive, multiclass-enhanced prototype representations for each class. |
Universal-Prototype Enhancing for Few-Shot Object Detection | UP-FSOD | ICCV 2021 | PASCAL VOC, COCO | It developed a new framework of few-shot object detection with universal prototypes that owns the merit of feature generalization towards novel objects. |
Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning | Meta-Baseline | ICCV 2021 | ImageNet | It explored a simple process: meta-learning over a whole-classification pre-trained model on its evaluation metric. |
Variational Feature Disentangling for Fine-Grained Few-Shot Classification | VFD | ICCV 2021 | CUB-200-2011, NABirds, Stanford Dogs | It proposed a feature disentanglement framework that allows us to augment features with randomly sampled intra-class variations while preserving their class-discriminative features. |
Relational Embedding for Few-Shot Classification | RENet | ICCV 2021 | ImageNet, CUB-200-2011, CIFAR-FS | The method leverages relational patterns within and between images via self-correlational representation and cross-correlational attention. |
Multi-scale Positive Sample Refinement for Few-shot Object Detection | MPSR | ECCV 2020 | PASCAL VOC, COCO | It proposed a Multi-scale Positive Sample Refinement approach to enrich object scales in FSOD and integrated it as an auxiliary branch to the popular architecture of Faster R-CNN with FPN. |
Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild | FSDetView | ECCV 2020 | PASCAL VOC, COCO; ObjectNet3D, Pascal3D+, Pix3D | It guided the network prediction with class-representative features extracted from data in different modalities: image patches for object detection, and aligned 3D models for viewpoint estimation. |
Latent Embedding Feedback and Discriminative Features for Zero-Shot Classification | TF-Vaegan | ECCV 2020 | CUB-200-2011, Oxford 102 Flower, SUN Attribute, AwA2 | It proposed to enforce semantic consistency at all stages of zero-shot learning: training, feature synthesis and classification. |
Embedding Propagation: Smoother Manifold for Few-Shot Classification | embedding-propagation | ECCV 2020 | ImageNet, CUB-200-2011 | It proposed to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification. |
Few-Shot Video Object Detection | FSVOD | ECCV 2022 | FSVOD-500, FSYTV-40 | Contributions: 1. video dataset FSVOD-500; 2. Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object; 3. Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability. |
AcroFOD: An Adaptive Method for Cross-domain Few-shot Object Detection | AcroFOD | ECCV 2022 | Cityscapes, SIM10k, ViPeD, COCO, KITTI | It proposed an adaptive method consisting of two parts: 1. An adaptive optimization strategy to select augmented data similar to target samples. 2. The multi-level domain-aware data augmentation to increase the diversity and rationality of augmented data. |
Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification | Tip-Adapter | ECCV 2022 | ImageNet, Stanford Cars, UCF101, Caltech 101, Oxford 102 Flower, SUN397, DTD, EuroSAT, FGVC-Aircraft, Oxford-IIIT Pet, Food-101 | Tip-Adapter constructs the adapter via a key-value cache model from the few-shot training set, and updates the prior knowledge encoded in CLIP by feature retrieval. |
Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark | MoFSOD | ECCV 2022 | MoFSOD | It proposed a benchmark consisting of 10 datasets from a wide range of domains to evaluate FSOD algorithms. |
Multi-faceted Distillation of Base-Novel Commonality for Few-Shot Object Detection | MFDC | ECCV 2022 | PASCAL VOC, COCO | It proposed to learn three types of class-agnostic commonalities between base and novel classes: recognition-related semantic commonalities, localization-related semantic commonalities and distribution commonalities. |
Frustratingly Simple Few-Shot Object Detection | TFA | ICML 2020 | PASCAL VOC, COCO, LVIS | Fine-tuning only the last layer of existing detectors on rare classes is crucial. However, there's high variance in the few samples. So revise the evaluation protocols by sampling multiple groups of training examples to obtain stable comparison. |
Few-Shot Object Detection via Association and DIscrimination | FADI | NeurIPS 2021 | PASCAL VOC, COCO | It proposed a two-step fine-tuning framework which builds up a discriminative feature space for each novel class: 1. In the association step, construct a compact novel class feature space via explicitly imitating a specific base class feature space. 2. In the discrimination step, disentangle the classification branches for base and novel classes. |
Meta Faster R-CNN: Towards Accurate Few-Shot Object Detection with Attentive Feature Alignment | Meta-FRCN | AAAI 2022 | PASCAL VOC, COCO | To improve proposal generation for few-shot classes, it proposed to learn a lightweight metric-learning based prototype matching network. To improve the fine-grained few-shot proposal classification, it proposed a novel attentive feature alignment method. |
Few-Shot Object Detection via Variational Feature Aggregation | VFA | AAAI 2023 | PASCAL VOC, COCO | It proposed a meta-learning framework with two novel feature aggregation schemes. First a Class-Agnostic Aggregation method, then a Variational Feature Aggregation method. |
Meta-DETR: Image-Level Few-Shot Object Detection with Inter-Class Correlation Exploitation | Meta-DETR | TPAMI 2022 | PASCAL VOC, COCO | It proposed Meta-DETR, which is the first image-level few-shot detector, and introduces a novel inter-class correlational meta-learning strategy to capture and leverage the correlation among different classes. |
Fast Hierarchical Learning for Few-Shot Object Detection | -- | IROS 2022 | COCO | Treat few-shot detection as a hierarchical learning problem, where the novel classes are treated as the child classes of existing base classes and the background class. |
COCO, LVIS (test and val), iNaturalist 2018, CUB-200-2011 (5-way 1-shot and 5-way 5-shot), NABirds