As this is an unsupervised problem of learning the ground truth based on the labeling process it can be solved with different approaches. Each method will try to model the annotation behavior in different ways and in different settings, providing different solutions for what is necessary.
Some notation comments:
To more details in the problem notation see the documentation.
- z correspond to the ground truth of the data.
- e correspondn to the reliability of the annotators.
- T correspond the number of annotators
n_annotators
- K correspond to the number of classes
n_classes
- M correspond to the number of groups in some models:
n_groups
- W correspond to the number of parameters of some predictive model
- Wm correspond to the number of parameters of the group model of Model Inference EM - Groups (gating network of MoE)
Method name | Inferred variable | Predictive model | Setting | Annotator model | Other model | Learnable parameters |
---|---|---|---|---|---|---|
Label Aggregation | - | ❌ | Global | - | - | 0 |
Label Inference EM | z | ❌ | Individual dense | Probabilistic confusion matrix | Class marginals | |
Label Inference EM - Global | z | ❌ | Global | - | global Probabilistic confusion matrix | |
Model Inference EM | z | ✔️ | Individual dense | Probabilistic confusion matrix | - | |
Model Inference EM - Groups | z | ✔️ | Individual sparse | - | Probabilistic confusion matrix per group, gating network over groups | |
Model Inference EM - Groups Global | z | ✔️ | Global | - | Probabilistic confusion matrix per group, group marginals | |
Model Inference EM - Global | z | ✔️ | Global | - | global Probabilistic confusion matrix | |
Model Inference - Reliability EM | e | ✔️ | Individual dense | Probabilistic reliability number | - | |
Model Inference BP | - | ✔️ | Individual dense (masked) | Confusion matrix weights | - | |
Model Inference BP - Global | - | ✔️ | Global | - | global confusion matrix weights |
- The inference of the methods with an explicit model per annotator depends on the participation of the annotators on the labelling process.
- Large number of annotations
- An explicit model per annotator could take inference advantage when the individual behavior is quite different from each other.
- While more complex model will overfit to the desired behavior modeling.
- The methods with predictive model could take inference advantage when the input patterns are more complex.
- The methods without two-step inference (based on backpropagation) could take advantage of a more stable learning.
Method name | Two-step inference | Predictive model | Setting | Computational scalability | Use case |
---|---|---|---|---|---|
Label Aggregation | ❌ | ❌ | Global | All cases | High density per data |
Label Inference EM | ✔️ | ❌ | Individual dense | Not scalable with n_annotators |
High density per annotator |
Label Inference EM - Global | ✔️ | ❌ | Global | Very large n_annotators |
High density |
Model Inference EM | ✔️ | ✔️ | Individual dense | Not scalable with n_annotators |
High density per annotator |
Model Inference EM - Groups | ✔️ | ✔️ | Individual sparse | Very large n_annotators |
High density per annotator |
Model Inference EM - Groups Global | ✔️ | ✔️ | Global | Very large n_annotators |
High density per data |
Model Inference EM - Global | ✔️ | ✔️ | Global | Very large n_annotators |
High density |
Model Inference - Reliability EM | ✔️ | ✔️ | Individual dense | Large n_annotators |
High density per annotator |
Model Inference BP | ❌ | ✔️ | Individual dense (masked) | Not scalable with n_annotators |
High density per annotator |
Model Inference BP - Global | ❌ | ✔️ | Global | Very large n_annotators |
High density per data |
Use case indicates that, the closer the method is to that setting, a better inference is performed. The density refers to the number of annotations per annotator/data/globally.
- The methods without a predictive model are independent of the choice of the learning model, only learns from labels.
- On a second phase these methods could learn f(x) over the inferred ground truth.
- The methods with a predictive model depend on the chosen learning model.
- Being able to take advantage of when the input patterns are more complex.
- The global methods could be set on the individual setting by changing the representation from individual to global (not vice versa).
- The methods without two-step inference are independent of the inference algorithm, where the learning is based in a single optimization framework.