You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Removing sentences that do not contain any of the terms in the query reduces the size from 10.6m to 1.4m. However, the scores also take a significant hit:
Experiment: mb_5cv_pruned
1S:
map all 0.3029
P_20 all 0.4157
2S:
map all 0.3045
P_20 all 0.4163
SS:
map all 0.3034
P_20 all 0.4175
I will explore other pruning methods, but it doesn't look too promising.
NDCG@20 for BERT(MSMARCO, MB) on sentences of top 1000/100 Robust04 docs:
Top 1000
Top 100 (optimized wrt NDCG)
Top 100 (optimized wrt MAP)
1S
0.5239
0.5131
0.5117
2S
0.5324
0.5206
0.5200
3S
0.5325
0.5228
0.5196
Note that hyperparameters for the first top 100 column are tuned to maximize NDCG@20 (not MAP, which is the default shown in the last column, as I wanted to see the difference). AP is pretty bad for top 100 as expected, but NDCG@20 is reasonable considering it gives us a ~10x speedup.
Throw away all sentence that don't at least have a term that matches the sentence? Other pruning scenarios?
The text was updated successfully, but these errors were encountered: