Skip to content

Latest commit

 

History

History
17 lines (13 loc) · 886 Bytes

lucene-inverted.msmarco-v2-doc-segmented.unicoil-noexp-0shot.20220808.4d6d2a.README.md

File metadata and controls

17 lines (13 loc) · 886 Bytes

msmarco-v2-doc-segmented-unicoil-noexp-0shot

Lucene impact index of the MS MARCO V2 segmented document corpus for uniCOIL (noexp) with title prepended.

This index was generated on 2022/08/08 at Anserini commit fbe35e on damiano with the following command:

nohup target/appassembler/bin/IndexCollection \
  -collection JsonVectorCollection \
  -input /scratch2/collections/msmarco/msmarco_v2_doc_segmented_unicoil_noexp_0shot_v2 \
  -index indexes/lucene-index.msmarco-v2-doc-segmented-unicoil-noexp-0shot.20220808.4d6d2a/ \
  -generator DefaultLuceneDocumentGenerator \
  -threads 18 -impact -pretokenized -optimize \
  >& logs/log.msmarco-v2-doc-segmented-unicoil-noexp-0shot.20220808.4d6d2a.txt &

In May 2024, index was repackaged to adopt a more consistent naming scheme.