Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce - doc2query document expansion #2642

Merged
merged 11 commits into from
Jan 18, 2025

Conversation

b8zhong
Copy link
Contributor

@b8zhong b8zhong commented Nov 30, 2024

Update Doc + Scripts for doc2query

Summary of Changes

1. Dropbox Links

  • Removed some dropbox links (ex. below, seems to be deleted or maybe these links are private? Let me know and I'll revert them)
    wget https://www.dropbox.com/s/57g2s9vhthoewty/msmarco-passage-pred-test_topk10.tar.gz -P collections/msmarco-passage
    

2. Eval Commands

  • Changed outdated references to target/appassembler/bin/trec_eval and IndexCollection (to bin/run.sh):
    target/appassembler/bin/trec_eval -c -m map -c -m recip_rank \
     tools/topics-and-qrels/qrels.car17v2.0.benchmarkY1test.txt \
     runs/run.car17v2.0.bm25.expanded-topk10.txt
    
  • Updated to:
    tools/eval/trec_eval.9.0.4/trec_eval -c -m map -m recip_rank \
     tools/topics-and-qrels/qrels.car17v2.0.benchmarkY1test.txt \
     runs/run.car17v2.0.bm25.expanded-topk10.txt
    
  • Should this be 9.0.4? https://github.com/castorini/anserini-tools has the updated 9.0.8 for trec_eval, the results don't change though

3. Retrieval

  • tools/scripts/msmarco/retrieve.pyis defunct, so I replaced with pyserini.search.lucene module:
    python tools/scripts/msmarco/retrieve.py --hits 1000 \
     --index indexes/msmarco-passage/lucene-index-msmarco-expanded-topk10 \
     --queries collections/msmarco-passage/queries.dev.small.tsv \
     --output runs/run.msmarco-passage.dev.small.expanded-topk10.tsv
    
    Updated to:
    python -m pyserini.search.lucene \
      --index indexes/msmarco-passage/lucene-index-msmarco-expanded-topk10 \
      --topics collections/msmarco-passage/queries.dev.small.tsv \
      --topics-format default \
      --output runs/run.msmarco-passage.dev.small.expanded-topk10.tsv \
      --output-format msmarco \
      --bm25 --k1 0.82 --b 0.68 --hits 1000
    
  • You need the Python environment though, so not sure if you want it like this

@b8zhong b8zhong closed this Jan 18, 2025
@lintool lintool reopened this Jan 18, 2025
@lintool
Copy link
Member

lintool commented Jan 18, 2025

hi @b8zhong - sorry, I do want to merge this - just haven't gotten a chance to yet...

Copy link

codecov bot commented Jan 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 67.11%. Comparing base (b8acb5b) to head (d1c8071).

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #2642   +/-   ##
=========================================
  Coverage     67.11%   67.11%           
  Complexity     1190     1190           
=========================================
  Files           182      182           
  Lines         11354    11354           
  Branches       1369     1369           
=========================================
  Hits           7620     7620           
  Misses         3230     3230           
  Partials        504      504           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@b8zhong
Copy link
Contributor Author

b8zhong commented Jan 18, 2025

Np; there's still a few dead links I should prolly remove.

Is every Dropbox link dead or something?

@lintool
Copy link
Member

lintool commented Jan 18, 2025

Yes, the Dropbox links are defunct, please remove.

Copy link
Member

@lintool lintool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, and then I can merge.

docs/experiments-doc2query.md Show resolved Hide resolved
@lintool lintool merged commit 75e51e0 into castorini:master Jan 18, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants