arguana index_reader.dump_documents_BM25 #2052

arthur-75 · 2024-12-23T10:56:44Z

Hello I have a problem with arguana dataset when I run this : (scifact and fiqa worked well)

from pyserini.index.lucene import LuceneIndexReader
index_reader = LuceneIndexReader(index_path)
index_reader.dump_documents_BM25(index_path+'/doc_bm25.jsonl')

--> [578](pyserini/index/lucene/_base.py:578) for term in self.get_document_vector(docid):
    [579](pyserini/index/lucene/_base.py:579)     bm25_vector[term] = self.compute_bm25_term_weight(docid, term, analyzer=None, k1=k1, b=b)
    [581](pyserini/index/lucene/_base.py:581) # vectors are written line by line to avoid running out of memory

File ~pyserini/index/lucene/_base.py:367, in LuceneIndexReader.get_document_vector(self, docid)
    [352](pyserini/index/lucene/_base.py:352) def get_document_vector(self, docid: str) -> Optional[Dict[str, int]]:
    [353](pyserini/index/lucene/_base.py:353)     """Return the document vector for a ``docid``. Note that requesting the document vector of a ``docid`` that
    [354](pyserini/index/lucene/_base.py:354)     does not exist in the index will return ``None`` (as opposed to an empty dictionary); this forces the caller
    [355](pyserini/index/lucene/_base.py:355)     to handle ``None`` explicitly and guards against silent errors.
   (...)
    [365](pyserini/index/lucene/_base.py:365)         A dictionary with analyzed terms as keys and their term frequencies as values.
    [366](pyserini/index/lucene/_base.py:366)     """
--> [367](pyserini/index/lucene/_base.py:367)     doc_vector_map = self.object.getDocumentVector(self.reader, docid)
    [368](pyserini/index/lucene/_base.py:368)     if doc_vector_map is None:
    [369](pyserini/index/lucene/_base.py:369)         return None

File jnius/jnius_export_class.pxi:876, in jnius.JavaMethod.__call__()

File jnius/jnius_export_class.pxi:1042, in jnius.JavaMethod.call_staticmethod()

File jnius/jnius_utils.pxi:79, in jnius.check_exception()

JavaException: JVM exception occurred: Document vector not stored! io.anserini.index.NotStoredException

The text was updated successfully, but these errors were encountered:

castorini locked and limited conversation to collaborators Jan 17, 2025

lintool converted this issue into discussion #2067 Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

arguana index_reader.dump_documents_BM25 #2052

arguana index_reader.dump_documents_BM25 #2052

arthur-75 commented Dec 23, 2024 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

arguana index_reader.dump_documents_BM25 #2052

arguana index_reader.dump_documents_BM25 #2052

Comments

arthur-75 commented Dec 23, 2024 • edited Loading

This issue was moved to a discussion.

arthur-75 commented Dec 23, 2024 •

edited

Loading