Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arguana index_reader.dump_documents_BM25 #2052

Closed
arthur-75 opened this issue Dec 23, 2024 · 0 comments
Closed

arguana index_reader.dump_documents_BM25 #2052

arthur-75 opened this issue Dec 23, 2024 · 0 comments

Comments

@arthur-75
Copy link

arthur-75 commented Dec 23, 2024

Hello I have a problem with arguana dataset when I run this : (scifact and fiqa worked well)

from pyserini.index.lucene import LuceneIndexReader
index_reader = LuceneIndexReader(index_path)
index_reader.dump_documents_BM25(index_path+'/doc_bm25.jsonl')
--> [578](pyserini/index/lucene/_base.py:578) for term in self.get_document_vector(docid):
    [579](pyserini/index/lucene/_base.py:579)     bm25_vector[term] = self.compute_bm25_term_weight(docid, term, analyzer=None, k1=k1, b=b)
    [581](pyserini/index/lucene/_base.py:581) # vectors are written line by line to avoid running out of memory

File ~pyserini/index/lucene/_base.py:367, in LuceneIndexReader.get_document_vector(self, docid)
    [352](pyserini/index/lucene/_base.py:352) def get_document_vector(self, docid: str) -> Optional[Dict[str, int]]:
    [353](pyserini/index/lucene/_base.py:353)     """Return the document vector for a ``docid``. Note that requesting the document vector of a ``docid`` that
    [354](pyserini/index/lucene/_base.py:354)     does not exist in the index will return ``None`` (as opposed to an empty dictionary); this forces the caller
    [355](pyserini/index/lucene/_base.py:355)     to handle ``None`` explicitly and guards against silent errors.
   (...)
    [365](pyserini/index/lucene/_base.py:365)         A dictionary with analyzed terms as keys and their term frequencies as values.
    [366](pyserini/index/lucene/_base.py:366)     """
--> [367](pyserini/index/lucene/_base.py:367)     doc_vector_map = self.object.getDocumentVector(self.reader, docid)
    [368](pyserini/index/lucene/_base.py:368)     if doc_vector_map is None:
    [369](pyserini/index/lucene/_base.py:369)         return None

File jnius/jnius_export_class.pxi:876, in jnius.JavaMethod.__call__()

File jnius/jnius_export_class.pxi:1042, in jnius.JavaMethod.call_staticmethod()

File jnius/jnius_utils.pxi:79, in jnius.check_exception()

JavaException: JVM exception occurred: Document vector not stored! io.anserini.index.NotStoredException
@castorini castorini locked and limited conversation to collaborators Jan 17, 2025
@lintool lintool converted this issue into discussion #2067 Jan 17, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant