Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staging #750

Closed
wants to merge 151 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
85e9217
Dev (#537)
kartikpersistent Jul 12, 2024
3591d2e
disabled the sumbit buttom on loading
kartikpersistent Jul 12, 2024
92188a3
Deduplication tab (#566)
kartikpersistent Jul 16, 2024
40650d0
Update frontend_docs.adoc (#538)
prakriti-solankey Jul 16, 2024
5e772e3
updated langchain versions (#565)
aashipandya Jul 16, 2024
db74b75
Update the De-Duplication query
praveshkumar1988 Jul 17, 2024
6174ad1
Node relationship id type none issue (#547)
praveshkumar1988 Jul 17, 2024
320c044
added the tooltips
kartikpersistent Jul 17, 2024
ecbc871
type fix
kartikpersistent Jul 17, 2024
2055502
Unneccory import
kartikpersistent Jul 17, 2024
eb64e33
added score threshold and added some error handling (#571)
vasanthasaikalluri Jul 17, 2024
ff72786
Update requirements.txt
karanchellani Jul 17, 2024
a50708f
Tooltip and other UI fixes (#572)
kartikpersistent Jul 17, 2024
85c77dd
Graph visualization removal of dropdown & Schema popup (#575)
prakriti-solankey Jul 18, 2024
a8b7db9
connection creation in extract and CancelledError handling for sse (#…
aashipandya Jul 18, 2024
fe6e692
Update the de-duplication nodes list query
praveshkumar1988 Jul 18, 2024
b617477
Format fixes
kartikpersistent Jul 18, 2024
8263330
accessbility fixes
kartikpersistent Jul 18, 2024
8c6efba
added the name for checkbox
kartikpersistent Jul 18, 2024
0dc3e6d
reset the loading state on API failure
kartikpersistent Jul 18, 2024
2ce52b3
openai llm as default (#588)
aashipandya Jul 18, 2024
1176105
resetting the duplicate nodes state when there is no data returned fr…
kartikpersistent Jul 18, 2024
4fc89c3
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Jul 18, 2024
2af96b0
New Graph query changes (#586)
prakriti-solankey Jul 18, 2024
ae95fcd
Update the duplicate nodes query
praveshkumar1988 Jul 19, 2024
061d261
updated graph query (#590)
vasanthasaikalluri Jul 19, 2024
4cf1c5d
Added 2 API endpoint to get the vector dimesion and drop_recreate vec…
praveshkumar1988 Jul 19, 2024
c3fc9d6
Merge get_vector_dimension API with /connect API
praveshkumar1988 Jul 19, 2024
0225c74
Drop index only when it's exist
praveshkumar1988 Jul 19, 2024
7fd1921
GPT 4o mini integration (#592)
aashipandya Jul 19, 2024
1804994
549 graph visualization removal of dropdown (#601)
prakriti-solankey Jul 22, 2024
2eb0bdd
Update GenericSourceModal.tsx
kartikpersistent Jul 22, 2024
40e96fd
Data table filtering (#589)
kartikpersistent Jul 23, 2024
11e6c08
Remove connection close from final bloack and make DB parameter required
praveshkumar1988 Jul 23, 2024
6c69d20
Perfromance test
abhishekkumar-27 Jul 1, 2024
7b8b4e6
Modified Integration test
abhishekkumar-27 Jul 23, 2024
3787a66
Add nltk package through code and pypandoc. Remove page_number
praveshkumar1988 Jul 23, 2024
e548422
Merge remote-tracking branch 'refs/remotes/origin/DEV' into DEV
praveshkumar1988 Jul 23, 2024
aabb0e2
updated script
abhishekkumar-27 Jul 23, 2024
ee5fdbe
Vector dimension reset (#594)
kartikpersistent Jul 24, 2024
c58a650
moved types to types files
kartikpersistent Jul 24, 2024
b1d69fe
removed hardcoded values
abhishekkumar-27 Jul 24, 2024
5853125
added the selected state indicator for filter types
kartikpersistent Jul 24, 2024
4191e85
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Jul 24, 2024
b7fe829
added missed dependency
kartikpersistent Jul 24, 2024
51f7753
increased the table hieght
kartikpersistent Jul 24, 2024
0ef8d74
Hybrid search (#611)
vasanthasaikalluri Jul 24, 2024
443bf99
modified script for wiki page
abhishekkumar-27 Jul 24, 2024
cfa2881
fixed the vector index param on submit
kartikpersistent Jul 24, 2024
bd0ca44
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Jul 24, 2024
784a689
format fixes
kartikpersistent Jul 25, 2024
71f06e2
type fix
kartikpersistent Jul 25, 2024
f444583
fixed orphan delete loading state
kartikpersistent Jul 25, 2024
2112259
Fix the issue "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x…
praveshkumar1988 Jul 25, 2024
65ef022
gpt-4o-mini default
kartikpersistent Jul 25, 2024
4d95e58
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Jul 25, 2024
1020b7d
Issue fixed in extarct API "UnboundLocalError: local variable 'graphD…
praveshkumar1988 Jul 25, 2024
c7796ce
Youtube timestamps (#612)
aashipandya Jul 25, 2024
e0b06b6
added the lazy loading for Dialogs
kartikpersistent Jul 26, 2024
fc9ff52
moved into utils
kartikpersistent Jul 26, 2024
2ac5a09
removed data uploading through axios in dropzone used service API
kartikpersistent Jul 26, 2024
307a412
lazy loading fallback UI
kartikpersistent Jul 26, 2024
a8145f6
Update requirements.txt
karanchellani Jul 29, 2024
ff7b9bc
Add file Cypher_queries
jayanth002 Jul 30, 2024
803e6e6
updated cypher_queries file
jayanth-002 Jul 30, 2024
56a68c3
fixed json parse issue
kartikpersistent Jul 30, 2024
dd955e2
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Jul 30, 2024
181a505
Added Preload to prevent to load LLM model SentenceTransform and usin…
praveshkumar1988 Jul 30, 2024
5bce477
Rollback env_file attribute from compose.yml
praveshkumar1988 Jul 30, 2024
e6d7d91
updated Cypher Queries File
jayanth-002 Jul 30, 2024
40d326b
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
jayanth-002 Jul 30, 2024
43cef3d
updated Cypher queries file
jayanth-002 Jul 30, 2024
55a6327
added fireworks new model
kartikpersistent Jul 31, 2024
e3606f8
UI for post processing in graph enhancements with a checkbox list (#627)
kartikpersistent Aug 1, 2024
c5526d6
DO NOT MERGE - Document, chunk node labels and relation labels update…
aashipandya Aug 1, 2024
8064a81
Application responsiveness desktop laptop tablet (#624)
kartikpersistent Aug 1, 2024
fe45fd7
format fixes
kartikpersistent Aug 1, 2024
5dcf4ea
code improvement
kartikpersistent Aug 2, 2024
96cfe1d
code improvement for boolean state
kartikpersistent Aug 2, 2024
3baa7ac
icon rendering fix
kartikpersistent Aug 2, 2024
ed22682
Ignore relationship types label start with __
praveshkumar1988 Aug 2, 2024
de64cd6
schema check
prakriti-solankey Aug 2, 2024
33b9618
Update README.md
kartikpersistent Aug 2, 2024
e68f9ef
Droped the old vector index (#652)
vasanthasaikalluri Aug 2, 2024
832deca
added cypher_queries and llm chatbot files
jayanth-002 Aug 4, 2024
31915be
updated llm-chatbot-python
jayanth-002 Aug 4, 2024
1931989
added llm-chatbot-python
jayanth-002 Aug 4, 2024
2b2d2db
updated llm-chatbot-python folder
jayanth-002 Aug 4, 2024
37ee900
fixed loader issue for lazy loading in chat info dialog
kartikpersistent Aug 5, 2024
6184e97
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Aug 5, 2024
b8ea4f7
Added chatbot "hybrid " mode use case
abhishekkumar-27 Aug 5, 2024
2603c16
__ changes (#656)
prakriti-solankey Aug 5, 2024
241e35e
DiffbotGraphTransformer doesn't need an LLMGraphTransformer (#659)
jeromechoo Aug 6, 2024
e6778f2
Removed experiments/llm-chatbot-python folder from DEV branch
jayanth-002 Aug 7, 2024
862ac6f
Removed experiments/Cypher_Queries.ipynb file from DEV branch
jayanth-002 Aug 7, 2024
ea6cca3
redcued the password clear timeout
kartikpersistent Aug 7, 2024
7fdf4dc
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Aug 7, 2024
c2b4974
disabled the closed button on banner and connection dialog while API …
kartikpersistent Aug 7, 2024
97f6e38
update delete query with entities
praveshkumar1988 Aug 7, 2024
d115c89
node id check (#663)
prakriti-solankey Aug 8, 2024
1932cc4
Status source and type filtering (#664)
prakriti-solankey Aug 8, 2024
d68b4bc
added Hybrid Chat modes (#670)
vasanthasaikalluri Aug 8, 2024
f6519c5
Rename the function #657
praveshkumar1988 Aug 9, 2024
c964023
label and checkboxes placement changes (#675)
kartikpersistent Aug 9, 2024
c9ef478
Graph node filename check
prakriti-solankey Aug 9, 2024
ecd5fd5
env fixes with latest nvl libraries
kartikpersistent Aug 12, 2024
bf2c29c
format fixes
kartikpersistent Aug 12, 2024
9b87543
Remove TotalPages when save file on local (#684)
praveshkumar1988 Aug 12, 2024
a04470d
file_name reference and verify_ssl issue fixed (#683)
praveshkumar1988 Aug 12, 2024
b1ac35f
User flow changes for recreating supported vector index (#682)
kartikpersistent Aug 12, 2024
9ff92a1
Concurrent processing of files (#665)
kartikpersistent Aug 13, 2024
9b0db55
format fixes
kartikpersistent Aug 13, 2024
d392825
fixed the row selection issue
kartikpersistent Aug 14, 2024
71413e5
clearing the queue when there are no files in the db
kartikpersistent Aug 14, 2024
686f131
Update Dockerfile (#694)
jexp Aug 15, 2024
811052f
env changes for VITE (#690)
prakriti-solankey Aug 16, 2024
f85709d
removed the processing count update on error event of server side eve…
kartikpersistent Aug 16, 2024
756fa65
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Aug 16, 2024
2fe2993
removed hardcode value
kartikpersistent Aug 16, 2024
27b1d27
removed hardcoded values for resetting the processing count
kartikpersistent Aug 16, 2024
fb4685c
function definition changes
kartikpersistent Aug 16, 2024
8937291
vite prefix
kartikpersistent Aug 16, 2024
39713d1
Update docker-compose.yml (#688)
Kain-90 Aug 19, 2024
433e017
enabled the entity extraction by default
kartikpersistent Aug 20, 2024
6cf6834
Fix typo: correct 'josn_obj' to 'json_obj' (#697)
destiny966113 Aug 20, 2024
e975265
fixed model rendering fix for waiting files
kartikpersistent Aug 20, 2024
6274314
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
kartikpersistent Aug 20, 2024
47d8330
conflict solved
prakriti-solankey Aug 20, 2024
3d749e2
lint fixes
kartikpersistent Aug 20, 2024
2b77b84
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
prakriti-solankey Aug 20, 2024
b929a20
Fix typo: correct 'josn_obj' to 'json_obj' (#697)
destiny966113 Aug 20, 2024
d18528b
lint fixes
kartikpersistent Aug 20, 2024
8c8bae7
Merge branch 'DEV' of https://github.com/neo4j-labs/llm-graph-builder…
prakriti-solankey Aug 20, 2024
b72c014
lint fixes
kartikpersistent Aug 20, 2024
5885815
connection _check
prakriti-solankey Aug 20, 2024
9599c25
Merge branch 'STAGING' into DEV
kartikpersistent Aug 21, 2024
ad8f7bb
Dev (#701)
kartikpersistent Aug 21, 2024
d3f4661
Merge branch 'STAGING' of https://github.com/neo4j-labs/llm-graph-bui…
prakriti-solankey Aug 21, 2024
2b63cee
DEV to STAGING (#703)
vasanthasaikalluri Aug 21, 2024
0891e2e
Dev (#705)
prakriti-solankey Aug 22, 2024
8159e2c
default modes in staging
kartikpersistent Aug 26, 2024
b77ddd0
processing count update fix on cancel
kartikpersistent Aug 26, 2024
eeaa54b
Dev to Staging (#709)
aashipandya Aug 27, 2024
8d57dd2
Merge branch 'main' into STAGING
kartikpersistent Aug 27, 2024
4f8b666
merge fixes
kartikpersistent Aug 27, 2024
7a23400
fixed processing count update on failed condtition
kartikpersistent Aug 28, 2024
b451820
removed unsused variable
kartikpersistent Aug 28, 2024
e574299
Merge branch 'main' into STAGING
kartikpersistent Aug 28, 2024
b7355ba
DEV to STAGING (#732)
praveshkumar1988 Sep 6, 2024
758a506
Merge branch 'main' into STAGING
prakriti-solankey Sep 9, 2024
2cf49cc
modified in wiki script and weburl
abhishekkumar-27 Sep 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion backend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ RUN pip install -r requirements.txt
# Copy application code
COPY . /code
# Set command
CMD ["gunicorn", "score:app", "--workers", "8","--preload","--threads", "8", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]
CMD ["gunicorn", "score:app", "--workers", "8","--threads", "8", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]
1 change: 1 addition & 0 deletions backend/Performance_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ def performance_main():
for _ in range(CONCURRENT_REQUESTS):
futures.append(executor.submit(post_request_chunk))

# Chatbot request futures
# Chatbot request futures
# for message in CHATBOT_MESSAGES:
# futures.append(executor.submit(chatbot_request, message))
Expand Down
74 changes: 45 additions & 29 deletions backend/score.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ def sick():
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
Expand Down Expand Up @@ -137,7 +136,8 @@ async def extract_knowledge_graph_from_file(
allowedNodes=Form(None),
allowedRelationship=Form(None),
language=Form(None),
access_token=Form(None)
access_token=Form(None),
retry_condition=Form(None)
):
"""
Calls 'extract_graph_from_file' in a new thread to create Neo4jGraph from a
Expand All @@ -161,30 +161,30 @@ async def extract_knowledge_graph_from_file(
merged_file_path = os.path.join(MERGED_DIR,file_name)
logging.info(f'File path:{merged_file_path}')
result = await asyncio.to_thread(
extract_graph_from_file_local_file, uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship)
extract_graph_from_file_local_file, uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship, retry_condition)

elif source_type == 's3 bucket' and source_url:
result = await asyncio.to_thread(
extract_graph_from_file_s3, uri, userName, password, database, model, source_url, aws_access_key_id, aws_secret_access_key, allowedNodes, allowedRelationship)
extract_graph_from_file_s3, uri, userName, password, database, model, source_url, aws_access_key_id, aws_secret_access_key, file_name, allowedNodes, allowedRelationship, retry_condition)

elif source_type == 'web-url':
result = await asyncio.to_thread(
extract_graph_from_web_page, uri, userName, password, database, model, source_url, allowedNodes, allowedRelationship)
extract_graph_from_web_page, uri, userName, password, database, model, source_url, file_name, allowedNodes, allowedRelationship, retry_condition)

elif source_type == 'youtube' and source_url:
result = await asyncio.to_thread(
extract_graph_from_file_youtube, uri, userName, password, database, model, source_url, allowedNodes, allowedRelationship)
extract_graph_from_file_youtube, uri, userName, password, database, model, source_url, file_name, allowedNodes, allowedRelationship, retry_condition)

elif source_type == 'Wikipedia' and wiki_query:
result = await asyncio.to_thread(
extract_graph_from_file_Wikipedia, uri, userName, password, database, model, wiki_query, max_sources, language, allowedNodes, allowedRelationship)
extract_graph_from_file_Wikipedia, uri, userName, password, database, model, wiki_query, language, file_name, allowedNodes, allowedRelationship, retry_condition)

elif source_type == 'gcs bucket' and gcs_bucket_name:
result = await asyncio.to_thread(
extract_graph_from_file_gcs, uri, userName, password, database, model, gcs_project_id, gcs_bucket_name, gcs_bucket_folder, gcs_blob_filename, access_token, allowedNodes, allowedRelationship)
extract_graph_from_file_gcs, uri, userName, password, database, model, gcs_project_id, gcs_bucket_name, gcs_bucket_folder, gcs_blob_filename, access_token, file_name, allowedNodes, allowedRelationship, retry_condition)
else:
return create_api_response('Failed',message='source_type is other than accepted source')

if result is not None:
result['db_url'] = uri
result['api_name'] = 'extract'
Expand Down Expand Up @@ -433,25 +433,25 @@ async def generate():
logging.info(" SSE Client disconnected")
break
# get the current status of document node
graph = create_graph_database_connection(uri, userName, decoded_password, database)
graphDb_data_Access = graphDBdataAccess(graph)
result = graphDb_data_Access.get_current_status_document_node(file_name)
if result is not None:
status = json.dumps({'fileName':file_name,
'status':result[0]['Status'],
'processingTime':result[0]['processingTime'],
'nodeCount':result[0]['nodeCount'],
'relationshipCount':result[0]['relationshipCount'],
'model':result[0]['model'],
'total_chunks':result[0]['total_chunks'],
'total_pages':result[0]['total_pages'],
'fileSize':result[0]['fileSize'],
'processed_chunk':result[0]['processed_chunk'],
'fileSource':result[0]['fileSource']
})

else:
status = json.dumps({'fileName':file_name, 'status':'Failed'})
yield status
graph = create_graph_database_connection(uri, userName, decoded_password, database)
graphDb_data_Access = graphDBdataAccess(graph)
result = graphDb_data_Access.get_current_status_document_node(file_name)
print(f'Result of document status in SSE : {result}')
if len(result) > 0:
status = json.dumps({'fileName':file_name,
'status':result[0]['Status'],
'processingTime':result[0]['processingTime'],
'nodeCount':result[0]['nodeCount'],
'relationshipCount':result[0]['relationshipCount'],
'model':result[0]['model'],
'total_chunks':result[0]['total_chunks'],
'fileSize':result[0]['fileSize'],
'processed_chunk':result[0]['processed_chunk'],
'fileSource':result[0]['fileSource']
})
yield status
except asyncio.CancelledError:
logging.info("SSE Connection cancelled")

Expand Down Expand Up @@ -495,21 +495,21 @@ async def get_document_status(file_name, url, userName, password, database):
graph = create_graph_database_connection(uri, userName, decoded_password, database)
graphDb_data_Access = graphDBdataAccess(graph)
result = graphDb_data_Access.get_current_status_document_node(file_name)
if result is not None:
if len(result) > 0:
status = {'fileName':file_name,
'status':result[0]['Status'],
'processingTime':result[0]['processingTime'],
'nodeCount':result[0]['nodeCount'],
'relationshipCount':result[0]['relationshipCount'],
'model':result[0]['model'],
'total_chunks':result[0]['total_chunks'],
'total_pages':result[0]['total_pages'],
'fileSize':result[0]['fileSize'],
'processed_chunk':result[0]['processed_chunk'],
'fileSource':result[0]['fileSource']
}
else:
status = {'fileName':file_name, 'status':'Failed'}
print(f'Result of document status in refresh : {result}')
return create_api_response('Success',message="",file_name=status)
except Exception as e:
message=f"Unable to get the document status"
Expand Down Expand Up @@ -626,6 +626,22 @@ async def merge_duplicate_nodes(uri=Form(), userName=Form(), password=Form(), da
return create_api_response(job_status, message=message, error=error_message)
finally:
gc.collect()

@app.post("/retry_processing")
async def retry_processing(uri=Form(), userName=Form(), password=Form(), database=Form(), file_name=Form(), retry_condition=Form()):
try:
graph = create_graph_database_connection(uri, userName, password, database)
await asyncio.to_thread(set_status_retry, graph,file_name,retry_condition)
#set_status_retry(graph,file_name,retry_condition)
return create_api_response('Success',message=f"Status set to Reprocess for filename : {file_name}")
except Exception as e:
job_status = "Failed"
message="Unable to set status to Retry"
error_message = str(e)
logging.exception(f'{error_message}')
return create_api_response(job_status, message=message, error=error_message)
finally:
gc.collect()

if __name__ == "__main__":
uvicorn.run(app)
1 change: 0 additions & 1 deletion backend/src/document_sources/gcs_bucket.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,6 @@ def merge_file_gcs(bucket_name, original_file_name: str, folder_name_sha1_hashed
blob.upload_from_file(file_io)
# pdf_reader = PdfReader(file_io)
file_size = len(merged_file)
# total_pages = len(pdf_reader.pages)

return file_size
except Exception as e:
Expand Down
16 changes: 8 additions & 8 deletions backend/src/document_sources/local_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,19 +56,19 @@ def get_pages_with_page_numbers(unstructured_pages):
if page.metadata['page_number']==page_number:
page_content += page.page_content
metadata = {'source':page.metadata['source'],'page_number':page_number, 'filename':page.metadata['filename'],
'filetype':page.metadata['filetype'], 'total_pages':unstructured_pages[-1].metadata['page_number']}
'filetype':page.metadata['filetype']}

if page.metadata['page_number']>page_number:
page_number+=1
if not metadata:
metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}
pages.append(Document(page_content = page_content, metadata=metadata))
# if not metadata:
# metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}
pages.append(Document(page_content = page_content))
page_content=''

if page == unstructured_pages[-1]:
if not metadata:
metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}
pages.append(Document(page_content = page_content, metadata=metadata))
# if not metadata:
# metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}
pages.append(Document(page_content = page_content))

elif page.metadata['category']=='PageBreak' and page!=unstructured_pages[0]:
page_number+=1
Expand All @@ -80,7 +80,7 @@ def get_pages_with_page_numbers(unstructured_pages):
page_content += page.page_content
metadata_with_custom_page_number = {'source':page.metadata['source'],
'page_number':1, 'filename':page.metadata['filename'],
'filetype':page.metadata['filetype'], 'total_pages':1}
'filetype':page.metadata['filetype']}
if page == unstructured_pages[-1]:
pages.append(Document(page_content = page_content, metadata=metadata_with_custom_page_number))
return pages
2 changes: 1 addition & 1 deletion backend/src/document_sources/wikipedia.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

def get_documents_from_Wikipedia(wiki_query:str, language:str):
try:
pages = WikipediaLoader(query=wiki_query.strip(), lang=language, load_max_docs=1, load_all_available_meta=False).load()
pages = WikipediaLoader(query=wiki_query.strip(), lang=language, load_all_available_meta=False).load()
file_name = wiki_query.strip()
logging.info(f"Total Pages from Wikipedia = {len(pages)}")
return file_name, pages
Expand Down
2 changes: 1 addition & 1 deletion backend/src/entities/source_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ class sourceNode:
updated_at:datetime=None
processing_time:float=None
error_message:str=None
total_pages:int=None
total_chunks:int=None
language:str=None
is_cancelled:bool=None
processed_chunk:int=None
access_token:str=None
retry_condition:str=None
20 changes: 10 additions & 10 deletions backend/src/graphDB_dataAccess.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,14 @@ def create_source_node(self, obj_source_node:sourceNode):
d.processingTime = $pt, d.errorMessage = $e_message, d.nodeCount= $n_count,
d.relationshipCount = $r_count, d.model= $model, d.gcsBucket=$gcs_bucket,
d.gcsBucketFolder= $gcs_bucket_folder, d.language= $language,d.gcsProjectId= $gcs_project_id,
d.is_cancelled=False, d.total_chunks=0, d.processed_chunk=0, d.total_pages=$total_pages,
d.is_cancelled=False, d.total_chunks=0, d.processed_chunk=0,
d.access_token=$access_token""",
{"fn":obj_source_node.file_name, "fs":obj_source_node.file_size, "ft":obj_source_node.file_type, "st":job_status,
"url":obj_source_node.url,
"awsacc_key_id":obj_source_node.awsAccessKeyId, "f_source":obj_source_node.file_source, "c_at":obj_source_node.created_at,
"u_at":obj_source_node.created_at, "pt":0, "e_message":'', "n_count":0, "r_count":0, "model":obj_source_node.model,
"gcs_bucket": obj_source_node.gcsBucket, "gcs_bucket_folder": obj_source_node.gcsBucketFolder,
"language":obj_source_node.language, "gcs_project_id":obj_source_node.gcsProjectId, "total_pages": obj_source_node.total_pages,
"language":obj_source_node.language, "gcs_project_id":obj_source_node.gcsProjectId,
"access_token":obj_source_node.access_token})
except Exception as e:
error_message = str(e)
Expand All @@ -71,26 +71,26 @@ def update_source_node(self, obj_source_node:sourceNode):
if obj_source_node.processing_time is not None and obj_source_node.processing_time != 0:
params['processingTime'] = round(obj_source_node.processing_time.total_seconds(),2)

if obj_source_node.node_count is not None and obj_source_node.node_count != 0:
if obj_source_node.node_count is not None :
params['nodeCount'] = obj_source_node.node_count

if obj_source_node.relationship_count is not None and obj_source_node.relationship_count != 0:
if obj_source_node.relationship_count is not None :
params['relationshipCount'] = obj_source_node.relationship_count

if obj_source_node.model is not None and obj_source_node.model != '':
params['model'] = obj_source_node.model

if obj_source_node.total_pages is not None and obj_source_node.total_pages != 0:
params['total_pages'] = obj_source_node.total_pages

if obj_source_node.total_chunks is not None and obj_source_node.total_chunks != 0:
params['total_chunks'] = obj_source_node.total_chunks

if obj_source_node.is_cancelled is not None and obj_source_node.is_cancelled != False:
if obj_source_node.is_cancelled is not None:
params['is_cancelled'] = obj_source_node.is_cancelled

if obj_source_node.processed_chunk is not None and obj_source_node.processed_chunk != 0:
if obj_source_node.processed_chunk is not None :
params['processed_chunk'] = obj_source_node.processed_chunk

if obj_source_node.retry_condition is not None :
params['retry_condition'] = obj_source_node.retry_condition

param= {"props":params}

Expand Down Expand Up @@ -187,7 +187,7 @@ def get_current_status_document_node(self, file_name):
query = """
MATCH(d:Document {fileName : $file_name}) RETURN d.status AS Status , d.processingTime AS processingTime,
d.nodeCount AS nodeCount, d.model as model, d.relationshipCount as relationshipCount,
d.total_pages AS total_pages, d.total_chunks AS total_chunks , d.fileSize as fileSize,
d.total_chunks AS total_chunks , d.fileSize as fileSize,
d.is_cancelled as is_cancelled, d.processed_chunk as processed_chunk, d.fileSource as fileSource
"""
param = {"file_name" : file_name}
Expand Down
Loading