Staging to main (#735) · neo4j-labs/llm-graph-builder@06b2c58

Commit

Staging to main (#735)

* Dev (#537)

* format fixes and graph schema indication fix

* Update README.md

* added chat modes variable in env updated the readme

* spell fix

* added the chat mode in env table

* added the logos

* fixed the overflow issues

* removed the extra fix

* Fixed specific scenario  "when the text from schema closes it should reopen the previous modal"

* readme changes

* removed dev console logs

* added new retrieval query (#533)

* format fixes and tab rendering fix

* fixed the setting modal reopen issue

---------

Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>

* disabled the sumbit buttom on loading

* Deduplication tab (#566)

* de-duplication API

* Update De-Duplicate query

* created the Deduplication tab

* added the API service

* added the removeable tags for similar nodes in deduplication tab

* Integrate Tag

* added GraphLabel

* added loader state

* added the merge service

* integrated the merge API

* Merge Query issue fixed

* Auto refresh the duplicate nodes after merging operation

* added the description for de duplication

* reset on merging

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Update frontend_docs.adoc (#538)

* Update frontend_docs.adoc

* doc update

* Images

* Images folder change

* Images folder change

* test image

* Update frontend_docs.adoc

* image change

* Update frontend_docs.adoc

* Update frontend_docs.adoc

* added the Graph Mode SS

* added the Query SS

* Update frontend_docs.adoc

* conflics fix

* conflict fix

* Update frontend_docs.adoc

---------

Co-authored-by: aashipandya <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* updated langchain versions (#565)

* Update the De-Duplication query

* Node relationship id type none issue (#547)

* de-duplication API

* Update De-Duplicate query

* Issue fixed Nodes,Relationship Id and Type None or Blank

* added the tooltips

* type fix

* Unneccory import

* added score threshold and added some error handling (#571)

* Update requirements.txt

* Tooltip and other UI fixes (#572)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <[email protected]>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: aashipandya <[email protected]>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <[email protected]>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <[email protected]>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <[email protected]>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <[email protected]>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <[email protected]>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <[email protected]>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: aashipandya <[email protected]>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <[email protected]>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <[email protected]>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <[email protected]>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <[email protected]>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <[email protected]>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: Pravesh Kumar <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: Ajay Meena <[email protected]>
Co-authored-by: Morgan Senechal <[email protected]>
Co-authored-by: karanchellani <[email protected]>

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <[email protected]>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <[email protected]>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <[email protected]>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: aashipandya <[email protected]>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <[email protected]>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <[email protected]>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <[email protected]>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <[email protected]>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <[email protected]>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <[email protected]>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <[email protected]>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: aashipandya <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: Ajay Meena <[email protected]>
Co-authored-by: Morgan Senechal <[email protected]>
Co-authored-by: karanchellani <[email protected]>

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <[email protected]>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: aashipandya <[email protected]>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <[email protected]>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <[email protected]>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <[email protected]>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <[email protected]>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <[email protected]>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <[email protected]>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <[email protected]>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: aashipandya <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: Ajay Meena <[email protected]>
Co-authored-by: Morgan Senechal <[email protected]>
Co-authored-by: karanchellani <[email protected]>

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent <[email protected]>

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent <[email protected]>

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent <[email protected]>

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: aashipandya <[email protected]>

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent <[email protected]>

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent <[email protected]>

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent <[email protected]>

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent <[email protected]>

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya <[email protected]>

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent <[email protected]>

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar <[email protected]>

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya <[email protected]>

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya <[email protected]>

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* added upload api

* changed the dropzone error message

---------

Co-authored-by: abhishekkumar-27 <[email protected]>
Co-authored-by: kartikpersistent <[email protected]>
Co-authored-by: aashipandya <[email protected]>
Co-authored-by: vasanthasaikalluri <[email protected]>
Co-authored-by: Prakriti Solankey <[email protected]>
Co-authored-by: Ajay Meena <[email protected]>
Co-auth…

Loading branch information

19 people authored Sep 9, 2024

1 parent 1aa2842 commit 06b2c58

backend/Dockerfile

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -21,4 +21,4 @@ RUN pip install -r requirements.txt
  
    # Copy application code

    COPY . /code

    # Set command

    CMD ["gunicorn", "score:app", "--workers", "8","--preload","--threads", "8", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]

    CMD ["gunicorn", "score:app", "--workers", "8","--threads", "8", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8000", "--timeout", "300"]

backend/Performance_test.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -94,6 +94,7 @@ def performance_main(): @@
             for _ in range(CONCURRENT_REQUESTS):
                 futures.append(executor.submit(post_request_chunk))
+            #  Chatbot request futures
             #  Chatbot request futures
             # for message in CHATBOT_MESSAGES:
             #     futures.append(executor.submit(chatbot_request, message))
@@ Expand Down @@

backend/score.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -46,7 +46,6 @@ def sick():
  
    app.add_middleware(

        CORSMiddleware,

        allow_origins=["*"],

        allow_credentials=True,

        allow_methods=["*"],

        allow_headers=["*"],

    )

    @@ -137,7 +136,8 @@ async def extract_knowledge_graph_from_file(
  
        allowedNodes=Form(None),

        allowedRelationship=Form(None),

        language=Form(None),

        access_token=Form(None)

        access_token=Form(None),

        retry_condition=Form(None)

    ):

        """

        Calls 'extract_graph_from_file' in a new thread to create Neo4jGraph from a

    @@ -161,30 +161,30 @@ async def extract_knowledge_graph_from_file(
  
                merged_file_path = os.path.join(MERGED_DIR,file_name)

                logging.info(f'File path:{merged_file_path}')

                result = await asyncio.to_thread(

                    extract_graph_from_file_local_file, uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship)

                    extract_graph_from_file_local_file, uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship, retry_condition)

            elif source_type == 's3 bucket' and source_url:

                result = await asyncio.to_thread(

                    extract_graph_from_file_s3, uri, userName, password, database, model, source_url, aws_access_key_id, aws_secret_access_key, allowedNodes, allowedRelationship)

                    extract_graph_from_file_s3, uri, userName, password, database, model, source_url, aws_access_key_id, aws_secret_access_key, file_name, allowedNodes, allowedRelationship, retry_condition)

            elif source_type == 'web-url':

                result = await asyncio.to_thread(

                    extract_graph_from_web_page, uri, userName, password, database, model, source_url, allowedNodes, allowedRelationship)

                    extract_graph_from_web_page, uri, userName, password, database, model, source_url, file_name, allowedNodes, allowedRelationship, retry_condition)

            elif source_type == 'youtube' and source_url:

                result = await asyncio.to_thread(

                    extract_graph_from_file_youtube, uri, userName, password, database, model, source_url, allowedNodes, allowedRelationship)

                    extract_graph_from_file_youtube, uri, userName, password, database, model, source_url, file_name, allowedNodes, allowedRelationship, retry_condition)

            elif source_type == 'Wikipedia' and wiki_query:

                result = await asyncio.to_thread(

                    extract_graph_from_file_Wikipedia, uri, userName, password, database, model, wiki_query, max_sources, language, allowedNodes, allowedRelationship)

                    extract_graph_from_file_Wikipedia, uri, userName, password, database, model, wiki_query, language, file_name, allowedNodes, allowedRelationship, retry_condition)

            elif source_type == 'gcs bucket' and gcs_bucket_name:

                result = await asyncio.to_thread(

                    extract_graph_from_file_gcs, uri, userName, password, database, model, gcs_project_id, gcs_bucket_name, gcs_bucket_folder, gcs_blob_filename, access_token, allowedNodes, allowedRelationship)

                    extract_graph_from_file_gcs, uri, userName, password, database, model, gcs_project_id, gcs_bucket_name, gcs_bucket_folder, gcs_blob_filename, access_token, file_name, allowedNodes, allowedRelationship, retry_condition)

            else:

                return create_api_response('Failed',message='source_type is other than accepted source')

            if result is not None:

                result['db_url'] = uri

                result['api_name'] = 'extract'

    @@ -433,25 +433,25 @@ async def generate():
  
                        logging.info(" SSE Client disconnected")

                        break

                    # get the current status of document node

                    graph = create_graph_database_connection(uri, userName, decoded_password, database)

                    graphDb_data_Access = graphDBdataAccess(graph)

                    result = graphDb_data_Access.get_current_status_document_node(file_name)

                    if result is not None:

                        status = json.dumps({'fileName':file_name, 

                        'status':result[0]['Status'],

                        'processingTime':result[0]['processingTime'],

                        'nodeCount':result[0]['nodeCount'],

                        'relationshipCount':result[0]['relationshipCount'],

                        'model':result[0]['model'],

                        'total_chunks':result[0]['total_chunks'],

                        'total_pages':result[0]['total_pages'],

                        'fileSize':result[0]['fileSize'],

                        'processed_chunk':result[0]['processed_chunk'],

                        'fileSource':result[0]['fileSource']

                        })

                    else:

                        status = json.dumps({'fileName':file_name, 'status':'Failed'})

                    yield status

                        graph = create_graph_database_connection(uri, userName, decoded_password, database)

                        graphDb_data_Access = graphDBdataAccess(graph)

                        result = graphDb_data_Access.get_current_status_document_node(file_name)

                        print(f'Result of document status in SSE : {result}')

                        if len(result) > 0:

                            status = json.dumps({'fileName':file_name, 

                            'status':result[0]['Status'],

                            'processingTime':result[0]['processingTime'],

                            'nodeCount':result[0]['nodeCount'],

                            'relationshipCount':result[0]['relationshipCount'],

                            'model':result[0]['model'],

                            'total_chunks':result[0]['total_chunks'],

                            'fileSize':result[0]['fileSize'],

                            'processed_chunk':result[0]['processed_chunk'],

                            'fileSource':result[0]['fileSource']

                            })

                        yield status

                except asyncio.CancelledError:

                    logging.info("SSE Connection cancelled")

    @@ -495,21 +495,21 @@ async def get_document_status(file_name, url, userName, password, database):
  
            graph = create_graph_database_connection(uri, userName, decoded_password, database)

            graphDb_data_Access = graphDBdataAccess(graph)

            result = graphDb_data_Access.get_current_status_document_node(file_name)

            if result is not None:

            if len(result) > 0:

                status = {'fileName':file_name, 

                    'status':result[0]['Status'],

                    'processingTime':result[0]['processingTime'],

                    'nodeCount':result[0]['nodeCount'],

                    'relationshipCount':result[0]['relationshipCount'],

                    'model':result[0]['model'],

                    'total_chunks':result[0]['total_chunks'],

                    'total_pages':result[0]['total_pages'],

                    'fileSize':result[0]['fileSize'],

                    'processed_chunk':result[0]['processed_chunk'],

                    'fileSource':result[0]['fileSource']

                    }

            else:

                status = {'fileName':file_name, 'status':'Failed'}

            print(f'Result of document status in refresh : {result}')

            return create_api_response('Success',message="",file_name=status)

        except Exception as e:

            message=f"Unable to get the document status"

    @@ -626,6 +626,22 @@ async def merge_duplicate_nodes(uri=Form(), userName=Form(), password=Form(), da
  
            return create_api_response(job_status, message=message, error=error_message)

        finally:

            gc.collect()

    @app.post("/retry_processing")

    async def retry_processing(uri=Form(), userName=Form(), password=Form(), database=Form(), file_name=Form(), retry_condition=Form()):

        try:

            graph = create_graph_database_connection(uri, userName, password, database)

            await asyncio.to_thread(set_status_retry, graph,file_name,retry_condition)

            #set_status_retry(graph,file_name,retry_condition)

            return create_api_response('Success',message=f"Status set to Reprocess for filename : {file_name}")

        except Exception as e:

            job_status = "Failed"

            message="Unable to set status to Retry"

            error_message = str(e)

            logging.exception(f'{error_message}')

            return create_api_response(job_status, message=message, error=error_message)

        finally:

            gc.collect()        

    if __name__ == "__main__":

        uvicorn.run(app)

backend/src/document_sources/gcs_bucket.py

-Original file line number
+Diff line change
@@ Expand Up @@
           blob.upload_from_file(file_io)
           # pdf_reader = PdfReader(file_io)
           file_size = len(merged_file)
-          # total_pages = len(pdf_reader.pages)
           return file_size
       except Exception as e:
@@ Expand Down @@

backend/src/document_sources/local_file.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -56,19 +56,19 @@ def get_pages_with_page_numbers(unstructured_pages):
  
                if page.metadata['page_number']==page_number:

                    page_content += page.page_content

                    metadata = {'source':page.metadata['source'],'page_number':page_number, 'filename':page.metadata['filename'],

                            'filetype':page.metadata['filetype'], 'total_pages':unstructured_pages[-1].metadata['page_number']}

                            'filetype':page.metadata['filetype']}

                if page.metadata['page_number']>page_number:

                    page_number+=1

                    if not metadata:

                        metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}

                    pages.append(Document(page_content = page_content, metadata=metadata))

                    # if not metadata:

                    #     metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}

                    pages.append(Document(page_content = page_content))

                    page_content='' 

                if page == unstructured_pages[-1]:

                    if not metadata:

                        metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}

                    pages.append(Document(page_content = page_content, metadata=metadata))

                    # if not metadata:

                    #     metadata = {'total_pages':unstructured_pages[-1].metadata['page_number']}

                    pages.append(Document(page_content = page_content))

            elif page.metadata['category']=='PageBreak' and page!=unstructured_pages[0]:

                page_number+=1

    @@ -80,7 +80,7 @@ def get_pages_with_page_numbers(unstructured_pages):
  
                page_content += page.page_content

                metadata_with_custom_page_number = {'source':page.metadata['source'],

                                'page_number':1, 'filename':page.metadata['filename'],

                                'filetype':page.metadata['filetype'], 'total_pages':1}

                                'filetype':page.metadata['filetype']}

                if page == unstructured_pages[-1]:

                        pages.append(Document(page_content = page_content, metadata=metadata_with_custom_page_number))

        return pages

backend/src/document_sources/wikipedia.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -4,7 +4,7 @@
  
    def get_documents_from_Wikipedia(wiki_query:str, language:str):

      try:

        pages = WikipediaLoader(query=wiki_query.strip(), lang=language, load_max_docs=1, load_all_available_meta=False).load()

        pages = WikipediaLoader(query=wiki_query.strip(), lang=language, load_all_available_meta=False).load()

        file_name = wiki_query.strip()

        logging.info(f"Total Pages from Wikipedia = {len(pages)}") 

        return file_name, pages

backend/src/entities/source_node.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -18,9 +18,9 @@ class sourceNode: @@
         updated_at:datetime=None
         processing_time:float=None
         error_message:str=None
-        total_pages:int=None
         total_chunks:int=None
         language:str=None
         is_cancelled:bool=None
         processed_chunk:int=None
         access_token:str=None
+        retry_condition:str=None

backend/src/graphDB_dataAccess.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -37,14 +37,14 @@ def create_source_node(self, obj_source_node:sourceNode):
  
                                d.processingTime = $pt, d.errorMessage = $e_message, d.nodeCount= $n_count, 

                                d.relationshipCount = $r_count, d.model= $model, d.gcsBucket=$gcs_bucket, 

                                d.gcsBucketFolder= $gcs_bucket_folder, d.language= $language,d.gcsProjectId= $gcs_project_id,

                                d.is_cancelled=False, d.total_chunks=0, d.processed_chunk=0, d.total_pages=$total_pages,

                                d.is_cancelled=False, d.total_chunks=0, d.processed_chunk=0,

                                d.access_token=$access_token""",

                                {"fn":obj_source_node.file_name, "fs":obj_source_node.file_size, "ft":obj_source_node.file_type, "st":job_status, 

                                "url":obj_source_node.url,

                                "awsacc_key_id":obj_source_node.awsAccessKeyId, "f_source":obj_source_node.file_source, "c_at":obj_source_node.created_at,

                                "u_at":obj_source_node.created_at, "pt":0, "e_message":'', "n_count":0, "r_count":0, "model":obj_source_node.model,

                                "gcs_bucket": obj_source_node.gcsBucket, "gcs_bucket_folder": obj_source_node.gcsBucketFolder, 

                                "language":obj_source_node.language, "gcs_project_id":obj_source_node.gcsProjectId, "total_pages": obj_source_node.total_pages,

                                "language":obj_source_node.language, "gcs_project_id":obj_source_node.gcsProjectId,

                                "access_token":obj_source_node.access_token})

            except Exception as e:

                error_message = str(e)

    @@ -71,26 +71,26 @@ def update_source_node(self, obj_source_node:sourceNode):
  
                if obj_source_node.processing_time is not None and obj_source_node.processing_time != 0:

                    params['processingTime'] = round(obj_source_node.processing_time.total_seconds(),2)

                if obj_source_node.node_count is not None and obj_source_node.node_count != 0:

                if obj_source_node.node_count is not None :

                    params['nodeCount'] = obj_source_node.node_count

                if obj_source_node.relationship_count is not None and obj_source_node.relationship_count != 0:

                if obj_source_node.relationship_count is not None :

                    params['relationshipCount'] = obj_source_node.relationship_count

                if obj_source_node.model is not None and obj_source_node.model != '':

                    params['model'] = obj_source_node.model

                if obj_source_node.total_pages is not None and obj_source_node.total_pages != 0:

                    params['total_pages'] = obj_source_node.total_pages

                if obj_source_node.total_chunks is not None and obj_source_node.total_chunks != 0:

                    params['total_chunks'] = obj_source_node.total_chunks

                if obj_source_node.is_cancelled is not None and obj_source_node.is_cancelled != False:

                if obj_source_node.is_cancelled is not None:

                    params['is_cancelled'] = obj_source_node.is_cancelled

                if obj_source_node.processed_chunk is not None and obj_source_node.processed_chunk != 0:

                if obj_source_node.processed_chunk is not None :

                    params['processed_chunk'] = obj_source_node.processed_chunk

                if obj_source_node.retry_condition is not None :

                    params['retry_condition'] = obj_source_node.retry_condition    

                param= {"props":params}

    @@ -187,7 +187,7 @@ def get_current_status_document_node(self, file_name):
  
            query = """

                    MATCH(d:Document {fileName : $file_name}) RETURN d.status AS Status , d.processingTime AS processingTime, 

                    d.nodeCount AS nodeCount, d.model as model, d.relationshipCount as relationshipCount,

                    d.total_pages AS total_pages, d.total_chunks AS total_chunks , d.fileSize as fileSize, 

                    d.total_chunks AS total_chunks , d.fileSize as fileSize, 

                    d.is_cancelled as is_cancelled, d.processed_chunk as processed_chunk, d.fileSource as fileSource

                    """

            param = {"file_name" : file_name}

0 comments on commit `06b2c58`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `06b2c58`

Commit

There are no files selected for viewing

0 comments on commit 06b2c58

0 comments on commit `06b2c58`