Enhance SearchQnA (#28)

* draft searchQnA example * skip the aio bug, test pass 1/2 * fix dep * add copyright * add README * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix sync chain issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add SearchQnA ui part Signed-off-by: lvliang-intel <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * frontend workflow * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix linebreak and frontend * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix preci issue Signed-off-by: lvliang-intel <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ci issue Signed-off-by: lvliang-intel <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix issue Signed-off-by: lvliang-intel <[email protected]> * update backend endpoint Signed-off-by: lvliang-intel <[email protected]> --------- Signed-off-by: lvliang-intel <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: lvliang-intel <[email protected]>
opea-project · Mar 29, 2024 · 0064b6e · 0064b6e
1 parent fa38b16
commit 0064b6e
Show file tree

Hide file tree

Showing 51 changed files with 12,482 additions and 27 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -11,7 +11,8 @@ repos:
       - id: check-json
         exclude: |
           (?x)^(
-              ChatQnA/ui/tsconfig.json
+              ChatQnA/ui/tsconfig.json|
+              SearchQnA/ui/tsconfig.json
           )$
       - id: check-yaml
       - id: debug-statements
@@ -25,7 +26,7 @@ repos:
       - id: insert-license
         files: |
           (?x)^(
-            (ChatQnA|CodeGen|DocSum|VisualQnA)/.*(py|yaml|yml|sh)|
+            (ChatQnA|CodeGen|DocSum|SearchQnA|VisualQnA)/.*(py|yaml|yml|sh)|
           )$
         args:
           [
@@ -37,7 +38,7 @@ repos:
       - id: insert-license
         files: |
           (?x)^(
-            (ChatQnA|CodeGen|DocSum|VisualQnA)/.*(ts|js)|
+            (ChatQnA|CodeGen|DocSum|SearchQnA|VisualQnA)/.*(ts|js)|
           )$
         args:
           [
@@ -50,7 +51,7 @@ repos:
       - id: insert-license
         files: |
           (?x)^(
-            (ChatQnA|CodeGen|DocSum|VisualQnA)/.*(html|svelte)|
+            (ChatQnA|CodeGen|DocSum|SearchQnA|VisualQnA)/.*(html|svelte)|
           )$
         args:
           [

diff --git a/SearchQnA/README.md b/SearchQnA/README.md
@@ -2,7 +2,7 @@
 
 Search Question and Answering is the task of using Search Engine (e.g. Google Search) to improve the QA quality. Large language models have limitation on answering real-time information or specific details because they are limited to prior training data. A search engine can make up this advantage. By using a search engine, this SearchQnA service will firstly look up the relevant source web pages and feed them as context to the LLMs, so LLMs can use those context to compose answers more precisely.
 
-## Start Service
+# Start Backend Service
 
 - Start the TGI service to deploy your LLM
 
@@ -17,11 +17,47 @@ bash launch_tgi_service.sh
 ```sh
 cd /home/sdp/sihanche/GenAIExamples/SearchQnA/langchain/docker
 docker build . --build-arg http_proxy=${http_proxy} --build-arg https_proxy=${http_proxy}  -t intel/gen-ai-examples:searchqna-gaudi --no-cache
-docker run -e TGI_ENDPOINT=<TGI ENDPOINT> -e GOOGLE_CSE_ID=<GOOGLE CSE ID> -e GOOGLE_API_KEY=<GOOGLE API KEY> -e HUGGINGFACEHUB_API_TOKEN=<HUGGINGFACE API TOKEN> -p 8085:8000 -e http_proxy=$http_proxy -e https_proxy=$https_proxy -v $PWD/qna-app:/qna-app --runtime=habana -e HABANA_VISIBE_DEVILCES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host intel/gen-ai-examples:searchqna-gaudi
+docker run -e TGI_ENDPOINT=<TGI ENDPOINT> -e GOOGLE_CSE_ID=<GOOGLE CSE ID> -e GOOGLE_API_KEY=<GOOGLE API KEY> -e HUGGINGFACEHUB_API_TOKEN=<HUGGINGFACE API TOKEN> -p 8080:8000 -e http_proxy=$http_proxy -e https_proxy=$https_proxy -v $PWD/qna-app:/qna-app --runtime=habana -e HABANA_VISIBE_DEVILCES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host intel/gen-ai-examples:searchqna-gaudi
 ```
 
 - Test
 
 ```sh
-curl http://localhost:8085/v1/rag/web_search_chat_stream     -X POST     -d '{"query":"Give me some latest news?"}'     -H 'Content-Type: application/json'
+curl http://localhost:8085/v1/rag/web_search_chat_stream -X POST -d '{"query":"Give me some latest news?"}' -H 'Content-Type: application/json'
 ```
+
+# Start Frontend GUI
+
+Navigate to the "ui" folder and execute the following commands to start the frontend GUI:
+
+```bash
+cd ui
+sudo apt-get install npm && \
+    npm install -g n && \
+    n stable && \
+    hash -r && \
+    npm install -g npm@latest
+```
+
+For CentOS, please use the following commands instead:
+
+```bash
+curl -sL https://rpm.nodesource.com/setup_20.x | sudo bash -
+sudo yum install -y nodejs
+```
+
+Update the `BACKEND_BASE_URL` environment variable in the `.env` file by replacing the IP address '127.0.0.1' with the actual IP address.
+
+Run the following command to install the required dependencies:
+
+```bash
+npm install
+```
+
+Start the development server by executing the following command:
+
+```bash
+nohup npm run dev &
+```
+
+This will initiate the frontend service and launch the application.
diff --git a/SearchQnA/langchain/docker/qna-app/server.py b/SearchQnA/langchain/docker/qna-app/server.py
@@ -16,6 +16,7 @@
 # limitations under the License.
 
 import os
+import shutil
 import sys
 from queue import Queue
 from threading import Thread
@@ -24,13 +25,15 @@
 from fastapi.responses import StreamingResponse
 from langchain.callbacks.base import BaseCallbackHandler
 from langchain.chains import RetrievalQAWithSourcesChain
+from langchain.globals import set_debug
 from langchain.retrievers.web_research import WebResearchRetriever
 from langchain_community.embeddings import HuggingFaceInstructEmbeddings
 from langchain_community.llms import HuggingFaceEndpoint
 from langchain_community.utilities import GoogleSearchAPIWrapper
 from langchain_community.vectorstores import Chroma
 from starlette.middleware.cors import CORSMiddleware
 
+set_debug(True)
 app = FastAPI()
 
 app.add_middleware(
@@ -41,6 +44,9 @@
     allow_headers=["*"],
 )
 
+TGI_ENDPOINT = os.getenv("TGI_ENDPOINT", "http://localhost:8080")
+SHOW_INTERMEDIATE_LOG = os.getenv("SHOW_INTERMEDIATE_LOG", "True").lower() in ("true", "1")
+
 
 class QueueCallbackHandler(BaseCallbackHandler):
     """A queue that holds the result answer token buffer for streaming response."""
@@ -52,13 +58,25 @@ def __init__(self, queue: Queue):
     def on_llm_new_token(self, token: str, **kwargs):
         sys.stdout.write(token)
         sys.stdout.flush()
-        if self.enter_answer_phase:
+        if SHOW_INTERMEDIATE_LOG or self.enter_answer_phase:
             self.queue.put(
                 {
                     "answer": token,
                 }
             )
 
+    def on_llm_start(self, *args, **kwargs):
+        if SHOW_INTERMEDIATE_LOG:
+            if not self.enter_answer_phase:
+                msg = "The search engine begin to fetch the HTML pages with these questions:"
+            else:
+                msg = "\nGet the answer from Large Language Models:\n"
+            self.queue.put(
+                {
+                    "answer": msg,
+                }
+            )
+
     def on_llm_end(self, *args, **kwargs):
         self.enter_answer_phase = not self.enter_answer_phase
         return True
@@ -94,11 +112,13 @@ def __init__(
             callbacks=[QueueCallbackHandler(queue=self.queue)],
         )
 
-        # check google api key is provided
+        # Check that google api key is provided
         if "GOOGLE_API_KEY" not in os.environ or "GOOGLE_API_KEY" not in os.environ:
             raise Exception("Please make sure to set GOOGLE_API_KEY and GOOGLE_API_KEY environment variables!")
 
-        # Notice: please check or manually delete the vectordb directory if you do not previous histories
+        # Clear the last time searching history, which is useful to avoid interfering with current retrievals
+        if os.path.exists(vectordb_persistent_directory) and os.path.isdir(vectordb_persistent_directory):
+            shutil.rmtree(vectordb_persistent_directory)
         self.vectorstore = Chroma(
             embedding_function=HuggingFaceInstructEmbeddings(model_name=vectordb_embedding_model),
             persist_directory=vectordb_persistent_directory,
@@ -109,7 +129,10 @@ def __init__(
 
         # Compose the websearch retriever
         self.web_search_retriever = WebResearchRetriever.from_llm(
-            vectorstore=self.vectorstore, llm=self.llm, search=self.search
+            vectorstore=self.vectorstore,
+            llm=self.llm,
+            search=self.search,
+            # num_search_results=3
         )
 
         # Compose the whole chain
@@ -119,14 +142,16 @@ def __init__(
         )
 
     def handle_search_chat(self, query: str):
-        response = self.llm_chain({"question": query})
+        try:
+            response = self.llm_chain({"question": query})
+        except Exception as e:
+            print(f"LLM chain error: {e}")
+            return "Internal Server Error", ""
         return response["answer"], response["sources"]
 
 
-tgi_endpoint = os.getenv("TGI_ENDPOINT", "http://localhost:8080")
-
 router = SearchQuestionAnsweringAPIRouter(
-    entrypoint=tgi_endpoint,
+    entrypoint=TGI_ENDPOINT,
 )
 
 
@@ -135,24 +160,28 @@ async def web_search_chat(request: Request):
     params = await request.json()
     print(f"[websearch - chat] POST request: /v1/rag/web_search_chat, params:{params}")
     query = params["query"]
-    answer, sources = router.handle_search_chat(query=query)
+    answer, sources = router.handle_search_chat(query={"question": query})
     print(f"[websearch - chat] answer: {answer}, sources: {sources}")
     return {"answer": answer, "sources": sources}
 
 
 @router.post("/v1/rag/web_search_chat_stream")
 async def web_search_chat_stream(request: Request):
     params = await request.json()
-    print(tgi_endpoint)
     print(f"[websearch - streaming chat] POST request: /v1/rag/web_search_chat_stream, params:{params}")
     query = params["query"]
 
     def stream_callback(query):
         finished = object()
 
         def task():
-            _ = router.llm_chain({"question": query})
-            router.queue.put(finished)
+            try:
+                _ = router.llm_chain({"question": query})
+                router.queue.put(finished)
+            except Exception as e:
+                print(f"LLM chain error: {e}")
+                router.queue.put({"answer": "\nInternal Server Error\n"})
+                router.queue.put(finished)
 
         t = Thread(target=task)
         t.start()
@@ -166,22 +195,25 @@ def task():
                 continue
 
     def stream_generator():
+        import codecs
+
         chat_response = ""
-        # FIXME need to add the sources and chat_history
-        for res_dict in stream_callback({"question": query}):
+        for res_dict in stream_callback(query={"question": query}):
             text = res_dict["answer"]
             chat_response += text
             if text == " ":
                 yield "data: @#$\n\n"
                 continue
-            if text.isspace():
+            # if text.isspace():
+            #     continue
+            if "\n" in text or "\r" in text:
+                text = text.replace("\n", "<br/>").replace(" ", "@#$")
+                yield f"data: {text}\n\n"
                 continue
-            if "\n" in text:
-                yield "data: <br/>\n\n"
-            new_text = text.replace(" ", "@#$")
-            yield f"data: {new_text}\n\n"
+            text = text.replace(" ", "@#$")
+            yield f"data: {text}\n\n"
         chat_response = chat_response.split("</s>")[0]
-        print(f"[rag - chat_stream] stream response: {chat_response}")
+        print(f"\n\n[rag - chat_stream] stream response: {chat_response}\n\n")
         yield "data: [DONE]\n\n"
 
     return StreamingResponse(stream_generator(), media_type="text/event-stream")

diff --git a/SearchQnA/ui/.editorconfig b/SearchQnA/ui/.editorconfig
@@ -0,0 +1,10 @@
+[*]
+indent_style = tab
+
+[package.json]
+indent_style = space
+indent_size = 2
+
+[*.md]
+indent_style = space
+indent_size = 2
diff --git a/SearchQnA/ui/.env b/SearchQnA/ui/.env
@@ -0,0 +1 @@
+BACKEND_BASE_URL = 'http://xxxxx:8003/v1/rag'
diff --git a/SearchQnA/ui/.eslintignore b/SearchQnA/ui/.eslintignore
@@ -0,0 +1,13 @@
+.DS_Store
+node_modules
+/build
+/.svelte-kit
+/package
+.env
+.env.*
+!.env.example
+
+# Ignore files for PNPM, NPM and YARN
+pnpm-lock.yaml
+package-lock.json
+yarn.lock
diff --git a/SearchQnA/ui/.eslintrc.cjs b/SearchQnA/ui/.eslintrc.cjs
@@ -0,0 +1,34 @@
+// Copyright (c) 2024 Intel Corporation
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+module.exports = {
+	root: true,
+	parser: "@typescript-eslint/parser",
+	extends: ["eslint:recommended", "plugin:@typescript-eslint/recommended", "prettier"],
+	plugins: ["svelte3", "@typescript-eslint", "neverthrow"],
+	ignorePatterns: ["*.cjs"],
+	overrides: [{ files: ["*.svelte"], processor: "svelte3/svelte3" }],
+	settings: {
+		"svelte3/typescript": () => require("typescript"),
+	},
+	parserOptions: {
+		sourceType: "module",
+		ecmaVersion: 2020,
+	},
+	env: {
+		browser: true,
+		es2017: true,
+		node: true,
+	},
+};
diff --git a/SearchQnA/ui/.prettierignore b/SearchQnA/ui/.prettierignore
@@ -0,0 +1,13 @@
+.DS_Store
+node_modules
+/build
+/.svelte-kit
+/package
+.env
+.env.*
+!.env.example
+
+# Ignore files for PNPM, NPM and YARN
+pnpm-lock.yaml
+package-lock.json
+yarn.lock
diff --git a/SearchQnA/ui/.prettierrc b/SearchQnA/ui/.prettierrc
@@ -0,0 +1,13 @@
+{
+	"pluginSearchDirs": [
+		"."
+	],
+	"overrides": [
+		{
+			"files": "*.svelte",
+			"options": {
+				"parser": "svelte"
+			}
+		}
+	]
+}
diff --git a/SearchQnA/ui/README.md b/SearchQnA/ui/README.md
@@ -0,0 +1,33 @@
+<h1 align="center" id="title"><img align="center" src="./static/favicon.png" alt="project-image" width="50" height="50">
+Neural Chat</h1>
+
+### 📸 Project Screenshots
+
+![project-screenshot](https://imgur.com/SmhJSmC.png)
+![project-screenshot](https://imgur.com/iGTDcwU.png)
+![project-screenshot](https://imgur.com/cbJi5gj.png)
+
+<h2>🧐 Features</h2>
+
+Here're some of the project's features:
+
+- Start a Text Chat：Initiate a text chat with the ability to input written conversations, where the dialogue content can also be customized based on uploaded files.
+- Upload File: The choice between uploading locally or copying a remote link. Chat according to uploaded knowledge base.
+- Clear: Clear the record of the current dialog box without retaining the contents of the dialog box.
+- Chat history: Historical chat records can still be retained after refreshing, making it easier for users to view the context.
+- Scroll to Bottom / Top: The chat automatically slides to the bottom. Users can also click the top icon to slide to the top of the chat record.
+- End to End Time: Shows the time spent on the current conversation.
+
+<h2>🛠️ Get it Running:</h2>
+
+1. Clone the repo.
+
+2. cd command to the current folder.
+
+3. Modify the required .env variables.
+   ```
+   BACKEND_BASE_URL = ''
+   ```
+4. Execute `npm install` to install the corresponding dependencies.
+
+5. Execute `npm run dev` in both environments