Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(chat): hacking on some RAG chat, playing with TUI #61

Merged
merged 13 commits into from
Nov 21, 2024
Merged

feat(chat): hacking on some RAG chat, playing with TUI #61

merged 13 commits into from
Nov 21, 2024

Conversation

skyl
Copy link
Owner

@skyl skyl commented Nov 20, 2024

PR Type

enhancement, tests


Description

  • Implemented a new Text User Interface (TUI) for chat functionality using ratatui in Rust.
  • Added support for exclude globs in the GitCollector and configuration files.
  • Introduced new system messages for synthetic embedding and chat in Python.
  • Commented out unused or over-specified code related to summary tasks and synthetic embeddings.
  • Updated Docker Compose to include Azure endpoint environment variable.
  • Enhanced Rust CLI commands with additional functionality and debug prints.

Changes walkthrough 📝

Relevant files
Enhancement
9 files
models.py
Comment out synthetic embedding text code                               

py/packages/corpora/models.py

  • Commented out code related to synthetic embedding text.
  • Retained existing embedding functionality.
  • +3/-0     
    corpus.py
    Update system message and import in corpus router               

    py/packages/corpora/routers/corpus.py

  • Imported CHAT_SYSTEM_MESSAGE.
  • Updated system message text in chat function.
  • +2/-4     
    sync.py
    Comment out summary task generation                                           

    py/packages/corpora/tasks/sync.py

    • Commented out the call to generate_summary_task.
    +1/-1     
    llm_interface.py
    Add synthetic embedding text method                                           

    py/packages/corpora_ai/llm_interface.py

  • Added method get_synthetic_embedding_text.
  • Imported SYNTHETIC_EMBEDDING_SYSTEM_MESSAGE.
  • +22/-1   
    prompts.py
    Add new system messages for chat and embedding                     

    py/packages/corpora_ai/prompts.py

  • Added SYNTHETIC_EMBEDDING_SYSTEM_MESSAGE.
  • Added CHAT_SYSTEM_MESSAGE.
  • +15/-0   
    chat-tui.rs
    Implement chat TUI with event handling                                     

    rs/core/corpora_cli/src/commands/chat-tui.rs

  • Implemented a TUI for chat using ratatui.
  • Added event handling for user input.
  • +166/-0 
    init.rs
    Calculate and display tarball size in MB                                 

    rs/core/corpora_cli/src/commands/init.rs

    • Added calculation for tarball size in MB.
    +2/-2     
    git.rs
    Add exclude globs and debug prints in GitCollector             

    rs/core/corpora_cli/src/context/collector/git.rs

  • Added support for exclude globs in GitCollector.
  • Added debug print statements for path collection.
  • +31/-6   
    config.rs
    Support exclude globs and add debug prints in config         

    rs/core/corpora_cli/src/context/config.rs

  • Added support for exclude globs in configuration.
  • Added debug print statements for configuration loading.
  • +6/-5     
    Tests
    1 files
    test_sync.py
    Comment out summary task assertion in tests                           

    py/packages/corpora/tasks/test_sync.py

    • Commented out assertion for summary task call.
    +2/-1     
    Documentation
    2 files
    provider_loader.py
    Add comments for model specification in provider loader   

    py/packages/corpora_ai/provider_loader.py

    • Added comments for model specification.
    +3/-0     
    llm_client.py
    Add comments for runtime model specification in OpenAI client

    py/packages/corpora_ai_openai/llm_client.py

    • Added comments for runtime model specification.
    +6/-2     
    Miscellaneous
    1 files
    chat.rs
    Comment out debug print statements in chat command             

    rs/core/corpora_cli/src/commands/chat.rs

    • Commented out debug print statements.
    +4/-0     
    Configuration changes
    1 files
    docker-compose.yaml
    Add Azure endpoint to environment variables                           

    docker-compose.yaml

    • Added OPENAI_AZURE_ENDPOINT to environment variables.
    +2/-0     
    Dependencies
    1 files
    Cargo.toml
    Add new dependencies for TUI and glob handling                     

    rs/core/corpora_cli/Cargo.toml

  • Added new dependencies: crossterm, globset, ratatui, tui-textarea.
  • +4/-0     

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    github-actions bot commented Nov 20, 2024

    PR Reviewer Guide 🔍

    (Review updated until commit 0ed1c29)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Code Smell
    Commented-out code related to synthetic embedding is present. Consider removing it if it's not needed or adding a comment explaining why it's kept.

    Code Smell
    The function create_markdown_skin has commented-out code that seems to be related to styling. Consider removing or uncommenting it if needed.

    Logging
    Debug print statements are present in the collect_paths and collect_tarball methods. Consider using a logging framework for better control over log levels and outputs.

    Logging
    Debug print statements are present in the load_config function. Consider using a logging framework for better control over log levels and outputs.

    Copy link

    github-actions bot commented Nov 20, 2024

    PR Code Suggestions ✨

    Latest suggestions up to 0ed1c29
    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add error handling for file reading to prevent application crashes due to missing or unreadable files

    Handle potential errors when reading files for voice, purpose, and structure to
    prevent the application from panicking if the files are missing or unreadable.

    rs/core/corpora_cli/src/commands/chat-tui.rs [133-138]

    -let voice = std::fs::read_to_string(root_path.join(".corpora/VOICE.md")).unwrap_or_default();
    -let purpose = std::fs::read_to_string(root_path.join(".corpora/PURPOSE.md")).unwrap_or_default();
    -let structure = std::fs::read_to_string(root_path.join(".corpora/STRUCTURE.md")).unwrap_or_default();
    +let voice = std::fs::read_to_string(root_path.join(".corpora/VOICE.md")).unwrap_or_else(|_| String::from("Default voice"));
    +let purpose = std::fs::read_to_string(root_path.join(".corpora/PURPOSE.md")).unwrap_or_else(|_| String::from("Default purpose"));
    +let structure = std::fs::read_to_string(root_path.join(".corpora/STRUCTURE.md")).unwrap_or_else(|_| String::from("Default structure"));
    Suggestion importance[1-10]: 7

    Why: This suggestion improves the robustness of the application by adding error handling for file reading, which can prevent crashes due to missing or unreadable files. It enhances the application's reliability, making it a valuable improvement.

    7
    Handle errors in strip_prefix to prevent panics when paths do not match the expected prefix

    Ensure that the strip_prefix method handles potential errors to avoid panics when
    paths do not match the expected prefix.

    rs/core/corpora_cli/src/context/collector/git.rs [68-72]

     .filter(|path| {
    -    Self::is_text_file(path) && !self.exclude_globs.is_match(path.strip_prefix(&self.root_path).unwrap_or(path))
    +    Self::is_text_file(path) && !self.exclude_globs.is_match(path.strip_prefix(&self.root_path).unwrap_or_else(|_| path.clone()))
     })
    Suggestion importance[1-10]: 7

    Why: The suggestion to handle errors in strip_prefix is important for preventing potential panics, which can improve the stability and reliability of the code. It addresses a possible issue, making it a beneficial enhancement.

    7
    General
    Remove or properly integrate commented-out code for generating synthetic embeddings to avoid confusion

    Ensure that the commented-out code for generating synthetic embeddings is either
    removed or properly integrated, as leaving it commented can lead to confusion and
    maintenance issues.

    py/packages/corpora/models.py [60-63]

    -# better_text = llm.get_synthetic_embedding_text(text)
    -# print(f"better_text: {better_text}")
    -# vector = llm.get_embedding(better_text)
     vector = llm.get_embedding(text)
    Suggestion importance[1-10]: 5

    Why: The suggestion to remove or properly integrate commented-out code is valid as it can help reduce confusion and improve code maintainability. However, it does not address a critical issue, so the impact is moderate.

    5

    Previous suggestions

    Suggestions up to commit f9ce11b
    CategorySuggestion                                                                                                                                    Score
    General
    Improve error handling by logging errors instead of using unwrap_or_else

    Handle potential errors from process_request by logging or managing them instead of
    using unwrap_or_else, which may hide underlying issues.

    rs/core/corpora_cli/src/commands/chat-tui.rs [91]

    -let ai_response = response.unwrap_or_else(|e| format!("Error: {}", e));
    +let ai_response = match response {
    +    Ok(res) => res,
    +    Err(e) => {
    +        log::error!("Failed to process request: {}", e);
    +        format!("Error: {}", e)
    +    }
    +};
    Suggestion importance[1-10]: 9

    Why: Logging errors provides better visibility into issues and aids in debugging, making it a more robust approach than simply formatting errors, which can obscure underlying problems.

    9
    Remove debugging print statements to clean up console output

    Remove or replace the print statement used for debugging purposes to avoid
    unnecessary console output in production.

    py/packages/corpora/models.py [61]

    -print(f"better_text: {better_text}")
    +# print(f"better_text: {better_text}")
    Suggestion importance[1-10]: 7

    Why: Removing or commenting out debugging print statements is a good practice to prevent unnecessary console output in production, improving code cleanliness and performance.

    7
    Possible issue
    Ensure terminal state is restored even if an error occurs during execution

    Ensure that the terminal state is properly restored even if an error occurs by using
    a defer or similar mechanism to handle cleanup.

    rs/core/corpora_cli/src/commands/chat-tui.rs [110-112]

    -disable_raw_mode()?;
    -execute!(terminal.backend_mut(), terminal::LeaveAlternateScreen)?;
    -terminal.show_cursor()?;
    +// Use a defer mechanism to ensure cleanup
    +defer! {
    +    disable_raw_mode().unwrap();
    +    execute!(terminal.backend_mut(), terminal::LeaveAlternateScreen).unwrap();
    +    terminal.show_cursor().unwrap();
    +}
    Suggestion importance[1-10]: 8

    Why: Ensuring terminal state restoration even in the event of an error is crucial for maintaining a consistent user experience and preventing terminal corruption, making this a valuable improvement.

    8

    @@ -57,6 +57,9 @@ def get_relevant_splits(self, text: str, limit: int = 10):
    from corpora_ai.provider_loader import load_llm_provider

    llm = load_llm_provider()
    # better_text = llm.get_synthetic_embedding_text(text)
    Copy link
    Owner Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    might be worth manipulating the input before finding the nearest neighbor ... but ... it's slow and expensive ...

    @@ -28,7 +28,7 @@ def process_tarball(corpus_id: str, tarball: bytes) -> None:
    corpus_file.save()
    corpus_file.splits.all().delete()

    generate_summary_task.delay(corpus_file.id)
    # generate_summary_task.delay(corpus_file.id)
    Copy link
    Owner Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    less work ... we never really tried to use the summaries anyways ..

    @@ -87,3 +90,21 @@ def get_summary(self, text: str) -> str:
    ),
    ]
    )

    def get_synthetic_embedding_text(self, text: str) -> str:
    Copy link
    Owner Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    unused, could be removed .. but, feels like something is there ;9

    Copy link
    Owner Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I should probably just throw this away as it's currently not used. I tried to make a chatgpt-UI-like TUI but that's not an extremely trivial endeavor so I just kept the chat command mostly as it was. But, I think this ratatui idea has some legs. If I'm wrong, I'll just rm it and clean it up later.

    @skyl skyl marked this pull request as ready for review November 21, 2024 05:52
    Copy link

    Persistent review updated to latest commit 0ed1c29

    @skyl skyl merged commit ab7b60b into main Nov 21, 2024
    3 checks passed
    @skyl skyl deleted the chat-tui branch November 21, 2024 06:14
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant