Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAG AI: Seeking advice for optimising model and parameter tweaking for best results with backstage catalog #1778

Open
hexionas opened this issue Dec 27, 2024 · 1 comment

Comments

@hexionas
Copy link

Hey guys.

I am working with the RAG AI plugin and I have successfully got it to work with AWS Bedrock. I am very new to the space of LLMs and I just wanted to ask in the community if anyone has some advice about tweaking the parameters to give better results when asking the AI certain things about the catalog.

Do you have any combinations of embedding model / prompt models that work well with specific parameters for embedding/chunking etc? The current setup I have makes the prompt responses quite useless, with lots of hallucinating and just in general incorrect information.

My main use case is to reliably retrieve information from API definitions to generate some basic code snippets, or just to get information about how some entities are related to each other. Currently when generating embeddings I run into a lot of Throttling exceptions even on a single entity in backstage - I am unsure how to handle that particular problem - I guess for that single entity there is just far too many embeddings being generated.

Here is a sample of the config I am currently running with:

# Roadie RAG AI configuration
ai:
  supportedSources: ['catalog', 'tech-docs']

  storage:
    pgvector:
      # (Optional) The size of the chunk to flush when storing embeddings to the DB. Defaults to 500
      chunksize: 800

  embeddings:
    chunkSize: 800

    # (Optional) The overlap between adjacent chunks of embeddings. The bigger the number, the more overlap. Defaults to 200
    chunkOverlap: 700

    bedrock:
      # (Required) Name of the Bedrock model to use to create Embeddings.
      modelName: 'amazon.titan-embed-text-v1'
      maxTokens: 1024
      maxRetries: 10
  const model = new Bedrock({
    maxTokens: 1024,
    model: 'amazon.titan-text-express-v1',
    region: 'eu-central-1',
    credentials: credProvider.sdkCredentialProvider,
  });
@Xantier
Copy link
Contributor

Xantier commented Dec 30, 2024

In general AWS Titan models are not quite up to par at the moment compared to other models. I would suggest to enable and pick any other model that Bedrock provides as a starting point. To create better and more relevant embeddings and responses based on those, it is recommended to enhance the current pipeline that is implemented in the plugin sources.

You'd likely want to categorize your embeddings differently and determine the correct embeddings RetrievalRouter using AugmentationRetriever. This can help determine the actual items you want to send as a context to the LLM. For API specs, it makes sense to for example create embeddings only for API type entities and create retriever determination logic based on the query user is asking. This determination can be something like a small local model assisting and choosing the implementation of a RetrievalRouter you want to use, or better yet, letting the user to choose which one to use, or hardcoding it if only a single type is wanted.

Additionally you more than likely want to post-process the embeddings you have retrieved to match the actual contents you want to feed to the LLM. This could be something like determininig the correct parts of the API spec (or full spec if needed) instead of the cut down snippets only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants