RAG AI: Seeking advice for optimising model and parameter tweaking for best results with backstage catalog #1778

hexionas · 2024-12-27T10:50:46Z

Hey guys.

I am working with the RAG AI plugin and I have successfully got it to work with AWS Bedrock. I am very new to the space of LLMs and I just wanted to ask in the community if anyone has some advice about tweaking the parameters to give better results when asking the AI certain things about the catalog.

Do you have any combinations of embedding model / prompt models that work well with specific parameters for embedding/chunking etc? The current setup I have makes the prompt responses quite useless, with lots of hallucinating and just in general incorrect information.

My main use case is to reliably retrieve information from API definitions to generate some basic code snippets, or just to get information about how some entities are related to each other. Currently when generating embeddings I run into a lot of Throttling exceptions even on a single entity in backstage - I am unsure how to handle that particular problem - I guess for that single entity there is just far too many embeddings being generated.

Here is a sample of the config I am currently running with:

# Roadie RAG AI configuration
ai:
  supportedSources: ['catalog', 'tech-docs']

  storage:
    pgvector:
      # (Optional) The size of the chunk to flush when storing embeddings to the DB. Defaults to 500
      chunksize: 800

  embeddings:
    chunkSize: 800

    # (Optional) The overlap between adjacent chunks of embeddings. The bigger the number, the more overlap. Defaults to 200
    chunkOverlap: 700

    bedrock:
      # (Required) Name of the Bedrock model to use to create Embeddings.
      modelName: 'amazon.titan-embed-text-v1'
      maxTokens: 1024
      maxRetries: 10

  const model = new Bedrock({
    maxTokens: 1024,
    model: 'amazon.titan-text-express-v1',
    region: 'eu-central-1',
    credentials: credProvider.sdkCredentialProvider,
  });

Xantier · 2024-12-30T09:41:48Z

In general AWS Titan models are not quite up to par at the moment compared to other models. I would suggest to enable and pick any other model that Bedrock provides as a starting point. To create better and more relevant embeddings and responses based on those, it is recommended to enhance the current pipeline that is implemented in the plugin sources.

You'd likely want to categorize your embeddings differently and determine the correct embeddings RetrievalRouter using AugmentationRetriever. This can help determine the actual items you want to send as a context to the LLM. For API specs, it makes sense to for example create embeddings only for API type entities and create retriever determination logic based on the query user is asking. This determination can be something like a small local model assisting and choosing the implementation of a RetrievalRouter you want to use, or better yet, letting the user to choose which one to use, or hardcoding it if only a single type is wanted.

Additionally you more than likely want to post-process the embeddings you have retrieved to match the actual contents you want to feed to the LLM. This could be something like determininig the correct parts of the API spec (or full spec if needed) instead of the cut down snippets only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG AI: Seeking advice for optimising model and parameter tweaking for best results with backstage catalog #1778

RAG AI: Seeking advice for optimising model and parameter tweaking for best results with backstage catalog #1778

hexionas commented Dec 27, 2024

Xantier commented Dec 30, 2024

RAG AI: Seeking advice for optimising model and parameter tweaking for best results with backstage catalog #1778

RAG AI: Seeking advice for optimising model and parameter tweaking for best results with backstage catalog #1778

Comments

hexionas commented Dec 27, 2024

Xantier commented Dec 30, 2024