Data Pipeline for RAG

Data Pipeline for RAG

In contrast to fine-tuning methods, the retrieved data for a RAG application isn’t being used to train the model.

Instead, the model’s output is augmented by a data pipeline that retrieves the most relevant subset of data from an external knowledge base.

To optimise the retrieval process, the contents of the knowledge base are transformed into a data format that can be efficiently searched.

Documents are typically indexed in the form of vector embeddings. Processed data and embeddings are stored in specialized vector databases optimized for handling vectorized data, enabling rapid search and retrieval operations.

Metadata is also applied in order to increase the precision and recall of the retrieval.

When a user submits a query, the query is vectorized using the LLM, resulting in a vector that is then queried in the vector index to retrieve the most similar documents.

This process allows for the retrieval of relevant information from indexed data based on similarity measures, enhancing the RAG system's ability to provide accurate responses by leveraging contextual information stored in the indexed vectors

Sources

  1. https://developer.nvidia.com/blog/rag-101-demystifying-retrieval-augmented-generation-pipelines/
  2. https://cameronrwolfe.substack.com/p/a-practitioners-guide-to-retrieval