Data Pipeline for RAG

Data Pipeline for RAG

Last Recalled
Parent Tag
,
#tags
Related cards
image

The data for a RAG pipeline has to be accurate and comprehensive. But in contrast to fine-tuning methods, the retrieval data isn’t being used to train the model. Which means it doesn’t have to be supplied so extensively.

In order to optimise the retrieval process, the documents are first pre-processed i.e. transformed into a data format that can be efficiently searched.

This typically involves extracting text from the documents, applying metadata, tokenizing the text, and creating vectors from the tokens.