Retrieval-augmented generation (RAG) is a widely used technique that augments large language models (LLMs) and GenAI apps by providing contextual information from external sources.
This method can significantly mitigate LLMs’ annoying hallucination issues. For example, if you ask a GenAI app to write an article about sharks, a RAG approach helps to ensure that the AI doesn’t make up a new type of shark or create new “facts” about known species. In addition, RAG also allows users to use domain-specific or private data for content generation while ensuring data security.
How does RAG work? Everything starts with a query. There are three key component steps in the RAG process: retrieval, augmentation and generation.
- Retrieval. This step identifies and retrieves content relevant to the user query by conducting a semantic search in a vector database. Vector databases store, index and retrieve vector embeddings created from external sources by a pretrained embedding model of your choice or a model that you build.
- Augmentation. After retrieving the semantically similar information from the vector database, the augmentation step combines the retrieved data with the original query along with any prompts, organizing them into instructions for the LLM to generate a response.
- Generation. This piece assembles content, paying attention to syntax, grammar, structure, etc., using natural language processing (NLP) provided by an LLM.
Choosing the Right Model and Vector Database for Your GenAI Apps
In a basic RAG system, the embedding model, the vector database and the LLM are the three most crucial building blocks. When you build a RAG framework, you need to decide early on what technologies best suit your application.
Basically, you can use any embedding model relevant to your application data to create vector embeddings, but each model has a unique way of generating vectors. This means you need to use the same model to generate the vector embeddings for both queries and datasets.
What vector databases you need to choose depends on the size of your data, the purpose of your applications, the data requirements you need to meet and so many other factors. Therefore, if you have a large dataset and want to build a RAG app for production, it is important to choose a vector database that can handle the scale.
Choosing the right LLM can be challenging as well. Fortunately, AWS Bedrock offers a variety of pretrained models, including embedding models and LLMs, to simplify this process. AWS Bedrock is a cloud service that provides access to these models, allowing you to select the one that best fits your application. You can use the chosen model for generating vector embeddings and as the LLM component of your RAG framework.
Integrate Zilliz Cloud With AWS Bedrock To Build a RAG Chain
This example shows you how to integrate LangChain, Zilliz Cloud (the managed version of Milvus) and AWS Bedrock. Let’s take a guided tour through the example.
There are four main steps to this integration:
- Install the required LangChain and AWS SDK for Python packages.
- Connect Zilliz Cloud to AWS Bedrock.
- Load and split documents from external sources.
- Predefine template guidelines and generate responses.
Install the Required Packages
To install the required packages, run the following script.
Configure the Zilliz Cloud/AWS Bedrock Connection
Once you’ve installed everything, configure the requisite environment variables to ensure that Zilliz and Bedrock can talk to each other. On the AWS side, you’ll need the AWS region name, key ID and access key. On the Zilliz side, you’ll need the cloud Uniform Resource Identifier (URI) and API key.
The AWS SDK for Python (boto3) lets you create, configure and manage AWS services. Next, you’ll create a boto3 client to connect to the AWS Bedrock Runtime service.
Use a ChatBedrock instance to gain access to all the Bedrock models. In this example, we’ll link it to anthropic.claude-3-sonnet-20240229-v1:0
.
You can select any of the other Bedrock models, but we use this one because it provides the infrastructure for generating text responses with model-specific settings, such as a low-temperature parameter to control response variability.
Load and Split Documents From External Sources
Now that everything is connected, we need to get some data from external sources. In this example, we’re pulling data from a specific web source: a blog post about AI agents.
We’ll use a WebBaseLoader instance to grab that data and then leverage the loader’s BeautifulSoup SoupStrainer
function to parse the relevant parts of the web page. We’re only targeting the following classes: “post-content,” “post-title” and “post-header.”
Once that data is loaded, we use a RecursiveCharacterTextSplitter instance to split it into smaller pieces, making it easier to work with and load into other components.
Generating Responses
Now we want to use the data we loaded to generate new content. We also want to ensure the output is accurate and mitigates AI hallucination. We instruct the AI to use statistical information and hard data whenever possible to support its claims.
The response should be specific and use statistics or numbers when possible.
Next, we initialize a Zilliz vector store containing the embeddings of the chunked documents. Having the documents as vectors is what makes it possible for RAG to do a semantic search to find and retrieve documents quickly and efficiently. The output should provide accurate, insightful, relevant and fact-based answers.
To recap, here are the steps for RAG chain:
- First, the question is converted to a vector embedding to enable retrieval of relevant documents stored in the vector database.
- Next, these documents are processed by a retriever and formatter.
- Then the documents are passed to a prompt template to format the response structure.
- Finally, a large language model receives this structured input to generate a coherent response, which is parsed into a string format and presented to the user.
For the full code of this example, please refer to this notebook.
RAG Use Cases
A RAG framework can enhance a lot of different use cases. The following list includes brief use-case descriptions. As you can see, these use cases span a variety of industries and verticals. Depending on your goals, you can find or build niche LLMs for these and other use cases.
Question-Answering Systems
RAG frameworks can provide detailed and accurate answers to user questions by retrieving relevant information from a large database and generating a coherent response.
Customer Support
Automated customer support systems can use RAG to find relevant information in support documents, manuals or FAQs and generate helpful responses to customer inquiries.
Content Creation and Summarization
RAG frameworks can help create content by retrieving relevant information from various sources and generating articles, reports or summaries.
Personalized Recommendations
In recommendation systems, RAG can enhance the generation of personalized recommendations by retrieving and synthesizing information based on user preferences and past behavior.
Educational Tools
Educational platforms can use RAG to generate personalized study materials, answer student questions and provide explanations based on a vast pool of educational resources.
Legal and Medical Assistance
RAG frameworks can benefit legal and medical professionals by allowing them to retrieve and synthesize information from case laws, medical literature and patient records to assist in decision-making and provide advice.
Interactive Storytelling and Gaming
RAG can be used to create dynamic and interactive storytelling experiences in games, where the system generates plot twists and dialogues based on retrieved story elements and user interactions.
Research and Development
Researchers can use RAG to gather and summarize relevant research papers, patents or technical documents, helping them stay updated with the latest developments and find connections between different pieces of information.
Virtual Assistants
Virtual assistants can use RAG to provide more accurate and contextually relevant responses by retrieving information from a knowledge base and generating appropriate replies.
Market Analysis and Business Intelligence
Businesses can use RAG to analyze market trends, competitor strategies and customer feedback by retrieving relevant data and generating insightful reports and action plans.
Code Generation and Documentation
Developers can use RAG frameworks to generate code snippets, documentation or explanations by retrieving relevant programming information from code repositories and technical documentation.
Final Thoughts
A RAG framework provides developers with a way to leverage large datasets, whether structured or unstructured, to build applications that are accurate and reliable. Pairing Zilliz Cloud with AWS Bedrock in a RAG framework gives you quick access to powerful tools. The prebuilt models in AWS Bedrock give you many options for building a wide range of GenAI applications. This getting-started tutorial is just the tip of the iceberg. To learn more about Zilliz Cloud, visit zilliz.com.
The post Get Started With AWS Bedrock for GenAI Apps appeared first on The New Stack.
Build a retrieval-augmented generation (RAG) framework with AWS Bedrock and a Zilliz Cloud vector database.