How To Build a RAG Agent With Nvidia NIM and LangChain

This tutorial demonstrates how to build an intelligent application that combines Retrieval-Augmented Generation (RAG) and tool calling using Nvidia NIM and LangChain. By integrating these technologies, we create a system capable of providing real-time flight status updates and detailed baggage information.

For a background on Nvidia NIM, refer to my previous tutorials on consuming the API and self-hosting the models as containers.

RAG enhances language models by retrieving relevant documents to generate accurate, context-specific responses — especially when dealing with specialized knowledge domains. On the other hand, tool calling extends the model’s capabilities by allowing it to interact with external APIs, fetching real-time data, and performing functions beyond its inherent knowledge.

By combining RAG and tool calling, we bridge the gap between a model’s existing knowledge and the need for up-to-date information. Nvidia NIM offers a robust language model foundation, while LangChain provides a flexible framework for managing interactions between the language model, retrieval systems, and external tools. This synergy enables us to build advanced AI applications that are both knowledgeable and responsive to real-world data.

In this tutorial, we build an agent that retrieves (RAG) data from an airline’s website to answer questions related to baggage policy. Based on the user query, it may also invoke an API (tool calling) to get the real-time status of a flight.

Prerequisites

Python 3.7 or higher
API keys for NVIDIA NIM API and FlightAware AeroAPI
Basic understanding of LLMs and RAG

Setting Up the Environment

First, let’s install and import the necessary libraries and set up our environment variables.

pip install -U langchain langchain-nvidia-ai-endpoints langchain-community faiss-cpu

import os
from datetime import datetime, timedelta
import pytz
import requests
import json

from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

# Replace these with your actual API keys
os.environ["NVIDIA_API_KEY"] = "your_nvidia_api_key"
AEROAPI_BASE_URL = "https://aeroapi.flightaware.com/aeroapi"
AEROAPI_KEY = "your_aero_api_key"

Initializing the Language Model

We initialize Meta Llama 3.1 450B parameter model that we’ll use for generating responses.

# Initialize LLM
llm = ChatNVIDIA(model="meta/llama-3.1-405b-instruct")

Creating the Flight Status Tool

Next, we create a tool that fetches flight status information using the FlightAware AeroAPI. This tool will allow our language model to provide real-time flight updates when queried.

# Flight status tool
@tool
def get_flight_status(flight_id: str):
    """
    Returns flight information for a given flight ID.
    """
    def get_api_session():
        session = requests.Session()
        session.headers.update({"x-apikey": AEROAPI_KEY})
        return session

    def fetch_flight_data(flight_id, session):
        # Extract flight_id if it contains 'flight_id='
        if "flight_id=" in flight_id:
            flight_id = flight_id.split("flight_id=")[1]
        
        # Define the time range for the API query
        start_date = datetime.now().date().strftime('%Y-%m-%d')
        end_date = (datetime.now().date() + timedelta(days=1)).strftime('%Y-%m-%d')
        api_resource = f"/flights/{flight_id}?start={start_date}&amp;end={end_date}"
        
        # Make the API request
        response = session.get(f"{AEROAPI_BASE_URL}{api_resource}")
        response.raise_for_status()
        flights = response.json().get('flights', [])
        if not flights:
            raise ValueError(f"No flight data found for flight ID {flight_id}.")
        return flights[0]

    def utc_to_local(utc_date_str, local_timezone_str):
        utc_datetime = datetime.strptime(utc_date_str, '%Y-%m-%dT%H:%M:%SZ').replace(tzinfo=pytz.utc)
        local_timezone = pytz.timezone(local_timezone_str)
        local_datetime = utc_datetime.astimezone(local_timezone)
        return local_datetime.strftime('%Y-%m-%d %H:%M:%S')

    # Get session and fetch flight data
    session = get_api_session()
    flight_data = fetch_flight_data(flight_id, session)

    # Determine departure and arrival keys
    dep_key = 'estimated_out' if flight_data.get('estimated_out') else 'scheduled_out'
    arr_key = 'estimated_in' if flight_data.get('estimated_in') else 'scheduled_in'

    # Build flight details
    flight_details = {
        'source': flight_data['origin']['city'],
        'destination': flight_data['destination']['city'],
        'depart_time': utc_to_local(flight_data[dep_key], flight_data['origin']['timezone']),
        'arrival_time': utc_to_local(flight_data[arr_key], flight_data['destination']['timezone']),
        'status': flight_data['status']
    }
    
    return (
        f"The current status of flight {flight_id} from {flight_details['source']} to {flight_details['destination']} "
        f"is {flight_details['status']} with departure time at {flight_details['depart_time']} and arrival time at "
        f"{flight_details['arrival_time']}."
    )

Here is a brief description of the functions:

Decorator @tool: Registers the function as a tool accessible by the language model.
get_api_session(): Creates an HTTP session with the necessary API key for authentication.
fetch_flight_data(): Fetches flight information for the specified flight ID within a defined time range.
utc_to_local(): Converts UTC date strings to local time based on the flight’s origin and destination time zones.

Binding the Tool to the Language Model

We bind our flight status tool to the LLM so that it can invoke the tool when necessary.

# LLM with tools
llm_with_tools = llm.bind_tools([get_flight_status], tool_choice="required")

Loading and Processing Documents

We load baggage information from the Emirates website and prepare it for retrieval by splitting the text into manageable chunks.

# Document loading and processing
def load_and_process_documents(url):
    """
    Loads documents from a URL and splits them into chunks for processing.
    """
    loader = WebBaseLoader(url)
    docs = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    return text_splitter.split_documents(docs)

Setting Up the Vector Store

We create a vector store using Nvidia embeddings to enable efficient retrieval of relevant document sections.

# Vector store setup
def setup_vector_store(documents):
    """
    Sets up a vector store for document retrieval using embeddings.
    """
    embeddings = NVIDIAEmbeddings()
    vector_store = FAISS.from_documents(documents, embeddings)
    return vector_store.as_retriever()

The NVIDIAEmbeddings class generates embeddings for the documents using NVIDIA’s language model, while the FAISS.from_documents method builds a vector store for similarity. Finally, search as_retriever converts the vector store into a retriever object for querying.

Implementing the Retrieval Function

We define a function that retrieves answers based on the user’s question and the context provided by the documents.

# Retrieval function
def retrieve(input_dict):
    """
    Retrieves an answer based on the question and the context from documents.
    """
    question = input_dict["question"]
    docs = retriever.invoke(question)
    context = " ".join(doc.page_content for doc in docs)
    
    evaluation_prompt = (
        f"Based on the following context, can you answer the question '{question}'? "
        "If yes, provide the answer. If no, respond with 'Unable to answer based on the given context.'\n\n"
        f"Context: {context}"
    )

    evaluation_messages = [HumanMessage(content=evaluation_prompt)]
    evaluation_result = llm.invoke(evaluation_messages)

    if "Unable to answer based on the given context" in evaluation_result.content:
        final_answer = use_flight_status_tool(question)
    else:
        final_answer = evaluation_result.content.strip()
    
    return {
        "context": context,
        "question": question,
        "answer": final_answer
    }

The function starts by extracting the user’s question from the input dictionary. It then uses the retriever to find relevant documents that may contain information pertinent to the question. The content of these retrieved documents is combined to form a comprehensive context. An evaluation prompt is created to assess whether the question can be adequately answered using the available context. The language model is invoked with this prompt to generate a response. If the language model is unable to provide an answer based on the context, the function employs a fallback mechanism by calling the use_flight_status_tool function. Finally, the function packages the results by returning a dictionary that includes the context, the question, and the final answer.

Using the Flight Status Tool

We define a function that leverages the flight status tool when the LLM cannot answer based on the provided context.

def use_flight_status_tool(question):
    """
    Uses the flight status tool to answer flight status related questions.
    """
    tool_messages = [HumanMessage(content=question)]
    ai_msg = llm_with_tools.invoke(tool_messages)
    
    if hasattr(ai_msg, 'tool_calls') and ai_msg.tool_calls:
        tool_call = ai_msg.tool_calls[0]
        try:
            tool_name = tool_call['name'].lower()
            tool_args = tool_call['args']
            # Select and invoke the appropriate tool
            selected_tool = {"get_flight_status": get_flight_status}[tool_name]
            return selected_tool.invoke(tool_args['flight_id'])
        except Exception as e:
            return f"Error retrieving flight status: {str(e)}"
    else:
        return "Unable to retrieve flight status information."

Building the RAG Chain

We construct a RAG chain based on LangChain Express Language (LCEL), which combines the retrieval and generation processes to produce the final answer.

# RAG chain setup
rag_chain = (
    RunnablePassthrough()
    | RunnableLambda(retrieve)
    | (lambda x: x["answer"])
)

Invoking the Agent

We define a function that processes user questions using the RAG Agent.

def process_question(question):
    """
    Processes a question and returns an answer.
    """
    return rag_chain.invoke({"question": question})

Finally, we put everything together and test our application with sample questions.

# Main execution
if __name__ == "__main__":
    # Load and process documents
    documents = load_and_process_documents(
        "https://www.emirates.com/in/english/before-you-fly/baggage/cabin-baggage-rules/"
    )
    
    # Setup vector store and retriever
    retriever = setup_vector_store(documents)

    # Example usage
    questions = [
        "What is flight status of EK524?",
        "What is the cabin baggage size?"
    ]
    
    for question in questions:
        result = process_question(question)
        print(f"Question: {question}")
        print(f"Answer: {result}\n")

Conclusion

In this tutorial, we’ve built an intelligent application that combines Retrieval-Augmented Generation and tool calling using Nvidia NIM and LangChain. By integrating these technologies, we’ve created a system capable of providing both static information from documents and dynamic, real-time data from external APIs.

This approach showcases how combining RAG with tool usage can enhance the capabilities of language models, making them more versatile and practical for real-world applications. The entire code is available via GitHub Gist below.

View the code on Gist.

The post How To Build a RAG Agent With Nvidia NIM and LangChain appeared first on The New Stack.

Nvidia NIM offers a robust language model, while LangChain provides a flexible framework for interactions with external data and tools.

How To Build a RAG Agent With Nvidia NIM and LangChain

Prerequisites

Setting Up the Environment

Initializing the Language Model

Creating the Flight Status Tool

Binding the Tool to the Language Model

Loading and Processing Documents

Setting Up the Vector Store

Implementing the Retrieval Function

Using the Flight Status Tool

Building the RAG Chain

Invoking the Agent

Conclusion

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112