
This tutorial demonstrates how to build an intelligent application that combines Retrieval-Augmented Generation (RAG) and tool calling using Nvidia NIM and LangChain. By integrating these technologies, we create a system capable of providing real-time flight status updates and detailed baggage information.
For a background on Nvidia NIM, refer to my previous tutorials on consuming the API and self-hosting the models as containers.
RAG enhances language models by retrieving relevant documents to generate accurate, context-specific responses — especially when dealing with specialized knowledge domains. On the other hand, tool calling extends the model’s capabilities by allowing it to interact with external APIs, fetching real-time data, and performing functions beyond its inherent knowledge.
By combining RAG and tool calling, we bridge the gap between a model’s existing knowledge and the need for up-to-date information. Nvidia NIM offers a robust language model foundation, while LangChain provides a flexible framework for managing interactions between the language model, retrieval systems, and external tools. This synergy enables us to build advanced AI applications that are both knowledgeable and responsive to real-world data.
In this tutorial, we build an agent that retrieves (RAG) data from an airline’s website to answer questions related to baggage policy. Based on the user query, it may also invoke an API (tool calling) to get the real-time status of a flight.
Prerequisites
- Python 3.7 or higher
- API keys for NVIDIA NIM API and FlightAware AeroAPI
- Basic understanding of LLMs and RAG
Setting Up the Environment
First, let’s install and import the necessary libraries and set up our environment variables.
pip install -U langchain langchain-nvidia-ai-endpoints langchain-community faiss-cpu
import os from datetime import datetime, timedelta import pytz import requests import json from langchain_nvidia_ai_endpoints import ChatNVIDIA, NVIDIAEmbeddings from langchain_community.document_loaders import WebBaseLoader from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.vectorstores import FAISS from langchain_core.tools import tool from langchain_core.messages import HumanMessage from langchain_core.runnables import RunnablePassthrough, RunnableLambda # Replace these with your actual API keys os.environ["NVIDIA_API_KEY"] = "your_nvidia_api_key" AEROAPI_BASE_URL = "https://aeroapi.flightaware.com/aeroapi" AEROAPI_KEY = "your_aero_api_key"
Initializing the Language Model
We initialize Meta Llama 3.1 450B parameter model that we’ll use for generating responses.
# Initialize LLM llm = ChatNVIDIA(model="meta/llama-3.1-405b-instruct")
Creating the Flight Status Tool
Next, we create a tool that fetches flight status information using the FlightAware AeroAPI. This tool will allow our language model to provide real-time flight updates when queried.
# Flight status tool @tool def get_flight_status(flight_id: str): """ Returns flight information for a given flight ID. """ def get_api_session(): session = requests.Session() session.headers.update({"x-apikey": AEROAPI_KEY}) return session def fetch_flight_data(flight_id, session): # Extract flight_id if it contains 'flight_id=' if "flight_id=" in flight_id: flight_id = flight_id.split("flight_id=")[1] # Define the time range for the API query start_date = datetime.now().date().strftime('%Y-%m-%d') end_date = (datetime.now().date() + timedelta(days=1)).strftime('%Y-%m-%d') api_resource = f"/flights/{flight_id}?start={start_date}&end={end_date}" # Make the API request response = session.get(f"{AEROAPI_BASE_URL}{api_resource}") response.raise_for_status() flights = response.json().get('flights', []) if not flights: raise ValueError(f"No flight data found for flight ID {flight_id}.") return flights[0] def utc_to_local(utc_date_str, local_timezone_str): utc_datetime = datetime.strptime(utc_date_str, '%Y-%m-%dT%H:%M:%SZ').replace(tzinfo=pytz.utc) local_timezone = pytz.timezone(local_timezone_str) local_datetime = utc_datetime.astimezone(local_timezone) return local_datetime.strftime('%Y-%m-%d %H:%M:%S') # Get session and fetch flight data session = get_api_session() flight_data = fetch_flight_data(flight_id, session) # Determine departure and arrival keys dep_key = 'estimated_out' if flight_data.get('estimated_out') else 'scheduled_out' arr_key = 'estimated_in' if flight_data.get('estimated_in') else 'scheduled_in' # Build flight details flight_details = { 'source': flight_data['origin']['city'], 'destination': flight_data['destination']['city'], 'depart_time': utc_to_local(flight_data[dep_key], flight_data['origin']['timezone']), 'arrival_time': utc_to_local(flight_data[arr_key], flight_data['destination']['timezone']), 'status': flight_data['status'] } return ( f"The current status of flight {flight_id} from {flight_details['source']} to {flight_details['destination']} " f"is {flight_details['status']} with departure time at {flight_details['depart_time']} and arrival time at " f"{flight_details['arrival_time']}." )
Here is a brief description of the functions:
Decorator @tool
: Registers the function as a tool accessible by the language model.get_api_session()
: Creates an HTTP session with the necessary API key for authentication.fetch_flight_data()
: Fetches flight information for the specified flight ID within a defined time range.utc_to_local()
: Converts UTC date strings to local time based on the flight’s origin and destination time zones.
Binding the Tool to the Language Model
We bind our flight status tool to the LLM so that it can invoke the tool when necessary.
# LLM with tools llm_with_tools = llm.bind_tools([get_flight_status], tool_choice="required")
Loading and Processing Documents
We load baggage information from the Emirates website and prepare it for retrieval by splitting the text into manageable chunks.
# Document loading and processing def load_and_process_documents(url): """ Loads documents from a URL and splits them into chunks for processing. """ loader = WebBaseLoader(url) docs = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) return text_splitter.split_documents(docs)
Setting Up the Vector Store
We create a vector store using Nvidia embeddings to enable efficient retrieval of relevant document sections.
# Vector store setup def setup_vector_store(documents): """ Sets up a vector store for document retrieval using embeddings. """ embeddings = NVIDIAEmbeddings() vector_store = FAISS.from_documents(documents, embeddings) return vector_store.as_retriever()
The NVIDIAEmbeddings
class generates embeddings for the documents using NVIDIA’s language model, while the FAISS.from_documents
method builds a vector store for similarity. Finally, search as_retriever
converts the vector store into a retriever object for querying.
Implementing the Retrieval Function
We define a function that retrieves answers based on the user’s question and the context provided by the documents.
# Retrieval function def retrieve(input_dict): """ Retrieves an answer based on the question and the context from documents. """ question = input_dict["question"] docs = retriever.invoke(question) context = " ".join(doc.page_content for doc in docs) evaluation_prompt = ( f"Based on the following context, can you answer the question '{question}'? " "If yes, provide the answer. If no, respond with 'Unable to answer based on the given context.'\n\n" f"Context: {context}" ) evaluation_messages = [HumanMessage(content=evaluation_prompt)] evaluation_result = llm.invoke(evaluation_messages) if "Unable to answer based on the given context" in evaluation_result.content: final_answer = use_flight_status_tool(question) else: final_answer = evaluation_result.content.strip() return { "context": context, "question": question, "answer": final_answer }
The function starts by extracting the user’s question from the input dictionary. It then uses the retriever to find relevant documents that may contain information pertinent to the question. The content of these retrieved documents is combined to form a comprehensive context. An evaluation prompt is created to assess whether the question can be adequately answered using the available context. The language model is invoked with this prompt to generate a response. If the language model is unable to provide an answer based on the context, the function employs a fallback mechanism by calling the use_flight_status_tool function. Finally, the function packages the results by returning a dictionary that includes the context, the question, and the final answer.
Using the Flight Status Tool
We define a function that leverages the flight status tool when the LLM cannot answer based on the provided context.
def use_flight_status_tool(question): """ Uses the flight status tool to answer flight status related questions. """ tool_messages = [HumanMessage(content=question)] ai_msg = llm_with_tools.invoke(tool_messages) if hasattr(ai_msg, 'tool_calls') and ai_msg.tool_calls: tool_call = ai_msg.tool_calls[0] try: tool_name = tool_call['name'].lower() tool_args = tool_call['args'] # Select and invoke the appropriate tool selected_tool = {"get_flight_status": get_flight_status}[tool_name] return selected_tool.invoke(tool_args['flight_id']) except Exception as e: return f"Error retrieving flight status: {str(e)}" else: return "Unable to retrieve flight status information."
Building the RAG Chain
We construct a RAG chain based on LangChain Express Language (LCEL), which combines the retrieval and generation processes to produce the final answer.
# RAG chain setup rag_chain = ( RunnablePassthrough() | RunnableLambda(retrieve) | (lambda x: x["answer"]) )
Invoking the Agent
We define a function that processes user questions using the RAG Agent.
def process_question(question): """ Processes a question and returns an answer. """ return rag_chain.invoke({"question": question})
Finally, we put everything together and test our application with sample questions.
# Main execution if __name__ == "__main__": # Load and process documents documents = load_and_process_documents( "https://www.emirates.com/in/english/before-you-fly/baggage/cabin-baggage-rules/" ) # Setup vector store and retriever retriever = setup_vector_store(documents) # Example usage questions = [ "What is flight status of EK524?", "What is the cabin baggage size?" ] for question in questions: result = process_question(question) print(f"Question: {question}") print(f"Answer: {result}\n")
Conclusion
In this tutorial, we’ve built an intelligent application that combines Retrieval-Augmented Generation and tool calling using Nvidia NIM and LangChain. By integrating these technologies, we’ve created a system capable of providing both static information from documents and dynamic, real-time data from external APIs.
This approach showcases how combining RAG with tool usage can enhance the capabilities of language models, making them more versatile and practical for real-world applications. The entire code is available via GitHub Gist below.
View the code on Gist.
The post How To Build a RAG Agent With Nvidia NIM and LangChain appeared first on The New Stack.
Nvidia NIM offers a robust language model, while LangChain provides a flexible framework for interactions with external data and tools.