How To Run an Agent on Federated Language Model Architecture

Image may be NSFW.
Clik here to view. abstract balls

In the first part of this series, I introduced the idea of federated language models, where we take advantage of a capable cloud-based large language model (LLM) and a small language model (SLM) running at the edge.

To recap, an agent sends the user query (1) along with the available tools (2) to a cloud-based LLM to map the prompt into a set of functions and arguments (3). It then executes the functions to generate appropriate context from a database (4a). If there are no tools involved, it leverages the simple RAG mechanism to perform a semantic search in a vector database (4b). The context gathered from either of the sources is then sent (5) to an edge-based SLM to generate a factually correct response. The response (6) generated by the SLM is sent as the final answer (7) to the user query.

Image may be NSFW.
Clik here to view.

This tutorial focuses on the practical implementation of a federated LM architecture based on the below components:

OpenAI GPT-4 Omni as an LLM.
Microsoft Phi-3 as a SLM.
Ollama as the inference engine for Phi-3.
Nvidia Jetson AGX Orin as the edge device to run Ollama.
MySQL database and a Flask API server running locally.
Chroma as the local vector store for semantic search.

Refer to the tutorial on setting up Ollama on Jetson Orin and implementing the RAG agent for additional context and details.

Start by cloning the Git repository https://github.com/janakiramm/federated-llm.git, which has the scripts, data, and Jupyter Notebooks. This tutorial assumes that you have access to OpenAI and an Nvidia Jetson Orin device. You can also run Ollama on your local workstation and change the IP address in the code.

Step 1: Run Ollama on Jetson Orin

SSH into Jetson Orin and run the commands mentioned in the file, setup_ollama.sh.

Verify that you are able to connect to and access the model by running the below command on your workstation, where you run Jupyter Notebook.

curl http://localhost:11434/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "phi3:mini",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "What is the capital of France?"
            }
        ]
    }'

Replace localhost with the IP address of your Jetson Orin device. If everything goes well, you should be able to get a response from the model.

Image may be NSFW.
Clik here to view.

Congratulations, your edge inference server is now ready!

Step 2: Run MySQL DB and Flask API Server

The next step is to run the API server, which exposes a set of functions that will be mapped to the prompt. To make this simple, I built a Docker Compose file that exposes the REST API endpoint for the MySQL database backend.

Switch to the api folder and run the below command:

start_api_server.sh

Check if two containers are running on your workstation with the docker ps command.

Image may be NSFW.
Clik here to view.

If you run the command curl "http://localhost:5000/api/sales/trends?start_date=2023-05-01&end_date=2023-05-30", you should see the response from the API.

Image may be NSFW.
Clik here to view.

Step 3: Index the PDF and Ingest the Embeddings in Chroma DB

With the API in place, it’s time to generate the embeddings for the datasheet PDF and store them in the vector database.

For this, run the Index_Datasheet.ipynb Jupyter Notebook, which is available in the notebooks folder.

A simple search retrieves the phrases that semantically match the query.

Image may be NSFW.
Clik here to view.

Step 4: Run the Federated LM Agent

The Jupyter Notebook, Federated-LM.ipynb, has the complete code to implement the logic. Let’s understand the key sections of the code.

We will import the API client that exposes the tools to the LLM.

Image may be NSFW.
Clik here to view.

We will import the API client that exposes the tools to the LLM.

First, we initialize two LLMs: GPT-4o (Cloud) and Phi3:mini (Edge)

Image may be NSFW.
Clik here to view.

After creating a Python list with the signatures of the tools, we will let GPT-4o map the prompt to appropriate functions and their arguments.

For example, passing the prompt What was the top selling product in Q2 based on revenue? to GPT-4o results in the model responding with the function get_top_selling_products and the corresponding arguments. Notice that a capable model is able to translate Q2 into date range, starting from April 1st to June 30th. This is exactly the power we want to exploit from the cloud-based LLM.

Image may be NSFW.
Clik here to view.

Once we execute the tool(s) suggested by GPT-4o, we execute, collect, and aggregate the output to form the context.

If the prompt doesn’t translate to tools, we attempt to use the retriever based on the semantic search from the vector database.

Image may be NSFW.
Clik here to view.

To avoid sending sensitive context to the cloud-based LLM, we leverage the model (edge_llm) at the edge for generation.

Image may be NSFW.
Clik here to view.

Finally, we implement the agent that orchestrates the calls between the cloud-based LLM and the edge-based LLM. It checks if the tools list is empty and then moves to the retriever to generate the context. If both are empty, the agent responds with the phrase “I don’t know.”

Image may be NSFW.
Clik here to view.

Below is the response from the agent based on tools, retriever, and unknown context.

Image may be NSFW.
Clik here to view.

To summarize, we implemented a federated LLM approach where an agent sends the user query along with available tools to a cloud-based LLM, which maps the prompt into functions and arguments. The agent executes these functions to generate context from a database. If no tools are involved, a simple RAG mechanism is used for semantic search in a vector database. The context is then sent to an edge-based SLM to generate a factually correct response, which is provided as the final answer to the user query.

The post How To Run an Agent on Federated Language Model Architecture appeared first on The New Stack.

A tutorial on implementing a federated LLM approach, where an agent sends data to a cloud-based LLM, which then outputs functions and arguments.

How To Run an Agent on Federated Language Model Architecture

Step 1: Run Ollama on Jetson Orin

Step 2: Run MySQL DB and Flask API Server

Step 3: Index the PDF and Ingest the Embeddings in Chroma DB

Step 4: Run the Federated LM Agent

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112