
Operating a Windows program. Automating invoice reconciliation. Booking a flight and hotel.
These are just a few tasks that a new class of large language models (LLMs) could enable for AI agents. Researchers are calling this next phase of LLMs “large action models,” or LAMs.
To date, LLMs have been stateless — unable to act, adapt or interact with tools on their own. But now, LAMs are set to let agents perform increasingly sophisticated actions and even navigate graphical user interfaces (GUIs).
“LAMs are a critical inflection point in the evolution of AI systems, moving from passive responders to autonomous operators,” Preetpal Singh, group managing director at the IT services company Xebia, told The New Stack. In effect, LAMs are shifting the industry from generative AI to agentic AI.
With LAMs at their core, AI agents are poised to outpace yesterday’s AI. “AI has always needed a do-engine, and LAMs are a generative AI response to that need,” Scott Willson, head of product marketing, xtype, a ServiceNow multi-instance management platform company, told The New Stack.
Others agree that LAMs underpin actionable agents. “When you’re talking about LAMs, you’re really talking about agents,” Keith Pijanowski, AI solutions engineer at MinIO, an object storage system, told The New Stack. “The LAM is really the brain behind agents.”
Understanding Large Action Models
LAMs are LLMs trained on specific actions and enhanced with real connectivity to external data and systems. This makes the agents they power more robust than basic LLMs, which are limited to reasoning, retrieval and text generation.
Whereas LLMs are more general-purpose, trained on a large data corpus, LAMs are more task-oriented. “LAMs fine-tune an LLM to specifically be good at recommending actions to complete a goal,” Jason Fournier, vice president of AI initiatives at the education platform Imagine Learning, told The New Stack.
Examples of LAMs so far include:
- Microsoft researchers have developed a LAM that performs tasks in Office, according to The Decoder.
- Orby recently debuted a LAM for automating enterprise tasks.
- CogAgent is an open source model designed to execute tasks in GUIs.
- The University of California-Berkeley shared Gorilla, a fine-tuned model that extends retrieval augmented generation (RAG) with a runtime for executing LLM-generated actions.
Academic research into LAMs is ongoing, and defining them in an industry context remains challenging. While the naming isn’t standardized, many projects described as “LLMs with tool use” or “agent frameworks” likely fall under the LAM umbrella.
For instance, OpenAI recently added a “Computer Use” function to its Responses API, allowing developers to guide AI through on-screen actions like clicking or scrolling. While OpenAI doesn’t use the term LAM, this reflects a broader surge in tools enabling more actionable AI agents.
“There is a growing demand for systems that go beyond language-based assistance and move toward intelligent agents capable of performing real-world actions,” wrote Microsoft researchers in the abstract for an LAM study they released in December. Another study, updated in May, depicts “a new generation of LLM-brained GUI agents.”
How Do LAMs Advance AI Agents?
Traditionally, business automation relied on robotic process automation (RPA), which mimicked user behaviors like clicks, scrolling or copying text to automate repetitive tasks. LAMs take this further.
Rather than following hardcoded logic, LAM-powered agents gather information at runtime — even data that didn’t exist when the workflow was first defined. “It’s more like dynamic business logic,” said Pijanowski.
Willson sees LAMs as “way better” than RPA: “Unlike conventional automation that follows rigid, pre-programmed rules, LAMs can adapt to changes in user interfaces and workflows.” You can also speak to them in plain language and let them handle the implementation details.
LAMs build on RAG, which lets LLMs pull in external documents. “RAG got the industry thinking we could give LLMs more information at inference time,” said Pijanowski. “RAG was the very first agent, but it only had one tool: go to a vector database and get me little chunks of documents.”
A LAM goes further — not just retrieving information or mimicking actions, but actually solving tasks. That could mean carrying out multistep workflows, like booking a vacation, said Pijanowski.
What LAMs Could Enable for Enterprises
In an enterprise setting, Pijanowski pointed to factory management as a promising use case. A LAM could automate maintenance by monitoring equipment, analyzing images for defects, and syncing with other platforms to create alerts, trigger orders or track inventory.
With Model Context Protocol (MCP) servers, which connect AI agents to external tools, in the mix, agents are also primed for areas like cloud DevOps. For example, MinIO’s AIStor MCP server enables LAMs to autonomously manage cloud files and perform administrative tasks.
LAMs trained on internal actions could streamline industry-specific workflows as well. Imagine Learning, for instance, has developed a curriculum-informed AI framework to support teachers and students with AI-powered lesson planning. Fournier sees promise in automating administrative tasks like student registration, synthesizing data for educators and enhancing the learning experience.
Or, Willson said, consider marketing: “You could tell an agentic AI platform with LAM technology, ‘Launch our new product campaign for the ACME software across all our channels with our standard messaging framework.'” Capabilities like this could save time, ensure brand consistency, and free teams to focus on high-level strategy.
Singh sees potential in automating tasks across finance and legal, such as analyzing transactions, reconciling invoices, reviewing contracts and handling customer support inquiries. “This type of automation can reduce operational costs while improving accuracy and speed,” he said.
In short, Willson said, “LAMs offer transformative potential for enterprise operations by automating complex workflows that currently require human intervention.”
Putting LAMs To Work
“LAMs are a key component of the agentic AI ecosystem,” said Willson. “Instead of simply kicking off a workflow, LAMs can determine the appropriate steps needed to achieve a goal.”
With agents doing the work, teams can skip building bespoke API integrations. “The power of LAMs lies in their ability to interact with existing software interfaces just as humans do, without requiring specialized integrations or APIs,” added Willson.
Still, implementing LAMs requires upfront effort and comes with unique development constraints. First, developers must train an LLM on all available actions.
Willson recommends “learning from observation,” where the model watches how humans interact with software and mimics those actions. “This learning capability allows them to improve over time without explicit reprogramming — a kind of monkey-see, monkey-do approach that traditional automation lacks.”
Next is deciding on a framework or tool to build the agent that sits on top of it, said Pijanowski. “With generative AI, it was as simple as deploying your LLM,” he said. “With agentic AI, you need a framework for looping it together and deploying it correctly into production.” He pointed to LangChain’s LangGraph as one example.
Standard practices like continuous integration, testing, monitoring and version control still apply, said Singh. Architecture matters, too: “LAMs perform best in environments that are modular and interoperable.”
The Downsides of Using LAMs
Security remains a pressing concern for potential LAM users. “LLM security practices are still evolving to address issues such as jailbreaking models, prompt injections and prompt leaking,” said Fournier. Agentic AI expands the attack surface, so LAMs need strong guardrails — especially in sectors like education, where privacy, accuracy and bias mitigation are critical.
“Another focus will be evaluations and benchmarks to better understand what these systems are doing and how to improve them,” said Fournier. This will require new tools for monitoring and continual assessment.
LAMs also aren’t suitable for every use case, Willson noted: “Traditional RPA remains better for high-volume, unchanging processes that involve simple, repetitive tasks with stable interfaces, where the efficiency of purpose-built solutions outweighs the flexibility of LAMs.”
While LAMs excel at dynamic, multi-system workflows, Willson said, RPA is better suited for highly deterministic tasks, such as regulated environments, legacy system integration, or real-time processing with strict performance requirements.
Another hurdle is connectivity. Within an agent, you have the control plane (the LAM) that parses requests and performs thinking. Underneath it, said Pijanowski, you have the tool plane, which connects to MCP servers, databases, APIs and other LLM-based agents.
LAMs will require a standard protocol to connect this control plane with external tools. And although Anthropic’s MCP is leading the pack, proposed alternatives like Google’s Agent2Agent Protocol (A2A) and Cisco’s Open Agentic Schema Framework (OASF) are close behind. The de facto protocol has yet to be crowned.
No Agentic AI Future Without LAMs
Gartner predicts that more than 33% of enterprise apps will embed agentic AI by 2028. While LAMs will likely power many of those tasks, some question how they’ll add value to nuanced workflows.
Some, like Fournier, remain cautious about how well LAMs will handle subjective or judgment-based tasks. But others are more certain.
“LAMs are not just enhancing generative AI but extending it to deliver business value across complex, real-world environments,” said Singh. “LAMs are a clear progression in the development of agentic AI.”
For Wilson, it’s fundamental: “It is a necessary component of agentic AI. I don’t see how you have agentic AI without a LAM.”
The post What Are Large Action Models? appeared first on The New Stack.
LAMs are the eyes and brain of tomorrow's more actionable AI agents.