Kong co-founder and CEO Augusto Marietti foresees a day when, just as there are app stores, there will be API stores, where both developers and machines can find and use APIs.
“Today is really the AI era,” Marietti told audiences at the virtual Kong API Summit on Thursday. “We’re moving from human-based services that needed UI to functions to machine-based services. And machines, they talk through APIs, they don’t talk through UI.”
Navigating this increased use of AI and AI-related APIs, he added, will take smarter, AI-infused API management. Among the AI-infused updates Kong introduced this week were two of specific interest to developers: the addition of semantic intelligence to its AI Gateway version 3.8 and a new Insomnia offering called AI Runner.
What Open Source AI Gateway Offers
AI Gateway is an open source plugin for Kong API Gateway. It was introduced as a beta offering in February as a way to govern and secure generative AI. This update can make GenAI up to 20 times faster as well as more cost-effective, the company claimed in a press release.
“We need an AI Gateway because we can move both the developers to a more productive place and the organization to a place where they have control of that AI traffic,” Marco Palladino, Kong co-founder and CTO, said as he introduced the AI Gateway upgrades.
“It is not possible to scale AI consumption across every team in the organization if we do not do this. It’s like writing software on paper. It doesn’t work, it’s hard, and it does not scale.”
Semantic intelligence, he said, is the ability to understand the meaning of the prompts being sent to AI. The AI Gateway, between the AI applications that organizations build and the large language model (LLM) and vector database technologies used to power AI, now provides three new capabilities.
Semantic Caching
The first capability semantic intelligence adds is semantic caching.
“These prompts are being translated into vectors that we can store in a vector database of choice, and by doing so, we can then compute the similarity of prompts that have the same meaning but use different words.” Palladino said.
For example, it can compare the prompts “How long does it take to cook a steak?” and “How long does it take to cook a fillet?” and determine they are similar.
“These prompts have a very high similarity between each other, so semantically, they have the same meaning,” Palladino said. “But if I ask the AI Gateway another prompt that has nothing to do with cooking a steak in this example, this will have a low similarity, and the AI gateway will fully recognize that.”
Kong can harness that intelligence to support semantic caching, he continued. Semantic caching can accelerate AI by understanding the prompts being sent through it. Instead of making two separate AI calls, it can make one AI call and serve up the cached result.
“The challenge is that as the number of AI models grows from hundreds to thousands of models, it becomes very hard for the developers to keep track of what are the models that we need to use for a specific task.”
— Marco Palladino, co-founder and CTO of Kong
“Whenever we receive lots of AI traffic, we can now generate embeddings,” he said. “We can store them in a vector database and then we can store the content in a cache store in such a way that our AI gets a lot faster and our user experience gets a lot better.”
Semantic Prompt Guard
Semantic Prompt Guard allows organizations to set up rules about prompts, such as blocking a particular topic from being discussed in an application. It could, for instance, be used to keep the AI from responding to prompts about politics, Palladino said. It allows organizations to improve content moderation and ensure that the consumption of AI is secure, he added.
“Semantic prompt guard allows us to set these rules inside of the infrastructure itself so they can be continuously updated over time without us having to update our applications or chase down the developers,” he said. “It makes them more productive because now all of these ships out of the box in the underlying AI infrastructure.”
Semantic Routing
The new AI Gateway offers six different load-balancing algorithms to manage models. Five of the algorithms are easy to understand and include round-robin routing, weighted routing and redirecting traffic to models that are the least busy.
The sixth algorithm, though, supports semantic routing. This is designed to help developers as the number of large language models proliferate, he said.
“The challenge is that as the number of AI models grows from hundreds to thousands of models, it becomes very hard for the developers to keep track of what the models are that we need to use for a specific task,” Palladino said.
“We can semantically route the request to the model that’s better fitted for that specific purpose, that has been fine-tuned for that purpose. Developers don’t have to know in advance what the model is. It’s semantically and magically happens at runtime on the AI Gateway itself.”
With AI Gateway, developers can use an LLM that it supports out of the box, without changing the code, he said. The company also announced new support for AWS Bedrock and Google Cloud Platform’s Vertex, in addition to AI Gateway’s already supported cloud-based LLMs.
New Insomnia AI Runner
In addition to its expanded AI Gateway offerings, Kong introduced AI Runner, a new tool for Insomnia 10, which is a development solution for building, testing and debugging APIs.
“We took a small subset of capabilities and we made them available in a new product offering called Insomnia AI Runner,” Palladino said. “The core of AI Runner, which runs in Insomnia, is actually powered by the AI Gateway we just announced, and that AI Gateway runs on top of Konnect Dedicated Cloud Gateways.”
Kong Connect Dedicated Cloud Gateways is a multiregional API management platform, which now offers Microsoft Azure support.
AI Runner allows developers to write code once and access multi-LLM support with just one click, the company stated. Like AI Gateway, it provides options for securing GenAI traffic, such as security for content moderation, prompt jailbreaking and prompt injection.
“This is an offering that’s meant for developers that want to use semantic capabilities without deploying it themselves, and it’s very straightforward,” he said. “By generating an acceleration link, you can use this link inside of your AI applications to enable semantic caching out of the box, no infrastructure to run, no infrastructure to deploy.”
The post Kong: New ‘AI-Infused’ Features for API Management, Dev Tools appeared first on The New Stack.
At its API Summit, the company introduced two updates of interest to developers: the addition of semantic intelligence to its AI Gateway and a new Insomnia offering called AI Runner.