Why Latency and ‘Total Cost of Ownership’ Matter More in AI Apps

Developer-turned-CEO Lin Qiao foresees a new emerging era in AI, where language model models are fine-tuned based on an organization’s own specialized data. This will allow organizations to take advantage of the language capabilities of AI while leveraging their own data sets to inform the feedback.

Before becoming CEO of Fireworks AI, Qiao led Meta’s PyTorch efforts. Generative AI can solve hundreds of complex logic problems, she noted, but that’s not the problem enterprises and developers typically face.

“The large model is too expensive to operate, and doesn’t give you the low latency for a good product experience,” she said. “That puts pressure [on] for people to go to smaller models.”

Smaller models also work better with what business problems developers are trying to solve.

“They have maybe five business-specific tasks to solve,” she said. “We [are] laser focusing on those smaller open source models, how to bring [them] on par with OpenAI‘s model in terms of quality or even beat them in terms of quality. At the same time, we provide much lower latency and much lower TCO (total cost of ownership) for those B2C applications and products.”

In this emerging AI era, Qiao said there are two problems developers face:

Performing fast iterations of training using enterprise data.
Scaling generative AI applications in production.

The company she co-founded, Fireworks AI, is “laser-focused” on handling these two problems for developers, she told The New Stack. “We offer extremely fast fine-tuning,” she added.

Fireworks AI leverages open source models. It recently raised $25 million in funding and claims 12,000 users, including Quora, Sourcegraph and the AI-Powerpoint presentation company, Tome. It estimates it serves more than 25 billion tokens daily.

Latency Is Critical in AI Applications

Qiao learned that for B2C companies like Meta, where she previously worked, interactivity and low latency are absolute requirements. Content generation specifically impacts whether a product is viable or not, she said; creating a quality AI product requires using your own data and iterating the model quickly, she added.

“All the developers at enterprises we talked to, they have their proprietary data, use our fine-tuning platform, and generate a customized model,” she said. “A one-click upload to our inference platform, and then your product can talk to your customized model directly using the content generated from your model.”

Developers then must look at the product metrics, adjust the data if needed and keep the loop going to fine-tune the models.

Then, the AI application must be able to scale very quickly while delivering a low total cost of ownership, she added.

“If the cost is high, then you bleed money much faster, so it will be a disaster and you won’t have a viable business,” she said. “Both latency and TCO are important for B2C companies.”

The Cost Challenge of AI Apps

But even with a great product, generative AI applications can be more expensive than traditional applications, which becomes a factor in the total cost of ownership. One way generative AI applications are different than traditional applications is that they require running on GPUs instead of CPUs, which are heavily commoditized.

“GPUs are expensive — it’s not just the chips that are expensive. A GPU is very power-hungry. Power is expensive. Power produces heat. It cannot use air cooling, it has to use liquid cooling or inverse cooling where you dump the chips in oil [to] take away heat,” Qiao said. “So all the supporting infrastructure jacks up the whole infrastructure cost of GenAI.”

That cost can be an additional barrier for business viability, she added. Fireworks attempts to help companies address the TCO challenge by focusing on smaller, open source models that are on par or better than LLM generative AI offerings, while being more cost-effective to run.

“We provide much lower latency and much lower TCO for those B2C applications and products,” she said.

Use Cases for Smaller Models

Many of Fireworks AI’s customers are using AI to create assistants, she said — medical assistants, legal assistants and coding assistants are popular use cases. That makes latency a particularly important challenge with AI due to the interactive, conversational nature of its output.

Documents are another use case she frequently sees. From images to PDFs, AI is being used to scan and search documents for product catalogs, e-commerce and even risk analysis. Tome, a customer of Fireworks AI, uses AI to build presentation slides for business users.

Without a fast response time, an AI application can become a horrible product, she added.

“That response time usually has to be half a second or one second,” she said. “It becomes a much more interesting product because it’s responsive, interactive.”

The post Why Latency and ‘Total Cost of Ownership’ Matter More in AI Apps appeared first on The New Stack.

Fireworks AI CEO Lin Qiao says specially-trained models are key when it comes to building fast AI applications for business.

Why Latency and ‘Total Cost of Ownership’ Matter More in AI Apps

Latency Is Critical in AI Applications

The Cost Challenge of AI Apps

Use Cases for Smaller Models

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112