WebAI and MacStadium Launch an AI Inferencing Service Based on Apple Silicon

Modern Macs, in part because of the unified memory architecture of Apple’s modern System-on-a-Chip platform, are a favorite among developers who want to use large language models (LLMs) locally. That’s great during the development process — and also simply fun to try out — but very few companies then deploy their models on Apple Silicon. For a while now, webAI has focused on bringing machine learning and small generative AI models to Apple devices, both phones and desktops.

Now the company is taking this a step further thanks to a partnership with Mac hosting service MacStadium, which will allow enterprises to deploy their AI models on Apple Silicon in the MacStadium cloud.

As webAI co-founder and CEO David Stout told me, when he founded the company in 2019, his thesis was that to really make AI meaningful, it had to live in users’ pockets. “It needs to be owned by the user, and it needs to be hyper-contextual. There was nothing really supporting that and that’s where webAI was born,” he said. Since the entire industry was still in flux — and generative AI was still a few years from going mainstream — the team ended up building its own runtime and inferencing engine.

From the outset, webAI wasn’t interested in building its own models (that’s a race to zero, Stout believes) but in giving its users the tools to train, fine-tune and deploy models on their Apple Silicon hardware. Since most companies don’t have server racks filled with Macs in their offices, webAI got creative. Its runtime essentially allows businesses to distribute the server load across multiple machines, not unlike a render farm of old. But once they get started, many companies also start buying Macs dedicated to running the webAI runtime.

Racks of Macs at MacStadium.

For many companies, keeping their AI models and associated data in-house is paramount, so they want to develop their AI applications without having to send data to a third party. This allows them to reuse their existing hardware investments without having to buy more expensive and power-hungry Nvidia cards.

“A lot of our partners, they’re like, ‘Wait a second, I can own this for a fairly reasonable cost. Let’s just build up our stack.’ When you have companies with 1,000 employees, why wouldn’t you be using the machines that are on your network? webAI facilitates that,” Stout said. He also noted that on a cost-per-token basis, these Mac clusters are more affordable than Nvidia GPUs.

One other aspect of webAI is that on top of building the service for distributing these large models across devices, that same service also helps optimize those models. To do this, webAI uses what it calls Entropy-Weighted Quantization (EWQ).

The idea here is to analyze the transformer blocks within a model to find those blocks that can be quantized without affecting the overall performance of the model. Those with lower entropy, that is, a more predictable information distribution, can often withstand more aggressive quantization with minimal impact on overall accuracy. In webAI’s benchmarking and for most LLM architectures, this technique only reduces accuracy by less than 0.5% but reduces the model size by up to 30%.

When it comes to moving into production, which is where many enterprises are now with their AI workloads, this new partnership with MacStadium offers a new alternative to other hosting providers.

“This partnership enables enterprises to deploy practical AI to solve real business problems vs. more AI hype centered on general large-scale models,” MacStadium CEO Ken Tacelli said. “The combination of unique AI-focused hardware and software enables us to deliver solutions to market at a fraction of the cost and power, and the scalability of our AI solution can go far beyond what people normally associate with a Mac. The capabilities of these devices enable everything from image recognition to complex inference and system automation.”

Stout described this effort with MacStadium as offering a private cloud to its customers. “It’s more private than any other solution. How webAI has built its network — facilitated with MacStadium’s infrastructure — is one of the most secure systems for private processing off-site. And it’s going to be an AI-native solution, not something that we retrofitted to fit AI into the story.”

As for the hardware, the two companies aren’t sharing the details yet, but Stout noted that this will be a tiered solution. Not every workload needs to run on Mac Studios with 512GB of RAM, after all.

“What we found to be true is AI runs best on Apple Silicon, especially with our own runtime. If we were using PyTorch or TensorFlow, that might be untrue. We aren’t. We’re using our own library, and we’re bringing these models to the device. And we found that their silicon is probably some of the best for AI,” Stout said.

The post WebAI and MacStadium Launch an AI Inferencing Service Based on Apple Silicon appeared first on The New Stack.

WebAI and MacStadium are teaming up to make hosting AI applications on Apple Silicon in the MacStadium cloud easier.

WebAI and MacStadium Launch an AI Inferencing Service Based on Apple Silicon

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112