Quantcast
Channel: Artificial Intelligence News, Analysis and Resources - The New Stack
Viewing all articles
Browse latest Browse all 545

WebAI and MacStadium Launch an AI Inferencing Service Based on Apple Silicon

$
0
0

Modern Macs, in part because of the unified memory architecture of Apple’s modern System-on-a-Chip platform, are a favorite among developers who want to use large language models (LLMs) locally. That’s great during the development process — and also simply fun to try out — but very few companies then deploy their models on Apple Silicon. For a while now, webAI has focused on bringing machine learning and small generative AI models to Apple devices, both phones and desktops.

Now the company is taking this a step further thanks to a partnership with Mac hosting service MacStadium, which will allow enterprises to deploy their AI models on Apple Silicon in the MacStadium cloud. 

As webAI co-founder and CEO David Stout told me, when he founded the company in 2019, his thesis was that to really make AI meaningful, it had to live in users’ pockets. “It needs to be owned by the user, and it needs to be hyper-contextual. There was nothing really supporting that and that’s where webAI was born,” he said. Since the entire industry was still in flux — and generative AI was still a few years from going mainstream — the team ended up building its own runtime and inferencing engine. 

From the outset, webAI wasn’t interested in building its own models (that’s a race to zero, Stout believes) but in giving its users the tools to train, fine-tune and deploy models on their Apple Silicon hardware. Since most companies don’t have server racks filled with Macs in their offices, webAI got creative. Its runtime essentially allows businesses to distribute the server load across multiple machines, not unlike a render farm of old. But once they get started, many companies also start buying Macs dedicated to running the webAI runtime.

Racks of Macs at MacStadium.

Racks of Macs at MacStadium.

For many companies, keeping their AI models and associated data in-house is paramount, so they want to develop their AI applications without having to send data to a third party. This allows them to reuse their existing hardware investments without having to buy more expensive and power-hungry Nvidia cards. 

“A lot of our partners, they’re like, ‘Wait a second, I can own this for a fairly reasonable cost. Let’s just build up our stack.’ When you have companies with 1,000 employees, why wouldn’t you be using the machines that are on your network? webAI facilitates that,” Stout said. He also noted that on a cost-per-token basis, these Mac clusters are more affordable than Nvidia GPUs.

One other aspect of webAI is that on top of building the service for distributing these large models across devices, that same service also helps optimize those models. To do this, webAI uses what it calls Entropy-Weighted Quantization (EWQ).

The idea here is to analyze the transformer blocks within a model to find those blocks that can be quantized without affecting the overall performance of the model. Those with lower entropy, that is, a more predictable information distribution, can often withstand more aggressive quantization with minimal impact on overall accuracy. In webAI’s benchmarking and for most LLM architectures, this technique only reduces accuracy by less than 0.5% but reduces the model size by up to 30%.

When it comes to moving into production, which is where many enterprises are now with their AI workloads, this new partnership with MacStadium offers a new alternative to other hosting providers. 

“This partnership enables enterprises to deploy practical AI to solve real business problems vs. more AI hype centered on general large-scale models,” MacStadium CEO Ken Tacelli said. “The combination of unique AI-focused hardware and software enables us to deliver solutions to market at a fraction of the cost and power, and the scalability of our AI solution can go far beyond what people normally associate with a Mac. The capabilities of these devices enable everything from image recognition to complex inference and system automation.” 

Stout described this effort with MacStadium as offering a private cloud to its customers. “It’s more private than any other solution. How webAI has built its network — facilitated with MacStadium’s infrastructure — is one of the most secure systems for private processing off-site. And it’s going to be an AI-native solution, not something that we retrofitted to fit AI into the story.”

As for the hardware, the two companies aren’t sharing the details yet, but Stout noted that this will be a tiered solution. Not every workload needs to run on Mac Studios with 512GB of RAM, after all.

“What we found to be true is AI runs best on Apple Silicon, especially with our own runtime. If we were using PyTorch or TensorFlow, that might be untrue. We aren’t. We’re using our own library, and we’re bringing these models to the device. And we found that their silicon is probably some of the best for AI,” Stout said.

The post WebAI and MacStadium Launch an AI Inferencing Service Based on Apple Silicon appeared first on The New Stack.

WebAI and MacStadium are teaming up to make hosting AI applications on Apple Silicon in the MacStadium cloud easier.

Viewing all articles
Browse latest Browse all 545

Trending Articles