Eight Google researchers jolted AI into a new stage of evolution when they introduced transformers at a key machine learning conference seven years ago. Transformer architecture is the innovative AI neural network that makes today’s large language models (LLMs), and the generative AI applications built on top of them, possible. The work stands on the shoulders of many, including AI giants like Turing Award winner Geoffrey Hinton and living legend Fei Fei Li, the latter recognized for insisting big data was central to unlocking AI’s power. While research among the hyperscalers and academics remains as dynamic as ever, the other hotbed of AI model innovation today is the enterprise itself.
Enterprises across every vertical have smartly taken stock of this watershed moment in AI’s living history, seizing the day to refine LLMs in inventive new ways and create new kinds of value with them. But, as yet, that value is largely unrealized.
Right now, halfway through 2024, to take full advantage of LLMs, enterprise innovators first have to wrap their minds around a lot of moving parts. Having the right technology under the hood and tuning it to an enterprise’s unique needs will help ensure generative AI applications are capable of producing reliable results — and real-world value.
Datasets, Models and Tools
Data, of course, is AI’s fuel, and massive public datasets feed the power of LLMs. But these public datasets may not include the correct data for what an enterprise innovator is trying to accomplish. The hallucinations and bias born through them also conflict with the quality control enterprises require. Lineage, traceability, explainability, reliability and security of data all matter more to enterprise users. They have to be responsible with data use or risk costly lawsuits, reputation problems, customer harm, and damage to their products and solutions. That means they must determine what internal proprietary datasets should feed model customization and application development, where those datasets live and how best to clean and prepare them for model consumption.
The LLMs we’ve heard most about are considered foundation models: The ones built by OpenAI, Google, Meta and others that are trained on enormous volumes of internet data — some high-quality data and some so poor it counts as misinformation. Foundation models are built for massive parallelism, adaptable to a wide variety of different scenarios, and require significant guardrails. Meta’s Llama 2, “a family of pretrained and fine-tuned LLMs, ranging in scale from 7B to 70B parameters,” is a popular starting point for many enterprises. It can be fine-tuned with unique internal datasets and combined with the power of knowledge graphs, vector databases, SQL for structured data, and more. Fortunately, there’s a robust activity in the open source community to offer new optimized LLMs.
The open source community has also become particularly helpful in offering the tools that serve as connective tissue for generative AI ecosystems. LangChain, for example, is a framework that simplifies the creation of AI-based applications, with an open source Python library specifically designed to optimize the use of LLMs. Additionally, a Linux Foundation offshoot is developing open standards for retrieval-augmented generation (RAG), which is vital for bringing enterprise data into pre-trained LLMs and reducing hallucinations. Enterprise developers can access many tools using APIs, which is a paradigm shift that’s helping to democratize AI development.
While some enterprises will have a pure research division that may investigate developing new algorithms and LLMs, most will not be reinventing the wheel. Fine-tuning existing models and leveraging a growing ecosystem of tools will become the fastest path to value.
Super-Compute and an Elastic Data Plane
The current AI era, and the generative AI boom in particular, are driving a spectacular rise in compute usage and in the advancement of GPU technology. This is due to the complex and sheer number of calculations that AI training and AI inference require, albeit distinctions exist in the way those processes consume compute. It’s impossible not to mention Nvidia GPUs here, which supply about 90% of the AI chip market and will likely continue dominance with the recently announced and wildly powerful GB200 Grace Blackwell Superchip, capable of real-time, trillion-parameter inference and training.
The combination of the right datasets, fine-tuned LLMs, and a robust tool ecosystem, alongside of this powerful compute, is core to enabling enterprise AI innovation. But the technological backbone giving form to all of it is data infrastructure — storage and management systems that can unify a data ecosystem. The data infrastructure that has been foundational in cloud computing is now also foundational to AI’s existence and growth.
Today’s LLMs need volume, velocity, and variety of data at a rate not seen before, and that creates complexity. It’s not possible to store the kind of data LLMs require on cache memory. High IOPS and high throughput storage systems that can scale for massive datasets are a required substratum for LLMs where millions of nodes are needed. With superpower GPUs capable of lightning-fast read storage read times, an enterprise must have a low-latency, massively parallel system that avoids bottlenecks and is designed for this kind of rigor. For example, Hitachi Vantara’s Virtual Storage Platform One offers a new way of approaching data visibility across block, file and object. Different kinds of storage need to be readily available to meet different model requirements, including flash, on-site, and in the cloud. Flash can offer denser footprints, aggregated performance, scalability, and efficiency to accelerate AI model and application development that’s conscious of carbon footprint. And flash reduces power consumption, crucial to reap the benefits of generative AI in a sustainable present and future.
Ultimately, data infrastructure providers can best support enterprise AI developers by putting a unified elastic data plane and easy-to-deploy appliances with generative AI building blocks, along with suitable storage and compute, in developers’ hands. A unified elastic data plane is a lean machine, offering extremely efficient handling of data, with data plane nodes near to wherever the data is, easy access to disparate data sources, and improved control over data lineage, quality, and security. With appliances, models can sit on top and can be continuously trained. This kind of approach will speed enterprise development of value-generating AI applications across domains.
Controlling Costs and Carbon Footprint
It’s crucial that these technological underpinnings of the AI era be built with cost efficiency and reduction of carbon footprint in mind. We know that training LLMs and the expansion of generative AI across industries are ramping up our carbon footprint at a time when the world desperately needs to reduce it. We know too that CIOs consistently name cost-cutting as a top priority. Pursuing a hybrid approach to data infrastructure helps ensure that enterprises have the flexibility to choose what works best for their particular requirements and what is most cost-effective to meet those needs.
Most importantly, AI innovators should be clear about what exactly they want to achieve and which model and datasets they need to achieve it, then size hardware requirements like flash, SSDs and HDDs accordingly. It may be advantageous to rent from hyperscalers or use on-premise machines. Generative AI needs energy-efficient, high-density storage to reduce power consumption.
Hybrid data centers with high levels of automation, an elastic data plane and optimized appliances for AI application-building will help move AI innovation forward in a socially responsible and sustainable way that still respects the bottom line. How your business approaches this could determine whether it evolves with the next phases of AI or becomes a vestige locked in the past.
The post Enterprise AI Requires a Lean, Mean Data Machine appeared first on The New Stack.
How your business approaches this could determine whether it evolves with the next phases of AI or becomes a vestige locked in the past.