Quantcast
Channel: Artificial Intelligence News, Analysis and Resources - The New Stack
Viewing all articles
Browse latest Browse all 562

Adopt World Models Today or Fall Behind Tomorrow

$
0
0

Ten years ago, deep learning papers were often not considered credible because their references would not go back more than a year. This year, we celebrated the 10th anniversary of two papers that founded GenerativeAI in images and text. Over the last 10 years, we have learned: a) the power of embeddings to represent concisely and effectively rich semantic information and b) the power of pretraining on vast amounts of data.

After attending the latest NeurIPS conference, one of the most prestigious artificial intelligence and machine learning conferences, I spent over 100 hours reviewing tutorials, conference sessions, and workshops. Here are my top enterprise AI takeaways.

From Machine Learning Models to World Models

While the pioneers of AI who led the massive language models, like OpenAI, have exhausted the available training data, the playbook of pretraining is expanding beyond the modalities of text and images. Language Models are pre-trained in tables, Excel spreadsheets, etc. This marks the dusk of Machine Learning Models and the dawn of World Models. These models are the foundation that represents existing knowledge and are fine-tuned or dynamically contextualized for specific applications.

Solving Coding and Math Problems

The immediate commercial value of producing code generation using the Large Language Model (LLM)-based models justifies the large volume of papers at NeurIPS. It is remarkable to see an equal amount of research effort on improving the mathematical theorem-proving capabilities of LLMs. Automating Mathematical Theorem Proving goes back to the beginning of the previous century when Hilbert asked if there was an algorithm that could automatically prove theorems. Godel answered the question, leading Turing to the computer science foundation. Besides the academic inspirations of this task, teaching LLMs how to prove theorems helps us understand how to build better reasoning systems. In the same way that teaching math increases the intellectual capacity of students who might not necessarily be mathematicians, the math capabilities of LLMs help them improve on other tasks.

ChatGPT on a Chip and the Magic Number Eight Billion

In his keynote address, Lidong Zhou from Microsoft Research pointed out that it is possible to see chips that can fit one trillion parameters soon. Traditional models use floating-point arithmetic, which requires expensive multipliers. The Bitnet requires only 1.58 bits per parameter and needs lookup tables and adders that require much less surface area on a chip. Although we haven’t been able to build transformers with 1-bit parameters, we still see papers working in this direction that work with bitwise logical operations, increasing the computation per area density even more. Until then, researchers have been working by fine-tuning smaller models like LLama and Mistral and managing to outperform the big ones. Mistral seems more prevalent among researchers because of its permissive license. To solve more complicated tasks, a whole line is combined and merges specialized LLMs. This technology has become more mature, and practical results were presented in a competition this year. This technique provides an affordable way for the enterprise to solve complicated tasks.

As We Perfect Pretraining, It Is Time To Think About Agents

As the “Test of Time” winner, Ilya SuTskever mentioned in the award ceremony that AI giants have exhausted internet data (fossil fuel data). They are turning to models that can boost their power by allocating more compute to inference time. This compute will fuel reasoning agents that will cooperate to solve harder problems. Hochreiter, in his keynote, supported this direction by adding that we need LLMs that should trade accuracy for inference speed, indicating that the revamped LSTM (called xLSTM) can be such an option.

From 10,000 feet this year, NeurIPS signals the beginning of the AI industrialization era following the paradigm of other disruptive technologies like electricity, microchips, etc. World models are here to stay and expand. One-off inference costs drop quickly, reaching $30,000 for 1 trillion tokens (the size of the training data for GPT-x scale models). Now that we have fast and cheap LLMs, the new era of agent-based inference (also known as reasoning) is rising, opening the expectation for solving more advanced tasks with the risk of AI becoming less predictable and controllable.

The post Adopt World Models Today or Fall Behind Tomorrow appeared first on The New Stack.

Use NeurIPS insights to update your AI development roadmap immediately.

Viewing all articles
Browse latest Browse all 562

Trending Articles