Adopt World Models Today or Fall Behind Tomorrow

Ten years ago, deep learning papers were often not considered credible because their references would not go back more than a year. This year, we celebrated the 10th anniversary of two papers that founded GenerativeAI in images and text. Over the last 10 years, we have learned: a) the power of embeddings to represent concisely and effectively rich semantic information and b) the power of pretraining on vast amounts of data.

After attending the latest NeurIPS conference, one of the most prestigious artificial intelligence and machine learning conferences, I spent over 100 hours reviewing tutorials, conference sessions, and workshops. Here are my top enterprise AI takeaways.

From Machine Learning Models to World Models

While the pioneers of AI who led the massive language models, like OpenAI, have exhausted the available training data, the playbook of pretraining is expanding beyond the modalities of text and images. Language Models are pre-trained in tables, Excel spreadsheets, etc. This marks the dusk of Machine Learning Models and the dawn of World Models. These models are the foundation that represents existing knowledge and are fine-tuned or dynamically contextualized for specific applications.

Solving Coding and Math Problems

The immediate commercial value of producing code generation using the Large Language Model (LLM)-based models justifies the large volume of papers at NeurIPS. It is remarkable to see an equal amount of research effort on improving the mathematical theorem-proving capabilities of LLMs. Automating Mathematical Theorem Proving goes back to the beginning of the previous century when Hilbert asked if there was an algorithm that could automatically prove theorems. Godel answered the question, leading Turing to the computer science foundation. Besides the academic inspirations of this task, teaching LLMs how to prove theorems helps us understand how to build better reasoning systems. In the same way that teaching math increases the intellectual capacity of students who might not necessarily be mathematicians, the math capabilities of LLMs help them improve on other tasks.

ChatGPT on a Chip and the Magic Number Eight Billion

In his keynote address, Lidong Zhou from Microsoft Research pointed out that it is possible to see chips that can fit one trillion parameters soon. Traditional models use floating-point arithmetic, which requires expensive multipliers. The Bitnet requires only 1.58 bits per parameter and needs lookup tables and adders that require much less surface area on a chip. Although we haven’t been able to build transformers with 1-bit parameters, we still see papers working in this direction that work with bitwise logical operations, increasing the computation per area density even more. Until then, researchers have been working by fine-tuning smaller models like LLama and Mistral and managing to outperform the big ones. Mistral seems more prevalent among researchers because of its permissive license. To solve more complicated tasks, a whole line is combined and merges specialized LLMs. This technology has become more mature, and practical results were presented in a competition this year. This technique provides an affordable way for the enterprise to solve complicated tasks.

As We Perfect Pretraining, It Is Time To Think About Agents

As the “Test of Time” winner, Ilya SuTskever mentioned in the award ceremony that AI giants have exhausted internet data (fossil fuel data). They are turning to models that can boost their power by allocating more compute to inference time. This compute will fuel reasoning agents that will cooperate to solve harder problems. Hochreiter, in his keynote, supported this direction by adding that we need LLMs that should trade accuracy for inference speed, indicating that the revamped LSTM (called xLSTM) can be such an option.

From 10,000 feet this year, NeurIPS signals the beginning of the AI industrialization era following the paradigm of other disruptive technologies like electricity, microchips, etc. World models are here to stay and expand. One-off inference costs drop quickly, reaching $30,000 for 1 trillion tokens (the size of the training data for GPT-x scale models). Now that we have fast and cheap LLMs, the new era of agent-based inference (also known as reasoning) is rising, opening the expectation for solving more advanced tasks with the risk of AI becoming less predictable and controllable.

The post Adopt World Models Today or Fall Behind Tomorrow appeared first on The New Stack.

Use NeurIPS insights to update your AI development roadmap immediately.

Adopt World Models Today or Fall Behind Tomorrow

From Machine Learning Models to World Models

Solving Coding and Math Problems

ChatGPT on a Chip and the Magic Number Eight Billion

As We Perfect Pretraining, It Is Time To Think About Agents

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112