How To Increase Plasticity in LLMs and AI Applications

Deep learning models — including large language models like ChatGPT, Gemini and Claude — seem like powerful tools that have been trained on a large body of knowledge. But there are limits to their knowledge, because deep learning models will often have a cut-off date to their training, meaning they won’t know any up-to-date information after that point in time. These constraints are put in place in order to balance between the need for stability (the model’s ability to retain previously learned knowledge) and what is called plasticity, or the model’s ability to learn and adapt to new information.

It’s essentially a trade-off: AI models that prioritize stability aren’t able to learn new information, while experts have observed that when AI models are continually incorporating new data, a marked decline in performance and a loss of plasticity is the typical result.

“Loss of plasticity refers to the phenomenon where AI models lose the ability to learn new things,” explained Shibhansh Dohare, a University of Alberta researcher who is also one of the authors of a recent study on mitigating loss of plasticity in AI models. “Any system that cannot learn new things is, by definition, not a continual learning system,” he said. “Continual learning is not possible without maintaining plasticity.”

“Plasticity is essential because in many applications there is always new data, and the system has to learn from the new data and adapt to the changes in its data stream.”
– Shibhansh Dohare, University of Alberta researcher

To ensure stability and accuracy, most deep learning models are specialized in solving a particular problem, with training occurring only once on a dataset, and then are never updated again — which can obviously pose problems.

“From an application perspective, plasticity is essential because in many applications, there is always new data, and the system has to learn from the new data and adapt to the changes in its data stream,” added Dohare. “If an AI system loses the ability to learn new things, then it becomes increasingly outdated over time.”

Techniques for Optimizing Plasticity

To minimize loss of plasticity, here are some commonly used tools and techniques that AI engineers can potentially explore, as well as some emerging solutions:

Parameter Regularization

One effective technique is to find ways to ensure that the weights stay close to initial values. In particular, L2 regularization is a popular way to do this by adding a penalty term to the loss function (which tracks the margin of error in a model’s outputs). If the model’s predictions are accurate, the loss is less, whereas a loss is larger if the model’s predictions are inaccurate, thus resulting in a penalty that is proportional to the weight’s impact on previous tasks.

Shrink-and-Perturb

First proposed in a 2020 paper, this method first shrinks all the weights toward zero, and then adds random noise to them. Weights are shrunk by multiplying them with some value between 0 and 1. After shrinking, weights are perturbed by adding noise, or some some small value that is Gaussian-distributed.

Dropout

This technique aims to prevent units “hidden” between a neural network’s inputs and outputs from co-adapting or relying on each other for generating accurate predictions. By doing this, the model is forced to have artificial neurons that can learn good features without depending on other neurons. This is done by setting each hidden unit to zero with a small probability — in other words, adding randomness to the training process and thus making the model more robust to noise and unseen data.

Batch Normalization

This method helps to improve optimization and learning speeds when training neural networks, as well as the problem of “dead units” or “dead neurons” (neurons that consistently generate the same output — typically zero — regardless of input). This is done by inserting a network layer between two hidden layers. This “batch norm” layer takes outputs from the first hidden layer in batches, and then normalizes and rescales them before passing them on as input for the subsequent hidden layer.

ADAM Optimizer

Short for “Adaptive Moment Estimation,” this is an iterative optimization algorithm that minimizes the loss function during training. It is a variant of stochastic gradient descent and acts like a “smart helper” that repeatedly adjusts the network’s parameters so that the network is encouraged to improve its performance.

Continual Backpropagation

Dubbed “continual backprop“, this recently published technique is an extension of the conventional backpropagation algorithm. It acts like a kind of stochastic gradient descent, but with selective reinitialization of low-utility, hidden units of the network. According to algorithm co-author Dohare, continual backpropagation “overcomes plasticity loss in all the cases we tested”, with recent results showing that continual backpropagation may be one of the most effective methods for mitigating loss of plasticity, potentially allowing AI models to continue learning indefinitely. More documentation here.

Utility-Based Perturbed Gradient Descent (UPGD)

Another recent approach that combines gradient updates with perturbations. Smaller modifications are applied to more “useful” units in the network to protect them from catastrophic forgetting, while the reverse is true for less useful units, thus reviving their plasticity.

While catastrophic forgetting and loss of plasticity are treated as separate (if complementary) issues, UPGD addresses both, says algorithm author Mohammed Elsayed, a researcher at the University of Alberta: “We took on the challenge of addressing both loss of plasticity and catastrophic forgetting using a single algorithm. We use a simple mechanism of identifying the useful parameters and protecting them from drastic changes to prevent the learning system from forgetting useful information and, in turn, address catastrophic forgetting. On the other hand, we identify the least useful parameters and change them a little bit, which improves plasticity since it might be hard to change those parameters if the system experiences loss of plasticity. We found that our approach can tackle both issues of continual learning, namely loss of plasticity and catastrophic forgetting.” The code is available here.

Ultimately, the loss of plasticity in AI models is vital problem to solve in order for machines to continuously learn and adapt to their environments, without the need for constant and costly retraining.

The post How To Increase Plasticity in LLMs and AI Applications appeared first on The New Stack.

To optimize plasticity — an AI model's ability to adapt to new information — here are some tools and techniques for AI engineers.

How To Increase Plasticity in LLMs and AI Applications

Techniques for Optimizing Plasticity

Parameter Regularization

Shrink-and-Perturb

Dropout

Batch Normalization

ADAM Optimizer

Continual Backpropagation

Utility-Based Perturbed Gradient Descent (UPGD)

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112