Quantcast
Channel: Artificial Intelligence News, Analysis and Resources - The New Stack
Viewing all articles
Browse latest Browse all 534

How To Increase Plasticity in LLMs and AI Applications

$
0
0
clay

Deep learning models — including large language models like ChatGPT, Gemini and Claude — seem like powerful tools that have been trained on a large body of knowledge. But there are limits to their knowledge, because deep learning models will often have a cut-off date to their training, meaning they won’t know any up-to-date information after that point in time. These constraints are put in place in order to balance between the need for stability (the model’s ability to retain previously learned knowledge) and what is called plasticity, or the model’s ability to learn and adapt to new information.

It’s essentially a trade-off: AI models that prioritize stability aren’t able to learn new information, while experts have observed that when AI models are continually incorporating new data, a marked decline in performance and a loss of plasticity is the typical result.

“Loss of plasticity refers to the phenomenon where AI models lose the ability to learn new things,” explained Shibhansh Dohare, a University of Alberta researcher who is also one of the authors of a recent study on mitigating loss of plasticity in AI models. “Any system that cannot learn new things is, by definition, not a continual learning system,” he said. “Continual learning is not possible without maintaining plasticity.”

“Plasticity is essential because in many applications there is always new data, and the system has to learn from the new data and adapt to the changes in its data stream.”
– Shibhansh Dohare, University of Alberta researcher

To ensure stability and accuracy, most deep learning models are specialized in solving a particular problem, with training occurring only once on a dataset, and then are never updated again — which can obviously pose problems.

“From an application perspective, plasticity is essential because in many applications, there is always new data, and the system has to learn from the new data and adapt to the changes in its data stream,” added Dohare. “If an AI system loses the ability to learn new things, then it becomes increasingly outdated over time.”

Techniques for Optimizing Plasticity

To minimize loss of plasticity, here are some commonly used tools and techniques that AI engineers can potentially explore, as well as some emerging solutions:

Parameter Regularization

One effective technique is to find ways to ensure that the weights stay close to initial values. In particular, L2 regularization is a popular way to do this by adding a penalty term to the loss function (which tracks the margin of error in a model’s outputs). If the model’s predictions are accurate, the loss is less, whereas a loss is larger if the model’s predictions are inaccurate, thus resulting in a penalty that is proportional to the weight’s impact on previous tasks.

Shrink-and-Perturb

First proposed in a 2020 paper, this method first shrinks all the weights toward zero, and then adds random noise to them. Weights are shrunk by multiplying them with some value between 0 and 1. After shrinking, weights are perturbed by adding noise, or some some small value that is Gaussian-distributed.

Dropout

This technique aims to prevent units “hidden” between a neural network’s inputs and outputs from co-adapting or relying on each other for generating accurate predictions. By doing this, the model is forced to have artificial neurons that can learn good features without depending on other neurons. This is done by setting each hidden unit to zero with a small probability — in other words, adding randomness to the training process and thus making the model more robust to noise and unseen data.

Batch Normalization

This method helps to improve optimization and learning speeds when training neural networks, as well as the problem of “dead units” or “dead neurons” (neurons that consistently generate the same output — typically zero — regardless of input). This is done by inserting a network layer between two hidden layers. This “batch norm” layer takes outputs from the first hidden layer in batches, and then normalizes and rescales them before passing them on as input for the subsequent hidden layer.

ADAM Optimizer

Short for “Adaptive Moment Estimation,” this is an iterative optimization algorithm that minimizes the loss function during training. It is a variant of stochastic gradient descent and acts like a “smart helper” that repeatedly adjusts the network’s parameters so that the network is encouraged to improve its performance.

Continual Backpropagation

Dubbed “continual backprop“, this recently published technique is an extension of the conventional backpropagation algorithm. It acts like a kind of stochastic gradient descent, but with selective reinitialization of low-utility, hidden units of the network. According to algorithm co-author Dohare, continual backpropagation “overcomes plasticity loss in all the cases we tested”, with recent results showing that continual backpropagation may be one of the most effective methods for mitigating loss of plasticity, potentially allowing AI models to continue learning indefinitely. More documentation here.

Utility-Based Perturbed Gradient Descent (UPGD)

Another recent approach that combines gradient updates with perturbations. Smaller modifications are applied to more “useful” units in the network to protect them from catastrophic forgetting, while the reverse is true for less useful units, thus reviving their plasticity.

While catastrophic forgetting and loss of plasticity are treated as separate (if complementary) issues, UPGD addresses both, says algorithm author Mohammed Elsayed, a researcher at the University of Alberta: “We took on the challenge of addressing both loss of plasticity and catastrophic forgetting using a single algorithm. We use a simple mechanism of identifying the useful parameters and protecting them from drastic changes to prevent the learning system from forgetting useful information and, in turn, address catastrophic forgetting. On the other hand, we identify the least useful parameters and change them a little bit, which improves plasticity since it might be hard to change those parameters if the system experiences loss of plasticity. We found that our approach can tackle both issues of continual learning, namely loss of plasticity and catastrophic forgetting.” The code is available here.

Ultimately, the loss of plasticity in AI models is vital problem to solve in order for machines to continuously learn and adapt to their environments, without the need for constant and costly retraining.

The post How To Increase Plasticity in LLMs and AI Applications appeared first on The New Stack.

To optimize plasticity — an AI model's ability to adapt to new information — here are some tools and techniques for AI engineers.

Viewing all articles
Browse latest Browse all 534

Trending Articles