VOOZH about

URL: https://thenewstack.io/techniques-for-tackling-catastrophic-forgetting-in-ai-models/

⇱ Techniques for Tackling Catastrophic Forgetting in AI Models - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2024-10-01 11:44:23
Techniques for Tackling Catastrophic Forgetting in AI Models
research,
AI / DevOps

Techniques for Tackling Catastrophic Forgetting in AI Models

Preventing catastrophic forgetting is done using three approaches: regularization, memory-based techniques, and architecture-based methods.
Oct 1st, 2024 11:44am by Kimberley Mok
👁 Featued image for: Techniques for Tackling Catastrophic Forgetting in AI Models
Image via Unsplash+. 

Despite the massive leaps forward in machine learning models recently, experts are still wrangling with the challenge of ensuring that machines don’t forget previously learned knowledge — especially when they are learning new knowledge.

This problem is known as catastrophic forgetting, or catastrophic interference. It occurs when the weights of an artificial neural network are optimized for learning a new task, which can, in turn, interfere with prior knowledge that is stored in the same weights. As an AI model parses new inputs, the statistical relationships between the model’s internal representations can change, mix or overlap — potentially leading to reduced performance (or “model drift“) or (at its worst) to the model abruptly and drastically forgetting its prior training.

Causes of Catastrophic Forgetting

There are a number of factors that might lead to a model ‘forgetting’. These include overfitting the model to new training data, limited model capacity, shared parameters, using a training technique that is ill-suited to the task, and the lack of regularization.

Nevertheless, some experts point out that the exact mechanisms behind catastrophic forgetting aren’t yet well understood.

“While there are a lot of studies in the field of continual learning investigating how to address catastrophic forgetting experimentally through algorithm design, there is still a lack of understanding on what factors are important and how they affect catastrophic forgetting,” explained Sen Lin, an assistant professor in University of Houston’s computer science department, and the co-author of a recent study on the effect of catastrophic forgetting on continual learning. “Our study filled these gaps up by revealing three important factors: model over-parameterization, task similarity, and task ordering, and their impacts on learning performance.”

Tackling Catastrophic Forgetting

In general, approaches to prevent catastrophic forgetting fall into three broad categories: regularization, memory-based techniques, and architecture-based methods.

Regularization techniques preserve meaningful weight parameters that are important to old tasks when training the model for new tasks. These include:

  • Elastic Weight Consolidation (EWC): A technique that quantifies the importance of each weight of a model’s previously learned tasks and penalizes any major changes to those crucial weights, thus incentivizing the model to retain pre-existing knowledge.
  • Synaptic Intelligence (SI): This method builds provides an adaptive safeguard against forgetting by computing each weight’s impact to model performance and protecting the weights that are critical to new tasks, thus striking a balance between old and new knowledge.
  • Learning Without Forgetting (LwF): One of the earliest methods to mitigate catastrophic forgetting, it’s an incremental learning approach that combines distillation networks and fine-tuning in order to retain original knowledge during the learning of a new task.

Architecture-based techniques are modifications to the model architecture that can help “freeze” critical parameters of old tasks in order to accommodate for new task learning or by increasing model size when more model capacity is required. These encompass methods such as:

  • Progressive Neural Networks (PNN or ProgNets): a column-based approach where separate columns of neural networks are trained for each task, using lateral connections between to transfer new information from previously learned tasks to the new task, rather than overwriting them.
  • Expert Gate Modules: this “network-within-a-network” notion utilizes a base network that is enhanced with other sub-networks for each task, with each subnet being equipped with an auto-encoder that makes it an “expert” in its task. After training, the parameters of each model are “frozen”, with only the relevant “expert” solving the task it is designed for and with a shared “backbone” or base of knowledge being retained.
  • Dynamic Expandable Networks (DEN): this technique allows models to decide the network capacity it needs, adding new artificial neurons and connections for each new task, and ‘pruning’ any redundant links.

Memory-Based techniques help to store information about old tasks into some kind of memory storage, which the model can then use to “replay” information during current task learning.

  • Memory Replay: Models retain subsets of previous training data that is used for periodic retraining of the model in the future, which helps to “remind” them of past information.
  • Generative Replay: Synthetic samples are produced by generative adversarial networks (GANs), which imitate previous data sets and are used to reinforce the model’s prior learning. One drawback is that generated data is typically lower in quality than the original.
  • Memory-Augmented Networks: Models are equipped with external memory modules that enhance their ability to store and retrieve prior learning, thus preventing forgetting.
  • Wake-Sleep Consolidated Learning (WSCL): According to one of this recent study’s authors, Professor Concetto Spampinato of the University of Catania’s PeRCeiVe.AI Lab, this is a biologically inspired method that “mimics the brain’s wake-sleep cycle. During the sleep phase, WSCL not only replays memories but also dreams — simulating new experiences — helping the model adapt to future tasks more effectively. This dreaming feature is unique and makes our approach more dynamic than simply storing old data.”

It is also possible to customize these techniques even further by using a hybrid approach, where two of more of the aforementioned methods are combined in order to bypass the limitations of any one method. For instance, variational continual learning (VCL) integrates both elastic weight consolidation (EWC) and generative replay (GR) to both regularize model weights while replaying old training data via a variational auto-encoder.

Despite this myriad of potential solutions, a universal solution for catastrophic forgetting has yet to be found. With AI models becoming ever more larger, complex and polyvalent, catastrophic forgetting remains a crucial obstacle to overcome in the quest for continual learning.

TRENDING STORIES
Kimberley Mok is a tech and design reporter who covers artificial intelligence, robotics, quantum computing, tech culture and science stories for The New Stack. Trained as an architect, she is also an illustrator and multidisciplinary designer who has been passionate...
Read more from Kimberley Mok
SHARE THIS STORY
TRENDING STORIES
TNS owner Insight Partners is an investor in: turing.
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.