In a world where change is the only constant, why do we Continuous learning Approaches to AI models.


Imagine you have a small robot designed to roam your garden and water your plants. You first invest a significant amount of time and resources into collecting data, training and testing the robot over a period of several weeks. The robot learns to navigate your garden efficiently when the ground is covered with grass and bare soil.
But after a few weeks, as flowers begin to bloom, the garden changes dramatically. A robot trained on data from a different season no longer perceives its surroundings accurately and struggles to complete its tasks. To fix this, the model needs to be updated with new examples of blooming gardens.
The first thought is to add new data examples to the training and retrain the model from scratch. However, this is costly, and we don't want to do this every time the environment changes. Moreover, we realized that not all past training data is available.
Now we might consider fine-tuning the model with new samples, but this is risky as the model may lose some of the capabilities it previously learned. Catastrophic forgetting (A situation in which previously acquired knowledge and skills are lost when the model learns new information).
So is there an alternative? Yes, you can use continuous learning!
Of course, a robot watering plants in a garden is just one example of the problem; we'll look at more real-world applications later in the text.
Adaptive learning with continuous learning (CL)
It is impossible to predict and prepare for every scenario a model may face in the future, so the right option is often to adaptively train the model as new samples arrive.
In CL, Stability The model and PlasticityStability is the ability of a model to retain previously learned information, while plasticity is its ability to adapt to new information when new tasks are introduced.
“(…) in a continuous learning scenario, a learning model needs to incrementally build and dynamically update its internal representation as the distribution of tasks changes dynamically over its lifetime..” [2]
But how do we control stability and plasticity?
Researchers have identified several ways to build adaptive models. [3] The following categories have been established:
- Normalization-based approach
- In this approach, we add a regularization term that balances the influence of old and new tasks on the model structure.
- For example, weight normalization We aim to control for parameter variation by adding a penalty term to the loss function, which penalizes parameter changes by taking into account how much they contributed to previous tasks.
2. Replay-based approach
- Methods from this group focus on recovering parts of the historical data to ensure that the model can solve previous tasks. One limitation of this approach is that it requires access to the historical data, which is not always possible.
- For example, experience replayHere, we store and replay examples from old training data. When training a new task, some examples from previous tasks are added, exposing the model to a combination of old and new task types, limiting catastrophic forgetting.
3. Optimization-based approach
- Here, we want to manipulate optimization techniques to mitigate the effects of catastrophic forgetting while still maintaining performance across all tasks.
- For example, gradient projection This is how the gradients calculated for a new task are projected so that they do not affect previous gradients.
4. Representation-Based Approach
- This set of methods focuses on obtaining and using robust feature representations to avoid catastrophic forgetting.
- For example, self-supervised learningIn , a model can learn a robust representation of the data before it is trained on a specific task. The idea is to learn high-quality features that reflect good generalization across different tasks the model may encounter in the future.
5. Architecture-Based Approach
- While previous methods assume a single model with a single parameter space, CL also has many techniques that exploit the architecture of the model.
- For example, the parameter assignmentDuring training, each new task is given a dedicated subspace in the network, eliminating the problem of parameter-destructive interference. However, if the network is not fixed, its size will grow with the number of new tasks.
How can we evaluate the performance of a CL model?
The basic performance of the CL model can be measured from various angles. [3]:
- Overall performance rating: Average performance across all tasks
- Memory Stability Rating: Calculate the difference between your current performance and your maximum performance before continuous training for a specific task
- Learning plasticity assessment: Measure the difference between the performance of joint training (training on all data) and training with CL
So why don’t all AI researchers switch to continuous learning right away?
If you have access to past training data and are not concerned about computational costs, it may seem easier to train from scratch.
One reason is that there is still limited interpretability of what happens to a model during continuous training: if training from scratch produces results that are as good or better than continuous training, people may prefer the easier approach of retraining from scratch rather than spending time trying to understand performance issues with CL methods.
Furthermore, current research tends to focus on evaluating models and frameworks, which may not fully reflect real-world use cases that businesses have. [6]However, there are many synthetic incremental benchmarks that do not adequately reflect real-world conditions where tasks naturally evolve.
lastly, [4]Although many papers on CL focus on storage rather than computational cost, in reality, storing historical data is much less costly and energy intensive than retraining a model.
If there was more focus on including the computational and environmental costs of retraining models, more people might become interested in improving the current state of the art in CL methods, as they would see measurable benefits. For example, [4]Retraining the model 10,000 GPU Days Recent training of large-scale models.
Why should we work to improve the CL model?
Continuous learning aims to address one of the most challenging bottlenecks of current AI models: the fact that data distribution changes over time. Retraining is costly and computationally intensive, making it neither an economically nor environmentally sustainable approach. Therefore, in the future, well-developed CL methods may result in models that are more accessible and reusable for a larger community of people.
As summarized below: [4],There is a list of applications that essentially require or could benefit from well-developed CL methods.
- ModelEdit
- Selectively edit parts of a model that are prone to errors without damaging other parts of the model. Continuous learning techniques help to continuously correct model errors at a much lower computational cost.
2. Personalization and specialization
- A generic model may need to be customized for a specific user. Continuous learning allows updating only a few parameters without introducing catastrophic forgetting into the model.
3. On-device learning
- Given the limited memory and computational resources on small devices, an efficient way to train models in real time as new data arrives, without having to start from scratch, could be useful in this field.
4. Faster retraining with warm starts
- Models need to be updated when new samples become available or when the distribution changes significantly. Continuous learning makes this process more efficient by updating only the parts affected by new samples, instead of retraining from scratch.
5. Reinforcement Learning
- In reinforcement learning, agents interact with a non-stationary environment, so efficient continual learning methods and approaches can be useful for this use case.
learn more
As you can see, still There is a lot of room for improvement in the area of continuous learning methodsIf you're interested, here are some resources to get you started:
- Introductory courses: [Continual Learning Course] Lecture #1: Introduction and Motivation From ContinualAI on YouTube: https://youtu.be/z9DDg2CJjeE?si=j57_qLNmpRWcmXtP
- Essay on motivation for continuous learning: Continuing learning: Applications and the way forward [4]
- Papers on the state of the art in continuous learning: A comprehensive survey of continuous learning: Theory, methods, and applications [3]
If you have any questions or comments, please feel free to post in the comments section.
cheers!
[1] Awasthi, A., and Sarawagi, S. (2019). Continuous Learning with Neural Networks: A ReviewIn Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (pp. 362–365). Association for Computing Machinery.
[2] Ongoing AI Wiki Introduction to Continuous Learning https://wiki.continualai.org/the-continualai-wiki/introduction-to-continual-learning
[3] Wang, L., Zhang, X., Su, H., Zhu, J. (2024). A comprehensive survey of continuous learning: theory, method, and application.IEEE Pattern Analysis and Machine Intelligence Transactions, 46(8), 5362–5383.
[4] Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Coss, Alexander Gepers, Tyler L. Hayes, Eik Hillermeyer, Christopher Keinan, Dilisha Kuditipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Trias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, Gido M. van de Ven. (2024). Continuing learning: Applications and the way forward https://arxiv.org/abs/2311.11908
[5] Awasthi, A., & Sarawagi, S. (2019). Continuous learning with neural networks: A review. Proceedings of the ACM India Joint International Conference on Data Science and Data Management (pp. 362–365) Association for Computing Machinery.
[6] Saurabh Garg, Mehrdad Farajtabal, Hadi Pooransari, Raviteya Vemulapali, Sachin Mehta, Onsel Tuzel, Vaishal Shankar, and Fartash Faghri. (2024). TiC-CLIP: Continuous Training of the CLIP Model.