
Machine learning research aims to learn representations that enable effective downstream task performance. A growing subfield aims to interpret the role of these representations in model behavior or modify the representations to enhance alignment, interpretability, or generalization. Similarly, neuroscience examines the correlation between neural representations and their behavior. Both fields focus on understanding or improving system computations, abstract behavioral patterns of tasks, and their implementations. The relationship between representation and computation is complex and needs to be made more understandable.
Over-parameterized deep networks often generalize well despite their memory capabilities, suggesting that their architecture and gradient-based learning dynamics have an implicit inductive bias towards simplicity. Networks biased towards simpler features may facilitate learning of simpler features and also affect the internal representation of complex features. Representation bias favors simple and common features, influenced by factors such as feature prevalence and transformer output position. Studies of shortcut learning and segregated representations highlight how these biases affect network behavior and generalization.
In this study, DeepMind researchers investigate the separation of representation and computation by manipulating the characteristics of features while creating datasets that match their computational roles. A variety of deep learning architectures are trained to compute multiple abstract features from the input. Results show that there are systematic biases in feature representation based on characteristics such as feature complexity, learning order, and feature distribution. Simpler or more quickly learned features are represented more strongly than complex or more slowly learned features. These biases are influenced by architectures, optimizers, and training regimes such as transformers that favor earlier decoded features in the output sequence.
Their approach involves training a network to classify multiple features via separate output units (e.g., MLP) or sequences (e.g., Transformer). The datasets are constructed to ensure statistical independence between features, and the models achieve high accuracy (>95%) on a held-out test set, confirming correct computation of features. In this work, we investigate how properties such as feature complexity, prevalence, and position in the output sequence affect feature representation. A family of training datasets is created to systematically manipulate these properties, and corresponding validation and test datasets ensure the expected generalization.
Training different deep learning architectures to compute multiple abstract features reveals systematic biases in feature representations. These biases depend on external properties such as feature complexity, learning order, and feature distribution. Simpler or earlier learned features are represented more strongly than complex or later learned features, even when all are equally well learned. Training regimes such as architectures, optimizers, and transformers also affect these biases. These findings characterize the inductive biases of gradient-based representation learning and highlight the challenges in separating external biases from computationally significant aspects for interpretability and comparison to brain representations.
In this study, researchers trained deep learning models to compute multiple input features and found that their representations exhibit significant biases. These biases depend on feature characteristics such as complexity, learning order, prevalence in the dataset, and position in the output sequence. Representation biases may be related to implicit inductive biases in deep learning. In practice, these biases pose challenges in interpreting learned representations and comparing them across different systems in machine learning, cognitive science, and neuroscience.
Please check paper. All credit for this work goes to the researchers of this project. Also, don't forget to follow us: twitter. participate Telegram Channel, Discord Channeland LinkedIn GroupsUp.
If you like our work, you will love our Newsletter..
Please join us 43,000+ ML subreddits | In addition, our AI Event Platform

Asjad is an Intern Consultant at Marktechpost. He is pursuing a B.Tech in Mechanical Engineering from Indian Institute of Technology Kharagpur. Asjad is an avid advocate of Machine Learning and Deep Learning and is constantly exploring the application of Machine Learning in Healthcare.
