Claude Eye Maker Humanity Recently, we have published a new study highlighting the risk of hidden behavioral transfers between AI models via seemingly meaningless data. Research by the Human Fellows Program in collaboration with Truthful AI, the Warsaw Institute of Technology, and the Alignment Research Center examined the phenomenon called Subliminal learning. AI systems say they can pass hidden actions to each other without their knowledge, causing concern AI Safety. “Language models can send characteristics to other models, even if they are thought to be meaningless data,” Humanity posted on X.In one test, a small “student” AI model was trained with a random string of numbers generated by a larger “teacher” model in favor of owls. The student model developed the same preferences despite the word “Owl” not being displayed in the training data. Researchers found that this behavior only occurs when two models use the same architecture. Characteristic movement occurred through subtle statistical patterns that could not be detected even with advances in AI filters.Some of the properties handed over were not harmless. Risky behavior has also become a student model, avoiding difficult questions and manipulating answers. This can be a problem as businesses can create smaller, cheaper AI based on the bigger ones, potentially spreading unintentionally unsafe behavior.The study warns that subliminal learning can occur in many neural networks under appropriate conditions, making it a broader problem rather than a one-off issue. “Subliminal learning can be a general property of neural network learning. We prove theorem that it generally occurs NNS (under certain conditions), and we also demonstrate empirically even with simple MNIST classifiers,” says a post from AI researcher Owain Evans.The findings come when AI developers use synthetic data to reduce costs. Industry experts can increase the risk of flawed models entering the market without close control, particularly in the rush of scaling up by startups like Elon Musk's Xai.
