“AI tools are cultural artifacts, not neutral software.”

As content becomes increasingly mechanized, how can we ensure that information remains reliable? EPFL Professor Andrea Cavallaro shares his views.

Few forces have the power to shape our lives as profoundly as information. In our society, publicly shared facts determine how citizens vote, how patients are treated, and how communities respond to crises… But where reliable information strengthens trust and cooperation, corrupt information can also erode these bonds.

This has always been true. What has changed dramatically in recent years is the scale and speed at which information is produced, manipulated, and disseminated. This process is greatly enhanced by artificial intelligence. Today, AI tools generate and distribute text, images, and videos that are increasingly indistinguishable from human-generated content. And if the system doesn’t repair itself, who will?

Andrea Cavallaro, professor and director of the Multimodal Intelligent Systems Laboratory at EPFL, is developing the multimodal systems needed to detect hate speech across text, images, audio, and video. He is also one of the leaders of AlignAI, an ambitious EU-funded PhD network that will train 17 PhD candidates across six universities to embed human values into large-scale language models.

The AlignAI project aims to convey human values to AI systems. What does that actually mean?

The idea is to provide both an intellectual framework and a practical platform for transferring personal and societal values into learning systems. How can we characterize the values and norms that people consider important? How can we define the processes by which these values are transferred into the system? These are the questions we are addressing. At the moment, AI systems are primarily designed by people with engineering backgrounds. But we are no longer dealing with systems that measure physical properties. Large-scale language models (LLMs) interact with humans, and humans are inherently difficult to characterize. As such, AlignAI is primarily staffed by non-technical PhD students, including social scientists, cognitive psychologists, and philosophers.

Values vary widely across cultures, individuals, and contexts… how do you begin to map them?

AlignAI will test its approach across three use cases: education, mental health, and online news consumption. These are domains where the impact of LLM is already high and the risk to get the alignment right is very high. One of our PhD students is working across all three disciplines to develop a conceptualization of value. We start in Europe, which is already a very diverse region, using existing legislation as a starting point. Because laws embody values that society considers important enough to be codified. We are working with the judges to find the right angle.

People tend to trust software by default. How much of that confidence is guaranteed in an LLM?

Automation bias is a well-known phenomenon. Because something is software, we may think it is trustworthy. However, you must remember that these tools are created. Someone decided what data to use, how to train the model, and how to fine-tune the model to limit unsafe behavior. And what is safe or unsafe is usually determined by a team of engineers, who impose their choices and, by extension, their underlying biases. All of this is passed from the dataset properties to the learned model properties. I call this distributed authorship. The key is to engage users as active auditors, rather than just passive customers, investigating edge cases and questioning value biases. AI tools are not the neutral technological tools of the last century that can be adjusted within known operating conditions. They interact with us and we shape their behavior according to our prompts. By design, they please us to increase engagement. The dynamic is completely new.

What are the challenges with hate speech detection?

Hateful content can be hidden in a variety of formats, including video frames, on-screen text, audio, and spoken word. Meaning may only emerge when these modalities are combined. We developed a system to cross-reference all of these at the same time. But hate speech has also evolved, using coded language, sarcasm, and implicit references. It is a moving target and undermines trust in systems, institutions and democracy.

Isn’t this a bit of a political theme for a vocational school?

In other words, they can be said to be cultural. The AI tools we use every day are cultural products. They are trained on human cultural production, primarily on specific cultures, and there is a huge imbalance. These are compressed versions of digital content created over several decades. Rather than being dry, neutral software, it is a container that absorbs humanity’s creations and fluently provides answers that only a few years ago were thought to be the preserve of highly educated humans. Recognizing this changes how we design them, evaluate them, and teach about them. Many of these students will end up building tools that work with humans. They need to understand what it means to co-design with the people who actually use the technology, rather than imposing a techno-solutionist perspective from above.

How does your work reach the tech giants that produce the world’s most popular LLMs?

We practice open science in our research. Our findings are easily accessible as open source, allowing developers to examine them and adopt them if they find them useful. I hope so. But what gave me extra hope was something I didn’t expect. I was invited to speak at a very large multinational company about what it means to build rights and values into AI tools. I was struck by how much this project resonated, even with a very specialized audience that I didn’t necessarily think would be interesting. In addition to AlignAI, there are other research groups around the world interested in designing tools that not only maximize engagement, but also support human flourishing.

Source link