Biomedical researchers can use advanced RNA-sequencing techniques to measure gene activity across millions of single cells and create detailed maps of tissues, organs, and diseases. Analyzing these datasets requires a rare combination of skills: a deep understanding of biology and the ability to develop computer code that turns data into insights. What if we could equip biomedical researchers with an AI assistant that looks at their data and supports their analysis, knows a lot about biology, and is easy to talk to? This could give scientists a virtual AI-based colleague with both biological and bioinformatics expertise to support their research.
Towards this goal, researchers led by Christoph Bock, principal investigator at the CeMM Center for Molecular Medicine Research of the Austrian Academy of Sciences and professor at the Medical University of Vienna, have developed CellWhisperer. CellWhisperer is an AI method and software tool that connects gene expression and explanatory text across over 1 million biological samples. It reduces the burden of complex computer code and provides an AI chat box to explore complex biology in English. This research nature biotechnologyshows how AI creates new ways for scientists to manipulate data when studying the biological basis of disease.
From genes to text and vice versa
CellWhisperer uses multimodal deep learning on biological text that matches gene activity profiles handpicked by the authors from public databases with the help of AI models. Combining these two data modalities makes it possible to search large datasets with text-based queries such as “Show me the immune cells from the inflamed colon of a patient with an autoimmune disease.”
CellWhisperer multimodal AI further integrates large language models trained to emulate discussions between biologists and bioinformaticians during data analysis. Chatting with CellWhisperer is therefore a bit like talking to a bioinformatics colleague, relying on CellWhisperer’s view of biological data and the biological knowledge of its large-scale language model. For example, users can ask CellWhisperer about genes that are active in cells of interest and have the model comment on potential biological effects. CellWhisperer is built into an easy-to-use web front end based on the popular CELLxGENE browser and is freely accessible online: https://cellwhisperer.bocklab.org.
“By training on experimental data from 20,000 studies over the past 20 years, CellWhisperer learned about the biological roles of genes and cells,” explains co-first author Moritz Schaefer. He is a former postdoctoral fellow in Christoph Bock’s research group at CeMM and is currently at Stanford University. “In this way, CellWhisperer is ready to analyze new single-cell RNA-seq data from many fields, making exploration of biomedical data easier and more exciting.”
A step towards becoming an AI research agent
To illustrate CellWhisperer’s biological discovery potential, the research team applied it to single-cell RNA-seq data from human embryonic development. The model used basic queries such as “heart” and “brain” to identify developmental time points, cell populations, and marker genes associated with human organogenesis. Many of these markers matched known developmental genes, while others indicated previously overlooked candidates.
“CellWhisperer not only facilitates biomedical research, but also helps us understand what’s going on inside the cells we’re studying,” said co-lead author Peter Peneder of St. Anna’s Children’s Cancer Research Institute.
“Science is teamwork, and with CellWhisperer, we have brought an AI research assistant to our team. CellWhisperer is extremely useful for exploratory research, such as getting a first impression of a new dataset and determining where to dig deeper. CellWhisperer supports and empowers us as human scientists,” emphasizes Christoph Bock.
reference: Schaefer M, Peneder P, Marzl D, et al. Multimodal learning enables chat-based exploration of single-cell data. nut biotechnology. 2025.doi: 10.1038/s41587-025-02857-9
This article has been reprinted from the following material: Note: Materials may be edited for length and content. Please contact the citation source for details. You can access our press release publishing policy here.
