CHAMMI-75: Finding commonalities in millions of biological images

Machine Learning


Scientists at the Morgridge Institute are elevating the oft-used phrase: “Every picture tells a story.”

The research team developed a database combining 2.8 million cell images taken with a wide range of imaging modalities. You can use it to train machine learning models to answer questions about basic biology, functional genomics, and treatment design.

Juan CaicedoJuan Caicedo
Juan Caicedo

Morgridge researcher Juan Caicedo says the ultimate goal is to provide researchers with a more universal tool to examine cell morphology for biological research. Cellular morphology involves analyzing the size, shape, structure, and pattern of cells to better define the differences between healthy and diseased states. These tools using machine learning have proven to be extremely powerful methods for studying how cells respond to treatments, among other biological applications.

The problem now, Caicedo says, is that the models being used are highly specialized. For example, it is possible to quantify liver cell images by confocal microscopy, but not to recognize the same cells by fluorescence or electron microscopy.

Caicedo and his team aim for a “one size fits all” model.

“If you look at how artificial intelligence is being applied to microscopy today, most of these models are trained on specific types of microscopy images,” says Caicedo, also an assistant professor of biostatistics at UW-Madison. “So we started realizing that if we really wanted to make progress, we needed to create models that were more broadly applicable to scientists.”

This database was released to the public by researchers in early 2026 and is called CHAMMI-75. It stands for “Channel Adaptation Model for Microscopic Imaging”. The number 75 represents the different data sources of cell images included in the database.

Overall, the model collects images of over 1.8 billion cells. These images are standardized through common formatting and metadata, allowing them to essentially “speak the same language” even though they come from very different sources. The site includes images from 14 different imaging modalities, including fluorescence, confocal, cryo-electron, or other microscopy “flavors”. It also incorporates images of 16 different organisms and 15 different magnification levels.

“We believe that every microscopy laboratory needs this kind of capability, there is a real need, and that our approach can help those laboratories.”
Juan Caicedo

The concept of “universal morphology” was actually inspired by the early success of large-scale language models (LLMs), Caicedo says. About a decade ago, AI researchers began developing LLMs that were trained in a variety of languages. AI has proven to be powerful enough to analyze and find similarities in virtually any language without the need to create a new AI platform for each language.

“I remember reading the first paper that did an experiment in 100 languages ​​at the same time, and the results were amazing,” Caicedo says. “The more languages ​​you include, the more fluent your model will be, even if that language doesn’t generate a lot of data. Imagine a very unusual language.”

“All human languages ​​have some things in common, so we found that languages ​​with very few resources benefit most from that joint modeling,” he added. “When you specialize in a model for this unusual language, you are limited to having only a small number of documents for that language.”

Caicedo says he’s hopeful that CHAMMI-75 will have similar effects and that AI could apply some of these commonalities to all cells, shedding light on rare or poorly understood diseases. After all, there are more than 200 cell types in the human body, and all cells have common elements such as DNA, a nucleus, membranes, and organelles. He says you’ll always find evidence from well-studied conditions that inform less-understood conditions.

One immediate application will be in drug development and repurposing, as a confirmation and predictive tool to help scientists choose the best direction when testing hundreds of different compounds. It also has the potential to inform smarter microscopy methods that can be programmed to identify different phenotypes in real time while experiments are being performed.

Examples of imaging data used in CHAMMI-75 include cells of human origin, as well as cells from other mammals, bacteria, and even plants. This includes data from genetic screening, drug screening, 3D imaging, and specialized cell biology. It includes publicly available data from the world’s largest existing cell atlases, including Image Data Resource, Human Protein Atlas, and Cell Painting Consortium.

Several biological research laboratories use Caicedo Institute’s methods for their own image collections. For example, Morgridge researcher Ken Poss studies heart regeneration in zebrafish, and the data helps show how heart activity evolves under certain genetic conditions. A team at Johns Hopkins University is using it for a database of human-derived cell lines derived from schizophrenia patients, and a team at the University of Copenhagen is using it to quantify micronuclei formation.

“We are trying to find as many examples of image-based experiments as possible to improve the quality of measurements we provide for real-world biological research,” Caicedo says. “We believe that every microscopy laboratory needs this kind of capability, there is a real need, and that our approach can help those laboratories.”



Source link