Recent advances in diagnostic imaging, genomics, and other technologies have meant that the life sciences are awash in data. For example, if a biologist wants to study cells taken from brain tissue from an Alzheimer's patient, there could be countless properties they want to investigate, such as the type of cell, the genes they express, or their location within the tissue. But while cells can now be experimentally interrogated using many different types of measurements simultaneously, when it comes to analyzing the data, scientists can typically only work with one type of measurement at a time.
Dealing with so-called “multimodal” data requires new computational tools, and this is where Xinyi Zhang comes in.
A fourth-year PhD student at MIT, Zhang is working to bridge the gap between machine learning and biology to understand fundamental biological principles, particularly in areas where traditional methods have reached their limits. Working in the lab of Professor Caroline Uhler in MIT's Department of Electrical Engineering and Computer Science and the Institute for Data, Systems, and Society, and collaborating with researchers at the Broad Institute's Eric and Wendy Schmidt Center and elsewhere, Zhang has led multiple efforts to build computational frameworks and principles for understanding cellular regulatory mechanisms.
“All these are small steps towards the ultimate goal of understanding how cells work, how tissues and organs function, why we get sick, and why some diseases can be cured and some cannot,” Zhang says.
Chan's free time activities are similarly ambitious: The list of hobbies she picked up at the lab includes sailing, skiing, ice skating, rock climbing, performing in the MIT concert choir, and flying single-engine planes. (She received her pilot's license in November 2022.)
“I guess I just like going places I've never been and doing things I've never done before,” she says in her usual understated tone.
Her supervisor, Wooler, said Zhang's quiet humility strikes a chord “in every conversation.”
“Every time, we learn something like, 'Okay, she's learning to fly,'” Wooler said. “It's just amazing. Everything she does, she does it for the right reasons. She wants to be good at what she cares about. I think that's really exciting.”
Zhang first became interested in biology when he was a high school student in Hangzhou, China, when he liked the fact that his biology teacher wouldn't answer his questions and that's when he started to see biology as the “most interesting” subject to study.
Her interest in biology eventually turned to bioengineering, and her parents, both middle school teachers, encouraged her to study in the US, so she studied bioengineering as well as electrical engineering and computer science as an undergraduate at the University of California, Berkeley.
Zhang was set to start her PhD in EECS at MIT immediately after graduating in 2020, but the COVID-19 pandemic delayed her first year. Nevertheless, in December 2022, Zhang, Wooler, and two other co-authors published their paper in the journal Nature Communications.
The paper was laid down by co-author Xiao Wang, who previously worked with the Broad Institute to develop spatial cell analysis methods that combine multiple forms of cell imaging and gene expression on the same cells, and then map the location within the tissue sample from which they originated, something that has never been done before.
This innovation had many potential applications, including enabling new ways to track the progression of various diseases, but there was no way to analyze all of the multimodal data this would generate. That's where Zhang came in, interested in designing a computational method that could do just that.
The team focused on choosing chromatin staining as their imaging technique because it is relatively inexpensive yet reveals a lot of information about the cell. The next step was to integrate the spatial analysis techniques developed by Wang, for which Zhang began designing an autoencoder.
An autoencoder is a type of neural network that typically encodes and reduces large amounts of high-dimensional data, then transforms the transformed data back to its original size. In this case, Zhang's autoencoder did the opposite, taking the input data and making it higher dimensional. This allowed him to combine data from different animals, eliminating technical variations that aren't due to meaningful biological differences.
The paper used the technique, abbreviated as STACI, to identify how cells and tissues show the progression of Alzheimer's disease when viewed with different spatial and imaging techniques. The model could also be used to analyze a range of diseases, Zhang said.
If she had infinite time and resources, her dream would be to create a perfect model of human life. Unfortunately, time and resources are limited. But her ambitions are not, and she says she wants to continue using her skills to solve “the hardest problems that we don't have the tools to answer.”
She is currently working on completing several projects, one focused on studying neurodegeneration through imaging of the frontal cortex, and another project predicting protein images from protein sequences and chromatin images.
“There are a lot of questions that remain unanswered,” she says, “and I want to pick questions that make biological sense, questions that will help us understand things we didn't know before.”