To understand dynamic processes from static snapshots, researchers have traditionally used something called “pseudotime.” When an image is captured, other cells and genes of the same type that may be further along in the same process are also captured. If scientists connect the dots correctly, they can gain powerful insights into what processes look like over time.
However, connecting these dots is difficult to infer, based on the assumption that similar-looking cells are simply located at different points on the same pathway, and is susceptible to false starts, stops, ruptures, and multiple chemical forces. Biology, such as effects, is often much more complex. each gene.
Instead of traditional pseudotime approaches, scientists are interested in an alternative approach known as “RNA kinetics,” which examines the dynamics of mRNA transcription, splicing, and degradation within cells. Although promising, it is still an early technology.
To improve the RNA velocity approach, TopicVelo employs and glean insights from much more difficult probabilistic models that reflect the unavoidable randomness of biology.
“If you think about it, cells are inherently random,” said Gao, lead author of the paper. “Even if you have twins or cells that are genetically identical, they can grow up to be completely different. TopicVelo introduces the use of stochastic models. We now have a better grasp of the underlying biophysics.”
Machine learning shows the way
The researchers also realized that another assumption was limiting standard RNA rates. “Most techniques assume that all cells essentially express the same large genetic program, but that cells need to carry out different types of processes simultaneously, to varying degrees. “I can imagine that,” Riesenfeld said. Disentangling these processes is difficult.
Probabilistic topic modeling, a machine learning tool traditionally used to identify themes from documents, provided the University of Chicago team with a strategy.
TopicVelo groups scRNA-seq data by the processes in which those cells and genes are involved, rather than by cell or gene type. Processes are inferred from data rather than being imposed by external knowledge.
“If you look at scientific journals, they're organized around themes like 'physics,' 'chemistry,' and 'astrophysics,'” Gao said. “We applied this organization principle to single-cell RNA-seq data. Now we can organize data by topics such as 'ribosome synthesis,' 'differentiation,' 'immune response,' and 'cell cycle.' Now it looks like this. A unique stochastic transcription model can then be fitted to each process. ”
TopicVelo disentangles this complex process, organizes it by topic, and then applies topic weights to cells, taking into account what percentage of each cell's transcriptional profile is involved in which activity.
Riesenfeld said: “This approach helps us examine the dynamics of different processes and understand their importance in different cells. This is especially true when there are branch points or when cells are pulled in different directions. It’s helpful.”
The results of combining probabilistic and topical models are surprising. For example, TopicVelo was able to reconstruct trajectories that previously required special experimental techniques to recover. These improvements greatly expand the potential applications.
Gao compared the paper's results to the paper itself, which is the product of many research fields and expertise.
“In PME, if you have a chemistry project, chances are you have physics or engineering students working on it,” he said. “It's never just chemistry.”
Citation: “Dissection and Integration of Burst Transcriptional Dynamics for Complex Systems”, Gao et al., Proceedings of the National Academy of Sciences, April 26, 2024. DOI: 10.1073/pnas.2306901121
Funding: NIH.
—adapted from Paper published by the Pritzker School of Molecular Engineering.
