As humans, our eyes receive two-dimensional images, which our brains convert into three-dimensional experiences. This ability allows us to recognize our position in space, judge distance, perceive depth, and enjoy visually examining all kinds of objects and events.
But trying to imagine invisible structures and higher-dimensional processes that cannot be captured by human-designed scopes is challenging for data scientists and visualization professionals who rely on machine learning and AI tools to enhance visual exploration.
“Biological processes are an example of complex high-dimensional data,” says USU Director Kevin Moon. Data Science and Artificial Intelligence Center and associate professor Department of Mathematics and Statistics. “For example, one of the datasets we use to test our AI tools is clinical data measured from multiple sclerosis patients. These datasets contain hundreds of thousands of data points about disease progression at the cellular level, as well as treatments and clinical outcomes.”
Moon is the corresponding author of the submitted paper “Gaining Biological Insights through Supervised Data Visualization.” Online June 30th in natural computational science.
This paper was published with the support of:
-
First author and USU alumnus Jake Rose (PhD ’22, Statistics), assistant professor at Brigham Young University.
-
Adele Cutler, Professor Emeritus, USU Department of Mathematics and Statistics.
-
Professor Yasuhiro Chou (USU) Department of Biochemical Engineering.
-
USU alumnus Wei Zhang (Ph.D.’21, bioengineering), University of Utah researcher.
The team’s research is collaborating with the National Institutes of Health. IVADO Visiting Scholar Programincluding additional national and international collaborators.
“In this paper, we introduce RF-PHATE, an acronym for Random Forest-Potential of Heat-diffusion for Affinity-based Trajectory Imbedding,” Moon said. “It’s just a tidbit, but it’s a supervised data visualization technique that allows you to explore relationships between related data in multidimensional datasets.”
To understand this, he said, it is helpful to examine the capabilities of previously developed unsupervised and supervised data visualization techniques.
“Commonly used unsupervised methods such as PHATE, t-SNE, and UMAP, as well as existing supervised methods, can help visualize the structure of big data sets,” Moon said. “However, each has some weaknesses. Some tend to overemphasize the differences between groups of data, and some do not take into account how those groups relate to each other. RF-PHATE is much better at preserving the structure of how groups of data relate to each other.”
In their paper, the research team demonstrates the functionality of the model and documents how RF-PHATE provides evidence for previously suspected subtypes of multiple sclerosis.
“Identifying subtypes is important because MS affects each patient differently and knowing the specific type will inform treatment decisions,” Moon said.
Additional datasets used to investigate the RF-PHATE model included plasma data from COVID-19 patients and data from antioxidant-treated lung cancer cells, but Moon notes that the model is not limited to biological data.
“RF-PHATE can be applied to many other fields, and can also be used to develop AI models that are easier to interpret, or to analyze the models themselves,” he said. “This remains a very active area of research for our group.”
Mr. Moon encourages both undergraduate and graduate students to explore this area of research along with other interdisciplinary fields. opportunity Facilitated by the USU DSAI Center.
“We support AI for Science, an international movement that encourages the use of artificial intelligence and machine learning to accelerate research, analyze large datasets, and simulate complex systems,” he says. “Through interdisciplinary collaboration, we can develop and use AI tools to more effectively analyze scientific data and accelerate discovery.”
reference: Rhodes JS, Aumon A, Morin S, et al. Gain biological insights through supervised data visualization. national computational science. 2026:1-21. doi: 10.1038/s43588-026-00999-7
This article has been reprinted from the following material: Note: Materials may be edited for length and content. Please contact the citation source for details. You can access our press release publishing policy here.
