Machine learning for analyzing spatial-omic data

Machine Learning


Jeremy Goexks (left) is Moffitt Cancer Center (Florida, USA), where he is also an associate faculty member in the Department of Machine Learning. Jeremy's computational laboratory leads the development of machine learning-based models for the analysis of spatial-omics data.

We caught up with Jeremy at AACR 2024 to talk about his lab's research, get tips on best practices for collecting and analyzing spatial-omic data, and discuss how the field can advance in the future .

What did you present at AACR?

There were two stories that I shared with the audience. The first article highlights recent publications detailing the use of spatial omics and machine learning to understand how a novel immunotherapy, the agonist CD40, affects the tumor microenvironment in pancreatic cancer. I did. We also investigated to what extent these approaches could be used to predict which individuals would respond to that immunotherapy and have longer disease-free survival, and vice versa. The second story is about how to develop software that makes it easy for both experimental and computational researchers to perform this type of complex analysis.

What were the three key takeaways from your presentation?

The first is that team science is extremely important. The publication I shared with the audience was a great collaboration between myself and several immunologists at Oregon Health and Science University, including Caitlin Byrne. Lisa Cousens. And combining my data science expertise with their immunology expertise was critical to the success of the project.

Key point 2 is that there are so many different things that can be measured in these tumors from a single-cell perspective as well as from a spatial perspective that it can be difficult to determine which features are important. about it. Machine learning is one way to capture that information and identify which features help us understand the biology of tumors.

My lab also uses machine learning to understand how treatments change the tumor microenvironment and which features are important for predicting response. We demonstrated that it can be used to identify which of the thousands of features that can be measured are important.

Third, these data science experiments are currently difficult to perform and take a long time. We need to build better software that accelerates that process and reduces the time it takes to perform an analysis while ensuring that the analysis is robust, reproducible, and useful to the scientific community. there is.


Single cell spatial proteomics

In this In Focus, we explore how spatial proteomics has given researchers access to information about the cell surface proteome and provide examples of its applications across the life sciences.


What are some best practice tips for making the most of spatial omics data?

One of the first things you need to do is harmonize all available data and metadata. Create a dataset where all data is processed in the same way and includes the necessary relevant metadata to determine if anything is missing and where each data point came from (which patient and at what point in time). We would like to be able to accurately understand the temperature (including whether it was warm). point.

From a data science perspective, best practices focus on using reproducible approaches. Either you use GitHub to store your code, or you use automated pipelines to perform analysis and generate numbers. This means that when the time inevitably comes to reproduce or apply that analysis to a new dataset, you can do so with confidence that you are repeating it accurately.

What interesting contributions have your lab made to the spatial omics data analysis?

We are using machine learning to improve our ability to predict treatment responses and our understanding of the underlying biology that leads to those responses. You can question the model to find out why it made certain predictions and identify which biological features it uses. This avoids the typical “black box” problem of many AI models: being able to answer a question but not being able to explain how they arrived at that answer.

I think another big contribution was that we tried to keep the model as simple as possible. Although we use a relatively simple machine learning model, we are still able to demonstrate good performance on our dataset. If your dataset is powerful, you don't always need the most complex algorithms to extract signals from your data.

What can researchers do at the data generation stage to improve downstream analysis?

I always encourage my experimental colleagues to be sure to record every element of their experiment. Everything from the antibodies and sample IDs used to the specific microscope focus levels used. All available metadata should be recorded in as systematic a manner as possible.

The second thing, which I think is relatively easy, is to try to process all samples in the same way. Processing the samples the same way makes computer analysis much easier, and you don't have to worry about there being technical differences revealed in the analysis rather than the biological differences you really want.

If there was one thing you could ask for to maximize the impact of spatial omics data in cancer, what would it be?

If there was one thing, I think sharing these datasets more widely, from raw images to downstream single cell omics tables, would be at the top of my list. Opening this data will allow many other analysts to tap into it and connect it with other datasets, and ultimately these larger datasets will accelerate cancer research. , I think it's a way to improve patient care as a medical institution. community.


The opinions expressed in this interview are those of the interviewee and do not necessarily reflect the opinions of the interviewee. biotechnique or Taylor & Francis Group.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *