thank you. Listen to this article in the player above. ✖
We have developed and demonstrated a novel metabolomics workflow for studying engineered microorganisms in synthetic biology applications. Our workflow combines state-of-the-art analytical instruments that generate information-rich data with novel machine learning (ML)-based algorithms tuned to process it.
In our role as scientists at the Pacific Northwest National Laboratory (PNNL), we led this multicenter study. Nature Communications.
Addressing Complex Sample Challenges
Metabolites are small molecules produced by a large network of cellular processes and biochemical reactions in living systems. The diversity of metabolite classes and structures constitutes a significant analytical challenge in terms of detection and annotation in complex samples.
Analytical instruments that can analyze hundreds of samples in a faster and more accurate manner are important for a variety of metabolomics applications, such as the development of microorganisms that can sustainably produce desirable fuels and chemicals.
Multidimensional measurements using liquid chromatography (LC), ion mobility, and data-independent acquisition mass spectrometry (MS) improve metabolite detection by linking separations in a single analytical platform . Although the potential of metabolomics has been demonstrated previously, this kind of multidimensional information-rich data is complex and cannot be processed with conventional tools. Therefore, we need algorithms and software tools that can process it and extract accurate metabolite information.
Software upgrade required for rich data
We have optimized a combination of advanced instruments for fast analysis and generated information-rich multidimensional data that can be used to resolve the complex metabolome.
As a computational method, Dr. Bilbao created a new algorithm called PeakDecoder to enable the interpretation of multidimensional data and ultimately identify individual molecules in complex mixtures. Our algorithm learns to distinguish between true co-elution and co-mobility directly from the raw data of the samples studied and calculates the error rate for metabolite identification. To train ML models, we propose a new method to generate training examples, similar to the target-decoy strategy commonly used in proteomics. Once the model is trained, it can be used to score metabolites of interest from the library with associated false positive rates. Also, unlike existing methods, it can be used with small size libraries.
The main results of this paper are:
- Optimized Fast Analysis Methods for Metabolites Using LC, Ion Mobility, and MS
- A New Algorithm Enabling Processing of Multidimensional MS Data and Estimation of Error Rates in Metabolomics
Accurate metabolomics profiling at scale
This method reduces sample analysis time by a factor of 3 over previous conventional approaches by using optimized LC conditions. PeakDecoder enables accurate profiling of multidimensional MS measurements in large studies.
The workflow was used to study the metabolites of different microbial strains designed by Agile BioFoundry to create various bioproducts such as polymers and diesel fuel precursors. We were able to interpret 2,683 metabolite signatures across 116 microbial samples.
“This metabolomics capability offers far-reaching benefits beyond synthetic biology and across environmental and biological research.” – Dr. Kristin Burnum-Johnson, Biochemist, Agile BioFoundry TEST Task Leader.
However, we note that the current algorithm is not fully automated due to software dependencies and requires a metabolite library acquired under compatible analytical conditions for inference.
Powering PeakDecoder with AI
We are working on the next version of the algorithm, which leverages advanced artificial intelligence (AI) techniques used in other fields such as computer vision. A user-friendly, fully automated version of PeakDecoder supports other types of molecular profiling workflows such as proteomics and lipidomics. Performance is evaluated on a wider variety of experimental data and a multidimensional molecular library of AI predictions. The new version is expected to bring significant advances in multi-omics research.
“Advanced AI-based software has the potential to replace traditional MS tools that require significant human intervention.” – Dr. Aivett Bilbao, computational scientist.
reference: Bilbao A, Muñoz N, Kim J, et al. PeakDecoder enables machine learning-based metabolite annotation and accurate profiling in multidimensional mass spectrometry measurements. Nat Commune2023;14(1):2461. Doi:10.1038/s41467-023-37031-9
