Machine learning-based protein annotation tools predict protein function

Machine Learning


Machine learning-based protein annotation tools predict protein function

Snekmer is an application for building and searching protein family models and novel sequence clusters.Credit: Jason McDermott, Pacific Northwest National Laboratory

Microorganisms drive important processes of life on earth. They influence the global elemental cycle, the movement of carbon, nitrogen and other elements. It also promotes plant growth and influences disease development. These roles are essential in any ecosystem. Although research continues to expand the database of microbial DNA sequences, it does not provide all the biological information about proteins.

Engineering microbes for sustainable bioenergy and other bioproducts requires scientists to better understand the function of proteins and other molecules. Scientists infer protein function by comparing it to a reference database of proteins that have already been characterized.

However, these comparisons are difficult and do not scale to large databases. To meet this challenge, scientists have applied machine learning to models that predict protein function. The result is Snekmer, a program that allows scientists to rapidly model protein families.

Studying the biological protein molecules of microorganisms helps scientists pursue new applications for genetically engineered microorganisms. Snekmer is easy to deploy in high performance computing environments. Additionally, he has been incorporated into the DOE KBase framework as a new application that allows users to annotate genome and metagenomic sequences.

This helps scientists better model the engineering impact of microbes. This includes the impact of these microbes on climate and their benefits for crop health and bioproduction. Snekmar also helps scientists study microbial evolution and microbiome patterns.

The inability of current methods to predict the function of 30–50% of bacterial protein sequences is a major barrier to a deeper understanding of complex systems such as the soil microbiome. Most protocols rely on pairwise alignments, but this becomes computationally unwieldy and more difficult to interpret as databases grow.

For protein family alignment-based models, sensitivity and accuracy depend on the initial training set and risk becoming obsolete when additional sequence diversity is discovered. Many bacterial proteins have no assigned function, or have been assigned only a general function based solely on taxonomic understanding.

To address this need, researchers at the Pacific Northwest National Laboratory, Baylor University, and Oregon Health and Science University exploited the redundancy in amino acid residue characteristics to reduce sequence space and apply machine learning to short protein sequences. Developed Snekmer, a software tool that uses the (kmer) functionality. Generate protein family models.

Snekmer users can recode protein sequences into reduced alphabet kmer vectors, build supervised classification models trained on input protein families, or perform functional classification of proteins based on Snekmer models.

The research will be published in a journal Advances in bioinformatics.

For more information:
Christine H Chang et al., Snekmer: A Scalable Pipeline for Protein Sequence Fingerprinting Based on Amino Acid Recoding, Advances in bioinformatics (2023). DOI: 10.1093/bioadv/vbad005

Courtesy of the U.S. Department of Energy

Quote: Machine Learning-Based Protein Annotation Tool Predicts Protein Function (June 1, 2023) from https://phys.org/news/2023-06-machine-learning-based-protein-annotation-tool.html Retrieved June 2, 2023

This document is subject to copyright. No part may be reproduced without written permission, except in fair trade for personal study and research purposes. Content is provided for informational purposes only.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *