Introducing ProtST: A framework that enhances pre-training and understanding of protein sequences from biomedical texts

Screenshot 2023-07-11 at 11.57.07 PM — https://arxiv.org/abs/2301.12040

A large language model could dive into almost any domain. From natural language processing and natural language understanding to computer vision, these models have incredible capabilities that provide solutions for all areas of artificial intelligence. Advances in artificial intelligence and machine learning have shown how these language models can be used to predict protein structure and its function. Protein language models (PLMs) pre-trained on large protein sequence datasets have demonstrated their ability to enhance predictions of protein structure and function.

Proteins are essential for biological growth, cell repair and regeneration, and have important applications in drug discovery and healthcare. Currently, existing PLMs only learn protein expression while recording co-evolutionary information based on protein sequence, and do not include other important properties such as protein function and subcellular location. These models lack the ability to explicitly capture protein function.

For many proteins, textual characterizations are available that provide insight into their key functions and properties. To explore this further, the research team introduced ProtST, a framework that uses biomedical texts to improve pre-training and understanding of protein sequences. The team also developed his dataset, called ProtDescribe, which combines protein sequences with text describing their function and other properties. The ProtST framework, based on the ProtDescribe dataset, aims to preserve the expressive power of traditional PLM when acquiring co-evolutionary information during the process of pre-training.

🚀 Check out 100’s of AI Tools at the AI Tools Club

Three separate jobs were created to add protein property data of varying granularity to PLM during the pre-training stage while preserving the initial expressiveness of the model. The first is unimodal mask prediction. This aims to preserve the ability of PLM to record co-evolutionary information using masked protein modeling. This model is trained to predict the masked parts based on the surrounding context by masking out certain regions of the protein sequence. This ensures that PLM retains its ability to represent as you add more property data.

The second is Multimodal Representation Alignment, which aligns protein sequences and their associated textual representations. Structured textual representations of protein feature descriptors are extracted using biological language models, and following alignment of protein sequences to these textual representations, PLM provides a semantic analysis between the sequences and their textual descriptions. Relationships can be recorded.

The third task, multimodal mask prediction, defines detailed dependencies between residues in the protein sequence and words in the protein property description. To create a multimodal representation of both residues and words, we use the fusion module to predict masked residues and words. This allows PLM to record textual descriptions of complex connections between protein sequences and their properties.

As a result of the evaluation, the team found that ProtST’s supervised learning utilizes enhanced protein representations to obtain better performance on various representation learning benchmarks. In many of these representation learning tasks, ProtST-guided PLM performs better than previous models. ProtST showed excellent performance in zero-shot protein classification in a zero-shot environment. As a result, the trained model was able to classify proteins into several functional categories, even in classes that were not present during training. ProtST can also be used to search for functional proteins from large databases without the need for functional annotation.

In conclusion, this framework for enhancing protein sequence pre-training and comprehension with biomedical texts seems promising and a good addition to the progress of AI.

Please check paper and Github link.don’t forget to join 26,000+ ML SubReddit, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email us. Asif@marktechpost.com

🚀 Check out 800+ AI tools in the AI Tools Club

Tanya Malhotra is a final year student at the University of Petroleum and Energy Research, Dehradun, graduating with a Bachelor of Science in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
A data science enthusiast with good analytical and critical thinking, she has a keen interest in learning new skills, leading groups, and managing work in an organized manner.

🚀 Transform Selfies into AI Generated Headshots: Try the #1 AI Headshot Generator Today

Source link

Madeline Foster commented on What is Generative AI? Everything You Need to Know: This is exactly how you scale a modern business. S
gbgbet commented on Adobe and OpenAI pave the way for AI-driven video generation: It is appropriate time to make some plans for the
binance registration commented on How AI Can Bridge the Language Barrier in Crypto: Thanks for sharing. I read many of your blog posts
gbgbet commented on Emerging Trends in AI Video Editing Apps: A Comprehensive Overview: I'm not sure where you are getting your info, but
binance registrering commented on There is a War for talent in EV space, grooming them with AI and ML to enhance operations: Sitaram Kandi, Vice President – HR, PVs and Electric Vehicles, Tata Motors: Thank you for your sharing. I am worried that I la

Introducing ProtST: A framework that enhances pre-training and understanding of protein sequences from biomedical texts

Leave a Reply

RECENT POSTS

Increasing Google Ranking Volatility, Loss of Discover Data, Complete Disappearance of FAQ Rich Results, Google Ads AI Dashboard

SPECIALGUESTX AGENCY leverages AI to invent the future of fine jewelry – Marketing Communications News

New report highlights AI’s growing role » CBIA

Related Posts

Leave a Reply