design de novo Proteins hold great potential for realizing an excellent combination of new functions and mechanical properties, thereby advancing biological and engineering applications. However, in addition to the experimental costs associated with designing novel proteins with targeted structural properties or functions, testing the vast number of possible amino acid sequences remains a challenge.
A recent study published in the journal Chemresearchers utilize attention-based diffusion models to efficiently generate novel protein sequences with predetermined secondary structures.
About research
In this study, researchers predict amino acid sequences and generate folded three-dimensional (3D) structures of proteins based on secondary structure design constraints by residue-by-residue structure or global content. We describe two generative deep learning models.
The team focused on protein mechanical properties for analysis and mapping between primary amino acid sequences and secondary protein structures. The model considered conditioning explanations as input for generating amino acid sequences by attention-based conditional diffusion.
3D protein structures were generated using the AlphaFold and OmegaFold methods. Two models were trained using the Protein Data Bank (PDB) dataset.
Model A received a partial input of proteinaceous secondary structure, whereas model B considered residue-by-residue data of secondary structure as input and built a 3D protein model to predict the amino acid sequence of the protein. . The model was able to generate samples for further refinement of the sequence by selecting the best sample that best satisfied the conditioning input or the sample with the lowest similarity to known proteins.
The diffusion model used a U-Net convolutional neural network with interconnected transformers and convolutional layers, skip connections and attention modules to identify and subsequently remove noise at every step.
of de novo By performing Basic Local Alignment Search Tool (BLAST) analysis to assess protein novelty, we compared proteins with structural prediction (CASP)-significant evaluation of 14 and 15 target set proteins. Generative models constructed protein sequences from random signals under conditioning by stepwise reversing the diffusion process. The definition of protein secondary structure (DSSP) code was used to assess eight parameters related to protein secondary structure.
For model A, conditioning vector parameters included α-helices, extended parallel and/or antiparallel β-sheet structures, 3, 4, or 5 hydrogen-bonded turns, unstructured parameters, β-bridges, 3/3 will beTen Helix, pi-helix, and bend.
For model B, five cases with different secondary structure distributions were considered. These included a dominant β-sheet, a long α-helix with a breaker in the middle, a small α-helix, a β-sheet flanked by two α-helical domains, and a partially disordered helical protein.
Investigation result
Diffusion models efficiently design proteins with secondary structure specifications, de novo A previously undiscovered amino acid sequence.
Generative models provided robust results even for imperfectly typed inputs and unrealistic designs. As a result, the use of these models may be extended to generate proteins with other clinically and functionally relevant properties.
A residue-by-residue secondary structure-based model was more accurate and yielded more diverse amino acid sequences, especially for α-helical structures.
Both models reliably addressed diverse design goals and provided novel approaches for discovering superior protein materials and systems. Model A analysis identified several distinct cases, including high β-sheet content, mixtures of α-helical and β-sheet content, pure α-helical content, significantly disordered α-helices, and completely disordered proteins. it was done.
AlphaFold and OmegaFold analyzes of predicted β-strand assembly into higher-order fibrous structures yielded comparable results. BLAST analysis predicted structures similar to existing amino acid sequences that could be enhanced by increasing the conditioning probability during training or by adding noise to the conditioning vector.
The results of Model B were in good agreement with the design goals, confirming that a protein production model could be designed. de novo Proteins with geometric specifications and secondary structure localization. Developing models that provide detailed atomic coordinates may improve protein design.
For model B, BLAST analysis showed 50% to 60% similarity between existing and generated proteins. Model B produced protein more efficiently than model A.
Conclusion
The current study reports two deep learning models that can predict amino acid sequences and 3D protein structures based on secondary structure design objectives. These new models are robust, reliable, and capable of generating new, yet undiscovered protein sequences from natural mechanisms and systems.
The model generated protein sequences with the desired secondary structure conformations. These data can be combined to obtain a protein sequence using model A, while model B can be used to refine the sequence by specifying residue-level detail of secondary structure. can.
The model not only tries to respect conditional inputs, but also yields to the underlying constraints of physically possible secondary structures learned during training. This approach has the potential to accelerate the design of new proteins for use in medical, industrial, and other bioengineering applications.
Further studies include additional conditioning, investigating the functional properties of the generated proteins for a variety of properties beyond their structural purpose, such as biological activity, and diverting sequence diversity from those of existing proteins. need to improve sex.
Journal reference:
- Ni, B., Kaplan, DL, & Buehler, MJ (2023). Generative design of de novo proteins based on secondary structure constraints using attention-based diffusion models. Chemdoi:10.1016/j.chempr.2023.03.02
Quote
To cite this article in an essay, paper, or report, please use one of the formats below.
-
APA
Toshniwal Paharia, Pooja Toshniwal Paharia. (April 25, 2023). Innovation in protein design: AI generates new sequences. News – Medical. Retrieved 25 April 2023 from https://www.news-medical.net/news/20230425/Revolutionizing-protein-design-AI-generates-novel-sequences.aspx.
-
MLA
Toshniwal Paharia, Pooja Toshniwal Paharia. “Revolution in protein design: AI generates new sequences”. News – MedicalApril 25, 2023.
-
Chicago
Toshniwal Paharia, Pooja Toshniwal Paharia. “Revolution in Protein Design: AI Generates New Sequences”. News-Medical. https://www.news-medical.net/news/20230425/Revolutionizing-protein-design-AI-generates-novel-sequences.aspx.
-
harvard
Toshniwal Paharia, Pooja Toshniwal Paharia. 2023. Revolutionizing Protein Design: AI Generates New SequencesNews-Medical, accessed April 25, 2023, https://www.news-medical.net/news/20230425/Revolutionizing-protein-design-AI-generates-novel-sequences.aspx.
