From machine learning to multimodal models: the AI revolution in enzyme engineering

Newswise — Enzymes are biological catalysts that support important cellular reactions and numerous industrial processes, from food production to pharmaceuticals. Its efficiency depends on how the amino acid residues form the active site, recognize the substrate, and stabilize the transition state, often through induced adaptive conformational changes. Traditional enzyme engineering relies on directed evolution, rational or semirational design, and de novo design. Although effective, these approaches are limited by lengthy and costly screening, reliance on a priori structural knowledge, and limited reliability under real-world conditions. For all strategies, the breadth of protein sequence space is a fundamental bottleneck, as even small mutational changes generate vast numbers of variants that cannot be tested exhaustively.

a study (DOI: 10.1016/j.bidere.2025.100044) Published in biodesign research An August 29, 2025 paper by the team of Dr. Shuaiqi Meng and Dr. Haiyang Cui from Nanjing Normal University argues that the field has entered a new phase where fundamental models and multimodal systems integrate sequence, structure, chemistry, and experimental context, enabling enzyme designs that are not only faster but also increasingly generalizable and mechanistically interpretable.

In this review, we introduce an integrated AI-first framework that extends enzyme engineering from single-enzyme modeling to multi-enzyme pathway design and integrates sequence, structure, reaction environment, and system-level constraints into a continuous “model design-validation” logic. At the single-enzyme scale, the modeling strategy explicitly encodes key situational variables (pH, temperature, solvent composition, substrate and product identity, cofactor availability) so that predictions reflect realistic biochemical settings rather than ideal conditions. Within this scope, the authors organize AI applications into three core task families: functional modeling (enzyme/nonenzyme identification, EC number prediction, GO annotation, ligand binding site identification, and kcat, K_Mand kcat/K_M), structural modeling (near-atomic 3D prediction of enzymes and complexes to reveal catalytic pockets, substrate binding features, and support reverse folding), and property modeling (thermal stability, pH tolerance, selectivity, binding affinity, robustness, and resistance-related properties that govern ease of use in practice). Methodologically, this framework combines condition-aware kinetic prediction, substrate- and product-centric outcome modeling (including by-product control and feedback inhibition assessment), cofactor-specific prediction and regeneration/recycling optimization, thermodynamics-based inverse folding, and AI-driven complex structural prediction of protein-ligand, protein-protein, and protein-nucleic acid assemblies. These capabilities are scaled up to pathway-level modeling to optimize enzyme-to-enzyme coordination through metabolic network analysis that assesses expression balance, flux regulation, pathway design and retrosynthesis, thermodynamic/kinetic feasibility, and even spatial constraints. This review also tracks four stages of evolution in AI integration (classical machine learning, deep neural networks, protein language models, and emerging multimodal and agent-style workflows) and highlights that progress relies not only on algorithms but also on data infrastructure, leveraging continuously updated databases and curated machine learning datasets across sequences, structures, dynamics, and interactions. Importantly, the integrated approach has already demonstrated measurable benefits. Condition-aware tools such as UniKP helped unearth highly active tyrosine ammonia-lyase candidates and guide directed evolution to achieve up to 3.5-fold improvements in catalytic efficiency, while system-level modeling improved the design of coordinated processes of multiple enzymes (e.g., polysaccharide degradation, lignin valorization, terpene biosynthesis) and increased confidence in predicting the pathways through which enzymes degrade. The organization is addressed explicitly and collectively informs the transition from optimization of individual enzymes to global control of the biocatalytic system.

By linking “mining-design-validation” into a data-driven loop, AI can reduce the number of experiments needed to achieve high-performance enzymes and expand the scope of engineering feasibility. Short-term effects include rapid discovery of candidate enzymes from uncharacterized sequence space, improved prediction of catalytic and stability outcomes prior to synthesis, and more reliable prioritization in directed evolution campaigns.

###

References

Toi

10.1016/j.bidere.2025.100044

Original source URL

https://doi.org/10.1016/j.bidere.2025.100044

Funding information

This study was supported by the National Natural Science Foundation of China (Grant No. 22008166), the National Natural Science Foundation of China Innovative Research Group Project (Grant No. 22478199), and the Jiangsu Provincial Synthetic Biology Basic Research Center (Grant No. BK20233003). This research is supported by the Jiangsu Provincial “Entrepreneurship and Innovation Plan” Education Fund (No. 164080H00250), the Research Startup Fund of Nanjing Normal University (No. 184080H201B94), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD, No. 164320H1865).

About biodesign research

biodesign research is specialized in the exchange of information in the interdisciplinary field of biological systems design. Its unique mission is to pave the way for the predictable de novo design and evaluation of engineered or re-engineered organisms using rational or automated methods to address global challenges in health, agriculture, and the environment.

Source link