A breakthrough has recently emerged in the field of nucleic acid aptamer discovery, led by researchers Weihong Tan, Xiaohong Fang, and Tao Bing from the Hangzhou Institute of Medical Sciences, Chinese Academy of Sciences. Their innovative approach leverages machine learning techniques to decipher the complex secondary structure of nucleic acid aptamers from single-round screening data. This state-of-the-art methodology largely bypasses traditional lengthy iterative enrichment processes and directly extracts the detailed structural information needed for high-affinity aptamer optimization. This study, published as an open access research paper in CCS Chemistry, proposes an innovative paradigm in aptamer research, improving both the speed and accuracy of aptamer identification and optimization.
Nucleic acid aptamers are short, single-stranded oligonucleotides that fold into complex three-dimensional conformations and can bind to a variety of target molecules with high specificity and affinity. Despite the power of SELEX (Systematic Evolution of Ligands by EXponential enrichment) in generating candidate aptamers, elucidating the functional secondary structures that mediate target recognition remains a challenge. Traditional structure determination methods such as electron microscopy, nuclear magnetic resonance (NMR), and X-ray crystallography are not only resource-intensive but often fail to resolve the dynamic and heterogeneous structures characteristic of aptamer-target complexes. As a result, optimizing and cleaving aptamer sequences to increase binding efficiency has been constrained by limited structural insight.
To address these long-standing obstacles, the research team developed an advanced machine learning framework that integrates unsupervised autoencoder clustering and deep learning algorithms to analyze core sequence elements within a large pool of aptamer candidates obtained in a single screening round. This approach represents a departure from traditional iterative enrichment and allows the identification of conserved sequence motifs and corresponding secondary structural features important for target binding. Through this strategy, the molecular structure underlying aptamer function can be computationally inferred, providing a blueprint for rational design and refinement without the need for extensive experimental trial and error.
The authors first applied their methodology to a data set screening aptamers that target the CD8 protein, an important cell surface marker. Using deep learning to analyze a family of sequences within a single-round library revealed a core sequence ‘GTGAGGAGCTTGAAA’ that was prevalent despite a highly heterogeneous sequence background. Importantly, traditional multiple sequence alignment methods were insufficient to extract these short motifs in an environment of low homology, highlighting the superiority of machine learning approaches in resolving subtle sequence patterns associated with function.
To empirically validate the computational core sequence discovery, the team synthesized a candidate library embedded with a key submotif (5′-AGCTTGAAA-3′) and subjected it to a selection method, RE-SILEX. Remarkably, all of the newly identified aptamers (more than 20,000 in total) contained the predicted core sequence, demonstrating the robustness of the one-round screen combined with machine learning analysis. This provided a strong proof of concept that this approach not only identifies biologically relevant motifs but also guides subsequent aptamer enrichment and design.
The researchers took their analysis further and developed a machine learning-based algorithm to examine the secondary structure formed by the core sequences within the fixed region. Statistical evaluation revealed that approximately 62.4% of these sequences form stem-loop structures important for molecular recognition, while the remainder adopt diverse conformations. Among stem-loop forming aptamers, the sequence “GTGA” is predominant within the hyperbranched loop and stem region, suggesting a consensus binding motif. Detailed quantification and base distribution profiling confirmed the shared secondary structure between aptamers and revealed how specific folding patterns confer target specificity and affinity.
Based on these insights, a rational truncation and optimization strategy was applied to the aptamer derived from the RE-SILEX pool, resulting in a sequence with significantly enhanced binding affinity. This optimization process leveraged structural knowledge from machine learning to discard redundant nucleotides and focus on functionally important motifs, achieving significant improvements in aptamer performance. Such advances demonstrate the potential to streamline the aptamer development pipeline from initial discovery to functional application.
The universality of this single-round machine learning technique was demonstrated through analysis of another fibrogenic target, fibroblast activation protein (FAP). Here, this method identifies a highly conserved core sequence that is hypothesized to form a G-quadruplex secondary structure stabilized by hairpin formation at the terminal region. This discovery was pivotal as it demonstrated the applicability of the approach to different structural motifs beyond stem-loops and extended its scope to more complex non-canonical nucleic acid topologies. Sequential cleavage and affinity optimization for the FAP aptamer further validated the broad applicability of this framework.
In subsequent validation experiments focused on the CD8 protein, the researchers observed that more than three-quarters of the sequences had an identified core motif and shared consistent secondary structure elements. The cleaved and optimized aptamer showed greater than 10-fold increased affinity while maintaining target specificity even within complex cellular environments. Remarkably, this structural characterization also facilitated the design of split aptamers, paved the way for de novo sequence generation, and ushered in a new era of synthetic biology toolkits employing deep learning for functional nucleic acid design.
Beyond immediate applications, this work highlights a transformative shift in the aptamer research paradigm, from iterative enrichment and experimental structure elucidation to data-driven computationally powered discovery. By revealing that single-round libraries inherently contain rich structural and functional information, this study fundamentally questions the notion that extensive screening cycles are necessary. Therefore, the introduction of high-throughput sequencing combined with advanced machine learning enables rapid and accurate deciphering of the nucleic acid folding landscape that determines target binding.
The implications of this work extend beyond aptamers to a broader biomolecular and therapeutic context. The developed machine learning algorithms can be adapted to explore non-coding RNA interactions with unprecedented resolution and model RNA-protein complexes. Furthermore, by facilitating an AI-driven virtual screening platform for nucleic acid ligands, this approach will pave the way for accelerating drug discovery pipelines and precision diagnostics, ultimately facilitating the development of next-generation nucleic acid-based therapeutics tailored for personalized medicine.
This research, supported by prominent funding agencies such as the National Natural Science Foundation of China, the “Pioneer” and “Leading Goose” research and development programs of Zhejiang Province, and the Strategic Priority Research Program of the Chinese Academy of Sciences, exemplifies the fusion of computational innovation and molecular biology. The publication of this paper in CCS Chemistry highlights the increasing role of interdisciplinary methodologies in advancing chemical science on the international front.
As the scientific community continues to harness the power of artificial intelligence in molecular design, the synergy of machine learning and high-throughput sequencing will bring new frontiers in aptamer technology. This integration not only improves our understanding of structure-function relationships, but also enables the rational engineering of nucleic acids with tailored functionality. Such advances will accelerate the translation of aptamers from the bench to the bedside, increasing their usefulness in diagnostics, therapeutics, and more.
Research subject: Not applicable
Article title: Single-round aptamer discovery using machine learning: elucidating the structure-function principles of target binding
News publication date: January 7, 2026
Web references:
https://www.chinesechemsoc.org/journal/ccschem
http://dx.doi.org/10.31635/ccschem.025.202506736
Image credit: CCS Chemistry
keyword
machine learning
Tags: Aptamer-target binding dynamics Breakthroughs in nucleic acid research Breakthroughs in structural biology Machine learning in aptamer research Machine learning techniques for biochemistry Nucleic acid aptamer discovery methods Open access research in chemistry Optimizing high-affinity aptamers Overcoming challenges in aptamer optimization Secondary structure analysis of aptamers Advances in SELEX methods Transformative approaches in aptamer identification
