High-throughput screening (HTS) generates data at a scale that fundamentally shapes the analytical choices available to drug discovery teams. The field of AI vs statistical screening has moved from an academic discussion to a practical question for laboratories that must decide how to process, triage, and interpret the millions of data points generated in a single campaign. Historically, HTS data analysis relied on well-defined statistical methods that offered predictability and ease of validation; today, machine learning (ML) approaches are increasingly integrated into workflows alongside these established techniques. Understanding the strengths and limitations of each is essential for teams seeking to maximize hit quality and minimize attrition.
The scale of modern HTS campaigns – where compound libraries of hundreds of thousands or even millions of molecules are tested against a biological target – creates both the opportunity and the necessity for more sophisticated data analysis strategies. Traditional statistical frameworks were designed to handle the variability inherent in assay systems, but they operate within explicit assumptions about data distributions and thresholds that AI-based approaches can, in some contexts, circumvent. The choice of analytical method is not binary; many contemporary screening pipelines use statistical quality control (QC) and ML-based hit prioritization in combination, reflecting the complementary nature of the two approaches. For a broader discussion of AI integration in HTS, see AI and Machine Learning in High-Throughput Screening and High-Throughput Screening: Principles, Applications and Advancements.
The statistical foundation of HTS data analysis
Traditional statistical analysis in HTS is built on a small number of robust, widely adopted metrics. The Z′-factor, introduced by Zhang et al. in 1999, remains the primary measure of assay quality in primary screens. It captures the separation between positive and negative control distributions, accounting for both the signal window and the variability of each population.1 An assay with a Z′-factor greater than 0.5 is generally considered suitable for large-scale screening, while values below this threshold indicate that the assay needs optimization before compound testing begins.
Beyond assay quality, traditional statistical analysis addresses the hit identification problem through percent inhibition thresholds, median absolute deviation (MAD)-based normalization, and B-score corrections for systematic plate effects. Malo et al. demonstrated that these robust preprocessing steps, combined with replicate measurements, significantly improve the sensitivity and specificity of hit identification in primary screens.2 These methods require minimal computational infrastructure and produce outputs that are directly interpretable: a compound either exceeds the defined activity threshold or it does not.
For dose–response analysis following primary hits, nonlinear regression of the four-parameter logistic (4PL) model is the standard approach. This yields concentration–response parameters, such as the IC₅₀ and Hill coefficient, that provide mechanistic information interpretable within a pharmacological framework. The transparency of these calculations is an important regulatory consideration: statistical outputs can be audited, reproduced independently, and explained in submission documents in ways that some ML models currently cannot.
Machine learning approaches to HTS data analysis
ML methods applied to HTS data fall into two broad categories: those that improve hit identification within a single screen, and those that use historical screening data to predict bioactivity before or alongside experimental testing. In the first category, gradient boosting algorithms have shown strong performance for quantitative structure–activity relationship (QSAR) modeling of HTS outputs. Boldini et al. conducted a systematic comparison of gradient boosting variants across 94 endpoints, finding that ensemble methods consistently outperformed simpler regression models while retaining reasonable interpretability through feature importance scores.3
In the second category, deep learning architectures—particularly graph neural networks (GNNs) – learn representations of molecular structure directly from graph data, enabling predictions that generalize across chemical space without requiring manually engineered molecular descriptors.4 These models excel at identifying nonlinear structure–activity relationships that fixed statistical thresholds are structurally unable to capture. A landmark study by Wallach et al. demonstrated that a convolutional neural network applied across 318 drug targets successfully identified novel hits across all major therapeutic areas and protein classes, including targets without known binders or high-quality crystal structures.5
ML approaches also offer specific advantages in false-positive detection. Boldini et al. described a gradient boosting-based method, termed minimum variance sampling analysis, that simultaneously detects assay interferents and prioritizes true bioactive compounds within a single HTS dataset, requiring no prior knowledge of the interference mechanism and completing analysis in under 30 seconds per assay on low-resource hardware.6 This type of data-driven interferent detection complements, rather than replaces, the rule-based filters that have underpinned traditional QC pipelines.
Interpretability, transparency, and regulatory context
One of the most significant distinctions between traditional statistical methods and AI approaches in HTS data analysis is the ease with which results can be explained. Statistical metrics are grounded in explicitly defined mathematical relationships; the Z′-factor, for instance, is a simple function of group means and standard deviations. Decisions made on the basis of such metrics can be documented fully in validation reports, presented to regulatory bodies, and reproduced exactly given the same input data.
Deep learning models present a different transparency profile. The complex internal representations learned by GNNs and other deep architectures are not inherently human-readable, creating what is commonly described as a ‘black-box’ problem. Explainable AI (XAI) techniques, including attention mechanisms, SHAP (SHapley Additive exPlanations) values, and gradient-based saliency maps, have been developed to provide post-hoc interpretability, but the additional analytical step represents a nontrivial overhead. A 2022 review of XAI methods in biomedical data science concluded that the trade-off between model performance and interpretability remains an active area of research, with no universal solution.7
Regulatory guidance for AI-derived decisions in drug discovery is still evolving, and established statistical outputs remain the preferred documentation format in most submission contexts. This does not preclude the use of ML in hit identification; rather, it suggests that ML outputs are often best presented as supporting evidence alongside, rather than instead of, traditional statistical validation data.
Data requirements, limitations, and practical trade-offs
The performance of ML models in HTS data analysis is strongly dependent on the quality and quantity of available training data. Statistical methods, by contrast, can extract meaningful results from a single screen with well-characterized controls, making them more robust in early-stage projects where historical data are sparse. When training sets are small, imbalanced, or biased toward particular chemical scaffolds, ML models may generalize poorly or introduce systematic errors that are difficult to detect without extensive validation.
A comprehensive review of AI/ML methodologies across the drug discovery pipeline noted that data quality, heterogeneity, and the lack of standardization across institutions remain primary obstacles to full-scale adoption of AI in HTS analysis.8 Initiatives such as the MF-PCBA dataset have been developed specifically to provide large-scale, standardized HTS data for model training and benchmarking, but access to equivalent proprietary datasets remains unequal across industry and academic settings.
Computational cost represents a further practical consideration. Standard statistical analysis of HTS data requires only common laboratory software and modest hardware, while training and deploying deep learning models typically demands GPU resources and specialized expertise. For organizations running occasional or low-volume screening campaigns, the infrastructure investment required to implement ML-based analysis may not be justified by the marginal gains over well-executed statistical approaches.
Table 1. Comparison of traditional statistical analysis and AI/ML approaches for HTS data analysis.
|
Attribute |
Traditional statistical analysis |
AI/ML approaches |
|
Primary objective |
Identify hits above a fixed threshold; assess assay quality |
Learn nonlinear activity patterns; prioritize hits by predicted bioactivity |
|
Key metrics |
Z′-factor, percent inhibition, signal-to-noise ratio, IC₅₀ |
Area under the ROC curve, precision–recall, RMSE, feature importance scores |
|
Data volume handled |
Scales well to tens of thousands of compounds with standard hardware |
Scales to millions of compounds; benefits from large training sets |
|
Interpretability |
High; results linked directly to assay controls and defined thresholds |
Variable; deep learning models may require explainability tools |
|
False-positive handling |
Rule-based filters and Z-score cutoffs; fixed thresholds can miss context |
Learned detection of interferents without prior knowledge of interference mechanism |
|
Regulatory acceptance |
Well-established; widely accepted in validation frameworks |
Emerging; regulatory guidance for AI-derived decisions is still evolving |
|
Data requirements |
Minimal; performs well on a single screen with controls |
Requires historical training data; performance degrades with sparse or biased datasets |
|
Computational cost |
Low; standard statistical software is sufficient |
Moderate to high; GPU resources and model development expertise may be required |
Hybrid workflows and the role of AI vs statistical screening in practice
The most productive analytical environments in contemporary HTS are not those that have selected a single approach, but those that have integrated statistical and AI methods at appropriate stages of the screening workflow. Statistical QC – including Z′-factor calculation, plate normalization, and threshold-based hit calling – provides the foundation that validates assay quality and ensures that downstream analyses, whether statistical or AI-based, begin from reliable primary data.
AI tools are then applied most effectively in stages where their ability to detect complex patterns adds value: secondary screening triage, interference detection, structure–activity relationship (SAR) exploration, and virtual compound prioritization. A landmark prospective evaluation spanning 318 targets demonstrated that a deep learning-based virtual screening model successfully identified novel hits across all major therapeutic areas and protein classes tested – including targets lacking known binders or high-quality crystal structures – with empirical results supporting computational methods as a viable alternative to physical HTS.5
Hybrid workflows also allow teams to leverage the interpretability of statistical outputs for regulatory purposes while using ML predictions to guide experimental prioritization decisions internally. This division of function – statistical methods for auditable quality control, AI methods for pattern recognition and prioritization – reflects the current state of the field and is likely to remain the dominant paradigm while regulatory guidance for AI-derived outputs continues to develop.
Key considerations when selecting an analysis approach for HTS:
• Assay quality should always be assessed using established statistical metrics, including Z′-factor and signal-to-noise ratio, regardless of which downstream analysis method is used.
• ML models require training data; where historical HTS data are unavailable or limited, traditional statistical methods remain the more reliable primary analysis tool.
• Gradient boosting and GNN-based approaches can improve false-positive detection and hit prioritization, particularly in campaigns where known interference patterns are absent from the compound library.
• Model interpretability should be evaluated before deploying AI-derived predictions; XAI tools can help bridge the gap between high-performance models and explainable outputs.
• Regulatory documentation should retain statistical validation data as primary evidence of assay suitability, with AI outputs included as supplementary analytical information where applicable.
AI and statistical methods in HTS: an evolving analytical partnership
The question of AI vs statistical screening in HTS data analysis is best understood not as a competition but as a division of labor across the screening workflow. Traditional statistical frameworks, including the Z′-factor, MAD-based normalization, and four-parameter logistic dose–response fitting, provide the transparent, auditable quality control that underpins any reliable screening campaign. AI and ML approaches extend these capabilities into pattern recognition, compound prioritization, and interference detection at a scale and complexity that classical statistics cannot address alone.
The expanding body of comparative literature indicates that deep learning and gradient boosting models can match or exceed traditional methods in hit prioritization when sufficient training data are available, and that the two approaches are increasingly deployed together in mature screening organizations. As training datasets grow, XAI tools mature, and regulatory frameworks evolve to accommodate AI-derived evidence, the integration of machine learning into HTS data analysis workflows is expected to deepen rather than replace the statistical foundations that have supported the field for decades.
References
1. Zhang JH, Chung TDY, Oldenburg KR. A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen. 1999;4(2):67–73. doi: 10.1177/108705719900400206
2. Malo N, Hanley JA, Cerquozzi S, Pelletier J, Nadon R. Statistical practice in high-throughput screening data analysis. Nat Biotechnol. 2006;24(2):167–175. doi: 10.1038/nbt1186
3. Boldini D, Grisoni F, Kuhn D, Friedrich L, Sieber SA. Practical guidelines for the use of gradient boosting for molecular property prediction. J Cheminform. 2023;15(1):73. doi: 10.1186/s13321-023-00743-7
4. Duvenaud D, Maclaurin D, Aguilera-Iparraguirre J, et al. Convolutional networks on graphs for learning molecular fingerprints. Adv Neural Inf Process Syst. 2015;28:2224–2232. doi: 10.48550/arXiv.1509.09292
5. Wallach I, Bernard D, Nguyen K, et al. AI is a viable alternative to high throughput screening: a 318-target study. Sci Rep. 2024;14(1):7526. doi: 10.1038/s41598-024-54655-z
6. Boldini D, Friedrich L, Kuhn D, Sieber SA. Machine learning assisted hit prioritization for high throughput screening in drug discovery. ACS Cent Sci. 2024;10(4):823–832. doi: 10.1021/acscentsci.3c01517
7. Han H, Liu X. The challenges of explainable AI in biomedical data science. BMC Bioinformatics. 2022;22(Suppl 12):443. doi: 10.1186/s12859-021-04368-1
8. Ferreira FJN, Carneiro AS. AI-driven drug discovery: a comprehensive review. ACS Omega. 2025;10(23):23889–23903. doi: 10.1021/acsomega.5c00549
This content includes text that has been created with the assistance of generative AI and has undergone editorial review before publishing. Technology Networks’ AI policy can be found here.
