Model performance evaluation
Comprehensive evaluation of the machine learning architecture for track and field teaching optimization revealed exceptional predictive accuracy and robust generalization capabilities across all analytical domains. Rigorous statistical validation through stratified 5-fold cross-validation provided unbiased estimation of model performance on unseen data, with metrics computed on the held-out test partition (n = 62) to ensure reliable assessment of real-world applicability. Figure 3 presents the quantitative performance characteristics across eight analytical dimensions (including convergence and confusion matrix analysis), illustrating the framework’s efficacy for pedagogical optimization.


Performance evaluation of machine learning components for track and field teaching optimization.
The technique classification component exhibited exceptional discriminative capability across all track and field events, with F1-scores ranging from 0.88 ± 0.02 for shot put to 0.94 ± 0.01 for long jump as illustrated in Fig. 3A. The performance variation correlates inversely with the biomechanical complexity inherent in each event, where shot put techniques demonstrate greater inter-subject variability due to anthropometric influences. Event-specific model tuning produced statistically significant improvements (p < 0.01) for technically complex events such as hurdles and high jump, suggesting the efficacy of specialized parameter configurations for advanced movement pattern recognition.
The learning curve analysis depicted in Fig. 3B demonstrates remarkable data efficiency, with models achieving 90% of asymptotic performance with only 60% of the training data, indicating robust generalization capability even with constrained instructional examples. This characteristic is particularly valuable for real-world educational deployments where comprehensive data collection may be prohibitively resource-intensive. The model’s generalization capabilities were further validated through stratified cross-validation procedures, ensuring performance consistency across diverse student populations.
Quantitative outcome prediction demonstrated exceptional accuracy (Fig. 3D) with RMSE of 0.083 m and coefficient of determination (R²) of 0.978, confirming the model’s capacity to forecast performance improvements resulting from specific instructional interventions. Predictive accuracy remained consistent across the performance spectrum without significant heteroscedasticity at extreme values. The injury risk assessment module (Fig. 3C) achieved an AUC of 0.913, with notably high sensitivity (0.92) for identifying high-risk movement patterns that could predispose students to acute injuries during technical execution.
Model convergence analysis (Fig. 3E) revealed efficient optimization trajectories with minimal evidence of overfitting, as indicated by the close alignment between training and validation loss curves. All model components achieved convergence stability within 150 epochs, with the technique classification module exhibiting the most rapid convergence (87 epochs). Feature importance analysis (Fig. 3F) identified knee angulation parameters as the most predictive variables (normalized importance 0.92), followed by velocity metrics (0.78) and ground reaction forces (0.65). This hierarchical importance ranking provides valuable insights for instructional prioritization, suggesting pedagogical emphasis on joint kinematics rather than power production.
Model convergence and overfitting analysis
As illustrated in Fig. 3G, the validation loss closely tracks the training loss throughout the optimization process, with final values of 0.023 ± 0.004 and 0.027 ± 0.005 respectively, indicating minimal overfitting. Cross-validation variance remained below 0.03 across all folds, confirming model stability.
Comparison with alternative architectures
As detailed in Table 7, the proposed model’s comparative performance is clearly highlighted. The CNN-BiLSTM architecture demonstrates superior performance compared to alternative approaches, achieving 5.6% higher F1-score than Transformer models while requiring 51% less training time.
Confusion matrix analysis
Figure 3H presents the confusion matrix for technique classification across all events. Primary misclassifications occur between biomechanically similar techniques (e.g., sprint start vs. acceleration phase, 8.2% misclassification rate), with overall accuracy of 93.7%. These patterns provide valuable insights for instructors, highlighting areas requiring additional sensor resolution or alternative feedback mechanisms.
Comprehensive ablation studies removing individual model components demonstrated the synergistic nature of the ensemble architecture, with integrated performance exceeding individual component capabilities by 17.3% on average, validating the multi-module design approach underlying the pedagogical optimization framework.
Machine learning approaches for optimizing track and field instruction
The integration of machine learning algorithms with wearable sensor technology has demonstrated remarkable potential for revolutionizing track and field instruction methodologies. Our comprehensive analysis reveals that hybrid CNN-BiLSTM architectures achieve superior classification accuracy across multiple athletic events, with F1-scores consistently exceeding 0.88 across all disciplines. The ML-enhanced teaching protocols implemented in our experimental trials produced a 27.3% reduction in time-to-proficiency while simultaneously decreasing injury risk by 41.2% compared to traditional pedagogical approaches. This substantial improvement can be attributed to the precise biomechanical analysis facilitated by our multi-modal sensing framework and the adaptive optimization algorithms that continuously calibrate instructional content to individual learning trajectories. As illustrated in Fig. 4D (Feature Importance Analysis for Technical Proficiency), the feature importance analysis identified joint kinematics, particularly knee angulation parameters, as the most predictive variables for technical proficiency assessment, providing valuable insights for instructional prioritization. Our experimental validation involved 312 undergraduate participants across three academic semesters, generating 26,544 discrete technique execution instances captured through a distributed sensor network comprising 12 high-definition cameras and 24 wearable inertial measurement units. The implementation of a gradient-based reinforcement learning paradigm for content optimization enables dynamic adaptation to individual learning patterns through four methodological components.

Machine learning analysis for track and field teaching optimization.
Figure 4A (Technique Classification Performance across Track and Field Events) demonstrates the robust classification performance across various athletic disciplines, with the long jump achieving the highest F1-score (0.94). Figure 4B (Learning Curve Analysis Demonstrating Model Efficiency) reveals that our models achieve 90% of asymptotic performance with only 60% of the training data, indicating exceptional data efficiency. The injury risk assessment capabilities shown in Fig. 4C (ROC Curve for Injury Risk Assessment) achieved an impressive AUC of 0.913, enabling proactive intervention strategies before technical errors lead to potential injuries. As shown in Table 8, the experimental protocol implemented a comprehensive data collection framework across multiple athletic events. Longitudinal assessment of skill retention revealed statistically significant improvements (p < 0.01) for the experimental group compared to control subjects, with particularly pronounced benefits observed in technically complex events such as hurdles and jumping disciplines. Figure 4E (Performance Prediction Accuracy Comparison) demonstrates the exceptional predictive capability of our models (R² = 0.978), while Fig. 4F (Model Convergence Analysis for Training and Validation Sets) confirms the stability of our training approach with minimal evidence of overfitting. The biomechanical analysis framework established through this research provides a foundational methodology for quantitative evaluation of instructional effectiveness, with the integrated ensemble architecture outperforming individual component configurations by an average margin of 17.3% in predictive accuracy and educational outcome optimization.
As shown in Table 8, our experimental protocol employed a comprehensive data collection framework encompassing six primary track and field events. This multi-modal approach integrated various sensing technologies to capture the biomechanical complexity inherent in each discipline, with sophisticated technical parameters analyzed for each event. The extensive dataset comprising over 26,500 discrete execution instances provided a robust foundation for our machine learning models. Figure 4D identifies knee angle as the most influential biomechanical parameter (normalized importance 0.92), followed by velocity metrics (0.78) and ground reaction forces (0.65). This hierarchical ranking of feature importance has significant implications for instructional prioritization in track and field pedagogy. The experimental implementation of our ML-optimized teaching methodology demonstrates that intelligent integration of wearable technologies with advanced analytical algorithms can substantially enhance learning efficiency while simultaneously reducing injury risk. The framework represents a significant advancement in physical education methodology, establishing data-driven approaches for personalized instruction that adapt dynamically to individual learning patterns and physiological constraints.
Evaluation of system usability
Comprehensive evaluation of system practicality was conducted across multiple athletic facilities to assess implementation feasibility, user experience, and operational sustainability.

(A) User satisfaction by stakeholder group (B) System Response Time by Module (C) Learning Curve for System Adoption (D) Cost Analysis by System Component (E) Time Efficiency Comparison (F) Long-term Performance Benefits.
The intelligent teaching system demonstrated exceptional usability metrics across diverse stakeholder groups, with coaches reporting the highest satisfaction ratings (4.5/5.0) as illustrated in Fig. 5A (User Satisfaction by Stakeholder Group). System responsiveness analysis revealed differential latency characteristics across computational modules, with the machine learning analysis component exhibiting the highest processing time (205 ± 32ms) while maintaining sub-threshold latency for real-time instructional applications as shown in Fig. 5B (System Response Time by Module). The learning curve assessment depicted in Fig. 5C (Learning Curve for System Adoption) indicates rapid proficiency acquisition, with users achieving 85% operational competency after approximately 16 training hours, and notably accelerated adoption rates observed among coaching staff compared to student users. Implementation cost analysis detailed in Fig. 5D (Cost Analysis by System Component) reveals that server infrastructure and camera systems represent the primary capital expenditure components, while software licensing and maintenance constitute the dominant recurring operational costs. As detailed in Table 9, the comparative usability analysis across implementation environments demonstrates robust system performance across diverse instructional contexts, with particularly favorable metrics observed in university athletic facilities. The time efficiency comparison presented in Fig. 5E (Time Efficiency Comparison) quantifies substantial temporal advantages of the ML-enhanced system across all pedagogical workflow components, with particularly dramatic reductions in analysis (93% decrease) and feedback delivery (92% decrease) durations. Longitudinal performance benefits illustrated in Fig. 5F (Long-term Performance Benefits) demonstrate progressive divergence between traditional and ML-enhanced instructional outcomes, with cumulative advantages becoming increasingly pronounced beyond the third academic semester. The system demonstrated exceptional functional reliability with 99.2% uptime during the 16-week experimental period, requiring minimal maintenance interventions (3.2 h/month) and demonstrating robust resilience to environmental variability including adverse weather conditions during outdoor implementations.
As shown in Table 9, the implementation environment significantly influences system usability metrics across multiple dimensions. The intelligent teaching system exhibits optimal performance characteristics in university athletic facilities and elite training centers, with substantially higher system integration scores and cost-benefit ratios compared to high school environments. Figure 5 collectively illustrates the comprehensive practicality assessment across multiple evaluation dimensions, demonstrating that despite initial implementation complexities, the system delivers substantial operational efficiencies and pedagogical advantages that justify the capital investment and training requirements. The quantitative usability metrics confirm that the intelligent teaching system achieves the practical feasibility necessary for wide-scale adoption across diverse educational contexts, with particular suitability for higher education and elite training environments where technical infrastructure and support resources are more readily available.
Ablation study
A comprehensive ablation study was conducted to systematically evaluate the contribution of individual components within our machine learning framework for track and field teaching optimization. Six different model configurations were tested, sequentially removing key architectural components to quantify their specific contributions to overall system performance. As illustrated in Fig. 6A (Sprinting Technique Classification), the removal of LSTM units produced the most substantial performance degradation in sprinting technique classification, reducing F1-scores from 0.94 to 0.78 (−17.0%), highlighting the critical importance of temporal sequence modeling for capturing dynamic movement patterns. Similar patterns were observed for hurdles technique classification in Fig. 6B, where LSTM removal yielded a 19.8% decrease in classification accuracy. Gradient-boosted trees proved particularly important for long jump technique classification as shown in Fig. 6C, with their removal causing a 15.1% performance reduction. The ablation analysis for performance prediction revealed that CNN architectures contribute substantially to prediction accuracy, with their removal increasing mean absolute error by 97.6% (Fig. 6D) and reducing R² values from 0.967 to 0.892 (Fig. 6E). As detailed in Table 10, the cascading effects of component removal extend beyond performance metrics to impact educational outcomes, with technique acquisition time increasing significantly when key components are removed. The computational efficiency analysis in Fig. 6F reveals that while component removal generally reduces computational latency, the performance trade-offs are disproportionately severe relative to the modest processing time savings (maximum 13.2% latency reduction). Transfer learning components demonstrate particularly significant contributions to cross-event generalization, with their removal reducing transfer performance by 35.7% while only decreasing computational demands by 3.4%. Bayesian optimization components, while computationally intensive, prove essential for personalized difficulty calibration, with their removal significantly compromising the system’s ability to maintain students within optimal challenge zones as indicated by a 29.8% reduction in time spent in ideal learning states.
As shown in Table 10, the full model configuration demonstrates superior performance across all evaluation dimensions, with particularly substantial advantages in transfer performance and personalization quality metrics. Figure 6 provides a comprehensive visualization of key performance metrics across ablation configurations, clearly demonstrating the differential contributions of individual machine learning components to specific aspects of system performance. The convolutional layers prove essential for feature extraction from kinematic data, while LSTM units contribute substantially to temporal pattern recognition in dynamic movement sequences. The recursive structure of gradient-boosted trees facilitates complex decision boundaries for technique classification, while transfer learning mechanisms enable efficient cross-event knowledge application. The Bayesian optimization framework, while computationally intensive, provides critical uncertainty quantification for adaptive learning path generation. These findings validate our integrated ensemble architecture approach, demonstrating that the component synergies yield performance improvements that significantly exceed the capabilities of individual subsystems or simplified architectural configurations.

(A) Sprinting technique classification (B) Hurdles technique classification (C) Long jump technique classification (D) Performance prediction error (E) Performance prediction accuracy (F) Computational efficiency.
