Researchers at Drexel University and Michigan State University have demonstrated a program designed to use AI and computer vision to provide athletic form coaching, with the goal of preventing injuries and improving outcomes.
As any athlete will tell you, perfect practice makes perfect. However, maintaining good form can be difficult for those who don’t have regular access to a coach or trainer. In fact, during the COVID-19 pandemic, when many people were exercising at home, the U.S. Consumer Product Safety Commission reported a 48% increase in injuries related to home exercise. Researchers at Drexel University and Michigan State University have developed a prototype program that uses artificial intelligence and computer vision to analyze video and provide form coaching in real-time, in hopes of preventing some of these injuries and extending the professional guidance of coaches.
The program is designed to integrate biomechanical modeling with computer vision and vision language models to provide live, personalized feedback and explanation of the guidance provided during exercise. This has proven difficult for most fitness coaching apps. The researchers presented their findings before presenting a prototype, called BioCoach, at a conference on computer vision and pattern recognition sponsored by the Institute of Electrical and Electronics Engineers of Japan and the Computer Vision Foundation in June.
video: BioCoach – Visual Intelligence Lab
“Many people who exercise at home using videos or apps do not receive high-quality assessments of their movements,” said Dr. Feng Liu, an assistant professor in Drexel University’s College of Engineering and Computing, who led the study. “Feedback is often too general or simply too encouraging, but without any real form guidance. Our goal with BioCoach is to provide timely, specific cues based on body movement that approximate the kind of guidance a knowledgeable coach would give.”
Fenn’s Visual Intelligence Lab at Drexel applies advanced computer vision, machine learning, and 3D human modeling to study problems in exercise instruction, clinical gait assessment, and classroom education.
To prepare for BioCoach, the team began creating exercise video coaching benchmarks. The publicly available Qualcomm Exercise Video Dataset (QEVD) contains hundreds of hours of exercise videos with time-stamped coaching feedback.
Feedback included only short instructional comments such as “lower your body.” So the researchers created a new version, re-annotating it with more detailed biomechanical targets, such as “increase elbow flexion to 90 degrees at the bottom.” A short rationale for guidance has also been added, such as “Increase hip/knee flexion to distribute the load.”
In total, the team added more than 2,400 notes to the more than 200 videos used to train and test BioCoach. These annotations helped us prepare a large-scale language model that provides coaching and guidance to users. Additionally, because timestamps are stored in the annotated dataset, this new benchmark allows researchers to evaluate not only the guidance provided by the system, but also whether the system responded in a timely manner.
Leveraging an improved exercise video feedback dataset, the team designed BioCoach to analyze each video through two complementary information streams to provide users with access and appropriate guidance.
One stream uses 3D convolutional neural networks, deep learning programs that excel at identifying individual objects in images and videos, to capture visual appearance and movement patterns. The other allows BioCoach to estimate 3D skeletal motion and body shape, giving the program access to information about joint angles, ranges of motion, and movement stages.
By accessing these information streams, BioCoach can access structured biomechanical data specific to each joint. This means that before providing feedback, we first identify the most relevant joints for each exercise (for example, hips, knees, ankles for squats, shoulders, elbows, wrists for push-ups, etc.) so we can provide more detailed guidance.
Through this process, the program can also use body shape information and movement quality analysis to provide structured information that the language model translates into specific biomechanical-based feedback.
“Our goal was to build a system that did more than examine pixels and generate general comments,” Liu says. “BioCoach exposes the model to 3D motion, joint angles, and exercise-specific constraints, so the feedback can point out specific movement issues and explain why they are important.”
After preparing the program, the team set out to test it against top competitors: video language AI programs from NVIDIA, ByteDance, Alibaba, Salesforce, OpenAI, Shanghai Jiao Tong University, Chinese University of Hong Kong, Peking University, China’s Peng Cheng Lab, and research and development teams at the Massachusetts Institute of Technology.
They tested the programs by viewing numerous practice videos for each program. Some of it was from the original QEVD set, and some was annotated by the team. Each program’s responses were compared to the responses provided in the original QEVD dataset and those added by researchers, and scored based on whether they were timely, accurate, and detailed.
When responding to videos from the original dataset, BioCoach outperformed its closest competitor, Stream-VLM (a program created by researchers at MIT and NVIDIA), in judging text quality and accuracy, but its timing scores were close but slightly lower.
However, when we evaluated the feedback from the dataset with more specific annotations, it outperformed Stream-VLM on all metrics, with particular improvements in biomechanical accuracy and detailed anatomy-specific feedback.
The researchers suggest that these results show that adding explicit 3D kinematics and biomechanical context can improve the quality and interpretability of real-time motor feedback without significantly reducing responsiveness.
“It was encouraging to see BioCoach perform so well against programs written by top researchers and companies in the AI field,” said Feng. “Although this is still a prototype, we can see how combining computer vision with structured biomechanical reasoning will make AI coaching systems more useful and easier to test.”
The research team plans to continue their work by enhancing the program to be able to estimate joint reaction forces and muscle activation patterns from videos to detect subtle compensatory movements that can cause injury during exercise.
“We believe this work can ultimately support exercise and physical therapy apps that extend the expertise of human coaches and trainers between in-person sessions,” Liu said. “Future systems could allow users to receive more specific and timely feedback as they practice on their own, while staying in the loop with human experts.”
This research was supported by the National Science Foundation.
In addition to Otori, Yuyang Ji and Yixuan Shen are also from Drexel. Dr. Shengjie Zhu and Dr. Yu Kong from Michigan State University contributed to this research.
Read the full paper here: https://arxiv.org/abs/2603.26938
