Nearly one in five people experience benign laryngeal voice disorders, often manifesting as dysphonia and potentially indicating an underlying health problem, but new approaches promise more effective diagnosis. Mohsen Anabestani, Samira Aghadoust and colleagues at Weill Cornell Medical College and Tehran University of Medical Sciences have developed an artificial intelligence system that accurately classifies these diseases using only sustained vowel recordings. The team built a hierarchical machine learning framework. The framework consistently outperforms existing AI models by first identifying pathological sounds, then broadly classifying them, and finally distinguishing between specific structural, inflammatory, and functional conditions. This achievement demonstrates the potential of acoustic biomarkers as a scalable, non-invasive method for early detection, improved diagnostic workflows, and continuous monitoring of vocal health.
Detection and classification of speech disorders using machine learning
This study details a machine learning framework for detecting and classifying voice disorders and addresses the need for objective tools to complement traditional subjective assessments. The system uses algorithms such as convolutional neural networks, extreme learning machines, and deep learning techniques to analyze the acoustic features of speech to identify patterns that indicate medical conditions. The experiment utilized 15,132 recordings from 1,261 speakers and demonstrated the potential of machine learning to accurately detect and classify speech disorders. Certain acoustic features, such as pitch variations, have been found to be particularly important for accurate diagnosis, and this study reveals that continuous speech provides valuable information to improve model performance. Although various algorithms have shown promise, research highlights that no single approach is universally optimal and that reproducibility remains a key challenge for the field. The ultimate goal is to create tools that assist, rather than replace, clinicians in areas such as early screening, quality of life assessment, and analysis of voice changes associated with conditions such as COVID-19.
Machine learning diagnoses benign voice disorders
This study presents a new machine learning framework for classifying benign laryngeal voice disorders, which affect nearly 1 in 5 people. Scientists have developed a hierarchical system that automatically classifies eight different voice disorder types alongside healthy controls using acoustic features extracted from sustained vowels. This framework operates in three stages that mirror the clinical workflow. It begins with a screening process to distinguish between pathological and non-pathological speech, integrating convolutional neural network analysis and 21 interpretable acoustic biomarkers. Subsequent stages stratify speech into broader groups, refine the classification, and clearly improve the differentiation of structural and inflammatory disorders compared to functional states. The system consistently outperforms standard multiclass classifiers and pre-trained speech models, achieving high accuracy by combining deep spectral representations with interpretable acoustic features. This represents a significant advancement in digital health technology, providing a scalable, non-invasive tool for early screening, diagnostic triage, and continuous monitoring of vocal health.
Diagnosing laryngeal speech disorders using deep learning
This study presents a new machine learning framework for classifying benign laryngeal voice disorders that affect a large proportion of the population. The team developed a system that mimics clinical practice, first identifying pathological voices, then classifying them into broad types, and finally diagnosing specific diseases. By integrating deep learning analysis of audio recordings with established acoustic biomarkers, this framework achieves high accuracy in differentiating different conditions, particularly improving the discrimination between structural and inflammatory diseases. The system exhibits strong ability to differentiate voice disorders and achieves high diagnostic performance. This indicates that even short, sustained vowels contain substantial and reliable information for distinguishing between diverse laryngeal conditions. Importantly, this framework outperforms more general speech and speech models and highlights the value of tailoring analysis methods to the specific characteristics of clinical speech. This scalable, non-invasive approach has broad potential for early screening, clinical assessment, and continuous monitoring of vocal health, ultimately contributing to reducing the impact of dysphonia.
👉 More information
🗞 Hierarchical classification of benign laryngeal voice disorders from sustained vowels based on AI-driven acoustic voice biomarkers
🧠ArXiv: https://arxiv.org/abs/2512.24628
