This AI paper proposes a self-supervised music understanding model called MERT that achieves overall SOTA performance on 14 MIR tasks

Machine Learning


https://arxiv.org/abs/2306.00107

Self-supervised learning is used prominently in artificial intelligence to develop intelligent systems. Transformation models such as BERT and T5 have recently gained popularity due to their excellent properties, exploiting the idea of ​​self-monitoring in natural language processing tasks. These models are first trained on large amounts of unlabeled data and then fine-tuned on labeled data samples. Self-supervised learning has been successfully used in many fields such as speech processing, computer vision, and natural language processing, but its application to music audio still needs to be explored. The reason for this lies in the limitations associated with the field of music modeling musical knowledge such as musical timbre and pitch characteristics.

To address this problem, the research team introduced MERT, which stands for “Massive Self-Supervised Training Model of Music Understanding”. This acoustic model was developed with the idea of ​​using a supervised model to generate pseudo-labels in a pre-training stage by means of Masked Language Modeling (MLM). By integrating the teacher model, MERT helps her student model, the transformer encoder of her BERT approach, to understand and understand the model music audio better.

This generalizable and affordable pre-trained acoustic-musical model follows the self-supervised learning paradigm of speech by incorporating a multitasking paradigm that employs a supervised model to balance acoustic and musical representation learning. , to generate a pseudo-target for a continuous audio clip. To enhance the robustness of the learned representations, MERT introduced an intra-batch noise mixture augmentation technique. This technique distorts audio recordings by combining them with random clips, challenging the model to pick up relevant meaning even from obscure situations. This addition enhances the model’s ability to generalize to situations where music may be mixed with unrelated audio.

🚀 Check out 100’s of AI Tools at the AI ​​Tools Club

The team devised a highly effective combination of teacher models that outperformed all conventional speech and phonetic methods. This group includes Acoustic Teacher based on Residual Vector Quantization-Variational Autoencoder (RVQ-VAE) and Music Teacher based on Constant Q Transform (CQT). The sound teacher utilizes her RVQ-VAE to provide discretized sound level summaries of music signals to capture acoustic signatures. Based on the CQT, music teachers focus on capturing the tonal and interval aspects of music. These teachers work together to guide student models to learn meaningful representations of music audio.

The team also considered settings to deal with pre-training instability of the acoustic language model. By optimizing these settings, we were able to scale the MERT parameters from 95M to 330M, resulting in a more powerful model capable of capturing the intricate details of music audio. Upon evaluation, the experimental results demonstrated the effectiveness of his MERT in generalizing to a variety of music comprehension tasks. This model achieved his SOTA scores on 14 different tasks, demonstrating its strong performance and generalization ability.

In conclusion, the MERT model addresses a gap in applying self-supervised learning to music audio.


please check out Paper and Github link. don’t forget to join 23,000+ ML SubReddit, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email me. Asif@marktechpost.com

🚀 Check out 100’s of AI Tools at the AI ​​Tools Club

Tanya Malhotra is a final year student at the University of Petroleum and Energy Research, Dehradun, graduating with a Bachelor of Science in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
A data science enthusiast with good analytical and critical thinking, she has a keen interest in learning new skills, leading groups, and managing work in an organized manner.

➡️ The Ultimate Guide to Data Labeling in Machine Learning



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *