This AI paper proposes a self-supervised music understanding model called MERT that achieves overall SOTA performance on 14 MIR tasks

Self-supervised learning is used prominently in artificial intelligence to develop intelligent systems. Transformation models such as BERT and T5 have recently gained popularity due to their excellent properties, exploiting the idea of self-monitoring in natural language processing tasks. These models are first trained on large amounts of unlabeled data and then fine-tuned on labeled data samples. Self-supervised learning has been successfully used in many fields such as speech processing, computer vision, and natural language processing, but its application to music audio still needs to be explored. The reason for this lies in the limitations associated with the field of music modeling musical knowledge such as musical timbre and pitch characteristics.

To address this problem, the research team introduced MERT, which stands for “Massive Self-Supervised Training Model of Music Understanding”. This acoustic model was developed with the idea of using a supervised model to generate pseudo-labels in a pre-training stage by means of Masked Language Modeling (MLM). By integrating the teacher model, MERT helps her student model, the transformer encoder of her BERT approach, to understand and understand the model music audio better.

This generalizable and affordable pre-trained acoustic-musical model follows the self-supervised learning paradigm of speech by incorporating a multitasking paradigm that employs a supervised model to balance acoustic and musical representation learning. , to generate a pseudo-target for a continuous audio clip. To enhance the robustness of the learned representations, MERT introduced an intra-batch noise mixture augmentation technique. This technique distorts audio recordings by combining them with random clips, challenging the model to pick up relevant meaning even from obscure situations. This addition enhances the model’s ability to generalize to situations where music may be mixed with unrelated audio.

🚀 Check out 100’s of AI Tools at the AI Tools Club

The team devised a highly effective combination of teacher models that outperformed all conventional speech and phonetic methods. This group includes Acoustic Teacher based on Residual Vector Quantization-Variational Autoencoder (RVQ-VAE) and Music Teacher based on Constant Q Transform (CQT). The sound teacher utilizes her RVQ-VAE to provide discretized sound level summaries of music signals to capture acoustic signatures. Based on the CQT, music teachers focus on capturing the tonal and interval aspects of music. These teachers work together to guide student models to learn meaningful representations of music audio.

The team also considered settings to deal with pre-training instability of the acoustic language model. By optimizing these settings, we were able to scale the MERT parameters from 95M to 330M, resulting in a more powerful model capable of capturing the intricate details of music audio. Upon evaluation, the experimental results demonstrated the effectiveness of his MERT in generalizing to a variety of music comprehension tasks. This model achieved his SOTA scores on 14 different tasks, demonstrating its strong performance and generalization ability.

In conclusion, the MERT model addresses a gap in applying self-supervised learning to music audio.

please check out Paper and Github link. don’t forget to join 23,000+ ML SubReddit, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email me. Asif@marktechpost.com

🚀 Check out 100’s of AI Tools at the AI Tools Club

Tanya Malhotra is a final year student at the University of Petroleum and Energy Research, Dehradun, graduating with a Bachelor of Science in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
A data science enthusiast with good analytical and critical thinking, she has a keen interest in learning new skills, leading groups, and managing work in an organized manner.

➡️ The Ultimate Guide to Data Labeling in Machine Learning

Source link

binance Registrera dig commented on New Podcast Exploring A.I. and Business Travel: Thank you for your sharing. I am worried that I la
注册以获取100 USDT commented on Two divergent skills that matter in an AI world: Math and business development: Can you be more specific about the content of your
Linda Espey commented on Revolutionizing safety and seamless journeys: This was a fantastic and informative article! I re
skapa ett binance-konto commented on The humor of French slang: Thank you for your sharing. I am worried that I la
Binance commented on The Smartest Man Who Ever Lived: Can you be more specific about the content of your

This AI paper proposes a self-supervised music understanding model called MERT that achieves overall SOTA performance on 14 MIR tasks

Leave a Reply

RECENT POSTS

Oracle adds AI native builder for Fusion applications

Quantum Machine Learning Market Growth Trajectory to 2035: Hardware Scaling and Hybrid Systems Drive Expansion – News and Statistics

Did Meta use AI in its layoff decisions?

Related Posts

Leave a Reply