New AI research introduces Recogize Anything Model (RAM), a robust base model for image tagging

For natural language processing (NLP) tasks, large language models (LLMs) trained on large online datasets perform exceptionally well. The Segment Anything Model (SAM) showed excellent zero-shot localization capabilities in computer vision (CV) by scaling up the data.

Unfortunately, SAM cannot generate semantic labels, which is a basic task equivalent to localization. Recognizing many labels for a single image is the goal of multi-label image recognition, also called image tagging. Image tagging is an important and useful computer vision problem because images contain different labels such as objects, landscapes, properties, and activities.

There are two main factors that prevent image labeling:

Collect a wide range of high-quality data. Efficient data annotation engines that can semi-automatically or automatically annotate a large number of photos across various categories are still lacking, as are standardized comprehensive labeling systems.
There are not enough open vocabularies and powerful models built using efficient and flexible model design that leverages large-scale weakly supervised data.

🚀 Check out 100’s of AI Tools at the AI Tools Club

Recognize Anything Model (RAM) is a robust base model for image tagging and has just been introduced by researchers at OPPO Research Institute, International Digital Economy Academy (IDEA), and AI2 Robotics. When it comes to data, RAM can overcome problems such as poor labeling systems, inadequate datasets, inefficient data engines, and architectural constraints.

Researchers start by creating a standard global naming convention. They use academic datasets (classification, detection, segmentation) and commercial tagging tools (Google, Microsoft, Apple) to power their tagging system. Combining all available public tags and common text-based tags, this labeling method produces 6,449 labels that collectively cover most use cases. The researchers say it is possible to recognize the remaining open vocabulary labels using open set recognition.

Automatically annotating large photos with a label system can be a daunting task. The proposed approach to image tagging is inspired by previous work in this area, using large-scale public image-text pairs to train robust visual models. To leverage these large amounts of image text data for tagging, the team used automated text semantic analysis to extract image tags. Using this method, you can retrieve a large set of image tags based on image-text pairs without resorting to manual annotation.

Internet source image and text combinations tend to be inaccurate due to random noise. The team created a data tagging engine to improve annotation accuracy. To solve the problem of missing labels, we adopt existing models to create complementary classifications. When dealing with mislabeled regions, identify specific sections in the image and associate them with distinct labels. We then use a region clustering technique to find and eliminate anomalies within the same category. Additionally, labels that make inconsistent predictions are also removed to obtain more accurate annotations.

RAM allows generalization to new classes by adding semantic context to label searches. The discriminating power of RAM is enhanced by this model architecture for any visual dataset, demonstrating its versatility. RAM introduces a new paradigm in image tagging by showing that generic models trained on noisy, unannotated data can outperform highly supervised models. RAM requires a free, public dataset without annotations. The most powerful version of RAM is 8 A100 GPUs and he only needs to train for 3 days.

According to the team, RAM still has room for improvement. This includes running the data engine over and over again, increasing the backbone parameters to improve model capacity, and extending the training dataset to his 14+ million photos to better target different regions. to cover.

please check out papers, projects, and Github. don’t forget to join 23,000+ ML SubReddit, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email us. Asif@marktechpost.com

🚀 Check out 100’s of AI Tools at the AI Tools Club

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data her science enthusiast and has a keen interest in the range of applications of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its practical applications.

➡️ Try: Criminal IP: AI-Based Phishing Link Checker Chrome Extension

Source link