Meta, the parent company of Facebook and Instagram, has demonstrated a new artificial intelligence (AI) model that can “crop” any object in any image or video with a single click. Amazingly, the company has also open sourced it.
The AI is a “Segment Anything Model” or SAM that can use various input prompts to tell it what to segment and react to it in real time.As gizmodo The report points out that there are a wealth of existing AI-powered clipping or replacement systems already on the market. For example, content-aware fills in Adobe Photoshop and Apple’s ability to “lift and drop” subjects from photos and drop them into text. He is two notable examples, but here he does what Meta suggests is a bit different and striking.
Once the image is computed, the AI does a very good job of isolating the main objects in the image. For example, in a live demo you can ask the user to show everything they perceive as individual objects to illustrate what the technology is doing. Larger images don’t always show as much detail as each person in a large cityscape, but most objects are relatively easy to select.
When subjects reach a certain size, the AI gets smart enough to see those parts even if they’re not perfectly in focus.
The demo includes a set of images, but users can also upload their own samples to try with the system. For example, enter the photo of Tarantula Nebula from the James Webb Space Telescope, it is very busy and full of spots of unremarkable light. On the flip side, he uploaded a picture of two bikers from his EOS R6 II review with more impressive results.
Note that this only took a few seconds. This adds to the impressive nature of this technology.
“SAM’s advanced capabilities are the result of training on millions of images and masks collected using a model-in-the-loop ‘data engine’. Researchers used SAM and its data to interactively annotate images and update models. This cycle was repeated over and over to improve both the model and the dataset,” he explains Meta.
“After annotating a sufficient number of masks with the help of SAM, we were able to leverage SAM’s sophisticated ambiguity-aware design to annotate new images completely automatically. To do this, we present the SAM with a grid of points on the image and ask the SAM to segment everything at each point.Our final dataset contains about 11 million It contains over 1.1 billion segmentation masks collected from licensed and privacy-preserving imagery from .
Meta states that multiple masks can be output, even if the subject is somewhat vague. This is especially impressive given the capabilities of the base model designed to demonstrate the technology. Meta says it intends to make the system “promptable” so that it can receive input from a variety of sources, from where the user wearing the VR headset is looking to text descriptions. increase.
The full paper describing this technology is published on Meta’s AI website. But perhaps the most exciting thing, given the company behind it, is that the software is open source. The full dataset powering SAM is also available for download from Meta. Also available on Github.