Meta AI said in a statement that the model is an inertial measurement unit (IMU) that calculates not only text, images/video and audio, but also depth (3D), heat (infrared), motion and position.
ImageBind gives your machine a holistic understanding that connects objects in your photos with their sounds, 3D shapes, warmth, coldness and motion.
The company says ImageBind can outperform previous expert models trained individually for one specific modality, as described in a research paper. But most importantly, machines will be able to better analyze different forms of information together, helping advance AI.
“For example, Meta’s Make-A-Scene can use ImageBind to create images from audio, such as creating images based on the sounds of a rainforest or a busy market,” the company added.
“ImageBind is part of Meta’s effort to create multimodal AI systems that learn from all kinds of data around them. IMU sensor for designing or experiencing ,” said the artificial intelligence company.
