Tencent has released a new artificial intelligence (AI) model called InstantMesh that can render 3D objects using still photos. The new AI model is an upgrade to the company's older Instant3D framework, using a combination of multi-view diffuse and sparse-view reconstruction models based on the Large Scale Reconstruction Model (LRM) architecture. Masu. Tencent also open-sourced the InstantMesh model and provided a preview app that allows enthusiasts to test its functionality and generate and export 3D renderings.
The company has published a preprint version of the research paper on arXiv. In particular, arXiv does not conduct peer reviews, so it is difficult to say whether the model has been evaluated. However, the company has already made its AI model available open source on Hugging Face, so developers can test its efficiency. For enthusiasts, there's also an app view where you can add your photos and watch them turn into 3D renderings. When we tested the platform with Gadgets 360, we found that renders were created within 10 seconds, as the company claims. However, the rendering quality felt quite low. User X (formerly known as X) posted a video of using an AI model. You can see the results below.
🤯Tencent's InstantMesh is insane – super fast image conversion with high quality output
⬇️ Link below – Generate a 3D model from one image for free in 30 seconds 🔥🔥 pic.twitter.com/Dft4xF3vQm
— Victor M (@victormustar) April 15, 2024
When it comes to the technology behind its AI models, the company uses two different architectures: a multi-view diffusion model and an LRM architecture. The former helps to process images as input and generates various dimensions that are not visible in the image. LRM constructs trajectory view objects that can be experienced in a 3D environment.
According to Tencent, InstantMesh solves the Janus problem in the world of 3D rendering. The Janus problem is a phenomenon in the 3D rendering space where a model must be created by “imagining” different aspects of a reference object, resulting in multiple standard views of the object rather than a combined 3D object. The company solved this problem using a new view generator tweaked from Stable Diffusion.
The research paper also shares benchmark scores compared to various existing models, including the recently launched Stability AI's Stable Video 3D. Based on the scores, InstantMesh performed better than SV3D on Google Scanned Objects (GSO) and OmniObject3D (Omni3D) trajectory views. In some parameters of the Omni3D benchmark (equivalent to output resolution) he found SV3D to be better, but Tencent said this was intentional. “We believe that perceptual quality is more important than fidelity because a 'true novel perspective' should be unknown and there are multiple possibilities given a single image as a reference. “It is important,” the company said.
