Open AI Releases Shap E: Conditionally Generative Models for 3D Assets

Machine Learning


https://arxiv.org/pdf/2305.02463.pdf

Over the past few months, the popularity of generative AI has gradually increased. From multiple organizations to his AI researchers, everyone is discovering the enormous potential of generative AI for creating their own original content. With the introduction of the Large Language Model (LLM), many tasks have become convenient. Models like DALL-E, developed by OpenAI, that allow users to create realistic images from text prompts, are already being used by over a million users. This text-to-image generative model generates high-quality images based on input text descriptions.

For 3D image generation, a new project was recently released by OpenAI. This conditionally generative model, called Shap·E, is designed to generate 3D assets. Unlike traditional models that produce a single output representation, Shap E produces implicit function parameters. These functions can be rendered as textured meshes or Neural Radiance Fields (NeRF), allowing you to generate versatile and realistic 3D assets.

During Shap·E training, the researchers first trained the encoder. The encoder takes her 3D assets as input and maps them to the implicit function’s parameters. This mapping allows the model to fully learn the basic representation of her 3D assets. A conditional diffusion model was then trained using the output of the encoder. Conditional Diffusion Models generate diverse and complex 3D assets by learning conditional distributions of implicit parameters from given input data and sampling from the learned distributions. The diffusion model was trained using a large dataset of paired 3D assets and their corresponding text descriptions.

🚀 Check out 100 AI Tools in the AI ​​Tools Club

Shap-E includes an Implicit Neural Representation (INR) of the 3D representation. An implicit neural representation encodes a 3D asset by mapping 3D coordinates to location-specific information such as density and color to represent the 3D asset. It provides a versatile and flexible framework for capturing detailed geometric properties of 3D assets. The two types of INR discussed by the team are:

  1. Neural Radiance Field (NeRF) – NeRF represents a 3D scene by mapping coordinates and directions to densities and RGB colors. NeRF can render from any perspective, enabling realistic, high-fidelity scene rendering and can be trained to match ground truth rendering.
  1. DMTet and its extension GET3D – These INRs are used to represent textured 3D meshes by mapping coordinates to colors, signed distances, and vertex offsets. These features allow us to build 3D triangle meshes in a differentiable way.

The team shared some examples of Shap E results. This includes 3D results for text prompts. This includes bowls of food, penguins, voxelized dogs, campfires, avocado-like chairs, and more. The resulting model trained on Shap E shows the excellent performance of the model. You can generate high-quality output in just seconds. For evaluation, we compared Shap E with another generative model called Point E. This produces an explicit representation on the point cloud. In comparison, Shap E showed faster convergence and achieved comparable or better sample quality, despite modeling a higher-dimensional and polyrepresentative output space.

In conclusion, Shap·E is an effective and efficient 3D asset generation model. This looks promising and is an important addition to the contribution of generative AI.


check out research papers, inference code, and sample. don’t forget to join Our 20k+ ML SubReddit, cacophony channeland email newsletterWe share the latest AI research news, cool AI projects, and more. If you have any questions about the article above or missed something, feel free to email me. Asif@marktechpost.com

🚀 Check out 100 AI Tools in the AI ​​Tools Club

Tanya Malhotra is a final year student at the University of Petroleum and Energy Research in Dehradun with a Bachelor of Science in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
A data science enthusiast with good analytical and critical thinking, she has a keen interest in learning new skills, leading groups, and managing work in an organized manner.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *