Creating Detailed 3D Models from Images: How AI Frameworks Will Change the Game

Three-dimensional (3D) modeling has become important in many fields such as architecture and engineering. A 3D model is a computer-generated object or environment that can be manipulated, animated, and rendered from different perspectives to provide a realistic visual representation of the physical world. Creating 3D models can be time consuming and costly, especially for complex objects. However, recent advances in computer vision and machine learning have made it possible to generate 3D models and scenes from a single input image.

3D scene generation involves using artificial intelligence algorithms to learn the underlying structure and geometric properties of an object or environment from a single image. This process usually consists of two stages. The first stage extracts the object’s shape and structure, and the second stage generates the object’s texture and appearance.

In recent years, this technology has gained a lot of attention in the research community. A classical approach to 3D scene generation involves learning the features and properties of a scene viewed in two dimensions. In contrast, the new approach takes advantage of differentiable rendering and can compute the gradient or derivative of the rendered image with respect to the input geometry parameters.

🚀 Check out 100’s of AI Tools at the AI Tools Club

However, all of these techniques are often developed to address this task for a specific category of objects, providing 3D scenes with limited variance, such as minor changes in terrain representation. increase.

New approaches for 3D scene generation have been proposed to address this limitation.

The goal is to create natural scenes with unique characteristics arising from the interdependence between the shape and appearance of the constituent elements. The idiosyncratic nature of these features makes it difficult for the model to learn general shape properties.

In similar cases, an exemplar-based paradigm is used that manipulates a good exemplar model to build a richer target model. Therefore, for this technique to be effective, the exemplar model should have similar characteristics to the target model.

However, having different sample scenes with specific characteristics makes it difficult to create ad-hoc designs for all scene types.

The proposed approach therefore utilizes patch-based algorithms that have been used long before deep learning. The pipeline is shown in the following diagram.

Specifically, a multi-scale generative patch-based framework is employed, using a generative patch nearest neighbor (GPNN) module to maximize bidirectional visual summarization between inputs and outputs.

This approach utilizes a grid-based emissive field, the prenoxel, known for its impressive visual effects, to represent the input scene. Its regular structure and simplicity give it advantages over patch-based algorithms, but it requires certain key designs to be implemented. Specifically, the exemplar pyramids are constructed through a coarse-to-fine training process of the prenocell on images of the input scene, rather than simply downsampling a high-resolution pre-trained model. Furthermore, the high-dimensional, boundless and noisy features of prenoxel-based explaments at each level are transformed into well-defined and compact geometric and appearance features, improving robustness and efficiency in subsequent patch matching. To do.

Furthermore, this work employs different representations for the composition process within the generative nearest neighbor module. Patch matching and blending work simultaneously at each level, gradually compositing intermediate value-based scenes and finally transforming them into coordinate-based equivalent scenes.

Finally, using patch-based algorithms on voxels can result in high computational demands. Pyramids therefore utilizes an exact-to-approximate patch nearest-neighbor field (NNF) module to keep the search space within a manageable range while making minimal compromises to the optimality of the visual overview.

The results obtained by this model are reported below for some random images.

This is an overview of a new AI framework that enables high-variance image-to-3D scene generation. If you’re interested, read more about this technique at the link below.

Please check paper and plan. don’t forget to join 21,000+ ML SubReddit, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email me. Asif@marktechpost.com

🚀 Check out 100’s of AI Tools at the AI Tools Club

Daniele Lorenzi has a master’s degree. He completed his Bachelor of Science in ICT in Internet and Multimedia Engineering from the University of Padua, Italy in 2021. he has his Ph.D. He is a candidate for the Institute of Information Technology (ITEC) at the Alpine Adriatic University (AAU) Klagenfurt. He currently works at the Christian Doppler Laboratory ATHENA and his research interests include adaptive video streaming, immersive his media, machine learning and QoS/QoE evaluation.

Source link