Neuralangelo: #NVIDIA Unveils #AI Model for Lifelike 3D Reconstruction from 2D Video Clips

In a breakthrough development, NVIDIA Research has unveiled Neuralangelo, a cutting-edge AI model that utilizes neural networks for 3D reconstruction. This groundbreaking technology has the remarkable ability to transform 2D video clips into highly detailed 3D structures, generating realistic virtual replicas of real-world objects such as buildings, sculptures, and more.

Similar to the artistic genius of Michelangelo sculpting lifelike visions from blocks of marble, Neuralangelo crafts intricate 3D structures with astonishing detail and texture. These 3D objects can be seamlessly imported into design applications, allowing creative professionals to further enhance and utilize them in various fields including art, video game development, robotics, and industrial digital twins.

What sets Neuralangelo apart from previous methods is its exceptional capability to translate complex material textures—such as roof shingles, glass panes, and smooth marble—from 2D videos into high-quality 3D assets. This advancement surpasses previous techniques and significantly facilitates the rapid creation of usable virtual objects by developers and creative professionals, utilizing smartphone footage.

Ming-Yu Liu, senior director of research and co-author of the paper, commented, "The 3D reconstruction capabilities Neuralangelo offers will be a huge benefit to creators, helping them recreate the real world in the digital realm. This tool will eventually enable developers to import detailed objects, whether they are small statues or massive buildings, into virtual environments for video games or industrial digital twins."

During a captivating demonstration, NVIDIA researchers showcased Neuralangelo's ability to recreate a wide range of objects, from the iconic statue of David by Michelangelo to commonplace items like a flatbed truck. Moreover, the AI model can effectively reconstruct both the interiors and exteriors of buildings, as demonstrated by a detailed 3D model of the park at NVIDIA's Bay Area campus.

Neuralangelo employs instant neural graphics primitives, the underlying technology of NVIDIA Instant NeRF, to accurately capture repetitive texture patterns, homogeneous colors, and strong color variations—addressing previous limitations in AI models for 3D scene reconstruction.

The process begins with the model analyzing a 2D video captured from various angles. It selects several frames that provide different viewpoints, simulating an artist's perspective when considering a subject from multiple sides to grasp its depth, size, and shape. Based on the camera positions of each frame, Neuralangelo's AI generates an initial 3D representation of the scene, analogous to a sculptor chiseling the basic shape of a subject. The model then optimizes the rendering process to refine the details, much like a sculptor carefully carving stone to mimic the texture of fabric or the human figure.

The result is a final 3D object or large-scale scene that finds practical applications in virtual reality environments, digital twins, and robotics development.

NVIDIA Research will showcase Neuralangelo and nearly 30 other projects at the Conference on Computer Vision and Pattern Recognition (CVPR), taking place from June 18th to 22nd in Vancouver. These research papers cover a wide range of topics, including pose estimation, 3D reconstruction, and video generation.

Among the projects is DiffCollage, a diffusion method that enables the creation of large-scale content, including panoramic landscape orientations, 360-degree views, and looped-motion images. By treating smaller images as sections of a larger visual, similar to assembling a collage, DiffCollage allows diffusion models to generate cohesive-looking large content without requiring training on images of the same scale.