We’ve heard for some time that machine learning (ML) is a technology that might completely disrupt the technology space, and the way it has been applied is already impressive, innovative and creative. At Geo Week this year, we showcased a few machine learning projects that learned how to detect cracks in concrete, or differentiate species of trees, or tell the difference between a curb and a driveway.
It is nearly impossible to fully imagine all the possibilities unlocked by applications of machine learning - and their infinitely complex neural network cousins - which is why I was both surprised and absolutely not surprised to see NVIDIA showing off their tools in the AI/ML space during GTC earlier this month.
But somewhat quietly tucked among the supercomputer processing chips and other flashy announcements is a piece of research that is a stunning revelation for the photogrammetry community.
The new model - called “Instant NeRF” is a neural rendering model that can learn a high-resolution 3D scene in seconds, and can render images from that scene in even less time.
So, what does that actually mean?
It means, in practice, that you can now take a few high-resolution photographs and use it recreates a three-dimensional scene. Nearly instantly.
If you consider the process of photogrammetry - carefully taking hundreds of photos and using high-intensity software to begin to mesh them together - an instantaneous version of this sounds more like magic more than reality.
Bringing AI into the picture certainly has sped things up. Early NeRF models rendered crisp scenes without artifacts in a few minutes, but still took hours to train.
But the new Isntant NeRF cuts rendering time by orders of magnitude. In a demonstration video published by NVIDIA, the tech was demonstrated to photograph a woman holding a camera. With only a handful of photos the model was fully rendered in only a few moments, in stunning clarity.
It works by essentially deconstructing the light within an image - like a reverse ray tracing. By using AI to approximate how light works in the real world, they can detect and understand the position of objects without depth sensors or other tech, creating a near perfect facsimile of a scene.
While this tool is not yet available to the public and is currently in a research phase, it opens up immense possibilities for photogrammetry software and processes.
From NVIDIA’s blog:
Collecting data to feed a NeRF is a bit like being a red carpet photographer trying to capture a celebrity’s outfit from every angle — the neural network requires images taken from multiple positions around the scene, as well as the camera position of each of those shots.
From there, a NeRF essentially fills in the blanks, training a small neural network to reconstruct the scene by predicting the color of light radiating in any direction, from any point in 3D space. The technique can even work around occlusions — when objects seen in some images are blocked by obstructions such as pillars in other images.
David Luebke, vice president for graphics research at NVIDIA, emphasized how game-changing this tool could become.
“Instant NeRF could be as important to 3D as digital cameras and JPEG compression have been to 2D photography — vastly increasing the speed, ease and reach of 3D capture and sharing.”
NVIDIA’s announcement included several potential use cases: to train robots and self-driving cars to understand the size and shape of real-world objects in real time with captured imagery, or to rapidly generate digital representations of real environments that creators can modify and build on, but this could be applied in even more places where photogrammetry would be too cumbersome or slow - for example in search and rescue planning, disaster response or training in real-world locations.
Brace yourself, photogrammetrists. Change is on the way.