August 14, 2024

Answering Your Questions About NeRFs and Gaussian Splats

Getting back to questions left unanswered in a recent Geo Week News webinar.

3D Technology

In the middle of last month, Geo Week News hosted a fascinating webinar in which three experts dove into everything relating to NeRFs and Gaussian Splatting. Each of these 3D rendering techniques are relatively new, allowing users to create 3D models of real-world assets much more easily than more traditional methodologies like photogrammetry. The webinar, entitled NeRFs and Gaussian Splatting: The Future of 3D, featured presentations from each of the following three panelists:

Jonathan Stephens, Chief Evangelist & Marketing Director, EveryPoint
Yoshi Sato, Founder, Waldek Technologies
Ted Parisot, Co-Founder, Helios Visions

These presentations were then followed by a question and answer segment featuring questions from myself as well as the audience. However, as the webinar attracted a tremendous amount of interest, we were only able to get to a few of the questions, leaving many unanswered. Fortunately, Stephens and Sato were able and willing to go back and answer some of those we were not able to get to in the actual webinar.

Below, you can find the answers to these questions. Additionally, we posted an article with some of the key takeaways from the presentations and subsequent conversation. And of course, make sure you grab your free registration for an on-demand copy of the webinar recording below.

Register Here

Without further ado, here are the answers from Stephens and Sato, labeled JS and YS.

What are the use cases for Gaussian Splats beyond visualizations, or future use cases you are thinking of?

(JS) - Gaussian splatting and radiance fields in general fall into a category of research called novel view synthesis. Visualization of a scene is the intended output. However, machine vision applications are utilizing this technology to train autonomous systems. For example, NVIDIA announced NeRF support in their Omniverse platform to enable rapid world building for robotics simulation. A robotics team could recreate a warehouse or urban streetscape visualization as a NeRF and add meshes, collision, layers, etc. to simulate real world scenarios for Sim2Real transfer.

What does “model clean up” look like for Gaussian Splats? Is this something that can be done in the web editor?

(YS) - One common issue with 3D Gaussians training is "floaters" i.e., noisy off-surface 3D Gaussians generated due to estimation errors. Gauzilla Pro allows you to select and delete these floaters manually (in a future version I'm planning to implement auto floater deletion).

(JS) - SuperSplat is another popular open source browser-based gaussian splat editor. Just drag your gaussian splat PLY file into a browser running SuperSplat and start editing. Primarily you would be deleting floating artifacts or removing areas of the scene you do not wish to preserve. For example, you created a splat of a sculpture and you did not want to preserve the background beyond the focused object.

How do you manage to process large scenes using Gaussian Splatting, considering the substantial resources required? Do you split the scenes and run them in parallel on multiple GPUs, or do you utilize more powerful machines?

(JS) - A few examples:

Nerfstudio (the most popular open source gaussian splat software) just added multiple GPU support.
Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets - this is a recently published project that trains a large scene into several chunks and uses techniques to seamlessly blend the sub-models together.

(YS) - There are some academic research projects that develop multi-GPU parallel 3D Gaussian training solutions for large-scale scenes (eg. Hierarchical 3D Gaussian, Grendel).

Were the visualizations in the presentations from drone videos or images?

(JS) - I used a mix of ground and drone based examples. The ground based captures were all captured using an iPhone. The drone example I had of a telecom tower was captured using photos from a Mavic 3E in an ascension spiral pattern with the camera pitched down 35 degrees.

Is a stereo camera required?

(YS) - No, you can use essentially any monocular RGB video camera with a sufficient resolution and quality (ideally 4k). Make sure you do not change the zoom factor or exposure while capturing the video.

Can you use a large number of high-resolution images to create better point clouds and combine them with a few lower resolution images or video to create Gaussian Splats?

(JS) - Excellent question - you are generating a sparse point cloud for a starting point to optimize the gaussian splat training. The “quality” of the point cloud has little effect on the final quality of the gaussian splats. What is more important is the accuracy of the camera poses. I would suggest using lower resolution images for the sparse point cloud and camera pose training (SfM), then swap the images for the higher resolution version during the gaussian splat training.

How essential are sequential images to successful Gaussian Splat processing?

(YS) - Because of 3DGS' dependence on COLMAP (SfM) it is highly recommended to capture a continuous-trajectory video (rather than discrete cut-scenes) because SfM requires overlapping feature points in the 2D images (eg. SIFT, SURF, ORB) to construct point clouds and camera pose estimation.

(JS) - I agree with Yoshi. Sequential images are not mandatory, however, it has benefits in the SfM component of the gaussian splat processes. There is support for other SfM software now such as RealityCapture to get your camera poses and point cloud that are better suited to handle non-sequential images. If you know how to correctly capture images non-sequentially you can still achieve a high quality end result.

Are you able to use shape files like .gbl or .svg with Gaussian Splatting?

(JS) - No, not today.

What are the differences between NeRFs and Gaussian Splats?

(JS) - Both are methods for achieving novel view synthesis - generating new views of a scene start from a sparse set of input images. For example, I could have 300 images of a building and be able to view the building from 10,000 new locations with near photorealistic quality. What separates the two methods is the process of how they derive novel views.

Why are Gaussian Splats faster than other methods?

(YS) - Read my interview by RadianceFields.com :)

What is the preferred frames-per-second for 4K video collection?

(YS) - Whichever with a fast shutter speed works (to avoid motion blur, which results in noisy/erroneous 3D Gaussians).

What is the level of detail limited by? For example, in a large-scale construction site model, would it be possible to capture and represent a $1 coin in a 3D model?

(YS) - The density of 3D Gaussian cloud is not uniform across a scene. Due to the nature of how 3D Gaussians are trained, there's a higher density at a location which is captured in many of the input images. Level of Gaussians utilizes the LoD (level of detail) for 3D Gaussians in a large-scale scene.

If the photographs are geo-referenced and accurate to 2cm, can the resultant dataset be accurate to 2cm?

(YS) - Yes, as long as the initial point cloud prior for the 3D Gaussian training is accurate in scales. This is how the XGRIDS LiDAR 3DGS solutions work (generate a point cloud from LiDAR measurements). In other words, you need to generate such an accurate point cloud from photographs.

For COLMAP alternatives, do you prefer Reality Capture or Metashape, and why?

(YS) - RealityCapture is free to use (or at least start). Metashape is developed by a Russian company, so depending on your business's jurisdiction there might be a geopolitical/country risk.

Do the photos need to be the same as the photos that created the point cloud? I am envisioning a workflow wherein I perform a typical mapping mission, which results in our survey-quality point cloud, and capture a video after the survey that can be down-res’d with a suitable number of frames extracted to use with that point cloud to create a Gaussian Splat as an additional deliverable. Would that work?

(YS) - No, a point cloud is used only as the geometric prior for the 3D Gaussian training. As long as the areas of the point cloud and those of the photos do not deviate too much, it should work.

★★★

We'd like to extend our gratitude to both Jonathan and Yoshi for taking the time to answer these questions we were not able to get to during the original webinar. Remember, you can still sign up for a free, on-demand recording of the conversation by following the link below.

Register Here