LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS • Sinjoy Saha

Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang, “LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS”, NeurIPS, 2024.

Quick Summary

Point-based 3D-Gaussian Splatting (3D-GS) [1] improved the performance of novel view synthesis and enabled real-time rendering which Neural Radiance Fields (NeRF) could not achieve. However, it comes with very high storage costs and slower rendering speeds due to over-parameterization from dense Gaussians. This paper addressed the speed and storage issues of 3D-GS by implementing compact and less redundant representations. The authors achieve 15\(\times\) reduction in storage on average while maintaining visual fidelity and increased FPS of 237.

Ideas, Approach and Results

The main ideas of the paper are three fold:

Gaussian Pruning and Recovery - removes redundant Gaussians that have minimal impact on quality and a recovery step for smoothness, thus reducing storage while preserving visual quality. The authors note that [1] uses Gaussian Densification to increase the Gaussian count, thus increasing the coverage of small details of a scene and enhancing recontruction quality. However, this needs significantly more storage. The authors draw inspiration from neural network pruning. The naive approach of using criteria such as opacity to measure impact of Gaussians might reduce the essential scene details. So, the authors propose calculating global significance of gaussians based on how many times each gaussian is hit (in all training views) and aggregating other factors such as opacity contribution, volume and transmittance values. Volume factor is normalized by the 90% largest values and clipped between 0 and 1.
Spherical Harmonic (SH) Distillation - reduces the complexity of the spherical harmonic coefficients while preserving essential lighting and appearance details improving compression efficiency. Here, the authors note that in [1] \(\sim\)80% of a Gaussian Splat representation is taken up by the Spherical Harmonics coefficients which is responsible for the “shininess” and specular reflections of the scene. The authors apply knowledge distillation technique to tackle this, where the teacher is the full-degree SHs and student is the lower-degree SHs and is trained using the mean-squared-L2 loss of pixels intensities between the two.
Vector Quantization - The authors also use Vector Quantization to store the Gaussians at a lower bit-width, again boosting compression efficiency. However, even here they balance between compression and quality and use the precomputed significance scores from the previous step to apply quantization to the least significant Spherical Harmonics.

The paper shows similar PSNR and SSIM scores on Mip-NeRF and Tank and Temple datasets with significant reduction in storage and improvement in speed.

Comparison, Strengths and Weaknesses

The three main techniques used in the paper are pruning redundant Gaussians, distilling the SHs and quantization to achieve efficient compression while maintaining visual quality.

The authors show that a naive criteria for gaussian pruning does not work well and thus, proposes the global significance calculation. However, calculation involves multiple factors and hyperparameters (like \(\beta\) and “90%” largest Gaussians) which might be difficult to tune for other scenes.

The SH distillation step aims at reducing a storage by using a teacher-student model while preserving the harmonics responsible for “shininess” and reflections.

The quantization step involves a K-means clustering step which again requires a hyperparameter K (number clusters or codes) to be tuned.

Questions/Issues

The paper achieves extremely good results on the datasets mentioned and outperforms [1] on storage significantly and thus FPS, which enables real-time novel view synthesis. However, I am curious to know if hyperparameter tuning is required for new scenes and how can we efficiently tune these hypperparameters mentioned above in a real-time basis. Also, the volume normalization parameter (90%) seemed arbitrary and I’m curious if there exists any method to find that for new datasets.

References

[1] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, George Drettakis, “3D Gaussian Splatting for Real-Time Radiance Field Rendering”, ACM Transactions on Graphics, July 2023.