DIP-GS: Deep Image Prior For Gaussian Splatting Sparse View Recovery

Gaussian Splatting Deep Image Prior (DIP) represenrtation for 3D sparse view recovery

Preprint 2025

Rajaei Khatib

Raja Giryes

Tel Aviv University

Paper

</Code>

Overview

In this paper, we propose DIP-GS, a Deep Image Prior (DIP) 3DGS representation. By using the DIP prior, which utilizes internal structure and patterns, with coarse-to-fine manner, DIP-based 3DGS can operate in scenarios where vanilla 3DGS fails, such as sparse view recovery. Note that our approach does not use any pre-trained models such as generative models and depth estimation, but rather relies only on the input frames. Among such methods, DIP-GS obtains state-of-the-art (SOTA) competitive results on various sparse-view reconstruction tasks, demonstrating its capabilities

DIP-GS general scheme: First, the method starts by running vanilla 3DGS to get initial Gaussians. Next, DIP fitting and post-processing are applied sequentially.

DIP-GS components at a given noise level. (a) - First, the mean's network \( f_{\theta_{\mu}}^{\mu} \) is initialized by minimizing the point cloud Chamfer Distance between its output \( {\mu}\), which is mapped from the noise \(\tilde{ {z}}\), and the initial Gaussians means \( {\mu}_{init}\). (b) - Second, the scale's network \(f_{\theta_{s}}^{s}\) is initialized by fitting the output scale channel \( {s}\), which is mapped from the noise \(\tilde{ {z}}\), to the estimated scale guess \( {s}_{est}\). (c) - Next, the DIP optimization, in which \(f_{\theta}\) maps \(\tilde{ {z}}\) to the Gaussian features, and \(\theta\) is learned by minimizing the render loss alongside other regularizations. (d) - The post-processing stage, where the Gaussians are initialized by the output of the DIP \(f_{\theta}\) that was trained in the previous stage. At each step, the method chooses a frame either from the sparse input views or one of the target views.