COLMAP-Free 3D Gaussian Splatting

Our COLMAP-Free 3D Gaussian Splatting approach successfully synthesizes photo-realistic novel view images efficiently, offering reduced training time and real-time rendering capabilities, while eliminating the dependency on COLMAP processing.

While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes.

BibTeX

Due to some technical issues, you may cannot find our paper via Google Scholar. Please use the following BibTeX entry to cite our paper.


                @InProceedings{Fu_2024_CVPR,
                    author    = {Fu, Yang and Liu, Sifei and Kulkarni, Amey and Kautz, Jan and Efros, Alexei A. and Wang, Xiaolong},
                    title     = {COLMAP-Free 3D Gaussian Splatting},
                    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
                    month     = {June},
                    year      = {2024},
                    pages     = {20796-20805}
                }

We show novel views from a Bezier curve which is fitted to estimated camera poses.

We show the synthesized images from testing views.

We visualise the estimated pose trajectories against ground truth trajectories.

Recently, OpenAI released Sora, a text-to-video generative model that creates rich and realistic videos from text instructions. We run our COLMAP-Free 3DGS reconstruction pipeline on a few released videos from Sora. Here are videos of our reconstruction and a live demo of the generated 3D Gaussian Splatting models. These encouraging results are demonstrates the potential of generating infinite 3D data from a few lines of text. We thank Ge Yang for developing the visualization toolkit VUER.

COLMAP-Free 3D Gaussian Splatting

Abstract

BibTeX

Approach

Novel View Synthesis

Camera Pose Estimation

Reconstructing 3D scenes from Sora Videos

Video