Our goal is to reconstruct scenes with stochastic, incoherent motion such as leaves moving in the wind, that can be particularly challenging because of small objects with similar appearance that move independently. Previous dynamic 3D Gaussian Splatting solutions either represent motion implicitly with neural networks achieving good quality but lower framerate, or explicitly with a function, often with higher training times and lower quality. To overcome these limitations, we propose an explicit method that introduces adaptive space-time densification and smoother optimization. We introduce a new densification approach based on error moments that are used to guide primitive splitting, and we adaptively refine the number of keyframes used based on the variance of error. We observe that dynamic reconstruction from monocular video is hard for standard optimization pipelines. To counter this, we introduce a weighted Adam approach that improves results based on primitive visibility. Finally, to handle the hard case of independent motion of similar-looking objects, we introduce an image-driven as-rigid-as-possible regularization. Our method has higher quality than previous explicit solutions, and has significantly higher framerate for rendering.
We augment the 3D Gaussian Splatting representation with per-primitive splines that represent motion. Each Gaussian has an associated keyframe list which is queried with time to give a displacement and rotation. We use the pixel error of the renderings to guide our spatio-temporal densification, which adaptively refines both the Gaussians and their keyframes.
Standard gradient-based densification is hard to adapt to our space-time setting. Instead, we introduce a splitting strategy guided by the reconstruction error. We formulate the first and second moments of the 2D pixel error for each Gaussian and combine them across views into a 3D estimate. This allows us to place new Gaussians precisely where they are needed, rather than randomly, leading to faster convergence and better quality.
Gaussians are initialized with a single keyframe each. We adaptively add keyframes over time based on the variance of the temporal error — assigning finer temporal granularity to parts of the scene with complex or fast-changing motion, while keeping simple regions compact.
Dynamic reconstruction from monocular video is challenging because at each training iteration many observations are invalid due to temporal changes. We introduce a visibility-weighted Adam variant that accounts for how often each primitive is observed, producing more stable and accurate optimization.
To handle the hard case of independent motion among similar-looking objects (e.g., leaves), we introduce an image-driven as-rigid-as-possible (ARAP) regularization that constrains local motion consistency without requiring user interaction or external priors such as depth or optical flow.
@article{tzathas2026adaptive,
title = {Adaptive Spatio-Temporal {3D} Gaussian Splatting for Scenes with Oscillatory Motion},
author = {Tzathas, Petros and Hu, Jeffrey and Meuleman, Andr\'{e}as and Cordonnier, Guillaume and Drettakis, George},
journal = {Computer Graphics Forum},
volume = {45},
number = {2},
year = {2026},
publisher = {Eurographics - The European Association for Computer Graphics and John Wiley \& Sons Ltd.}
}
This work was funded by the European Union, European Research Council (ERC) Advanced Grant NERPHYS, number 101141721 https://project.inria.fr/nerphys. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them. Experiments were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr). The authors would also like to thank Adobe and NVIDIA for software and hardware donations.