We propose Cloth-Splatting, a method to estimate 3D states of cloth from RGB supervision by combining 3D Gaussian Splatting (GS) with an action-conditioned dynamics model.
The key idea of our method is to represent the 3D state of the cloth as a mesh and create a differentiable mapping between the cloth state space and the observation space using GS.
This is achieved by populating the mesh faces with 3D Gaussians and expressing their positions relative to the mesh vertices.
Given this, we can address the problem of estimating the 3D state of the cloth using a prediction-update framework akin to Bayesian filtering.
Starting with a previous state estimate and a known robotic action, Cloth-Splatting predicts the next state using a learned dynamics model of the cloth (left, yellow).
This prediction is then updated using RGB observations (right, green), leveraging the rendering loss provided by GS, allowing the refinement of the state estimate using visual clues such as texture and geometry.