Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision

Abstract

We introduce Cloth-Splatting, a method for estimating 3D states of cloth from RGB images through a prediction-update framework. Cloth-Splatting leverages an action-conditioned dynamics model for predicting future states and uses 3D Gaussian Splatting to update the predicted states. Our key insight is that coupling a 3D mesh-based representation with Gaussian Splatting allows us to define a differentiable map between the cloth's state space and the image space. This enables the use of gradient-based optimization techniques to refine inaccurate state estimates using only RGB supervision. Our experiments demonstrate that Cloth-Splatting not only improves state estimation accuracy over current baselines but also reduces convergence time by ~85%.

Left, yellow: GNN prediction of the mesh for time t. Right, green: Updated mesh.

Cloth Representation

Our representation uses a triangular mesh to capture geometry, with 3D Gaussians placed on the mesh faces to model visual appearance. Each Gaussian's position is defined relative to the mesh vertices. By applying 3D Gaussian Splatting (GS) for image rendering, we establish a differentiable mapping between the cloth's state space and the observation space.

State Estimation

Given the mesh representation, we can address the problem of estimating the 3D state of the cloth using a prediction-update framework akin to Bayesian filtering. Starting with a previous state estimate and a known robotic action, Cloth-Splatting predicts the next state using a learned dynamics model of the cloth (GNN). Leveraging the rendering loss provided by GS, the state estimate is then iteratively updated using gradient-based optimization of the GS rendering loss.

Tracking

Manipulation

We show a proof of concept of using Cloth-Splatting in a manipulation pipeline. For this, we use the pre-trained GNN as dynamics model, MPC for planning of the robotic actions, and Cloth-Splatting for updating the state estimate after each action. We compare our method (MPC-CS) against the following baselines: a FIXED trajectory, an open-loop (MPC-OL) baseline that plans the best actions in an open-loop fashion, and an oracle baseline that has access to the ground truth state of the cloth at each time step (OL-ORACLE).

FIXED

MPC-OL

MPC-CS (Traj. 1)

MPC-CS (Traj. 2)

Additional Video Results

BibTeX

@inproceedings{
  longhini2024clothsplatting,
  title={Cloth-Splatting: 3D State Estimation from {RGB} Supervision for Deformable Objects},
  author={Alberta Longhini and Marcel B{\"u}sching and Bardienus Pieter Duisterhof and Jens Lundell and Jeffrey Ichnowski and M{\r{a}}rten Bj{\"o}rkman and Danica Kragic},
  booktitle={8th Annual Conference on Robot Learning},
  year={2024},
  url={https://openreview.net/forum?id=WmWbswjTsi}
}

Acknowledgments

This work was supported by the Swedish Research Council; the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation; the European Research Council (ERC-884807); and the Center for Machine Learning and Health (CMLH). The computations were enabled by the the Pittsburgh Supercomputing Center and by the Berzelius resource provided by the Knut and Alice Wallenberg Foundation at the Swedish National Supercomputer Centre.