Implicit Neural-Representation Learning for Elastic Deformable-Object Manipulations

Korea Advanced Institute of Science and Technology
RSS 2025

*Indicates the corresponding author

Experiment with INR-DOM: Implicit Neural-Representation Learning for Elastic Deformable-Object Manipulations

Abstract

We aim to solve the problem of manipulating deformable objects, particularly elastic bands, in real-world scenarios. However, deformable object manipulation (DOM) requires a policy that works on a large state space due to the unlimited degree of freedom (DoF) of deformable objects. Further, their dense but partial observations (e.g., images or point clouds) may increase the sampling complexity and uncertainty in policy learning. To figure it out, we propose a novel implicit neural-representation (INR) learning for elastic DOMs, called INR-DOM. Our method learns consistent state representations associated with partially observable elastic objects reconstructing a complete and implicit surface represented as a signed distance function. Furthermore, we perform exploratory representation fine-tuning through reinforcement learning (RL) that enables RL algorithms to effectively learn exploitable representations while efficiently obtaining a DOM policy. We perform quantitative and qualitative analyses building three simulated environments and real-world manipulation studies with a Franka Emika Panda arm.

Overall Architecture

inr-dom overall architecture
An overview of INR-DOM framework that aims to train the occlusion-robust state representation encoder $\Phi_\phi$, parameterized by $\phi$, of deformable objects (DOs) as well as the manipulation policy $\pi$. The training framework consists of two stages: 1) The first stage pre-trains a PointNet-based partial-to-complete variational autoencoder $(\Phi_\phi, \Psi)$ that embeds a partial point cloud $\mathbf{p}$ of a target DO into a latent embedding $\mathbf{z}$ and recovers the parameters $\mathbf{\theta}$ of an implicit signed distance field (SDF) network $\Omega_\theta$. This stage predicts full geometries leveraging two loss functions $\mathcal{L}_{\text{SDF}}$, $\mathcal{L}_{\text{skel}}$, along with three regularization loss functions: $\mathcal{L}_{\text{KL}}$, $\mathcal{L}_{\text{weight}}$, and $\mathcal{L}_{\text{cns}}$. 2) The second stage then improves the task-relevant representation power of the encoder $\Phi_\phi$ by jointly optimizing reinforcement learning (blue) with the loss $\mathcal{L}_{RL}$ and the contrastive learning (red) with the loss $\mathcal{L}_{\text{infoNCE}}$.

Results

Quantitative results

inr-dom result 0
Fig. 1. Comparison of point-cloud reconstruction performance for both seen and unseen types of partially observable rubber bands.
inr-dom result 1
Fig. 2. Distribution of $2\cdot 10^4$ embeddings from random manipulation dataset used in the pre-training
inr-dom result 2
Fig. 3. Comparison of accumulated reward curves during training between INR-DOM and baseline models.
inr-dom result 3
Table 1. Comparison of task success rates $[\%]$ across three simulated environments, based on the evaluation of 100 trials per environment.

Qualitative results

inr-dom result 4
Fig. 4. Comparison of occlusion-robust reconstruction performance between INR-DOM and Point2Vec. (Top) Point-cloud inputs of partially observable elastic bands. (Middle) Point clouds reconstructed by Point2Vec. (Bottom) SDF-based Meshes from INR-DOM.

real-world demonstration