Implicit Neural-Representation Learning for Elastic Deformable-Object Manipulations

Korea Advanced Institute of Science and Technology
RSS 2025

*Indicates the corresponding author

Abstract

We aim to solve the problem of manipulating deformable objects, particularly elastic bands, in real-world scenarios. However, deformable object manipulation (DOM) requires a policy that works on a large state space due to the unlimited degree of freedom (DoF) of deformable objects. Further, their dense but partial observations (e.g., images or point clouds) may increase the sampling complexity and uncertainty in policy learning. To figure it out, we propose a novel implicit neural-representation (INR) learning for elastic DOMs, called INR-DOM. Our method learns consistent state representations associated with partially observable elastic objects reconstructing a complete and implicit surface represented as a signed distance function. Furthermore, we perform exploratory representation fine-tuning through reinforcement learning (RL) that enables RL algorithms to effectively learn exploitable representations while efficiently obtaining a DOM policy. We perform quantitative and qualitative analyses building three simulated environments and real-world manipulation studies with a Franka Emika Panda arm.

Overall Architecture

inr-dom overall architecture
  • INR-DOM trains the occlusion-robust state representation encoder of deformable objects, and the manipulation policy.
  • The first stage pre-trains a PointNet-based partial-to-complete variational autoencoder using implicit surface reconstruction for geometric feature learning.
  • The second stage jointly optimizes the policy and the pre-trained encoder with reinforcement learning to enhance task-relevant representation power (blue), and updates the encoder using contrastive learning to capture episode correlations (red).

Results

Pre-training results

inr-dom result 0

Comparison of reconstruction performance for elastic bands

inr-dom result 1

(Top) Partial point cloud inputs, (Middle) Reconstructed by Point2Vec, (Bottom) SDF-based Meshes from INR-DOM

  • INR-DOM shows better reconstruction performance than other point cloud completion methods.
  • Implicit surface representation leverages our encoder to capture the continuous deformation of stretchable surfaces.
  • INR-DOM robustly represents surfaces in occluded and intertwined parts.

Fine-tuning results

inr-dom result 0

Distribution of embeddings from 10 disentanglement episodes

inr-dom result 1

Distribution of 20,000 embeddings from random manipulation dataset used in the pre-training.

  • Time-series contrastive learning makes our encoder cluster latent vectors for similar deformations
  • INR-DOM learns task-relevant representations and distinguishes visual ambiguity robustly.

Simulation results

Sealing

Sealing

Installation

Installation

Disentanglement

Disentanglement

Reward curves

Comparison of accumulated reward curves during training between INR-DOM and baseline models.

Task success rates

Comparison of task success rates [%] across three simulated environments, based on 100 trials per environment.

Real-world demonstration

Sealing
Installation
Our method vs. image-based method in the disentanglement under visual ambiguity.
  • We successfully transfer INR-DOM to real-world tasks.
  • INR-DOM successfully distinguishes deformed states, whereas image-based methods fail due to visual ambiguities at intersections.