Academic Project Page

Implicit Neural-Representation Learning for Elastic Deformable-Object Manipulations

Korea Advanced Institute of Science and Technology
RSS 2025
^*Indicates the corresponding author

Abstract

We aim to solve the problem of manipulating deformable objects, particularly elastic bands, in real-world scenarios. However, deformable object manipulation (DOM) requires a policy that works on a large state space due to the unlimited degree of freedom (DoF) of deformable objects. Further, their dense but partial observations (e.g., images or point clouds) may increase the sampling complexity and uncertainty in policy learning. To figure it out, we propose a novel implicit neural-representation (INR) learning for elastic DOMs, called INR-DOM. Our method learns consistent state representations associated with partially observable elastic objects reconstructing a complete and implicit surface represented as a signed distance function. Furthermore, we perform exploratory representation fine-tuning through reinforcement learning (RL) that enables RL algorithms to effectively learn exploitable representations while efficiently obtaining a DOM policy. We perform quantitative and qualitative analyses building three simulated environments and real-world manipulation studies with a Franka Emika Panda arm.

Overall Architecture

INR-DOM trains the occlusion-robust state representation encoder of deformable objects, and the manipulation policy.
The first stage pre-trains a PointNet-based partial-to-complete variational autoencoder using implicit surface reconstruction for geometric feature learning.
The second stage jointly optimizes the policy and the pre-trained encoder with reinforcement learning to enhance task-relevant representation power (blue), and updates the encoder using contrastive learning to capture episode correlations (red).

Pre-training results

Comparison of reconstruction performance for elastic bands

(Top) Partial point cloud inputs, (Middle) Reconstructed by Point2Vec, (Bottom) SDF-based Meshes from INR-DOM

INR-DOM shows better reconstruction performance than other point cloud completion methods.
Implicit surface representation leverages our encoder to capture the continuous deformation of stretchable surfaces.
INR-DOM robustly represents surfaces in occluded and intertwined parts.

Fine-tuning results

Distribution of embeddings from 10 disentanglement episodes

Distribution of 20,000 embeddings from random manipulation dataset used in the pre-training.

Time-series contrastive learning makes our encoder cluster latent vectors for similar deformations
INR-DOM learns task-relevant representations and distinguishes visual ambiguity robustly.

Simulation results

Sealing

Installation

Disentanglement

Comparison of accumulated reward curves during training between INR-DOM and baseline models.

Comparison of task success rates [%] across three simulated environments, based on 100 trials per environment.

Real-world demonstration