VibeAct: Vibration to Actions for Contact-Rich Reactive Robot Dexterity

Anonymous

System Overview

VibeAct System Overview

VibeAct connects real vibrotactile sensing to simulation-based policy learning through an explicit intermediate representation of contact and slip. A tactile estimator infers this representation from microphone signals, and the policy learns to act on the same representation in simulation.

Abstract

Dexterous manipulation depends on contact events that are fast, local, and often visually occluded. Piezoelectric microphones offer a compact and high-bandwidth way to sense these interactions, but the resulting vibro-acoustic signals are difficult to simulate faithfully enough for end-to-end sim-to-real policy learning on dexterous robot hands. We propose VibeAct, a framework that bridges real vibrotactile sensing and simulation-based reinforcement learning through a shared physical representation of contact and slip. In the real world, we embed piezoelectric microphones into a dexterous robot hand and collect vibro-acoustic data through teleoperation, then replay the recordings in a calibrated digital clone to automatically label per-finger contact and slip. A tactile estimator learns to predict contact and slip from real microphone waveforms, while manipulation policies are trained in simulation on the same representation computed directly from simulated contacts. This decoupling lets policies exploit rapid tactile feedback without simulating raw audio. Across five contact-rich tasks spanning regrasping, in-hand reorientation, and insertion, VibeAct consistently outperforms a proprioception-and-point-cloud baseline in simulation, with the largest gains on tasks requiring sustained reactive control, where the continuous slip-magnitude channel proves the most informative observation. The learned policies transfer to a physical dexterous hand-arm platform, improving success rates on deployed tasks.

Contributions

  • A sim-to-real framework for vibrotactile dexterity based on a shared contact-and-slip representation that bridges real vibrotactile sensing and simulation-based policy learning.
  • A digital-clone data labeling pipeline that automatically generates per-finger contact and slip supervision from real-world demonstrations.
  • An empirical study showing that this representation serves as an effective tactile observation for reinforcement learning across contact-rich dexterous manipulation tasks.

Method

VibeAct Model Architecture

VibeAct first trains a tactile estimator on real-world data, which estimates contact and slip from vibro-acoustic signals using four independent per-finger subnetworks. It then trains reinforcement learning policies entirely in simulation using this representation together with point-cloud and proprioceptive observations, and deploys the resulting policies directly in the real world.

Tactile Estimator

For each finger, the tactile estimator maps raw vibro-acoustic signals to three physically grounded quantities: contact onset, slip presence, and slip magnitude. To train the estimator, we teleoperate the robot to interact with objects in the real world while recording vibro-acoustic signals from the fingertips. We then replay the recorded motions in a calibrated digital clone and use the simulator’s contact solver to generate contact and slip labels for supervision.

Training Strategy

Step 1: Pretraining on a large dataset with stationary objects, to learn general acoustic patterns.
Step 2: Fine-tuning on a small dataset with free-moving objects, to adapt to more realistic manipulation scenarios.

Estimator Performance

During a teleoprated demostration, the VibeAct estimator produces predictions of contact and slip that closely track the ground-truth labels from the digital clone.

The VibeAct tactile estimator achieves an F1 score of 0.60 for contact onset and 0.91 for slip presence, and a mean absolute error (MAE) of 4.74 mm/s for slip magnitude.

Policy Learning

We train reinforcement learning policies entirely in simulation using point-cloud, proprioceptive, and tactile observations. During training, the tactile channel comes from the same contact solver used to generate labels for training the tactile estimator, which helps bridge the sim-to-real gap. For real-world deployment, we replace the simulator-generated tactile observations with outputs from the tactile estimator while keeping the policy unchanged. This allows policies trained entirely in simulation to transfer directly to the real world.

We evaluate VibeAct across multiple contact-rich dexterous manipulation tasks, including in-hand repositioning, in-hand reorientation, peg insertion, and nut rotation.

Experiments in Simulation

Experiments in the Real World

Evaluation

Across both simulation and real-world experiments, VibeAct consistently outperforms baselines without tactile observations. These results suggest that explicit representations of contact and slip, inferred from vibrotactile sensing, provide useful information for contact-rich dexterous manipulation.