VibeAct connects real vibrotactile sensing to simulation-based policy learning through an explicit intermediate representation of contact and slip. A tactile estimator infers this representation from microphone signals, and the policy learns to act on the same representation in simulation.
VibeAct first trains a tactile estimator on real-world data, which estimates contact and slip from vibro-acoustic signals using four independent per-finger subnetworks. It then trains reinforcement learning policies entirely in simulation using this representation together with point-cloud and proprioceptive observations, and deploys the resulting policies directly in the real world.
For each finger, the tactile estimator maps raw vibro-acoustic signals to three physically grounded quantities: contact onset, slip presence, and slip magnitude. To train the estimator, we teleoperate the robot to interact with objects in the real world while recording vibro-acoustic signals from the fingertips. We then replay the recorded motions in a calibrated digital clone and use the simulator’s contact solver to generate contact and slip labels for supervision.
The VibeAct tactile estimator achieves an F1 score of 0.60 for contact onset and 0.91 for slip presence, and a mean absolute error (MAE) of 4.74 mm/s for slip magnitude.
We train reinforcement learning policies entirely in simulation using point-cloud, proprioceptive, and tactile observations. During training, the tactile channel comes from the same contact solver used to generate labels for training the tactile estimator, which helps bridge the sim-to-real gap. For real-world deployment, we replace the simulator-generated tactile observations with outputs from the tactile estimator while keeping the policy unchanged. This allows policies trained entirely in simulation to transfer directly to the real world.
We evaluate VibeAct across multiple contact-rich dexterous manipulation tasks, including in-hand repositioning, in-hand reorientation, peg insertion, and nut rotation.
Across both simulation and real-world experiments, VibeAct consistently outperforms baselines without tactile observations. These results suggest that explicit representations of contact and slip, inferred from vibrotactile sensing, provide useful information for contact-rich dexterous manipulation.