VibeCheck: Using Active Acoustic Tactile Sensing
for Contact-Rich Manipulation
The acoustic response of an object can reveal a lot about its global state, for example its material properties or the extrinsic contacts it is making with the world. In this work, we build an active acoustic sensing gripper equipped with two piezoelectric fingers: one for generating signals, the other for receiving them. By sending an acoustic vibration from one finger to the other through an object, we gain insight into an object's acoustic properties and contact state. We use this system to classify objects, estimate grasping position, estimate poses of internal structures, and classify the types of extrinsic contacts an object is making with the environment. Using our contact type classification model, we tackle a standard long-horizon manipulation problem: peg insertion. We use a simple simulated transition model based on the performance of our sensor to train an imitation learning policy that is robust to imperfect predictions from the classifier. We finally demonstrate the policy on a UR5 robot with active acoustic sensing as the only feedback.
Paper
Latest version:
arXiv.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025
Supplementary Video
Team
CLUE Lab, Columbia University
BibTeX
@inproceedings{zhang2025vibecheck,
title={VibeCheck: Using Active Acoustic Tactile Sensing for Contact-Rich Manipulation},
author={Zhang, Kaidi and Kim, Do-Gon and Chang, Eric T. and Liang, Hua-Hsuan and He, Zhanpeng and Lampo, Kathryn and Wu, Philippe and Kymissis, Ioannis and Ciocarlie, Matei},
booktitle={Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2025}
}
Hardware Design
We built on previous work and recreate a hardware platform. The sensor housing ① is a 3D-printed Formlabs Clear V4 resin mount that aligns the piezoelectric disk ②, which is backed by high-density foam tape to ensure even pressure distribution and strong coupling. A Sorbothane pad ③ provides elastomeric isolation, suppressing unwanted vibrations from gripper motor and UR5 robot, while polyethylene strips ④ on the housing fingertips prevents slippage during excitation. On the speaker side ⑤, a Teensy-controlled audio adapter shield and inverting amplifier circuit drive the piezoelectric transducer to generate the swept-sine excitation. On the receiver side ⑥, an identical piezoelectric transducer captures the vibration response and feeds the signal directly into the Teensy’s ADC at 44.1 kHz.
Task Experiments
1) Object Classification
Goal: Identify object type (material / hollow vs. solid) from a single grasp.
Result: 100% accuracy in-distribution across 9 classes, with generalization to new surfaces and orientations. Best trade-off used kernel PCA with 5 principal components (≈91% variance for best out-of-distribution), though even 3 principal components achieved 100% in-distribution.
2) Grasping Position Classification
Goal: Estimate where along a rod the grasp occurs (edge / quarter / center).
Result: 100% accuracy in-distribution and strong under out-of-distribution conditions. Using ~10 principal components (≈90% variance) balanced generalization while avoiding overfitting.
3) Pose Estimation from Internal Structure
Goal: Estimate an object’s orientation from transmitted acoustic signals.
Result: Regression achieved ~20° RMSE overall, with smaller errors at most angles. 5 principal components (≈87% variance) gave the best trade-off.
4) Contact Type Classification
Goal: Predict the extrinsic contact state of a rod (diagonal / line / in-hole).
Result: Achieved 95% in-distribution accuracy; the most difficult cases occur near diagonal–line boundaries. High-dimensional embeddings (~500 PCs) produced the strongest results.
Data Collection Methods
We use active acoustic sensing with a linear swept-sine excitation, sampled at 44.1 kHz: 20 Hz → 20 kHz over 1 s. Within each grasp, we collect 5 sweeps before regrasping.
1) Object Classification
2) Grasping Position Classification
3) Pose Estimation from Internal Structure
4) Contact Type Classification
θx = 45°, θz = 45°
θx = 45°, 45° ≤ θz ≤ 90°
Task Learning
The contact type classifier was used as the sole feedback signal for a long-horizon peg insertion task. We built a simulator with ground-truth labels and trained an imitation learning policy that incorporated observation history to handle classifier uncertainty. The learned policy achieved a 95% success rate in simulation and transferred to the real robot. In UR5 robot rollouts, we achieved success rates of 90% in-distribution and 60% out-of-distribution.
Sensing Robustness
We evaluate robustness by adding background music at 75 dB, where the contact type classifier sustains 87% accuracy, indicating resilience to acoustic and vibrational disturbances.
Acknowledgements
We thank Pedro Piacenza, Trey Smith, Brian Coltin, Peter Ballentine, and Ye Zhang for insightful discussions and support. We thank Hardware as Policy and UMI for website inspiration. This work was supported by a NASA Space Technology Graduate Research Opportunity.
Contact
If you have any questions, please feel free to contact Eric Chang or Zhanpeng He