To finish real-world duties in residence environments, workplaces and public areas, robots ought to be capable to successfully grasp and manipulate a variety of objects. In recent times, builders have created numerous machine studying–based mostly fashions designed to allow expert object manipulation in robots.
Whereas a few of these fashions achieved good outcomes, to carry out nicely they usually must be pre-trained on giant quantities of knowledge. The datasets used to coach these fashions are primarily comprised of visible information, reminiscent of annotated pictures and video footage captured utilizing cameras, but some approaches additionally analyze different sensory inputs, reminiscent of tactile data.
Researchers at Carnegie Mellon College and Olin School of Engineering lately explored the opportunity of utilizing contact microphones as a substitute of standard tactile sensors, thus enabling the usage of audio information to coach machine studying fashions for robotic manipulation. Their paper, posted to the preprint server arXiv, might open new alternatives for the large-scale multi-sensory pre-training of those fashions.
“Though pre-training on a considerable amount of information is helpful for robotic studying, present paradigms solely carry out large-scale pretraining for visible representations, whereas representations for different modalities are skilled from scratch,” Jared Mejia, Victoria Dean and their colleagues wrote within the paper.
“In distinction to the abundance of visible information, it’s unclear what related internet-scale information could also be used for pretraining different modalities reminiscent of tactile sensing. Such pretraining turns into more and more essential within the low-data regimes frequent in robotics functions. We handle this hole utilizing contact microphones instead tactile sensor.”
As a part of their current examine, Mejia, Dean and their collaborators pre-trained a self-supervised machine studying method on audio-visual representations from the Audioset dataset, which incorporates greater than 2 million 10-second video clips of sounds and music clips collected from the web. The mannequin they pre-trained depends on audio-visual occasion discrimination (AVID), a way that may study to differentiate between various kinds of audio-visual information.
The researchers assessed their method in a collection of checks, the place a robotic was tasked with finishing real-world manipulation duties counting on a most of 60 demonstrations for every process. Their findings had been extremely promising, as their mannequin outperformed insurance policies for robotic manipulation that solely depend on visible information, notably in cases the place objects and places had been markedly completely different from these included within the coaching information.
“Our key perception is that contact microphones seize inherently audio-based data, permitting us to leverage large-scale audio-visual pretraining to acquire representations that enhance the efficiency of robotic manipulation,” Mejia, Dean and their colleagues wrote. “To the perfect of our data, our technique is the primary method leveraging largescale multisensory pre-training for robotic manipulation.”
Sooner or later, the examine by Mejia, Dean and their colleagues might open a brand new avenue for the belief of expert robotic manipulation using pre-trained multimodal machine studying fashions. Their proposed method might quickly be improved additional and examined on a broader vary of real-world manipulation duties.
“Future work might examine which properties of pre-training datasets are most conducive to studying audio-visual representations for manipulation insurance policies,” Mejia, Dean and their colleagues wrote. “Additional, a promising route could be to equip end-effectors with visuo-tactile sensors and speak to microphones with pre-trained audio representations to find out easy methods to leverage each for equipping robotic brokers with a richer understanding of their surroundings.”
Extra data:
Jared Mejia et al, Listening to Contact: Audio-Visible Pretraining for Contact-Wealthy Manipulation, arXiv (2024). DOI: 10.48550/arxiv.2405.08576
arXiv
© 2024 Science X Community
Quotation:
Utilizing contact microphones as tactile sensors for robotic manipulation (2024, Could 30)
retrieved 2 June 2024
from https://techxplore.com/information/2024-05-contact-microphones-tactile-sensors-robot.html
This doc is topic to copyright. Aside from any honest dealing for the aim of personal examine or analysis, no
half could also be reproduced with out the written permission. The content material is offered for data functions solely.