Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

Gadget-directed speech detection (DDSD) is the binary classification process of distinguishing between queries directed at a voice assistant versus aspect dialog or background speech. State-of-the-art DDSD techniques use verbal cues (for instance, acoustic, textual content and/or automated speech recognition system (ASR) options) to categorise speech as device-directed or in any other case, and sometimes need to take care of a number of of those modalities being unavailable when deployed in real-world settings. On this paper, we examine fusion schemes for DDSD techniques that may be made extra sturdy to lacking modalities. Concurrently, we examine the usage of non-verbal cues, particularly prosody options, along with verbal cues for DDSD. We current totally different approaches to mix scores and embeddings from prosody with the corresponding verbal cues, discovering that prosody improves DDSD efficiency by as much as 8.5% when it comes to false acceptance price (FA) at a given mounted working level through non-linear intermediate fusion, whereas our use of modality dropout methods improves the efficiency of those fashions by 7.4% when it comes to FA when evaluated with lacking modalities throughout inference time.

Source link

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

Helping nonexperts build advanced generative AI models | MIT News

ML/AI Platform Build vs Buy Decision: What Factors to Consider

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Microsoft, Rockwell Automation to bring Gen AI to robotics development

Bionic Robotics! AI Revolution?

Recommended For You

Helping nonexperts build advanced generative AI models | MIT News

ML/AI Platform Build vs Buy Decision: What Factors to Consider

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Conformer-Based Speech Recognition on Extreme Edge-Computing Devices

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Bionic Robotics! AI Revolution?

Custom PLC Software Development: A Game-Changer for Tailored Industrial Automation Solutions

How robots can help find the solar energy of the future

Leave a Reply Cancel reply

A technique for more effective multipurpose robots | MIT News

Helping robots grasp the unpredictable | MIT News

The Current State of AI! (My Personal News Recap)

Robotics investments reach $418M in November 2023

2024 World Battery & Energy Storage Industry Expo (WBE)

MIT faculty, instructors, students experiment with generative AI in teaching and learning | MIT News

What is AI – Artificial Intelligence in Telugu | Future of AI | TeluguBadi

Helping nonexperts build advanced generative AI models | MIT News

Unveiling the Power of AI in Shielding Businesses from Phishing Threats: A Comprehensive Guide for Leaders

Zion Solutions Group Joins Forces with Locus Robotics to Supercharge Warehouse Productivity

Neya Systems, AUVSI to develop cybersecurity certification program for UGVs

A method to enable safe mobile robot navigation in dynamic environments

Robot Talk Episode 90 – Robotically Augmented People

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

You might also like

Microsoft, Rockwell Automation to bring Gen AI to robotics development

Bionic Robotics! AI Revolution?

Recommended For You

Leave a Reply Cancel reply

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password