4M: Massively Multimodal Masked Modeling

*=Equal Contributors

Present machine studying fashions for imaginative and prescient are sometimes extremely specialised and restricted to a single modality and job. In distinction, latest massive language fashions exhibit a variety of capabilities, hinting at a chance for equally versatile fashions in laptop imaginative and prescient. On this paper, we take a step on this route and suggest a multimodal coaching scheme referred to as 4M. It consists of coaching a single unified Transformer encoder-decoder utilizing a masked modeling goal throughout a variety of enter/output modalities – together with textual content, pictures, geometric, and semantic modalities, in addition to neural community characteristic maps. 4M achieves scalability by unifying the illustration house of all modalities by way of mapping them into discrete tokens and performing multimodal masked modeling on a small randomized subset of tokens.

4M results in fashions that exhibit a number of key capabilities: (1) they’ll carry out a various set of imaginative and prescient duties out of the field, (2) they excel when fine-tuned for unseen downstream duties or new enter modalities, and (3) they’ll operate as a generative mannequin that may be conditioned on arbitrary modalities, enabling all kinds of expressive multimodal enhancing capabilities with outstanding flexibility.

By experimental analyses, we show the potential of 4M for coaching versatile and scalable basis fashions for imaginative and prescient duties, setting the stage for additional exploration in multimodal studying for imaginative and prescient and different domains.

Source link

4M: Massively Multimodal Masked Modeling

ML/AI Platform Build vs Buy Decision: What Factors to Consider

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Conformer-Based Speech Recognition on Extreme Edge-Computing Devices

Robotic Print and Apply Palletizer applies Dexterity AI to case labeling

Making an image with generative AI uses as much energy as charging your phone

Recommended For You

ML/AI Platform Build vs Buy Decision: What Factors to Consider

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Conformer-Based Speech Recognition on Extreme Edge-Computing Devices

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Understanding the visual knowledge of language models | MIT News

Making an image with generative AI uses as much energy as charging your phone

A color-based sensor to emulate skin's sensitivity for wearables and soft robotics

Bitdeal Expands Global Reach as Premier Digital Transformation Company

Leave a Reply Cancel reply

A technique for more effective multipurpose robots | MIT News

Helping robots grasp the unpredictable | MIT News

The Current State of AI! (My Personal News Recap)

Robotics investments reach $418M in November 2023

2024 World Battery & Energy Storage Industry Expo (WBE)

MIT faculty, instructors, students experiment with generative AI in teaching and learning | MIT News

What is AI – Artificial Intelligence in Telugu | Future of AI | TeluguBadi

A method to enable safe mobile robot navigation in dynamic environments

Robot Talk Episode 90 – Robotically Augmented People

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

RBR50 Spotlight: Slip Robotics minimizes trailer loading times with simple approach

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

Coval upgrades its CVGC Carbon Vacuum Gripper with an even more versatile second generation

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password

4M: Massively Multimodal Masked Modeling

You might also like

Robotic Print and Apply Palletizer applies Dexterity AI to case labeling

Making an image with generative AI uses as much energy as charging your phone

Recommended For You

Leave a Reply Cancel reply

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password