SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

This paper was accepted on the UniReps Workshop at NeurIPS 2023.

The panorama of publicly out there imaginative and prescient basis fashions (VFMs), corresponding to CLIP and Section Something Mannequin (SAM), is increasing quickly. VFMs are endowed with distinct capabilities stemming from their pre-training aims. As an example, CLIP excels in semantic understanding, whereas SAM focuses on spatial understanding for segmentation. On this work, we introduce a easy recipe to effectively merge VFMs right into a unified mannequin that absorbs their experience. Our technique integrates strategies of multi-task studying, continuous studying, and distillation. Additional, it calls for considerably much less computational price in comparison with conventional multi-task coaching from scratch, and it solely wants a small fraction of the pre-training datasets that have been initially used to coach particular person fashions. By making use of our technique to SAM and CLIP, we acquire SAM-CLIP: a unified mannequin that mixes the capabilities of SAM and CLIP right into a single imaginative and prescient transformer. In contrast with deploying SAM and CLIP independently, our merged mannequin, SAM-CLIP, reduces storage and compute prices for inference, making it well-suited for edge system functions. We present that SAM-CLIP not solely retains the foundational strengths of SAM and CLIP, but in addition introduces synergistic functionalities, notably in zero-shot semantic segmentation, the place SAM-CLIP establishes new state-of-the-art outcomes on 5 benchmarks. It outperforms earlier fashions which can be particularly designed for this job by a big margin, together with +6.8% and +5.9% imply IoU enchancment on Pascal-VOC and COCO-Stuff datasets, respectively.

Source link

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

ML/AI Platform Build vs Buy Decision: What Factors to Consider

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Conformer-Based Speech Recognition on Extreme Edge-Computing Devices

What does the future hold for generative AI? | MIT News

Accelerate data preparation for ML in Amazon SageMaker Canvas

Recommended For You

ML/AI Platform Build vs Buy Decision: What Factors to Consider

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Conformer-Based Speech Recognition on Extreme Edge-Computing Devices

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Understanding the visual knowledge of language models | MIT News

Accelerate data preparation for ML in Amazon SageMaker Canvas

Rice husk and recycled newspaper may be the eco-friendly insulation material of the future - Science & research news

ADAR Editor in Chief debunks common myths on substance abuse disorder - Science & research news

Leave a Reply Cancel reply

A technique for more effective multipurpose robots | MIT News

Helping robots grasp the unpredictable | MIT News

The Current State of AI! (My Personal News Recap)

Robotics investments reach $418M in November 2023

2024 World Battery & Energy Storage Industry Expo (WBE)

MIT faculty, instructors, students experiment with generative AI in teaching and learning | MIT News

What is AI – Artificial Intelligence in Telugu | Future of AI | TeluguBadi

Zion Solutions Group Joins Forces with Locus Robotics to Supercharge Warehouse Productivity

A method to enable safe mobile robot navigation in dynamic environments

Robot Talk Episode 90 – Robotically Augmented People

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

RBR50 Spotlight: Slip Robotics minimizes trailer loading times with simple approach

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

You might also like

What does the future hold for generative AI? | MIT News

Accelerate data preparation for ML in Amazon SageMaker Canvas

Recommended For You

Leave a Reply Cancel reply

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password