How to Scale Your EMA

*=Equal Contributors

Preserving coaching dynamics throughout batch sizes is a crucial instrument for sensible machine studying because it permits the trade-off between batch dimension and wall-clock time. This trade-off is often enabled by a scaling rule; for instance, in stochastic gradient descent, one ought to scale the training charge linearly with the batch dimension. One other vital machine studying instrument is the mannequin EMA, a practical copy of a goal mannequin whose parameters transfer in direction of these of its goal mannequin in line with an Exponential Shifting Common (EMA) at a charge parameterized by a momentum hyperparameter. This mannequin EMA can enhance the robustness and generalization of supervised studying, stabilize pseudo-labeling, and supply a studying sign for Self-Supervised Studying (SSL). Prior works haven’t thought-about the optimization of the mannequin EMA when performing scaling, resulting in totally different coaching dynamics throughout batch sizes and decrease mannequin efficiency. On this work, we offer a scaling rule for optimization within the presence of a mannequin EMA and display the rule’s validity throughout a variety of architectures, optimizers, and knowledge modalities. We additionally present the rule’s validity the place the mannequin EMA contributes to the optimization of the goal mannequin, enabling us to coach EMA-based pseudo-labeling and SSL strategies at small and huge batch sizes. For SSL, we allow coaching of BYOL as much as batch dimension 24,576 with out sacrificing efficiency, a 6× wall-clock time discount below idealized {hardware} settings.

Source link

Tags: EMA Scale

How to Scale Your EMA

Prying open the AI black box

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

SUS Corporation deploys ABB’s YuMi cobots to better manage lead times

Microsoft’s Azure AI Model Catalog Expands with Groundbreaking Artificial Intelligence Models

Recommended For You

Prying open the AI black box

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

Eric Evans receives Department of Defense Medal for Distinguished Public Service | MIT News

Imperva optimizes SQL generation from natural language using Amazon Bedrock

Microsoft's Azure AI Model Catalog Expands with Groundbreaking Artificial Intelligence Models

This is the Humane Ai Pin

Technique enables AI on edge devices to keep learning over time | MIT News

Leave a Reply Cancel reply

A technique for more effective multipurpose robots | MIT News

Helping robots grasp the unpredictable | MIT News

The Current State of AI! (My Personal News Recap)

MIT faculty, instructors, students experiment with generative AI in teaching and learning | MIT News

Robotics investments reach $418M in November 2023

An updated guide to Docker and ROS 2

What is AI – Artificial Intelligence in Telugu | Future of AI | TeluguBadi

Helping nonexperts build advanced generative AI models | MIT News

Unveiling the Power of AI in Shielding Businesses from Phishing Threats: A Comprehensive Guide for Leaders

Prying open the AI black box

Zion Solutions Group Joins Forces with Locus Robotics to Supercharge Warehouse Productivity

Neya Systems, AUVSI to develop cybersecurity certification program for UGVs

Achieving Superior Vision in Robotics with Automation in Low Light USB 3.0 Camera

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password

How to Scale Your EMA

You might also like

SUS Corporation deploys ABB’s YuMi cobots to better manage lead times

Microsoft’s Azure AI Model Catalog Expands with Groundbreaking Artificial Intelligence Models

Recommended For You

Leave a Reply Cancel reply

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password