Compressing LLMs: The Truth is Rarely Pure and Never Simple

Regardless of their exceptional achievements, fashionable Giant Language Fashions (LLMs) encounter exorbitant computational and reminiscence footprints. Not too long ago, a number of works have proven important success in training-free and data-free compression (pruning and quantization) of LLMs reaching 50-60% sparsity and decreasing the bit-width down to three or 4 bits per weight, with negligible perplexity degradation over the uncompressed baseline. As current analysis efforts are targeted on creating more and more refined compression strategies, our work takes a step again, and re-evaluates the effectiveness of current SoTA compression strategies, which depend on a reasonably easy and extensively questioned metric, perplexity (even for dense LLMs). We introduce Data-Intensive Compressed LLM BenchmarK (LLM-KICK), a set of carefully-curated duties to re-define the analysis protocol for compressed LLMs, which have important alignment with their dense counterparts, and perplexity fail to seize refined change of their true capabilities. LLM-KICK unveils many favorable deserves and unlucky plights of present SoTA compression strategies: all pruning strategies endure important efficiency degradation, generally at trivial sparsity ratios (e.g., 25-30%), and fail for N:M sparsity on knowledge-intensive duties; present quantization strategies are extra profitable than pruning; but, pruned LLMs even at 50% sparsity are sturdy in-context retrieval and summarization programs; amongst others. LLM-KICK is designed to holistically entry compressed LLMs’ capability for language understanding, reasoning, era, in-context retrieval, in-context summarization, and so forth. We hope our examine can foster the event of higher LLM compression strategies.

Source link

Compressing LLMs: The Truth is Rarely Pure and Never Simple

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Understanding the visual knowledge of language models | MIT News

Generating audio for video – Google DeepMind

Global Automation and Robotics Executive Breakfast

Augmented Intelligence and Human-Robot Collaboration | RobotShop Community

Recommended For You

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Understanding the visual knowledge of language models | MIT News

Generating audio for video – Google DeepMind

A smarter way to streamline drug discovery | MIT News

Technique improves the reasoning capabilities of large language models | MIT News

Augmented Intelligence and Human-Robot Collaboration | RobotShop Community

A miniature wireless robot that can effectively move through tubular structures

Amid a world of evolving AI, a Las Vegas man brings his creations to life

Leave a Reply Cancel reply

A technique for more effective multipurpose robots | MIT News

Helping robots grasp the unpredictable | MIT News

The Current State of AI! (My Personal News Recap)

Robotics investments reach $418M in November 2023

Robotics investments top $466M in April 2024

2024 World Battery & Energy Storage Industry Expo (WBE)

How to Find and Solve Valuable Generative-AI Use Cases | by Teemu Sormunen | Jun, 2024

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

The real reason why there aren’t many DIY maker works after the new Raspberry Pi 5 is launched? | RobotShop Community

GM invests another $850M into Cruise as it expands manual operations

coboworx raises $12M to make robotics more accessible

Understanding the visual knowledge of language models | MIT News

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password

Compressing LLMs: The Truth is Rarely Pure and Never Simple

You might also like

Global Automation and Robotics Executive Breakfast

Augmented Intelligence and Human-Robot Collaboration | RobotShop Community

Recommended For You

Leave a Reply Cancel reply

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password