Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

This paper has been accepted on the Knowledge Issues for Basis Fashions workshop at ICLR 2024.

Giant language fashions are skilled on large scrapes of the net, which are sometimes unstructured, noisy, and poorly phrased. Present scaling legal guidelines present that studying from such knowledge requires an abundance of each compute and knowledge, which grows with the scale of the mannequin being skilled. That is infeasible each due to the big compute prices and period related to pre-training, and the upcoming shortage of high-quality knowledge on the net. On this work, we proposeWebRephrase Augmented Pre-training (WRAP) that makes use of an off-the-shelf instruction-tuned mannequin prompted to paraphrase paperwork on the net in particular kinds corresponding to “like Wikipedia” or in “question-answer format” to collectively pre-train LLMs on actual and artificial rephrases. First, we present that utilizing WRAP on the C4 dataset, which is of course noisy, hastens pre-training by ~3 occasions. On the similar pre-training compute funds, it improves perplexity by greater than 10% on common throughout completely different subsets of the Pile, and improves zero-shot query reply accuracy throughout 13 duties by greater than 2%. Second, we examine the affect of the re-phrasing model on the efficiency of the mannequin, providing insights into how the composition of the coaching knowledge can affect the efficiency of LLMs in OOD settings. Our positive aspects are attributed to the truth that re-phrased artificial knowledge (i) incorporates model range that carefully displays downstream analysis model, and (ii) has greater “high quality” than web-scraped knowledge.

Source link

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Understanding the visual knowledge of language models | MIT News

Generating audio for video – Google DeepMind

MaxDiff RL Algorithm Improves Robotic Learning with “Designed Randomness”

FANUC launches explosion-proof painting cobot

Recommended For You

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

Understanding the visual knowledge of language models | MIT News

Generating audio for video – Google DeepMind

A smarter way to streamline drug discovery | MIT News

Technique improves the reasoning capabilities of large language models | MIT News

FANUC launches explosion-proof painting cobot

Symbotic brings in $424M in Q2 2024

China's home-grown general-purpose humanoid jogs out at 6 km/h

Leave a Reply Cancel reply

A technique for more effective multipurpose robots | MIT News

Helping robots grasp the unpredictable | MIT News

The Current State of AI! (My Personal News Recap)

Robotics investments reach $418M in November 2023

Creating bespoke programming languages for efficient visual AI systems | MIT News

HPI-MIT design research collaboration creates powerful teams | MIT News

How to Find and Solve Valuable Generative-AI Use Cases | by Teemu Sormunen | Jun, 2024

Researchers leverage shadows to model 3D scenes, including objects blocked from view | MIT News

The real reason why there aren’t many DIY maker works after the new Raspberry Pi 5 is launched? | RobotShop Community

GM invests another $850M into Cruise as it expands manual operations

coboworx raises $12M to make robotics more accessible

Understanding the visual knowledge of language models | MIT News

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password

Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

You might also like

MaxDiff RL Algorithm Improves Robotic Learning with “Designed Randomness”

FANUC launches explosion-proof painting cobot

Recommended For You

Leave a Reply Cancel reply

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password