Language models can explain neurons in language models

Though the overwhelming majority of our explanations rating poorly, we imagine we are able to now use ML methods to additional enhance our potential to supply explanations. For instance, we discovered we had been capable of enhance scores by:

Iterating on explanations. We will enhance scores by asking GPT-4 to give you doable counterexamples, then revising explanations in gentle of their activations.Utilizing bigger fashions to provide explanations. The common rating goes up because the explainer mannequin’s capabilities enhance. Nonetheless, even GPT-4 offers worse explanations than people, suggesting room for enchancment.Altering the structure of the defined mannequin. Coaching fashions with completely different activation capabilities improved clarification scores.

We’re open-sourcing our datasets and visualization instruments for GPT-4-written explanations of all 307,200 neurons in GPT-2, in addition to code for clarification and scoring utilizing publicly accessible fashions on the OpenAI API. We hope the analysis neighborhood will develop new methods for producing higher-scoring explanations and higher instruments for exploring GPT-2 utilizing explanations.

We discovered over 1,000 neurons with explanations that scored no less than 0.8, that means that in response to GPT-4 they account for a lot of the neuron’s top-activating habits. Most of those well-explained neurons will not be very fascinating. Nonetheless, we additionally discovered many fascinating neurons that GPT-4 did not perceive. We hope as explanations enhance we could possibly quickly uncover fascinating qualitative understanding of mannequin computations.

Source link

Language models can explain neurons in language models

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

Eric Evans receives Department of Defense Medal for Distinguished Public Service | MIT News

Training machines to learn more like humans do | MIT News

AI & GPT4… Now on Your Phone

Recommended For You

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

Eric Evans receives Department of Defense Medal for Distinguished Public Service | MIT News

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AI in Manufacturing: Overcoming Data and Talent Barriers

AI & GPT4… Now on Your Phone

Biodegradable artificial muscles: going green in the field of soft robotics

Pause AI? – O’Reilly

Leave a Reply Cancel reply

A technique for more effective multipurpose robots | MIT News

Helping robots grasp the unpredictable | MIT News

The Current State of AI! (My Personal News Recap)

MIT faculty, instructors, students experiment with generative AI in teaching and learning | MIT News

Robotics investments reach $418M in November 2023

2024 World Battery & Energy Storage Industry Expo (WBE)

What is AI – Artificial Intelligence in Telugu | Future of AI | TeluguBadi

A method to enable safe mobile robot navigation in dynamic environments

Robot Talk Episode 90 – Robotically Augmented People

Eliminating Vector Quantization: Diffusion-Based Autoregressive AI Models for Image Generation

RBR50 Spotlight: Slip Robotics minimizes trailer loading times with simple approach

Voyage Multilingual 2 Embedding Evaluation | by Lars Wiik | Jun, 2024

Coval upgrades its CVGC Carbon Vacuum Gripper with an even more versatile second generation

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password

Language models can explain neurons in language models

You might also like

Training machines to learn more like humans do | MIT News

AI & GPT4… Now on Your Phone

Recommended For You

Leave a Reply Cancel reply

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password