Unveiling the Inner Workings of LLMs: A Singular Value Perspective | by Louis Owen

Now, let’s bounce into the true deal of this text. Analyzing (Q, Ok, V, O) matrices of Llama-3–8B-Instruct mannequin through their singular values!

The Code

Let’s first import all mandatory packages wanted on this evaluation.

import transformersimport torchimport numpy as npfrom transformers import AutoConfig, LlamaModelfrom safetensors import safe_openimport osimport matplotlib.pyplot as plt

Then, let’s obtain the mannequin and reserve it into our native /tmpdirectory.

MODEL_ID = “meta-llama/Meta-Llama-3-8B-Instruct”!huggingface-cli obtain {MODEL_ID} –quiet –local-dir /tmp/{MODEL_ID}

In the event you’re GPU-rich, the next code may not be related for you. Nonetheless, for those who’re GPU-poor like me, the next code will probably be actually helpful to load solely particular layers of the LLama-3–8B mannequin.

def load_specific_layers_safetensors(mannequin, model_name, layer_to_load):state_dict = {}recordsdata = [f for f in os.listdir(model_name) if f.endswith(‘.safetensors’)]for file in recordsdata:filepath = os.path.be part of(model_name, file)with safe_open(filepath, framework=”pt”) as f:for key in f.keys():if f”layers.{layer_to_load}.” in key:new_key = key.change(f”mannequin.layers.{layer_to_load}.”, ‘layers.0.’)state_dict[new_key] = f.get_tensor(key)

missing_keys, unexpected_keys = mannequin.load_state_dict(state_dict, strict=False)if missing_keys:print(f”Lacking keys: {missing_keys}”)if unexpected_keys:print(f”Surprising keys: {unexpected_keys}”)

The explanation we do it’s because the free tier of Google Colab GPU is just not sufficient to load LLama-3–8B even with fp16 precision. Moreover, this evaluation requires us to work on fp32 precision because of how the np.linalg.svd is constructed. Subsequent, we will outline the primary perform to get singular values for a given matrix_type , layer_number , and head_number.

def get_singular_values(model_path, matrix_type, layer_number, head_number):”””Computes the singular values of the required matrix within the Llama-3 mannequin.

Parameters:model_path (str): Path to the modelmatrix_type (str): Sort of matrix (‘q’, ‘ok’, ‘v’, ‘o’)layer_number (int): Layer quantity (0 to 31)head_number (int): Head quantity (0 to 31)

Returns:np.array: Array of singular values”””assert matrix_type in [‘q’, ‘k’, ‘v’, ‘o’], “Invalid matrix sort”assert 0 <= layer_number < 32, “Invalid layer quantity”assert 0 <= head_number < 32, “Invalid head quantity”

# Load the mannequin just for that particular layer since we’ve restricted RAM even after utilizing fp16config = AutoConfig.from_pretrained(model_path)config.num_hidden_layers = 1model = LlamaModel(config)load_specific_layers_safetensors(mannequin, model_path, layer_number)

# Entry the required layer# At all times index 0 since we’ve loaded for the particular layerlayer = mannequin.layers[0]

# Decide the dimensions of every headnum_heads = layer.self_attn.num_headshead_dim = layer.self_attn.head_dim

# Entry the required matrixweight_matrix = getattr(layer.self_attn, f”{matrix_type}_proj”).weight.detach().numpy()if matrix_type in [‘q’,’o’]:begin = head_number * head_dimend = (head_number + 1) * head_dimelse: # ‘ok’, ‘v’ matrices# Modify the head_number based mostly on num_key_value_heads# That is accomplished since llama3-8b use Grouped Question Attentionnum_key_value_groups = num_heads // config.num_key_value_headshead_number_kv = head_number // num_key_value_groupsstart = head_number_kv * head_dimend = (head_number_kv + 1) * head_dim

# Extract the weights for the required headif matrix_type in [‘q’, ‘k’, ‘v’]:weight_matrix = weight_matrix[start:end, :]else: # ‘o’ matrixweight_matrix = weight_matrix[:, start:end]

# Compute singular valuessingular_values = np.linalg.svd(weight_matrix, compute_uv=False)

del mannequin, config

return checklist(singular_values)

It’s value noting that we will extract the weights for the required head on the Ok, Q, and V matrices by doing row-wise slicing due to how it’s applied by HuggingFace.

Q, Ok, V Matrices Implementation in HuggingFace. Be aware that in PyTorch the matrix dimension will probably be in (d_out,d_in). Supply: Picture by Writer.

As for the O matrix, we will do column-wise slicing to extract the weights for the required head on the O weight due to linear algebra! Particulars will be seen within the following determine.

Reasoning on why we will extract the required head on the O weight matrix by doing column-wise slicing. Supply: Picture by Writer.

Source link

Unveiling the Inner Workings of LLMs: A Singular Value Perspective | by Louis Owen | Jun, 2024

HUSKY: A Unified, Open-Source Language Agent for Complex Multi-Step Reasoning Across Domains

A creation story told through immersive technology | MIT News

Optimizing AI Workflows: Leveraging Multi-Agent Systems for Efficient Task Execution

A creation story told through immersive technology | MIT News

An open-source robotic system that can play chess with humans

Recommended For You

HUSKY: A Unified, Open-Source Language Agent for Complex Multi-Step Reasoning Across Domains

A creation story told through immersive technology | MIT News

Optimizing AI Workflows: Leveraging Multi-Agent Systems for Efficient Task Execution

Elon Musk Ends Lawsuit Against OpenAI, Criticizes Apple Over ChatGPT Plans

Build a custom UI for Amazon Q Business

An open-source robotic system that can play chess with humans

Robot Talk Episode 89 – Simone Schuerle

Flexiv Smooths the Way: Rizon Robot Revolutionizes Car Seat Production

Leave a Reply Cancel reply

Japan Releases Fully Functioning Female Robots

An updated guide to Docker and ROS 2

Unveiling Japan’s Latest AI Female Robots: Capable of Anything!

Universal Robots debuts UR20’s welding abilities

2024 World Battery & Energy Storage Industry Expo (WBE)

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

World’s Longest Field Goal- Robot vs NFL Kicker

Industrial robot shipments exceed 500,000 units in 2023

IEEE launches study group to explore and develop humanoid robot standards

HUSKY: A Unified, Open-Source Language Agent for Complex Multi-Step Reasoning Across Domains

A fully edible robot could soon end up on our plate, say scientists

Researchers wonder what if you just put a robot in the driver’s seat instead of automating the car?

Flexiv Smooths the Way: Rizon Robot Revolutionizes Car Seat Production

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password

Unveiling the Inner Workings of LLMs: A Singular Value Perspective | by Louis Owen | Jun, 2024

You might also like

The Code

A creation story told through immersive technology | MIT News

An open-source robotic system that can play chess with humans

Recommended For You

Leave a Reply Cancel reply

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password