Now, let’s bounce into the true deal of this text. Analyzing (Q, Ok, V, O) matrices of Llama-3–8B-Instruct mannequin through their singular values!
The Code
Let’s first import all mandatory packages wanted on this evaluation.
import transformersimport torchimport numpy as npfrom transformers import AutoConfig, LlamaModelfrom safetensors import safe_openimport osimport matplotlib.pyplot as plt
Then, let’s obtain the mannequin and reserve it into our native /tmpdirectory.
MODEL_ID = “meta-llama/Meta-Llama-3-8B-Instruct”!huggingface-cli obtain {MODEL_ID} –quiet –local-dir /tmp/{MODEL_ID}
In the event you’re GPU-rich, the next code may not be related for you. Nonetheless, for those who’re GPU-poor like me, the next code will probably be actually helpful to load solely particular layers of the LLama-3–8B mannequin.
def load_specific_layers_safetensors(mannequin, model_name, layer_to_load):state_dict = {}recordsdata = [f for f in os.listdir(model_name) if f.endswith(‘.safetensors’)]for file in recordsdata:filepath = os.path.be part of(model_name, file)with safe_open(filepath, framework=”pt”) as f:for key in f.keys():if f”layers.{layer_to_load}.” in key:new_key = key.change(f”mannequin.layers.{layer_to_load}.”, ‘layers.0.’)state_dict[new_key] = f.get_tensor(key)
missing_keys, unexpected_keys = mannequin.load_state_dict(state_dict, strict=False)if missing_keys:print(f”Lacking keys: {missing_keys}”)if unexpected_keys:print(f”Surprising keys: {unexpected_keys}”)
The explanation we do it’s because the free tier of Google Colab GPU is just not sufficient to load LLama-3–8B even with fp16 precision. Moreover, this evaluation requires us to work on fp32 precision because of how the np.linalg.svd is constructed. Subsequent, we will outline the primary perform to get singular values for a given matrix_type , layer_number , and head_number.
def get_singular_values(model_path, matrix_type, layer_number, head_number):”””Computes the singular values of the required matrix within the Llama-3 mannequin.
Parameters:model_path (str): Path to the modelmatrix_type (str): Sort of matrix (‘q’, ‘ok’, ‘v’, ‘o’)layer_number (int): Layer quantity (0 to 31)head_number (int): Head quantity (0 to 31)
Returns:np.array: Array of singular values”””assert matrix_type in [‘q’, ‘k’, ‘v’, ‘o’], “Invalid matrix sort”assert 0 <= layer_number < 32, “Invalid layer quantity”assert 0 <= head_number < 32, “Invalid head quantity”
# Load the mannequin just for that particular layer since we’ve restricted RAM even after utilizing fp16config = AutoConfig.from_pretrained(model_path)config.num_hidden_layers = 1model = LlamaModel(config)load_specific_layers_safetensors(mannequin, model_path, layer_number)
# Entry the required layer# At all times index 0 since we’ve loaded for the particular layerlayer = mannequin.layers[0]
# Decide the dimensions of every headnum_heads = layer.self_attn.num_headshead_dim = layer.self_attn.head_dim
# Entry the required matrixweight_matrix = getattr(layer.self_attn, f”{matrix_type}_proj”).weight.detach().numpy()if matrix_type in [‘q’,’o’]:begin = head_number * head_dimend = (head_number + 1) * head_dimelse: # ‘ok’, ‘v’ matrices# Modify the head_number based mostly on num_key_value_heads# That is accomplished since llama3-8b use Grouped Question Attentionnum_key_value_groups = num_heads // config.num_key_value_headshead_number_kv = head_number // num_key_value_groupsstart = head_number_kv * head_dimend = (head_number_kv + 1) * head_dim
# Extract the weights for the required headif matrix_type in [‘q’, ‘k’, ‘v’]:weight_matrix = weight_matrix[start:end, :]else: # ‘o’ matrixweight_matrix = weight_matrix[:, start:end]
# Compute singular valuessingular_values = np.linalg.svd(weight_matrix, compute_uv=False)
del mannequin, config
return checklist(singular_values)
It’s value noting that we will extract the weights for the required head on the Ok, Q, and V matrices by doing row-wise slicing due to how it’s applied by HuggingFace.
As for the O matrix, we will do column-wise slicing to extract the weights for the required head on the O weight due to linear algebra! Particulars will be seen within the following determine.