The reproducibility and transparency of enormous language fashions are essential for advancing open analysis, guaranteeing the trustworthiness of outcomes, and enabling investigations into information and mannequin biases, in addition to potential dangers. To this finish, we launch OpenELM, a state-of-the-art open language mannequin. OpenELM makes use of a layer-wise scaling technique to effectively allocate parameters inside every layer of the transformer mannequin, resulting in enhanced accuracy. For instance, with a parameter funds of roughly one billion parameters, OpenELM displays a 2.36% enchancment in accuracy in comparison with OLMo whereas requiring 2 occasions fewer pre-training tokens.
Diverging from prior practices that solely present mannequin weights and inference code, and pre-train on personal datasets, our launch consists of the entire framework for coaching and analysis of the language mannequin on publicly accessible datasets, together with coaching logs, a number of checkpoints, and pre-training configurations. We additionally launch code to transform fashions to MLX library for inference and fine-tuning on Apple units. This complete launch goals to empower and strengthen the open analysis neighborhood, paving the best way for future open analysis endeavors.