Understanding GPT-Neo, GPT-J, GLM, OPT, BLOOM, and extra…
![Towards Data Science](https://miro.medium.com/v2/resize:fill:48:48/1*CJe3891yB1A1mzMdqemkdg.jpeg)
Analysis on language modeling has an extended historical past that dates again to fashions like GTP and GPT-2 and even RNN-based strategies (e.g., ULMFit) that predate fashionable, transformer-based language fashions. Regardless of this lengthy historical past, nevertheless, language fashions have solely turn into standard comparatively lately. The primary rise in reputation got here with the proposal of GPT-3 [1], which confirmed that spectacular few-shot studying efficiency could possibly be achieved throughout many duties by way of a mixture of self-supervised pre-training and in-context studying; see beneath.
After this, the popularity garnered by GPT-3 led to the proposal of a swath of huge language fashions (LLMs). Shortly after, analysis on language mannequin alignment led to the creation of much more spectacular fashions like InstructGPT [19] and, most notably, its sister mannequin ChatGPT. The spectacular efficiency of those fashions led to a flood of curiosity in language modeling and generative AI.
Regardless of being extremely highly effective, many early developments in LLM analysis have one widespread property — they’re closed supply. When language fashions first started to achieve widespread recognition, most of the strongest LLMs have been solely accessible by way of paid APIs (e.g., the OpenAI API) and the flexibility to analysis and develop such fashions was restricted to pick people or labs. Such an method is markedly totally different from typical AI analysis practices, the place openness and thought sharing is normally inspired to advertise ahead progress.
“This restricted entry has restricted researchers’ means to grasp how and why these massive language fashions work, hindering progress on efforts to enhance their robustness and mitigate recognized points reminiscent of bias and toxicity.” — from [4]
This overview. Regardless of the preliminary emphasis upon proprietary expertise, the LLM analysis neighborhood slowly started to create open-source variants of standard language fashions like GPT-3. Though the primary open-source language fashions lagged behind the perfect proprietary fashions, they laid the inspiration for…