DeepMind researchers have introduced a new predictive and compute-optimal model known as Chinchilla. This model has 70 billion parameters, which is four times more than Gopher but with the same computing budget. In comparison to other large language models like Gopher (280B), GPT-3 (175B), Jurassic-1 (178B) and MT-NLG 530B, Chinchilla showed an impressive performance on downstream evaluation tasks while requiring less computing for fine tuning and inference – thus making it easier to use in practical applications. Chinchilla achieved an average accuracy of 67.5% on the MMLU benchmark, 7% better than that of Gopher’s score. The trend of training these large language models is to increase their size without having to add more training tokens; currently MT-NLG 530B is over three times larger than GPT 3’s 170 billion parameters.