GPT-J is an open source, 6 billion parameter model that was trained using the Pile dataset. It is available for use with Mesh Transformer JAX and can be downloaded from Eleuther AI. GPT-J-6B performs nearly as well as OpenAI’s 6.7 billion parameter GPT-3 (or Curie) on various zero-shot downstream tasks including Lambda PPL, Lambda ACC, Winogrande, HellaSWAG and PIQA metrics. When compared to other models of similar size such as GPT-2 1.5B and Megatron 2.5B , it has significantly better performance in all evaluations but requires more training FLOPS to achieve this level of accuracy. In addition, it outperforms larger models like GPT 3 13B and 175B making it a cost effective alternative for many applications where only few parameters are necessary for accurate results.