Project CodeNet by IBM

/ IBM Research’s Project CodeNet is a large dataset designed to teach AI how to code, containing over 500 million lines of code in more than 55 programming languages. Those languages range from modern ones such as C++, Java, Python and Go to legacy ones like COBOL, Pascal and FORTRAN. Project CodeNet has the potential to revolutionize AI for Code by providing an extensive benchmarking and experimentation platform. It contains 14 million code samples which are solutions intended for 4000 coding problems. The scale of the dataset combined with its diversity make it analogous to ImageNet – a huge data set used for computer vision research that has had great impact on the field.

