Whisper is a powerful automatic speech recognition (ASR) system that has been trained using 680,000 hours of meticulously gathered multilingual and multitask supervised data from the web. This large and diverse dataset allows for improved robustness when dealing with accents, background noise and technical language. Additionally, it enables transcription in multiple languages as well as translation into English. We are making Whisper’s models and inference code available to the public in order to create useful applications and further advance research on robust speech processing.