jd Benefits The WIT dataset provides a comprehensive resource for the research community to advance our understanding of multilingual, multimodal models. With its large size and broad language coverage, WIT can help researchers build better visio-linguistic models across many languages by providing more data to train on. Furthermore, since images are language agnostic – they provide valuable evidence when text is not available or not reliable (e.g., due to low quality translations). This makes it possible to learn cross-lingual image embedding which could be used for downstream tasks like zero-shot learning and transfer learning across different languages.