VIdeO-and-Language INference (VIOLIN) Dataset
Created by Liu et al. at 2020, the VIdeO-and-Language INference (VIOLIN) Dataset contains 95,322 video-hypothesis pairs from 15,887 video clips, spanning over 582 hours of video (YouTube and TV shows). Inference descriptions of video content were annotated. Inferences are used to measure entailment vs video clip., in English language. Containing 15,887 in JSON, H5 file format.
Dataset Sources
Here you can download the VIdeO-and-Language INference (VIOLIN) dataset in JSON, H5 format.
Download VIdeO-and-Language INference (VIOLIN) dataset JSON, H5 files
Fine-tune with VIdeO-and-Language INference (VIOLIN) dataset
Metatext is a powerful no-code tool for train, tune and integrate custom NLP models
Paper
Read full original VIdeO-and-Language INference (VIOLIN) paper.
Classify and extract text 10x better and faster 🦾
Metatext helps you to classify and extract information from text and documents with customized language models with your data and expertise.