Broken Video Detection Dataset

1Fudan University, ShangHai, China.
1ShenZhen University, ShenZhen, China.
2The Chinese University of Hong Kong, HongKong, China

*Corresponding Author

The left-hand is automatically generated by AI to simulate a damaged scene. The right-hand video applies mask annotations based on the proposed video corruption standard from this paper, highlighting the detected corrupted regions from the left video. This masked representation helps the model better learn the characteristics of corruption areas and improves detection accuracy.

Overview

Video destruction segmentation focuses on identifying and segmenting visually corrupted regions across time in a video. Unlike traditional video segmentation tasks that focus on foreground objects or language-referred regions, this task introduces a novel objective-segmenting structural defects and anomalies caused by generation artifacts, degradation, or damage. Each AI-generated video clip is paired with a corresponding mask-annotated version based on our proposed standards of destruction. This benchmark supports research on robust segmentation under imperfect video conditions, with applications in video restoration, quality assessment, and generative model evaluation.

Dataset Statistics

Our dataset includes AI-generated videos collected from the following major video generation models:Luma 1.6, Luma, CogVideoX, EasyAnimate V4, Kling, Qingying, Kling 1.5, Vidu, MiniMax, OpenSora 1.2, Gen-3 and Tongyi. These diverse sources ensure a comprehensive representation of contemporary AI video synthesis capabilities. The resolution of videos spans a wide range, from low (e.g., 416×624) to high (e.g., 1760×1152), covering various common formats such as 720p, 768p, and square resolutions like 1024×1024.After careful filtering and manual annotation, we curated a high-quality dataset consisting of:

  • 3,141 high-resolution videos
  • 336,000 high-quality manual annotations
Dataset Illustration

BibTeX