Video destruction segmentation focuses on identifying and segmenting visually corrupted regions across time in a video. Unlike traditional video segmentation tasks that focus on foreground objects or language-referred regions, this task introduces a novel objective-segmenting structural defects and anomalies caused by generation artifacts, degradation, or damage. Each AI-generated video clip is paired with a corresponding mask-annotated version based on our proposed standards of destruction. This benchmark supports research on robust segmentation under imperfect video conditions, with applications in video restoration, quality assessment, and generative model evaluation.
Our dataset includes AI-generated videos collected from the following major video generation models:Luma 1.6, Luma, CogVideoX, EasyAnimate V4, Kling, Qingying, Kling 1.5, Vidu, MiniMax, OpenSora 1.2, Gen-3 and Tongyi. These diverse sources ensure a comprehensive representation of contemporary AI video synthesis capabilities. The resolution of videos spans a wide range, from low (e.g., 416×624) to high (e.g., 1760×1152), covering various common formats such as 720p, 768p, and square resolutions like 1024×1024.After careful filtering and manual annotation, we curated a high-quality dataset consisting of: