The competition dataset consists of annotated aerial UAV video sequences collected from multiple established tracking benchmarks, supplemented with additional custom-annotated footage. Each sequence contains a single target object filmed from a UAV platform under realistic and challenging conditions.
Each sequence includes:
- A video file at its native frame rate
- Frame-by-frame bounding box annotations (for the training dataset)
- A single target per sequence
- Realistic UAV viewpoints including translation, rotation, and altitude variation
- Challenging tracking conditions such as occlusion, fast motion, scale variation, background clutter, and low resolution