An overview of recent action recognition datasets and their detection classes
- Action: Atomic low-level movement such as standing up, sitting down, walking, talking etc.
- Activity/event: Higher level occurence then actions such as dining, playing, dancing
- Trimmed video: A short video clip containing event/action/activity of interest
- Untrimmed video: A video clip of arbitrary length potentially containing durations without activities of interest
- Localization: locating an instance of event/action/activity within a video at a spatial or temporal scale
- Spatial localization: Locating the region/area of an instance of action/activity within a video