Skip to content

Instantly share code, notes, and snippets.

@yaythomas
Last active November 28, 2021 16:12
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yaythomas/f869aa957ef1b3d28642bcee371d1021 to your computer and use it in GitHub Desktop.
Save yaythomas/f869aa957ef1b3d28642bcee371d1021 to your computer and use it in GitHub Desktop.
pypyr video processing (untested) rough structural example

automating your video processing

[preamble]

what we want to achieve

TODO:

  1. download the video
  2. convert to different formats via ffmpeg
  3. create thumbnails
  4. use a trained ML model for scene classification
  5. make sub clips based on classification
  6. ML training?

pypyr for task automation

pypyr is a task-runner that lets you automate task sequences that may call different applications, APIs and bits of scripts - without you having to write code to do so. You don't have to worry about coding the error-prone boring repetitive bits that is common when you automate your processes - pypyr takes care of all of that for you, like input argument parsing, working with configuration files, error handling, automatic retries on failure and running something repeatedly in a loop.

You automate your workflow by creating steps in a pipeline. The term "pipeline" refers to the idea that this is a series of sequential steps, executing one after the other.

The pipeline is simply a yaml file that is friendly for human creation & consumption.

Configure your Dev Environment

pypyr is a Python application. It requires Python >=3.6.

You probably want to install it to a virtual environment, although this is not mandatory. If you want to create a virtual environment, this is the quickest way to do so. Open your terminal and do the following:

$ cd path/to/your/dev/folder/
$ python3 -m venv .env
$ . .env/bin/activate

[TODO: other deps like ffmpeg?]

Install pypyr

From your terminal, install pypyr like this:

$ pip install pypyr

Note: if you are working in a virtual environment, it needs to be active when you run pip install. If you followed the previous step it will be active already.

You can verify that pypyr installed correctly by running one of pypyr's built-in pipelines. It's about pipes, on account of being a pipeline runner:

$ pypyr magritte
Ceci n'est pas une pipe

Specifying your input configuration in a yaml file

We want to automate processing for more than one video and there are different parameters we want to set for each video. Instead of error-prone typing of long input sequences at the terminal, an easy way of specifying this is in a yaml input configuration file.

This will also allow us to create different inputs for different batches of videos with different settings for each.

Create a file in your code or text editor in your dev directory input.yaml

# ./input.yaml
videos:
  - name: video1.mp4
    url: https://arburl/notreal1
    do_thumbnails: True
  - name: video2.mp4
    url: https://arburl/notreal2
    do_thumbnails: False
  - name: video3.mp4
    url: https://arburl/notreal3
    do_thumbnails: True

The exact format of this yaml is arbitrary - we get to decide what we want the structure to look like and which fields we want to include. It's up to us to create a pipeline that understand whatever input we create here.

Creating your automation pipeline

In your favorite code editor, create video-process.yaml in your dev directory.

This is your pypyr pipeline. The entire pipeline will look like this:

# ./video-process.yaml
# run me like this:
# $ pypyr video-process ./input.yaml
context_parser: pypyr.parser.yamlfile
steps:
  - name: pypyr.steps.call
    description: --> loop through videos to process
    foreach: '{videos}'
    in:
      call: process_video
  - name: pypyr.steps.echo
    in:
      echoMe: --> done

process_video:
  - name: pypyr.steps.contextsetf
    in:
      contextSetf:
        current_video: '{i}'
  - name: pypyr.steps.echo
    in:
      echoMe: --> processing {current_video[name]} from {current_video[url]}
  - name: pypyr.steps.cmd
    comment: download video. retries up to 3X if download fails.
    description: --> downloading video
    retry:
      max: 3
    in:
      cmd: echo curl {current_video[url]} -o {current_video[name]}
  - name: pypyr.steps.cmd
    comment: convert to different formats [origin_file_name]-output.[ext]
    description: --> convert video to output formats
    foreach: ['flac', 'mkv', 'webm']
    in:
      cmd: echo ffmpeg -i {current_video[name]} {current_video[name]}-output.{i}
  - name: pypyr.steps.cmd
    description: --> generate thumbnails
    comment: do thumbnails, output name {current_video[name]}-out[n].png, where n counter.
             only run if do_thumbnails is True for this video.
    run: '{current_video[do_thumbnails]}'
    in:
      cmd: echo ffmpeg -i {current_video[name]} -vf fps=1/60 {current_video[name]}-out%d.png
  # TODO: whatever ML provider/api/cli you're using here

parsing the input configuration

The 1st thing we want to do is parse our input configuration file we already created. pypyr uses a context parser to parse inputs and put it into the pypyr context. The pypyr context is a dictionary that is in scope for the entire duration of the pipeline. You use the context to persist and pass values between steps.

In our case, pypyr has a ready-made parser that will read & parse our input yaml configuration file without having to write code.

This is what the context_parser: pypyr.parser.yamlfile first line of the pipeline does. It tells pypyr to use the built-in yamlfile parser to treat your cli input argument as a path to a yaml file to read into context. This lets you run the pipeline like this:

$ pypyr video-process ./path-to-input-config-here.yaml

This way you can dynamically specify different input configuration files each time you run your pipeline.

pipeline entry point

pypyr looks for the steps: group as entry-point to the pipeline.

What we want to do is run a sequence of steps over every video we have specified in our input configuration file. We can group the steps we want to run for each video together under the the process_video key.

We then want to loop over all of our videos from the input configuration and run the entire process_video sequence for each item in the input configuration.

We call the process_video group by using the built-in pypyr call step.

- name: pypyr.steps.call
  description: --> loop through videos to process
  foreach: '{videos}'
  in:
    call: process_video

We can very easily loop over every video in our input by telling pypyr to call process_video for each video by using the foreach decorator to loop through all the input videos.

Let's look at the foreach instruction in detail:

foreach: '{videos}'

pypyr treats anything in between curly braces as a formatting substitution expression. So here, we are telling pypyr to look for videos in the pypyr context. If you check our input.yaml file, you'll see we have a list if videos under the videos key. Remember that all of this will be in the pypyr context because we used the pypyr.parser.yamlfile context parser to load the yaml file path we specify from the cli into context.

pypyr will log the text in the description field to the cli output when we run the pipeline. This is not mandatory, but it is a handy way seeing your pipeline progress as it runs.

process each video

So that is how we are calling the process_video group of steps for each video in our input. Now let's look at the sequence of steps we run for each video:

process_video:
  - name: pypyr.steps.contextsetf
    in:
      contextSetf:
        current_video: '{i}'
  - name: pypyr.steps.echo
    in:
      echoMe: --> processing {current_video[name]} from {current_video[url]}
  - name: pypyr.steps.cmd
    comment: download video. retries up to 3X if download fails.
    description: --> downloading video
    retry:
      max: 3
    in:
      cmd: echo curl {current_video[url]} -o {current_video[name]}
  - name: pypyr.steps.cmd
    comment: convert to different formats [origin_file_name]-output.[ext]
    description: --> convert video to output formats
    foreach: ['flac', 'mkv', 'webm']
    in:
      cmd: echo ffmpeg -i {current_video[name]} {current_video[name]}-output.{i}
  - name: pypyr.steps.cmd
    description: --> generate thumbnails
    comment: do thumbnails, output name {current_video[name]}-out[n].png, where n counter.
             only run if do_thumbnails is True for this video.
    run: '{current_video[do_thumbnails]}'
    in:
      cmd: echo ffmpeg -i {current_video[name]} -vf fps=1/60 {current_video[name]}-out%d.png

working with the current item

In the first step, we use the built-in contextsetf step to set a new context item with some formatting. (The f at the end carries the same meaning as in printf in many programming languages.)

Remember that we are calling this entire step-group from a foreach loop. When we are in a foreach loop, {i} represents the current iterator. We could just use {i[name]} or {i[url]} to refer to fields of the current video throughout this step-group, but for the sake of clarity, let's create a new context item called current_video and assign to it the value of i, which is the current item in the list we are iterating over.

friendly progress indicator

In the next step we use echo to output a friendly status message to the console as the pipeline runs:

- name: pypyr.steps.echo
  in:
    echoMe: --> processing {current_video[name]} from {current_video[url]}

Notice in the substitution expression we are accessing the name and url values from our input yaml file.

downloading the video

You can execute any program available in your current PATH using the built-in pypyr.steps.cmd step.

So 1st, we're using curl to download the video, using the url from our input configuration. The -o flag specifies the file-name curl will save the download as.

- name: pypyr.steps.cmd
  comment: download video. retries up to 3X if download fails.
  description: --> downloading video
  retry:
    max: 3
  in:
    cmd: echo curl {current_video[url]} -o {current_video[name]}

TODO: arbitrary example curl - depending on video source might need to worry about authentication, custom http headers etc.

Because this is the internet and connectivity isn't guaranteed, if anything goes wrong with the download, we tell pypyr to retry it up to 3 times, using the automatic retry decorator:

retry:
  max: 3

converting the video to different output formats

Now that we have downloaded the video, we want to convert it to different output formats.

- name: pypyr.steps.cmd
  comment: convert to different formats [origin_file_name]-output.[ext]
  description: --> convert video to output formats
  foreach: ['flac', 'mkv', 'webm']
  in:
    cmd: echo ffmpeg -i {current_video[name]} {current_video[name]}-output.{i}

TODO: arbitrary untested example ffmpeg command. substitute with actual cmd & tested switches.

Rather than manually have to write out each output conversion step, we're going to use a foreach loop to execute the cmd for each item in the list. Notice that we are using substitution expressions to fill out the argument values we pass to the ffmpeg command, and here the iterator i will refer to flac, mkv or webm depending on where in the loop we are.

generate thumbnails

After all the conversions complete, the next step is to generate thumbnails from our original source video:

- name: pypyr.steps.cmd
  description: --> generate thumbnails
  comment: do thumbnails, output name {current_video[name]}-out[n].png, where n counter.
           only run if do_thumbnails is True for this video.
  run: '{current_video[do_thumbnails]}'
  in:
    cmd: echo ffmpeg -i {current_video[name]} -vf fps=1/60 {current_video[name]}-out%d.png

TODO: arbitrary untested example ffmpeg command. substitute with actual cmd & tested switches.

Here we use pypyr's conditional run decorator only to run this step to generate thumbnails if the do_thumbnails field is boolean True for the current video. If do_thumbnails is False, pypyr will NOT run this step, and output the description to the console for you with an indicator to show you that it is not running this step:

(skipping): --> generate thumbnails

running the pipeline from the cli

To run this pipeline, from the console, do:

$ pypyr video-process ./input.yaml

This tells pypyr to look for the video-process.yaml file in the current directory. pypyr automatically appends the .yaml for you, you don't have to type it out.

If you want to process a different set of videos, you can create a new input configuration file for them and run it the same way:

$ pypyr video-process ./another-input-file.yaml

This way you can re-use your pipeline without having to touch the functional pipeline declaration itself.

running the pipeline from the api

You can also run your pipeline from the pypyr Python api.

Rather than use a yaml config file as an input, we can directly inject a standard Python dictionary into the pipeline. The exact same pipeline can merrily use this dictionary as long as it keeps to the same structure as the yaml config file we made earlier.

import pypyr.pipelinerunner

# prepare a dict to initialize context
input_dict = {
    'videos': [
        {'name': 'video1.mp4',
         'url': 'https://arburl/notreal1',
         'do_thumbnails': True},
        {'name': 'video2.mp4',
         'url': 'https://arburl/notreal2',
         'do_thumbnails': False},
        {'name': 'video3.mp4',
         'url': 'https://arburl/notreal3',
         'do_thumbnails': True},
    ]
}

context_out = pypyr.pipelinerunner.main_with_context(
    pipeline_name='video-process',
    dict_in=input_dict)

See attached video-process.py for full sample.

# ./input.yaml
videos:
- name: video1.mp4
url: https://arburl/notreal1
do_thumbnails: True
- name: video2.mp4
url: https://arburl/notreal2
do_thumbnails: False
- name: video3.mp4
url: https://arburl/notreal3
do_thumbnails: True
import pypyr.pipelinerunner
# prepare a dict to initialize context
input_dict = {
'videos': [
{'name': 'video1.mp4',
'url': 'https://arburl/notreal1',
'do_thumbnails': True},
{'name': 'video2.mp4',
'url': 'https://arburl/notreal2',
'do_thumbnails': False},
{'name': 'video3.mp4',
'url': 'https://arburl/notreal3',
'do_thumbnails': True},
]
}
context_out = pypyr.pipelinerunner.main_with_context(
pipeline_name='video-process',
dict_in=input_dict)
# ./video-process.yaml
# run me like this:
# $ pypyr video-process ./input.yaml
context_parser: pypyr.parser.yamlfile
steps:
- name: pypyr.steps.call
description: --> loop through videos to process
foreach: '{videos}'
in:
call: process_video
- name: pypyr.steps.echo
in:
echoMe: done
process_video:
- name: pypyr.steps.contextsetf
in:
contextSetf:
current_video: '{i}'
- name: pypyr.steps.echo
in:
echoMe: --> processing {current_video[name]} from {current_video[url]}
- name: pypyr.steps.cmd
comment: download video. retries up to 3X if download fails.
description: --> downloading video
retry:
max: 3
in:
cmd: echo curl {current_video[url]} -o {current_video[name]}
- name: pypyr.steps.cmd
comment: convert to different formats [origin_file_name]-output.[ext]
description: --> convert video to output formats
foreach: ['flac', 'mkv', 'webm']
in:
cmd: echo ffmpeg -i {current_video[name]} {current_video[name]}-output.{i}
- name: pypyr.steps.cmd
description: --> generate thumbnails
comment: do thumbnails, output name {current_video[name]}-out[n].png, where n counter.
only run if do_thumbnails is True for this video.
run: '{current_video[do_thumbnails]}'
in:
cmd: echo ffmpeg -i {current_video[name]} -vf fps=1/60 {current_video[name]}-out%d.png
# whatever ML provider/api/cli you're using here
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment