Detailed media editing procedure for my thesis
My thesis involves the use of multiple GoPro action cameras, as well as an external microphone, to observe work being done from multiple angles and perspectives. This file outlines how I process all the media that I collect.
GoPro Hero 4 Silver action cameras
Three refurbished GoPro Hero 4 Silver action cameras are used. Each writes to a 64GB microSD card, which I reformatted to the exFAT file system. exFAT limits the size of each file to 4GB, so I have to concatenate the split mp4 files using ffmpeg. Also, GoPro action cameras record notoriously poor quality audio, and attempts to clean the audio may produce varying results depending on the circumstances under which you are recording.
SONY ICD-UX560 audio recorder
I use a SONY ICD-UX560 audio recorder as an additional source for audio input, to fall back on in case cleaning the GoPro audio fails or is insufficient. It comes with 4GB of built-in storage include an additional 32GB microSD card, which I just happened to have on hand. There are also a few sensitivity settings for targetting different breadths and depths of sound (though none of them are specifically meant for recording outdoors). I set it to record using the 16bit 44.1 kHz Linear PCM wav format, which is much higher quality than the default mp3 format. The files that this format produces are bigger, which means that you may have to transfer the files over to your computer more regularly.
Further reading regarding the hardware:
I use three high-capacity (4-5TB) external hard drives to store and organise the media that I collect. I dedicate one drive as a dynamic 'processing' drive, which I use as a working directory for somewhat intensive media editing, such as splitting, concatenating or otherwise re-arranging media in sensible ways. The two other drives are kept as relatively stable havens. One of these is used as a more active working directory that I continuously access throughout my analysis, whereas the other is used as a more stable backup. The processing drive also serves as a secondary backup, but a less reliable one since there is a higher likelyhood of data loss in such a dynamic computing environment, because mistakes happen from time to time. I keep the drives' directory structures identical to reduce the challenge of re-assigning QDA software's access to the media files, in case I need to revert to one of the backups.
I make an extensive effort to use only open source software. FFmpeg (https://www.ffmpeg.org/) is an extremely versatile video editing library, and serves as my primary tool. I also use Audacity (https://www.audacityteam.org/) to clean and arrange audio.
Concatenating original video records
The exFAT file system used on my microSD cards limits the size of each file to 4GB, so I have to concatenate the automatically-split mp4 files using ffmpeg. See the official supporting documentation for concatenating video files at https://trac.ffmpeg.org/wiki/Concatenate.
This method requires some preparation. You need to create a text file, which we can can simply name
mylist.txt, which includes a list of files to be concatenated, in the order in which they are to be merged. The contents of
mylist.txt should look something like this:
file 'file1.mp4' file 'file2.mp4' file 'file3.mp4' ...
It is important to point out that the proper order is not necessarily the way that your finder window orders them. Though I recommend that you check the files manually, the GoPro camera creates files using a predictable file naming scheme. The first file is named something like
GOPR0323.MP4, second file is
GP010323.MP4, third is
GP020323.MP4, with the 4th character increasing with every file that is part of the same series. A new series of files starting with
GOPR0324.MP4 will be initiated after recording is stopped and restarted.
After creating your list of videos to be concatenated, run the following command:
ffmpeg -f concat -safe 0 -i mylist.txt -c copy output.mp4
This reads the list, identifies the locations of each video file based on the path included in
mytext.txt, and combines them into
output.mp4. The output video file is a bit smaller than the combined size of the input video files for reasons that I do not currently understand, but the quality of the output is virtually indestinguishable from the the inputs.
Unifying videos using a split screen effect
ffmpeg can also be used to combine the multiple perspectives into a single view. Lets say we have 3 videos, each of variable length and with different start times:
- input1.mp4 starts first
- input2.mp4 starts second
- input3.mp4 starts third
The first command merges the first two videos, and produces out.mp4:
ffmpeg -i input2.mp4 -i input1.mp4 -i input2.mp4 -filter_complex "trim=0:13.0, drawbox=c=black:t=fill[delay];[delay] concat[left];[left]hstack[v]" -map "[v]" -map 1:a -c:a copy out.mp4
As a second stage, out.mp4 is then merged with input3.mp4 and produces final.mp4:
ffmpeg -i input3.mp4 -i out.mp4 -i input3.mp4 -filter_complex "trim=0:21.8, drawbox=c=black:t=fill[delay];[delay] concat[left];[left]hstack[v]" -map "[v]" -map 1:a -c:a copy final.mp4
The timecodes after
trim= needs to be modified to represent the times in input1.mp4 and out.mp4 when input2.mp4 and input3.mp4 should cut in, respectively.
The result is a series of 3 horitontally aligned videos. The one of the right starts first, and the ones in the middle and on the left start at the specified times. There are no borders between the videos.
Appending black frames
Since each video used as inputs for the split-screen effect have variable lengths, the ones that finish earlier than the end of the video need to be filled with black frames corresponding with the remaining amount of time. Moreover, in cases where the camera was turned off and then on again, black frames have to be inserted between the two recorded segments, with a length corresponding with the span of the gap in the record. So we need to identify points over overlap between each segment, and determine at which points each segment begins and ends. This allows us to calculate the timespans of the gaps to be filled. Here is a sample of the notes I take when figuring this out:
2/output1 is broken 1/output1 begins at 0:00:00 3/output1 begins when 1/output1 reaches 0:05:06.85 = 306.85 2/output2 begins when 1/output1 reaches 0:06:13.35 = 373.35 1/output2 begins when 2/output2 reaches 0:04:33.65 = 274.1 break 1/output1 length: 0:06:59 = 419 1/output2 length: 0:10:05 = 605 3/output1 length: 0:17:32 = 1052 2/output2 length: 0:19:54 = 1194 1/output1 ends: 0:06:59 = 419 1/output2 ends: 605 + 647.45 = 1252.45 3/output1 ends: 1052 + 306.85 = 1358.85 2/output2 ends: 1194 + 373.35 = 1567.35 Black1 begins: 419 Black1 ends: 274.1 + 373.35 = 647.45 Black1 length: 647.45 - 419 = 228.45 Black2 begins: 1252.45 Black2 ends: 1567.35 Black2 length: 1567.35 - 1252.45 = 314.9
The following example demonstrates how to use
-f lavfi to create the black frames as you concatenate them to the original video segments:
ffmpeg -i /Volumes/[raw]/CASE12017/June16/1/output1.mp4 -f lavfi -i color=s=1280x720:d=228.45 -filter_complex "[0:v]concat" -af apad -shortest out1.mp4
Change the value following
d= to the number of seconds that need to be filled with black frames. Ensure that the aspect ratio matches the original video.
If the black frames are to be sandwiched between two recorded segments, simply concatenate the output of the prior command to the second segment:
ffmpeg -i /Volumes/[concat]/case1june162017y/out1.mp4 -i /Volumes/[raw]/CASE12017/June16/1/output2.mp4 -filter complex “[0:v:0] [1:v:0] concat=n=2:v=1:a=1 [v] [a]” -map “[v]” -map “[a]” out2.mp4
Creating blank segments with just text
I like to include a 5 second title at the start of each video to indicate what I am about to watch. Similarly, in cases where all cameras are off, I like to include a brief explanation that indicates why there is a gap in the footage. I also like to include a moment at the end that indicates that the video is over.
The following example demonstrates how to create blank segments with overlaid text:
ffmpeg -f lavfi -i color=c=black:s=3840x720:d=5 -vf \ "drawtext=fontfile=/Volumes/[concat]/open-sans/OpenSans-Bold.ttf:fontsize=70:fontcolor=white:x=(w-text_w)/2:y=(h-text_h-text_h)/2:text='9:30 AM break', \ drawtext=fontfile=/Volumes/[concat]/open-sans/OpenSans-Bold.ttf:fontsize=70:fontcolor=white:x=(w-text_w)/2:y=(h+text_h)/2:text='All cameras turned off'" \ /Volumes/[concat]/930-break.mp4
Change the font file, size, colour and the text itself to whatever you want. The part following
y= indicates each line's vertical alignment. The value following
d= specifies the clip's duration in seconds. Ensure that the aspect ratio matches your desired result. See https://ffmpeg.org/ffmpeg-utils.html#Color for list of supported colour names.
After each tripartite video is compiled and clips with overlaid text have been generated, it's time to concatenate all these pieces together. We will use a different method that re-encodes video stream, however, because the videos may have different properties. This method is slower than the process used to concatenate the original 4GB clips, but results in a more stable output.
ffmpeg -i /Volumes/[concat]/case1june162017y/case1june162017title.mp4 -i /Volumes/[concat]/case1june162017y/out6.mp4 -i /Volumes/[concat]/case1june162017y/break.mp4 -i /Volumes/[concat]/case1june162017y/out12.mp4 -i /Volumes/[concat]/case1june162017y/break.mp4 -i /Volumes/[concat]/case1june162017y/out15.mp4 -i /Volumes/[concat]/case1june162017y/end.mp4 -filter_complex "[0:v:0] [1:v:0] [2:v:0] [3:v:0] [4:v:0] [5:v:0] [6:v:0] concat=n=7:v=1[v]" -map "[v]" out16.mp4
Each segment is included as an input file, in the order in which you want them to appear in the output file. Set the value following
concat=n= as the number of inputs that will be processed, and include as many instances of
[n:v:n] as their are inputs. Remember, the first input is denoted by 0 instead of 1.
Additionally, the above example excludes audio streams from the output file, which will be added through a subsequent process.
A unified audio track now has to be generated, and then mapped onto the video stream that was just created. We will do this by extracting the audio streams from the video segments, and then lining up the tracks according to the timeline originally devised to overlap the videos.
To extract the audio streams from the videos:
ffmpeg -i /Volumes/[raw]/CASE12017/June16/1/output1.mp4 -vn -acodec copy 1-output1.m4a ffmpeg -i /Volumes/[raw]/CASE12017/June16/3/output1.mp4 -vn -acodec copy 3-output1.m4a ffmpeg -i /Volumes/[raw]/CASE12017/June16/2/output2.mp4 -vn -acodec copy 2-output2.m4a ffmpeg -i /Volumes/[raw]/CASE12017/June16/1/output2.mp4 -vn -acodec copy 1-output2.m4a
We also need to generate a 5 second blank audio track to fill in the gaps over the blank videos:
ffmpeg -f lavfi -t 5 -i anullsrc 5sec-silence.m4a
Now we'll use the timeshift tool in audacity to move tracks along the visualized timeline. Import each m4a file one by one, and align them based on the intersections calculated in your. Some tracks will overlap with others, and there may be a slight echo in the final unified track. Don't forget to include the 5 second black tracks at the appropriate times.
After selecting all of the channels, select 'Export Audio' from the File menu and save the track as a m4a file to the working directory. You will be notified that this will unify all tracks into a single file, which is what we want.
Now map the merged audio track onto the video with no audio and give the output file a useful name.
ffmpeg -i /Volumes/[concat]/case1june162017y/out16.mp4 -i case1june162017.m4a -codec copy -shortest CASE1-June162017.mp4
Here are some tidbits that I've picked up along the way.
To simply cut a video:
ffmpeg -ss 00:01:00 -i input.mp4 -to 00:02:00 -c copy output.mp4
-i:This specifies the input file. In that case, it is (input.mp4).
-i, this seeks in the input file (input.mp4) to position.
00:01:00:This is the time your trimmed video will start with.
-to:This specifies duration from start (00:01:40) to end (00:02:12).
00:02:00:This is the time your trimmed video will end with.
-c copy:This is an option to trim via stream copy. (NB: Very fast)
- The timing format is:
This does not re-encode the video. Removing
-c copy will re-encode, but is much slower.
Open file.mp4 in audacity. Select a clip that will be used as a sample of the waveform you want to reduce. Ideally, this clip should be consistent and representative of a single source of background noise that you want to reduce throughout the entire video. This is not really that reliable for audio recorded outdoors, especially in windy settings, where the noise is typically inconsistent.
After selecting the clip, go to Effect >> Noise Reduction and select 'Get Noise Profile', then press 'OK'.
Close the noise reduction menu, select the entire video using the keyboard shortcut Command + A. Then go back to the noise reduction window (Effect >> Noise Reduction) to apply the filter based on the noise profile you just identified.
Export the modified audio as a file.wav (File >> Export >> Export as .WAV), and save it to the working directory.
Use ffmpeg to replace the dirty audio track with the clean one:
ffmpeg -i dirty.mp4 -i clean.wav -c:v copy -map 0:v:0 -map 1:a:0 clean.mp4
-map 0:v:0maps the first (index 0) video stream from the input to the first (index 0) video stream in the output.
-map 1:a:0maps the second (index 1) audio stream from the input to the first (index 0) audio stream in the output.
If the audio is longer than the video, you will want to add
-shortest before the output file name.
Not specifying an audio codec will automatically select a working one. You can specify one by adding
-c:a libvorbis after
See https://superuser.com/questions/1137612/ffmpeg-replace-audio-in-video for some more details regarding these additional options.
Do everything within a single working directory. If you mess up and need to restart, and if you have space on the drive, create an additional working directory and work in that one without deleting the original files. Similarly, keep an additional set of notes for each attempt, and keep them tidy. Comparing the results from previous attempts can help clarify what exactly went wrong, and can help you better understand how FFmpeg works if you're learning as you go like I am.
If you have access to a computer that can be run independently and continuously, it can be used to automate various tasks. Plan out each command in advance and paste them into a text file, in the order in which they are to be completed. Separate each one by a semicolon. Once all commands have been written and pasted into a single block, copy the entire block and paste it into the terminal prompt, and then hit enter. When one command finishes, the next one will begin. Incomplete or incompatible commands will be skipped over, so make sure to write them out properly. Simple errors or typos may cause hiccups, so it may be necessary to supervise the automation from time to time to ensure that things are going smoothly. If the processes are being run remotely, simply SSHing into the remote computer, and then listing files in the working directory with the
-lhflags, may suffice, to examine whether the unified block of commands was halted for for any reason or whether certain commands might have been skipped over.
Processing video is both time intensive and computationally expensive. Be patient, and let commands run their course. It might also be a good idea to cut long videos into smaller clips to test things out as a dry run.
Run as few instances of video editing or playback processes as you can in order to process files quicker. Quit VLC, close your web browser and refrain from using graphics-intensive applications, particularly any applications built using electron (see https://josephg.com/blog/electron-is-flash-for-the-desktop/). Free up as much CPU capacity as you can manage.