davejachimiak/Spike: Audio layering.md

## Spike: Audio layering.md

      
    Raw
  

              Spike: Audio layering.md
            
          
    Spike: Audio layering

Research Objectives

Because render pipeline has support for layering audio, we need to research what it will take to support this in the UI - like adding a background track, and adjusting the volume on both the main track and background track.
Engineering Approach


 Figure out what payload the Render Pipeline contract expects

Answer: RP is expecting an audioMix type, an input, multiple audio inputs -- an example of that being below.


 Test hardcoding the input of a background track to supply RP and make sure you're getting the expected output.

Demo of the background track on client-side here.


 Stretch goal: build a rough prototype that would allow the user to manually dial in the right volume for the background and main track at the clip level

Notes


Initial approach: I think it would be helpful to organize the video’s input on the same level as the audio inputs in the Edit Tree, that way on the FE we can filter the background audio and display it below the timeline clip.

Result: this is really hard to implement on the render pipeline. The possibility of different types for the main input will make it ambiguous as to what the actual inputs are. If the type is video, RP would need to look for input. If the type is concat , RP would need to look for inputs, which would basically make for a re-implementation of a bunch of logic that happens at the level of the parsing the whole editTree , and would create a bunch of unneeded complexity.


Next approach: supply RP the overlay prop and on the FE filter out the nodes that have this prop and audio type.

Example hardcoded JSON below. The results of this approach being:

concatNode: two video clips of the same video file will be concat’d together.
newTree: is where we're adding the overlay prop and the url of the audio.


// RENDER PIPELINES CODE EXPECTIONS

const tree = {
 type: 'audio_mix',
 videoInput: {
   type: 'volume',
   start: 0,
   transitions: [
     {
       time: {
         start: 0,
         end: 3000000
       },
       curve: {
         start: 0.0,
         end: 0.0
       }
     },
     {
       time: {
         start: 3000000,
         end: 6000000
       },
       curve: {
         start: 0.0,
         end: 1.0
       }
     }
   ],
   input: {
     type: 'concat',
     inputs: [
       {
         type: 'clip',
         range: {
           start: 0,
           end: 67469860,
         },
         input: {
           type: 'video',
           duration: 183059000,
           uri: 'https://embed-fastly.wistia.st/deliveries/690a36d29613bbadb70d00d24a943c854284957e.bin',
           preprocessStrategy: 'disable',
         },
       },
       {
         type: 'clip',
         range: {
           start: 69966870,
           end: 183059000,
         },
         input: {
           type: 'video',
           uration: 183059000,
           uri: 'https://embed-fastly.wistia.st/deliveries/690a36d29613bbadb70d00d24a943c854284957e.bin',
           preprocessStrategy: 'disable',
         }
       }
     ]
   }
 },
 audioInputs: [
   {
     input: {
       type: 'volume',
       input: {
         type: 'audio',
         uri: 'https://embed-ssl.wistia.com/deliveries/9fbcd194ca4461bea6adcc38b5e20bc9bc752912.bin',
       },
       transitions: [
         {
           time: {
             start: 0,
             end: 3000000
           },
           curve: {
             start: 1.0,
             end: 1.0
           }
         },
         {
           time: {
             start: 3000000,
             end: 6000000
           },
           curve: {
             start: 1.0,
             end: 0.1
           }
         }
       ]
     },
     start: 0
   }
 ]
}

// DIFFERENT APPROACH FOR AUDIO OVERLAYS FOR THE PREVIEW ENGINE (MAY NOT FIT WITH RP)

const concatNode = {
   type: 'concat',
   inputs: [
     {
       type: 'clip',
       range: { start: 55147229, end: 119644632 },
       input: {
         type: 'video',
         duration: 183059000,
         uri: 'https://embed-fastly.wistia.st/deliveries/690a36d29613bbadb70d00d24a943c854284957e.bin',
         preprocessStrategy: 'segment',
         mediaData: mediaData,
       },
     },
   ],
 };

 const newTree = {
   type: 'audio_mix',
   start: 0,
   audioInputs: [{
     type: 'audio',
     uri: 'https://embed-ssl.wistia.com/deliveries/9fbcd194ca4461bea6adcc38b5e20bc9bc752912.bin',
   }],
   videoInput: concatNode,
 };

 playerHandle.setEditTree(newTree);
Questions


On the client-side, how do I combine the example audio mix with the example clips? On the server-side, practically speaking, how can we take both video and audio as inputs to type=audio_mix? On the client-side, how do we preview the mixes?

All answered here by Max.


Should preprocessing support pure "audio" types?
How can we ensure that there is a visual difference between the main timeline clip and the background audio?

Answer: check with Pin-Bo.


Conclusion [WIP]

For volume gains and testing on the RP side, we'll use the mix_audio operation for RP, and update it so it selects a video stream too. Any further gains work will involve minor changes to render pipeline and the preview ops.