canibanoglu/gapless_update.md

## gapless_update.md

      
    Raw
  

              gapless_update.md
            
          
    What is this about?

Even though most of us have a general idea , I still think that a problem description is a good thing.
We want to achieve gapless playback on our apps. This is especially important for classical music (at least I think so) because there are works that actually don't have any stops between movements. Currently, we have a best-effort approach to having a gapless playback experience for users. What this means is that we preload the audio but we're not doing anything to ensure gapless playback.
This is where Media Source Extensions (MSE) API comes in. You can read more about MSE API here but in short it gives the developers a finer level of control over what gets played by the <audio> tag. This really gives the developers a lot of power and enables us to directly manipulate the audio stream. An important implication of this is that we can make sure that there is only one audio stream and append next audio data for the next track to that stream. As far as the <audio> tag is concerned, nothing has changed, it is still playing the same audio stream.
We would like to leverage this API to have gapless playback in our app.
If you want to join the (very scarce) conversation, you can join #gapless-playback and #gapless-webtop channels on Slack.
The branch name is research/gapless.
What are we researching?

As all good and new things, there are limitations on what we can do with this API. For example, the files we serve have to be encoded in a specific way so that we have playback on all browsers. Indeed, all client teams have to check different options so we can decide on an encoding scheme that supports lossless content consumption as well as lossy content consumption. (There is a Google spreadsheet to keep track of playback support across clients with different format, you can check it out here.) We need to find a set of encoding presets that will cover all clients. Moritz told me that aac-mp4-both and flac-mp4-ffmpeg from the spreadsheet form the lowest common denominator we're aiming for. FLAC audio in MP4 containers is a very new thing to do (apparently, I didn't know this when I took this over from Vlad) and we need to research under what conditions we can make the MSE API work.
This is a low level API and if you opt to use it, you have to implement all functionality yourself, functionality that we don't normally think much about, like going to the next or previous track, buffering content etc. We need to research what is already out there (player implementations leveraging MSE API) that will satisfy our requirements. If nothing meets our criteria, we need to research if it is possible to modify the available open source players in a way that would make them acceptable in our app. And as a last resort we need to weigh the option of writing a new player ourselves.
What have I done so far?

After spending a couple of days wrapping my head around all of this (which I still am not sure I achieved), I tried to answer the first question, can we get  aac-mp4-both and flac-mp4-ffmpeg to play on all major browsers? Then I started researching different player implementations and tried to get them to work with a something that will be similar to our setup.
Encoding issues

When I took this over from Vlad, he had already done most of what you can see on the branch. There was a working file for FLAC in MP4 container (src/assets/audio/flac.mp4 on the branch). The next step was to see if this implementation would work with the test files generated by Moritz (you can find links to these files here).
... aaaaaaand it didn't work. The files have to be prepared such that they are compliant with the ISO BMFF Byte Stream Format. The funny thing is Moritz assured me that the initial files that were there already in that format (the files have since changed, more on this later).
Investigating this further (I found out about chrome://media-internals), I found out that MSE API is a bit more picky about the encoding parameters.  Here's a screenshot of the error that was happening with those files.

The encoding was done with the following commands (thanks to Moritz):
ffmpeg -i ${input_file} -nostats -vn -ar 44100 -f flac -map_metadata -1 -c:a flac -acodec copy -f mp4 -strict -2 tmp

ffmpeg -nostats -vn -i tmp -c copy -map 0 -movflags +faststart -strict -2 out/${series}/flac-mp4-ffmpeg/${series}-flac-mp4-${bn}.mp4
(If you want to play along check out the How do I play along? section )
I checked with Vlad and he sent me parameters he used to prepare the flac.mp4 file and they were as follows:
ffmpeg -i data/input/example.flac -acodec copy -vn -y -strict -2 -movflags frag_keyframe+empty_moov+default_base_moof data/output/example-flac-in-mp4.mp4
The difference is the movflags, the frag_keyframe+empty_moov+default_base_moof parameters are essential. Once we re-encoded the files (and updated them on S3), we checked again and it all worked!
Right now you can find these files under src/assets/audio/originals.
FLAC in Safari doesn't work so I had to find a way to get aac-mp4-both files to work there. At first this didn't work out but the solution turned out to be use to correct codec name for the files (which is mp4a.40.2 for the record. This link coupled with ffprobe was really useful for finding out the correct ).
Player implementation research

This is where I spent the bulk of my time this week. I thought we'd just go with the custom player implementation route and I started looking into other players to see what they were doing for buffering. A couple of hours was enough to convince me that writing our own player should be a last resort (however epic it sounds, there is a part of me that is rooting for everything else failing, stupid as it is). Realistically, coming up with a full featured player well built and tested enough to present to users is not a small undertaking and should be avoided if possible in my opinion. On a side note, I have amassed a lot of links about different approaches, if you are interested I would be happy to share. I am still planning to take on a custom player project as a personal project in the future.
I then started to evaluate different player implementations. There are two major player implementations that work with MSE API, dash.js and Shaka Player (that is a seriously idiotic name). They are both very well built players with support for a lot of functionality (many of it pretty much unneeded by us).
The first problem with these players is that they are actually full-fledged players, capable of adaptive bitrate and video playback, none of which we need. Using these players would bring with it unnecessary fluff unless we decide to fork and remove the unnecessary stuff.
The second problem is how we serve the content. All of the audio content we want to play is served from CDN. For making chunked requests, we rely on the Range header on our requests and we get a correct response in return. If you are playing along, you can see that the requests made by the player on /gapless all have a Range header. So in effect our endpoints will be serving a single file and chunking will be done with the Range header. Both players mentioned above support playback using manifest files (MPD). These are files that describe how a player can access the content it's trying to play. It can tell the player how the data is chunked and how it should try to get those chunks. I generated MPD files and played around with several configurations and I managed to get dash.js to play audio.
This is where things get hairy. I feel like Shaka is the more complete player and I have already found some content about achieving gapless playback with it. The thing is it doesn't support what we will have for serving the content.
What exactly are you talking about?

MPD files have a way of describing how the media is segmented. MPD files are just XML files and they have a section called Representation. (Open an .mpd file in a text editor to check it all out). Here you describe how the content looks and how it can be accessed and what the data means if accessed in those manners. For buffering, content is downloaded in chunks, or Segments as they are called in MPD files. There are three different ways of referencing Segments: SegmentList, SegmentBase and SegmentURL (more info here).
Long story short, we need to use SegmentList with mediaRange attributes so that it works correctly with our CDN setup. Bad news is that, Shaka developers have already said that they don't and won't support this (see here).
This is where I am right now. The following parts will outline what I plan on doing from here.
Find out if gapless playback is easily achievable in dash.js

This is something that is still not clear to me. I have found content that it is possible with Shaka by changing the manifest file on the fly and appending new Periods but I don't know if this is possible with dash.js. To be frank, I didn't yet check if this would even work with Shaka.
The thing is this would be the most favorable outcome for us as it would involve less work on our part (no, I'm not lazy!). We would still need write some custom code to manipulate the manifest in a clever way to ensure gapless playback.
Shaka deep dive

Failing above, I will look into modifying Shakas SegmentParser module in a way that would be compatible with this type of MPD file. I have looked a bit into this but I am not that comfortable with the whole specification to claim that I understood what the hell was happening.
What happens if you don't use SegmentList with Shaka?

Well, the kind of MPD files supported by Shaka make requests to separate endpoints. If your file has 10 chunks, the player will make requests to endpoints like ...1.m4s, ...2.m4s and so on. This requires separate files and while this is very easy to achieve with a tool like MP4Box I'm not sure if this is acceptable for us, as this would result in duplicate content just for webtop. This should be cleared up with Moritz.
Alternative scenarios that I want to try

Before going the full commando way and embarking on a custom player journey, I would still like to try if I can somehow trick the player to think it's making requests to separate files. So far I have two ideas: The one that I favor is to have a service worker that intercepts requests made by Shaka and transforms them to requests supported by our CDN with Range header and the other is to write a gateway on the server side to make this transformation.
If these fail and we still can't come up with a way of getting an open source player to work for us, then that small part in me will rejoice in happiness and we will go ahead with the custom player implementation.
How do I play along?

Coming soon!