Skip to content

Instantly share code, notes, and snippets.

@arielpontes
Last active November 10, 2015 23:44
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arielpontes/9028811ebbf49f98f149 to your computer and use it in GitHub Desktop.
Save arielpontes/9028811ebbf49f98f149 to your computer and use it in GitHub Desktop.
WebVTT notes

WebVTT is a W3C standard for displaying timed text in HTML5. Its specification is currently (as of May 2015) in draft stage and therefore not all features are implemented by major players, especially when it comes to positioning. Since there are no recent summaries of WebVTT positioning online and the specification cannot be used by any sane person for quick reference, I'll make a short summary here.

#WebVTT Positioning

Let's start with an example. This file contains the main positioning parameters supported by WebVTT:

WEBVTT

00:00:01.000 --> 00:00:05.000 position:75% align:middle
These captions test some features of the WebVTT formats 

00:00:06.000 --> 00:00:10.000 line:5%
This cue is positioned at the top of the video

00:00:11.000 --> 00:00:15.000 position:5% align:start
This cue is positioned at the left side of the video.

00:00:16.000 --> 00:00:20.000 position:99% align:end
And this one ate the right side.

00:00:21.000 --> 00:00:25.000 size:33%
This cue is only a third of the width of the video, hence the multiple line breaks.

*Source: jwplayer's website

Some terms

  • Cue: an individual caption that has its own timing
  • Cue box: the box (rectangle) surrounding the actual text that will appear on the screen for that cue

Positioning attributes

All these features are supported consistently in Firefox, Chrome, Safari and Opera.

###align:[start|left|middle|right|end] default: middle

Whether the text is aligned to the start, left, middle (the default), end or right. This is related to the direction of writing in different languages. For English and most Western languages, start and left will be the same. For Hebrew or Arabic, for example, start will instead be the same as right.

###position:[N%][,start | ,middle | ,end] default: depends on align (read list bellow)

(see specs)

*Safari, however, defaults to 50% no matter what the align is.

Depending on the alignment, the position will mean different things. If the computed alignment is:

  • start/left: then N will determine how far the left side cue box will be from the left side of the video in percentage. For example: if the position is "5%", then the caption will appear after 5% of the width of the video (like in the 3rd example in the sample .vtt file).
    • The default position in this case is 0%
  • middle: then N will determine not where the box starts, but where the middle of the box is.
    • The default position in this case is 50%
  • end/right: then, as expected, N will determine where the right side of the cue box will be.
    • The default position in this case is 100%

The ,alignment parameter was added more recently to the specs and is only supported by Firefox at the moment. It overrides the computed alignment, so you can for example position the cue box relative to its left side while the text is still aligned to the right within the box.

This cue, for example:

00:00:01.000 --> 00:00:05.000 position:62.5% size:75%
Some text

On Firefox can be written as:

00:00:01.000 --> 00:00:05.000 position:25%,start size:75%
Some text

Although the new syntax is more intuitive (you see right away that the cue box stretches from 25% to 100%, while with the old syntax you have to do the math), it doesn't express any positioning that cannot also be expressed with the old syntax.

###line:[N|N%]

N defines how many lines to skip before showing the captions (from the top). N% defines what percentage of the height of the video should be skipped before showing the caption. Defaults to 100%

###size:[N%] N% determines the width of the cue box as a percentage of the full width of the video element. Defaults to 100%.

Regions

This feature is not supported by any of browsers I used to test it (Firefox, Chrome, Safari and Opera).

From the official documentation:

This example shows two regions containing rollup captions for two different speakers. Fred's cues scroll up in a region in the left half of the video, Bill's cues scroll up in a region on the right half of the video. Fred's first cue disappears at 12.5sec even though it is defined until 20sec because its region is limited to 3 lines and at 12.5sec a fourth cue appears:

WEBVTT
Region: id=fred width=40% lines=3 regionanchor=0%,100% viewportanchor=10%,90% scroll=up
Region: id=bill width=40% lines=3 regionanchor=100%,100% viewportanchor=90%,90% scroll=up

00:00:00.000 --> 00:00:20.000 region:fred align:left
<v Fred>Hi, my name is Fred

00:00:02.500 --> 00:00:22.500 region:bill align:right
<v Bill>Hi, I'm Bill

00:00:05.000 --> 00:00:25.000 region:fred align:left
<v Fred>Would you like to get a coffee?

00:00:07.500 --> 00:00:27.500 region:bill align:right
<v Bill>Sure! I've only had one today.

00:00:10.000 --> 00:00:30.000 region:fred align:left
<v Fred>This is my fourth!

00:00:12.500 --> 00:00:32.500 region:fred align:left
<v Fred>OK, let's go.

Note that regions are only defined for horizontal cues.

Removing the Region lines made no difference in Chrome and Opera. It did in Safari, but the behavior doesn't look at all like what's described in the documentation. Fred's and Bill's cues appear both on the left side of the video. The streams start at different heights and after one point the ones on top start appearing behind the old ones that started lower. Firefox didn't show captions at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment