WebVTT is a W3C standard for displaying timed text in HTML5. Its specification is currently (as of May 2015) in draft stage and therefore not all features are implemented by major players, especially when it comes to positioning. Since there are no recent summaries of WebVTT positioning online and the specification cannot be used by any sane person for quick reference, I'll make a short summary here.
#WebVTT Positioning
Let's start with an example. This file contains the main positioning parameters supported by WebVTT:
WEBVTT
00:00:01.000 --> 00:00:05.000 position:75% align:middle
These captions test some features of the WebVTT formats
00:00:06.000 --> 00:00:10.000 line:5%
This cue is positioned at the top of the video
00:00:11.000 --> 00:00:15.000 position:5% align:start
This cue is positioned at the left side of the video.
00:00:16.000 --> 00:00:20.000 position:99% align:end
And this one ate the right side.
00:00:21.000 --> 00:00:25.000 size:33%
This cue is only a third of the width of the video, hence the multiple line breaks.
*Source: jwplayer's website
- Cue: an individual caption that has its own timing
- Cue box: the box (rectangle) surrounding the actual text that will appear on the screen for that cue
All these features are supported consistently in Firefox, Chrome, Safari and Opera.
###align:[start|left|middle|right|end] default: middle
Whether the text is aligned to the start, left, middle (the default), end or right. This is related to the direction of writing in different languages. For English and most Western languages, start and left will be the same. For Hebrew or Arabic, for example, start will instead be the same as right.
###position:[N%][,start | ,middle | ,end] default: depends on align (read list bellow)
*Safari, however, defaults to 50% no matter what the align is.
Depending on the alignment, the position will mean different things. If the computed alignment is:
start
/left
: then N will determine how far the left side cue box will be from the left side of the video in percentage. For example: if the position is "5%", then the caption will appear after 5% of the width of the video (like in the 3rd example in the sample .vtt file).- The default position in this case is
0%
- The default position in this case is
middle
: then N will determine not where the box starts, but where the middle of the box is.- The default position in this case is
50%
- The default position in this case is
end
/right
: then, as expected, N will determine where the right side of the cue box will be.- The default position in this case is
100%
- The default position in this case is
The ,alignment
parameter was added more recently to the specs and is only supported by Firefox at the moment. It overrides the computed alignment, so you can for example position the cue box relative to its left side while the text is still aligned to the right within the box.
This cue, for example:
00:00:01.000 --> 00:00:05.000 position:62.5% size:75%
Some text
On Firefox can be written as:
00:00:01.000 --> 00:00:05.000 position:25%,start size:75%
Some text
Although the new syntax is more intuitive (you see right away that the cue box stretches from 25% to 100%, while with the old syntax you have to do the math), it doesn't express any positioning that cannot also be expressed with the old syntax.
###line:[N|N%]
N defines how many lines to skip before showing the captions (from the top). N% defines what percentage of the height of the video should be skipped before showing the caption. Defaults to 100%
###size:[N%] N% determines the width of the cue box as a percentage of the full width of the video element. Defaults to 100%.
This feature is not supported by any of browsers I used to test it (Firefox, Chrome, Safari and Opera).
From the official documentation:
This example shows two regions containing rollup captions for two different speakers. Fred's cues scroll up in a region in the left half of the video, Bill's cues scroll up in a region on the right half of the video. Fred's first cue disappears at 12.5sec even though it is defined until 20sec because its region is limited to 3 lines and at 12.5sec a fourth cue appears:
WEBVTT
Region: id=fred width=40% lines=3 regionanchor=0%,100% viewportanchor=10%,90% scroll=up
Region: id=bill width=40% lines=3 regionanchor=100%,100% viewportanchor=90%,90% scroll=up
00:00:00.000 --> 00:00:20.000 region:fred align:left
<v Fred>Hi, my name is Fred
00:00:02.500 --> 00:00:22.500 region:bill align:right
<v Bill>Hi, I'm Bill
00:00:05.000 --> 00:00:25.000 region:fred align:left
<v Fred>Would you like to get a coffee?
00:00:07.500 --> 00:00:27.500 region:bill align:right
<v Bill>Sure! I've only had one today.
00:00:10.000 --> 00:00:30.000 region:fred align:left
<v Fred>This is my fourth!
00:00:12.500 --> 00:00:32.500 region:fred align:left
<v Fred>OK, let's go.
Note that regions are only defined for horizontal cues.
Removing the Region
lines made no difference in Chrome and Opera. It did in Safari, but the behavior doesn't look at all like what's described in the documentation. Fred's and Bill's cues appear both on the left side of the video. The streams start at different heights and after one point the ones on top start appearing behind the old ones that started lower. Firefox didn't show captions at all.