Skip to content

Instantly share code, notes, and snippets.

@amishshah
Last active June 12, 2023 21:16
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amishshah/68548e803c3208566e36e55fe1618e1c to your computer and use it in GitHub Desktop.
Save amishshah/68548e803c3208566e36e55fe1618e1c to your computer and use it in GitHub Desktop.
A guide to the Ogg container format for demuxing opus audio

You take a stream/buffer of binary data. The start of your data is a "Page", which has a header followed by data.

Header

You need to read the following data from the header (see https://xiph.org/ogg/doc/framing.html for more detail)

  • capture_pattern - bytes 0 to 3, must read OggS
  • stream_structure_version - byte 4, must be 0
  • header_type_flag - byte 5, a bitflag that tells you metadata about the page (is it a new packet? is a packet continued here? is it the first/last page of the stream?)
  • absolute granule position - bytes 6 to 13, not needed unless you want seeking
  • stream serial number - bytes 14 to 17, a serial number given to each stream contained in the Ogg file. This is important for playing Ogg files that contain other streams, such as video or cover art. You need to identify and only bother parsing the Opus stream.
  • page sequence no - bytes 18 to 21, the page number
  • checksum - bytes 22 to 25, a checksum (you can ignore this)
  • page_segments - byte 26, the number of segments in the lacing table

Now you have to calculate the size of each packet -- each segment makes up a packet of data, this can be an Opus packet or Opus metadata.

To do this, we use the lacing_table. The lacing_table runs from bytes 27 to the value of page_segments, i.e. a page_segments value of 5 means the lacing_table is 5 bytes long.

To calculate the size of each packet (and therefore the number of packets), we use this algorithm:

lacing values

i = 0 (the position in the lacing table)
packetLengths = []
while i < len(table):
  packetLength = 0
  packetLength += table[i]
  while table[i] == 255:
    i++
    packetLength += table[i]
  packetLengths.push(packetLength)
  
totalBodyLength = sum(packetLengths)
numberOfPackets = len(packetLengths)

To read the first packet, we read the first packetLengths[0] bytes after the lacing table, the second packet is the first packetLengths[1] bytes after the first packet etc.

Now we need to start looking through the data to see if there are Opus Packets to pick out.

We take the first 8 bytes of each packet as header. If header == 'OpusHead' then this bitstream (defined in the page header above) is an Opus stream, and we should only bother deciphering future pages if their bitstream values are the same as this one. If header == 'OpusTags', we're looking at metadata for the file, such as artist and title. We can skip this. The last case is where the header is equal to neither of these, in that case it's an actual Opus Packet we can use!

@0ql
Copy link

0ql commented Mar 18, 2022

Great Explanation, but if I'm not mistaken there is a missing i++ after packetLengths.push(packetLength).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment