Skip to content

Instantly share code, notes, and snippets.

@zevaverbach
Last active March 8, 2019 03:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zevaverbach/d2b7a19397607677878aa3268fda1002 to your computer and use it in GitHub Desktop.
Save zevaverbach/d2b7a19397607677878aa3268fda1002 to your computer and use it in GitHub Desktop.
Spec for Universal Transcript

Universal Transcript Format

This is my first crack at spec'ing out the fields in the wordObjects contained in a "universal transcript" as produced by a machine, a human, or both, from human speech.

Fields in a wordObject

word

This is the word that the transcriber thinks is spoken.

speakerID

This is a string or integer identifier of who is speaking. It can also be undefined.

start

This is when in a recording the word is uttered, expressed as a decimal/float. Two decimal points should be sufficient for most applications. For individual words, only a machine will produce reliable values for this field.

end

This is when the word ends. Only a machine will produce a reliable value for this.

confidence

This is how confident, from 0 to 1 (two decimal points) the transcriber is in the word value. In most cases this should be set to 1 for a human transcriber, though a human could use brackets or a similar convention for indicating a word or phrase that they"re less than 100% confident in, which could be parsed to something less than 1 here.

alwaysCapitalized

This helper field is for keeping capitalization accurate when a UI-rendered transcript is edited, and, for example, a proper noun -- or perhaps the word "I" -- changes from being the first word in a sentence to being the second word. The rendering code could then keep that proper noun capitalized despite that it no longer begins the sentence, by taking into consideration an alwaysCapitalized value of true.

puncAfter

An array containing any punctuation that should be appended to a given word. An array should be used here because there are scenarios where there may be more than one punctuation character occuring after a word, e.g. "'Hi.'" Using a field such as this instead of putting punctuation into its own wordObject makes sense if you consider that punctuation would generally not have a start, end, or confidence value, though changing the rendering of surrounding words might be a little more complex this way -- for example if a period were changed to a comma, necessitating a change to the rendering of the next word in a transcript (to un-capitalize it if appropriate).

puncBefore

An array containing any punctuation that should prepend a given word. See puncAfter for applications and tradeoffs.

Example

{"transcript": [
    {
      "word": "Hi",
      "speakerID": 1,
      "start": 1.23,
      "end": 1.56,
      "confidence": .98,
      "alwaysCapitalized": false,
      "puncAfter": [],
      "puncBefore": []
    },
    {
      "word": "there",
      "speakerID": 1,
      "start": 1.56,
      "end": 1.85,
      "confidence": 1,
      "alwaysCapitalized": false,
      "puncAfter": ["."],
      "puncBefore": []
    },
    ...
]}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment