Skip to content

Instantly share code, notes, and snippets.

@joyrexus
Created May 9, 2014 16:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joyrexus/524d2ab465e36e44d29e to your computer and use it in GitHub Desktop.
Save joyrexus/524d2ab465e36e44d29e to your computer and use it in GitHub Desktop.
Transcript validation notes

Resources for creating a simple web interface for validating tabular transcript data.

I still see a lot of transcription and coding done in excel, with metadata stored in an info worksheet, transcription and coding in another transcript worksheet. Each row is tied to a timestamped utterance or speech act. So, there are columns for the timestamp and utterance along with additional columns for whatever annotation is needed (e.g., coding for syntax, semantics, gesture, context, etc.)

Anyway, we want to avoid parsing excel files. Let Google Drive / Sheets do this for us.

With a published sheet, we can retrieve a JSON feed from the Spreadsheets Data API.

However, we may want to avoid publishing our transcripts. Alternatively, we can use a node client with authentication built-in to access the relevant worksheet data for validation.

So, what we want is a simple single-page web app that ...

  • allows a user to upload an excel file
  • uses the Drive API to convert this file to a gsheet
  • uses the Drive or Spreadsheets API (or browserified node client) to retrieve the parsed sheet data as JSON
  • tries to validate the returned sheet data
  • provides feedback to the user if it finds any invalid data

File uploads are straightforward with the File API.

Keep it simple!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment