Skip to content

Instantly share code, notes, and snippets.

@practicalparticipation
Last active August 29, 2015 13:56
Show Gist options
  • Save practicalparticipation/8831368 to your computer and use it in GitHub Desktop.
Save practicalparticipation/8831368 to your computer and use it in GitHub Desktop.

Intro

Notes of a number of Open Refine expressions for working with transcripts from the Internet Governance Forum.

ToDo:

Get a working regexp to extract speaker names.

Explore outputting data to formats for import into SayIt.

Step 1: Import a transcript

Past from clipboard, should create a row for each paragraph of the transcript

Fetch data from Open Calais

Based on Paul Bradshaw's expression here. Requires an Open Calais API Key.

"http://api.opencalais.com/enlighten/rest/?licenseID=___API_KEY_HERE___&content=" + escape(value,'url') + "&paramsXML=%3Cc%3Aparams%20xmlns%3Ac%3D%22http%3A%2F%2Fs.opencalais.com%2F1%2Fpred%2F%22%20xmlns%3Ardf%3D%22http%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%22%3E%20%20%3Cc%3AprocessingDirectives%20c%3AcontentType%3D%22TEXT%2FRAW%22%20c%3AoutputFormat%3D%22Application%2FJSON%22%20%20%3E%20%20%3C%2Fc%3AprocessingDirectives%3E%20%20%3Cc%3AuserDirectives%3E%20%20%3C%2Fc%3AuserDirectives%3E%20%20%3Cc%3AexternalMetadata%3E%20%20%3C%2Fc%3AexternalMetadata%3E%20%20%3C%2Fc%3Aparams%3E"

Parse Open Calais JSON returns

parseJson("[" + value.replace(/\"http\:\/\/d\.opencalais\.com\/[A-Za-z0-9\/-]+\"\:/,"").replace(/\"doc\":/,"")[1,-1] + "]")

Fetch all the responses of a given type

forEach(filter(parseJson("[" + value.replace(/\"http\:\/\/d\.opencalais\.com\/[A-Za-z0-9\/-]+\"\:/,"").replace(/\"doc\":/,"")[1,-1] + "]"),v, v['_typeGroup']=='entities'),x,x)

Where _typeGroup can be one of the types listed here or queries can be based on _type from the same list.

Extract organisation data

forEach(filter(parseJson("[" + value.replace(/\"http\:\/\/d\.opencalais\.com\/[A-Za-z0-9\/-]+\"\:/,"").replace(/\"doc\":/,"")[1,-1] + "]"),v, v['_type']=='Organization'),x,x['name']).join("|")

Extract person data

forEach(filter(parseJson("[" + value.replace(/\"http\:\/\/d\.opencalais\.com\/[A-Za-z0-9\/-]+\"\:/,"").replace(/\"doc\":/,"")[1,-1] + "]"),v, v['_type']=='Person'),x,x['name']).join("|")

Extract topics

forEach(filter(parseJson("[" + value.replace(/\"http\:\/\/d\.opencalais\.com\/[A-Za-z0-9\/-]+\"\:/,"").replace(/\"doc\":/,"")[1,-1] + "]"),v, v['_typeGroup']=='topics'),x,x['categoryName']).join("|")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment