Skip to content

Instantly share code, notes, and snippets.

View stevecassidy's full-sized avatar

Steve Cassidy stevecassidy

View GitHub Profile
@stevecassidy
stevecassidy / index.html
Last active August 29, 2015 14:02
Wordcloud
<!DOCTYPE html>
<meta charset="utf-8">
<body>
<script src="http://d3js.org/d3.v3.min.js"></script>
<script src="http://www.jasondavies.com/wordcloud/d3.layout.cloud.js"></script>
<script>
var fill = d3.scale.category20();
d3.layout.cloud().size([300, 300])
.words([
library(wrassp)
library(alveo)
library(emuR)
config <- read_config()
client <- RestClient(server_uri=config$base_url)
# find some items from mitchel & delbridge
items <- client$search_metadata('collection_name:mitcheldelbridge AND speech_style:scripted AND sex:f AND identifier:*s1 AND uid:*9')
# create an item list
result <- client$create_item_list(items, 'md_sample')
@stevecassidy
stevecassidy / gist:3a92523eadd55839db9c
Last active August 29, 2015 14:07
An example of using the Alveo API in a iPython notebook
{
"metadata": {
"name": "",
"signature": "sha256:67a2f144df80487bd4ac3999393dcbe9d0f6b23d173d679a265f5822f590f418"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"metadata": {
"name": "",
"signature": "sha256:80bd36aff028e7424151365df098be881a56c96559ace346474b97f274e6387d"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
@stevecassidy
stevecassidy / gist:55d3fecb6f80d0dee4d3
Created December 2, 2014 02:21
SPARQL query for the Austalk dataset to find certain speakers and words
PREFIX dc:<http://purl.org/dc/terms/>
PREFIX austalk:<http://ns.austalk.edu.au/>
PREFIX olac:<http://www.language-archives.org/OLAC/1.1/>
PREFIX ausnc:<http://ns.ausnc.org.au/schemas/ausnc_md_model/>
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX dbpedia:<http://dbpedia.org/ontology/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX geo:<http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX iso639schema:<http://downlode.org/rdf/iso-639/schema#>
@stevecassidy
stevecassidy / index.py
Last active August 29, 2015 14:21
Code for handling a Trove data dump consisting of many lines contianing JSON data.
__author__ = 'steve'
import json
import os
import pickle
import gzip
class TroveIndex:
"""An index over a Trove data set"""
@stevecassidy
stevecassidy / simple_dataset.py
Last active October 21, 2015 10:13
A Galaxy tool to debug creating datasets
import argparse
import sys
import os
def parser():
parser = argparse.ArgumentParser(description="Generates some sample documents in a dataset")
parser.add_argument('--output_path', required=True, action="store", type=str, help="Path to output file")
return parser.parse_args()
@stevecassidy
stevecassidy / children.tpl
Last active April 22, 2016 07:52
Sample files to reproduce an issue with emuR. children.tpl is the original legacy template. children_DBconfig.json is the generated new config. mod_children_DBconfig.json is a modified version that fixes some loading issues.
! template file for children's speech
level Set
level Word Set
level Syllable Word
level Phoneme Syllable
level Phonetic Phoneme many-to-many
level Target Phonetic
label Set Speaker
import sys,os,random
import pyalveo
import time
from glob import glob
import csv
# disable insecure HTTPS warnings from the staging servers
import requests
requests.packages.urllib3.disable_warnings()
@stevecassidy
stevecassidy / ssd2wav.py
Last active August 17, 2016 11:44
Convert SSD files (from the old Emu system) to WAV format and modify label files to subtract the start time from the SSD file from all label times. A useful utility for converting old Emu databases.
import sys
import os
import wave
import array
import shutil
def ssff_rewrite(ssfffile, outdir):
"""Copy an SSFF format file but reset the
start time to 0.0"""