Daniel interrogator

## fline.py
#!/usr/bin/env python

"""
Utility to generate a first-line support message for a user who has submitted a roundup issue.

To only ever be one-file to discourage feature creep for a small utility.

The only main extensions I'd really consider are automatically posting this message to roundup,
but I don't think this is a good idea, as you may need to generate a couple of messages till
an appropriate one is generated

## process-palette.json
{
  "patterns": {
    "P1": {
      "expression": "(path):(line)"
    },
    "P2": {
      "expression": "(path)\\s+(line)",
      "path": "(?:\\/[\\w\\.\\-]+)+"
    }
  },

## thod
#!/usr/bin/env python3

# the thing above is called a shebang. it tells your shell what program to use
# to run this script. in this case, it says, this is python3. this makes it possible
# to run the script by typing `thod...`, rather than `python3 thod ...`

# the thing below is a module docstring. it's where you describe what the script
# is and how it works. it shows up if you do `thod --help`

"""

## download.py
#!/usr/bin/env python3

"""
Script to make a plain text corpus of PTSD narratives,
with a little bit of metadata.
"""

import os
import time
import requests

## tundra-api.sh
# query a conll file
CONLLU2_FILE="/Users/danielmcdonald/Downloads/test.conllu"
QUERY="[pos=/V.*/]"
LANGUAGE="german"
API="https://weblicht.sfs.uni-tuebingen.de/tundra-beta/api/query/visres"
curl -X POST -F "file=@$CONLLU2_FILE" -F "query=$QUERY" -F "lang=$LANGUAGE" "$API" > api-test.json

# query a treebank
ID="UD_French"
QUERY="[pos=/V.*/]"

## README.md

      
              5 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                interrogator
                / README.md
            
            
              Created
              January 18, 2017 12:16
            
              
                nsubj
              
          
    README is empty

  
## blog.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                interrogator
                / blog.md
            
            
              Last active
              November 8, 2016 06:46
            
              
                daniel's blog post
              
          
    Halfway through my PhD candidature in linguistics at Melbourne Uni, I was introduced by Fiona to the ResPlat family. One of their aims, I was told, was to train researchers across the university in emerging tools and methods for doing better, more reproducible research. A specific target of this agenda was the Humanities and Social Sciences, who, let's admit, sometimes lag behind a little when it comes to engagement with digital tools and methods.
IMAGE OF RESPLAT
http://67.media.tumblr.com/ede2ddf22557269fd92dd13c4b344c53/tumblr_inline_nk9gcyW6pE1ssbz72.jpg
"ResPlat Family"
My thesis was about corpus linguistics—that is, using computers to locate patterns in large collections of written text. Because of this, Fiona asked me if I could come on board and help out, teaching Python to researchers around the university, but with extra focus on those from the humanities. A key issue among corpus linguists, however, is that many don't really know how to code. A more common w

  
## dendo.py
%matplotlib notebook
import seaborn as sns
import numpy as np
from scipy.spatial.distance import pdist
from scipy.cluster.hierarchy import linkage, dendrogram

# pdist can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine',
# 'dice', 'euclidean', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching',
#'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'

## sample.py
from corpkit import *
corpus = corpus('corpus_name')
langmod = corpus.make_language_model('modelname')

# score string
langmod.score('Check similarity for this text to each corpus')

# score file
corpus_file = corpus.subcorpora[5].files[1]
langmod.score(corpus_file)

## sorter.py
# a list of ints
l = [4, 2, 6, 7, 8, 1, 50, 23, 13, 55, 12, 3]

# one line to sort them!
[l.insert(ind, l.pop(l.index(min(l[ind:])))) for ind in range(len(l))]

# see?
print(l)

# elaborated code
	#!/usr/bin/env python

	"""
	Utility to generate a first-line support message for a user who has submitted a roundup issue.

	To only ever be one-file to discourage feature creep for a small utility.

	The only main extensions I'd really consider are automatically posting this message to roundup,
	but I don't think this is a good idea, as you may need to generate a couple of messages till
	an appropriate one is generated
	{
	"patterns": {
	"P1": {
	"expression": "(path):(line)"
	},
	"P2": {
	"expression": "(path)\\s+(line)",
	"path": "(?:\\/[\\w\\.\\-]+)+"
	}
	},
	#!/usr/bin/env python3

	# the thing above is called a shebang. it tells your shell what program to use
	# to run this script. in this case, it says, this is python3. this makes it possible
	# to run the script by typing `thod...`, rather than `python3 thod ...`

	# the thing below is a module docstring. it's where you describe what the script
	# is and how it works. it shows up if you do `thod --help`

	"""
	#!/usr/bin/env python3

	"""
	Script to make a plain text corpus of PTSD narratives,
	with a little bit of metadata.
	"""

	import os
	import time
	import requests
	# query a conll file
	CONLLU2_FILE="/Users/danielmcdonald/Downloads/test.conllu"
	QUERY="[pos=/V.*/]"
	LANGUAGE="german"
	API="https://weblicht.sfs.uni-tuebingen.de/tundra-beta/api/query/visres"
	curl -X POST -F "file=@$CONLLU2_FILE" -F "query=$QUERY" -F "lang=$LANGUAGE" "$API" > api-test.json

	# query a treebank
	ID="UD_French"
	QUERY="[pos=/V.*/]"
	%matplotlib notebook
	import seaborn as sns
	import numpy as np
	from scipy.spatial.distance import pdist
	from scipy.cluster.hierarchy import linkage, dendrogram

	# pdist can be 'braycurtis', 'canberra', 'chebyshev', 'cityblock', 'correlation', 'cosine',
	# 'dice', 'euclidean', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching',
	#'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'
	from corpkit import *
	corpus = corpus('corpus_name')
	langmod = corpus.make_language_model('modelname')

	# score string
	langmod.score('Check similarity for this text to each corpus')

	# score file
	corpus_file = corpus.subcorpora[5].files[1]
	langmod.score(corpus_file)
	# a list of ints
	l = [4, 2, 6, 7, 8, 1, 50, 23, 13, 55, 12, 3]

	# one line to sort them!
	[l.insert(ind, l.pop(l.index(min(l[ind:])))) for ind in range(len(l))]

	# see?
	print(l)

	# elaborated code