Skip to content

Instantly share code, notes, and snippets.

@mtholder
mtholder / grep_ot_code
Last active August 29, 2015 14:01
If you have checked the main open tree of life repos out into a directory which you lovingly call OPEN_TREE_REPO_ROOT in your env, then this will run grep on the files that appear to be java, python, or shell scripts in those repos.
#!/bin/sh
for repo in api.opentreeoflife.org deployed-systems gcmdr opentree ot-base oti ott peyotl phylografter taxomachine reference-taxonomy treemachine ;
do
for suffix in .java .py .sh
do
find "$OPEN_TREE_REPO_ROOT/$repo" -name "*$suffix" -exec grep -H $@ {} \; | sed -e "s+$OPEN_TREE_REPO_ROOT/++"
done
done
@mtholder
mtholder / fetch_phylografter_study.py
Created May 31, 2014 02:11
(if you have peyotl installed and configured) this will download studies from phylografter and store each as NexSON in a file named <studyId>.json *WARNING*: this will overwrite <studyId>.json !
#!/usr/bin/env python
import sys
from peyotl.api.phylografter import Phylografter
pg = Phylografter()
for study_id in sys.argv[1:]:
pg.fetch_study(study_id, study_id + '.json')
@mtholder
mtholder / annotations.xsd
Last active August 29, 2015 14:02
hacky version of nexml/xsd/meta/annotations.xsd that allows any element inside a LiteralMeta
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.nexml.org/2009"
xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="http://www.nexml.org/2009"
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:sawsdl="http://www.w3.org/ns/sawsdl"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xhtml="http://www.w3.org/1999/xhtml/datatypes/" elementFormDefault="qualified">
<!-- <xs:annotation>
<xs:documentation>
This module defines annotations that can be attached to
@mtholder
mtholder / force_oti_reindexing_of_non_namespaced_ids.py
Created June 12, 2014 04:31
force_oti_reindexing_of_non_namespaced_ids.py
#!/usr/bin/env python
import sys
from peyotl.api import APIWrapper
aw = APIWrapper(phylesystem_api_kwargs={'get_from':'api'})
pa = aw.phylesystem_api
o = aw.oti
for line in open(sys.argv[1], 'rU'):
ls = line.strip()
if '_' in ls:
continue
@mtholder
mtholder / ottContextTNRS.py
Created June 18, 2014 16:34
query taxomachine using contextQueryForNames. if a name is preceded by an arg that starts with -cblah, then blah will be the contextName
#!/usr/bin/env python
import sys
from peyotl.api.taxomachine import Taxomachine
taxomachine = Taxomachine()
from peyotl.nexson_syntax import write_as_json
context_name = 'All life'
for n in sys.argv[1:]:
if n.startswith('-c'):
context_name = n[2:]
else:
@mtholder
mtholder / ottautocomplete.py
Created June 18, 2014 16:35
query taxomachine using autocompleteBoxQuery. if a name is preceded by an arg that starts with -cblah, then blah will be the contextName
#!/usr/bin/env python
import sys
from peyotl.api.taxomachine import Taxomachine
taxomachine = Taxomachine()
from peyotl.nexson_syntax import write_as_json
context_name = 'All life'
for n in sys.argv[1:]:
if n.startswith('-c'):
context_name = n[2:]
else:
@mtholder
mtholder / newick_label.py
Created September 19, 2014 08:44
print a word as it should appear if it is a node label (including tip label) in a newick tree
#!/usr/bin/env python
import sys
import os
import re
_SCRIPT_NAME = os.path.split(sys.argv[0])[-1]
_FORBIDDEN = re.compile(r'[^- 0-9a-zA-Z`~@#$%^&*()_+={}|\\\[\]:;"\'<,>.?/]')
_NEEDS_QU_PUNC_STR = r'[\[\]():,;]'
_NEEDS_QUOTES_PATTERN = re.compile(r'(\s|' + _NEEDS_QU_PUNC_STR + ')')
_NEEDS_QUOTES_PUNC_PATTERN = re.compile(_NEEDS_QU_PUNC_STR)
_SINGLE_QUOTE = "'"
@mtholder
mtholder / ott_names_to_newick.py
Last active August 29, 2015 14:06
Script to convert OTT names or uniqunames to newick labels
#!/usr/bin/env python
# vim: set fileencoding=utf-8 :
import sys
import os
import re
u'''
Unicode char lists found by running against ott2.8
Using the names column in taxonomy.tsv required allowing the
following unicode characters:
æ denoted: u'\xe6'
@mtholder
mtholder / check_ott_for_new_char.sh
Created September 19, 2014 11:20
check a version of ott for new characters (punctuation and unicode) in labels
#!/bin/bash
if ! which ott_names_to_newick.py >/dev/null 2>&1
then
echo 'Download ott_names_to_newick.py from https://gist.github.com/mtholder/ac58ab1b3c6a962b9bdc and put it on your PATH'
exit 1
fi
echo 'grabbing names from taxonomy'
set -x
awk 'BEGIN { FS = "\t\\|\t" } ; {print $3 }' taxonomy.tsv >names.txt
awk 'BEGIN { FS = "\t\\|\t" } ; {print $6 }' taxonomy.tsv >uniqnames.txt
@mtholder
mtholder / chars_in_file.py
Created November 28, 2014 12:48
chars_in_file.py takes a list of filepaths (to files encoded using utf-8). prints a list of the characters encountered.
#!/usr/bin/env python
import codecs
import sys
chars = set()
for filepath in sys.argv[1:]:
with codecs.open(filepath, 'r', encoding='utf-8') as fo:
for line in fo:
chars.update(iter(line))
c = list(chars)
c.sort()