Skip to content

Instantly share code, notes, and snippets.

@newgene
newgene / mygene_use_https.py
Created February 19, 2019 21:39
mygene client to access API via https
In [1]: import mygene
In [2]: mg = mygene.MyGeneInfo()
# by default, it uses http:
In [3]: mg.url
Out[3]: 'http://mygene.info/v3'
# switch to use https
In [4]: mg.url = 'https://mygene.info/v3'
@newgene
newgene / shell_aliases_flake8.sh
Last active August 29, 2015 14:26
My handy shell aliases for checking all changed Python files under a git or mercurial repo before commit.
# Just a short-hand to type less
alias f8='flake8'
# run "hgf" anywhere in a mercurial repo, it will check all changed *.py with flake8
alias hgf='tmp_cwd=`pwd` ; cd `hg root`; hgs -nmd |grep "\.py$" |xargs flake8; cd $tmp_cwd; unset tmp_cwd'
# run "gf" anywhere in a git repo, it will check all changed *.py with flake8
alias gf='tmp_cwd=`pwd` ; cd `git rev-parse --show-toplevel`; git diff --name-only |grep "\.py$" |xargs flake8; cd $tmp_cwd; unset tmp_cwd'
@newgene
newgene / metadata_mygene_info.json
Created September 17, 2014 16:49
An example of "/metadata" output from MyGene.info
{
"app_revision": "193:7417080ffb37",
"available_fields": [
"accession",
"alias",
"biocarta",
"chr",
"end",
"ensemblgene",
"ensemblprotein",
@newgene
newgene / gene_object_mygene_info.json
Created September 17, 2014 16:37
A full example of gene objects stored in MyGene.info.
{
"_id": "1017",
"_timestamp": "2014-08-25T00:00:00",
"accession": {
"genomic": [
"ABBA01008397",
"AC025162",
"AC034102",
"AC_000144",
"AF512553",
@newgene
newgene / merged_variant_json_doc
Last active August 29, 2015 14:01
This is an example of merged JSON documents for variant annotations (actual data may not be accurate)
{
"_id": "15:g.33905410A>G",
"mutdb": {
"chromEnd": 33905410,
"dbsnp_id": "rs2229116",
"allele2": "G",
"uniprot_id": "VAR_011405",
"allele1": "A",
"mutpred_score": 0.384,
"cosmic_id": null,
@newgene
newgene / mygene_gene_object_mapping.json
Created April 7, 2014 17:31
mygene_gene_object_mapping
{
"properties": {
"AnimalQTLdb": {
"type": "string",
"index": "no",
"include_in_all": false
},
"FLYBASE": {
"type": "string",
"index_name": "flybase",
@newgene
newgene / 1_readme.md
Last active August 29, 2015 13:57
Clonify_contest.md

Readme for this contest

General context of this contest

Human immune system produces a vast variety of antibodies in order to respond to the external stimuli. Next-generation sequencing technology allows researchers to obtain the sequences of all antibodies from a single person. Clustering these antibody sequences allows us to understand how an antibody is produced. However, the number of antibody sequences from a single sample can be up to 1 million scale. Clustering with such a big scale poses a big computation challenge.

Current algorithm

The current algorithm for clustering antibody sequences computes a pairwise distance matrix, and then perform a hierarchical clustering to group sequences into clusters. This algorithm is implemented in Python as provided clonify_contest.py script.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.