Skip to content

Instantly share code, notes, and snippets.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@newgene
newgene / 1_readme.md
Last active August 29, 2015 13:57
Clonify_contest.md

Readme for this contest

General context of this contest

Human immune system produces a vast variety of antibodies in order to respond to the external stimuli. Next-generation sequencing technology allows researchers to obtain the sequences of all antibodies from a single person. Clustering these antibody sequences allows us to understand how an antibody is produced. However, the number of antibody sequences from a single sample can be up to 1 million scale. Clustering with such a big scale poses a big computation challenge.

Current algorithm

The current algorithm for clustering antibody sequences computes a pairwise distance matrix, and then perform a hierarchical clustering to group sequences into clusters. This algorithm is implemented in Python as provided clonify_contest.py script.

@newgene
newgene / mygene_gene_object_mapping.json
Created April 7, 2014 17:31
mygene_gene_object_mapping
{
"properties": {
"AnimalQTLdb": {
"type": "string",
"index": "no",
"include_in_all": false
},
"FLYBASE": {
"type": "string",
"index_name": "flybase",
@newgene
newgene / merged_variant_json_doc
Last active August 29, 2015 14:01
This is an example of merged JSON documents for variant annotations (actual data may not be accurate)
{
"_id": "15:g.33905410A>G",
"mutdb": {
"chromEnd": 33905410,
"dbsnp_id": "rs2229116",
"allele2": "G",
"uniprot_id": "VAR_011405",
"allele1": "A",
"mutpred_score": 0.384,
"cosmic_id": null,
@newgene
newgene / gene_object_mygene_info.json
Created September 17, 2014 16:37
A full example of gene objects stored in MyGene.info.
{
"_id": "1017",
"_timestamp": "2014-08-25T00:00:00",
"accession": {
"genomic": [
"ABBA01008397",
"AC025162",
"AC034102",
"AC_000144",
"AF512553",
@newgene
newgene / metadata_mygene_info.json
Created September 17, 2014 16:49
An example of "/metadata" output from MyGene.info
{
"app_revision": "193:7417080ffb37",
"available_fields": [
"accession",
"alias",
"biocarta",
"chr",
"end",
"ensemblgene",
"ensemblprotein",
@newgene
newgene / shell_aliases_flake8.sh
Last active August 29, 2015 14:26
My handy shell aliases for checking all changed Python files under a git or mercurial repo before commit.
# Just a short-hand to type less
alias f8='flake8'
# run "hgf" anywhere in a mercurial repo, it will check all changed *.py with flake8
alias hgf='tmp_cwd=`pwd` ; cd `hg root`; hgs -nmd |grep "\.py$" |xargs flake8; cd $tmp_cwd; unset tmp_cwd'
# run "gf" anywhere in a git repo, it will check all changed *.py with flake8
alias gf='tmp_cwd=`pwd` ; cd `git rev-parse --show-toplevel`; git diff --name-only |grep "\.py$" |xargs flake8; cd $tmp_cwd; unset tmp_cwd'
@newgene
newgene / mygene_use_https.py
Created February 19, 2019 21:39
mygene client to access API via https
In [1]: import mygene
In [2]: mg = mygene.MyGeneInfo()
# by default, it uses http:
In [3]: mg.url
Out[3]: 'http://mygene.info/v3'
# switch to use https
In [4]: mg.url = 'https://mygene.info/v3'