Last active
October 2, 2018 15:38
-
-
Save dhimmel/44212bed10ce618da454e4508d91fac4 to your computer and use it in GitHub Desktop.
Analytical derivation of the prior XSwap probability of a hetnet edge
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
metaedge | abbreviation | n_edges | n_connected_source_nodes | n_connected_target_nodes | n_source_wedges | n_target_wedges | n_wedges | n_valid_xswaps | |
---|---|---|---|---|---|---|---|---|---|
Anatomy–downregulates–Gene | AdG | 102240 | 36 | 15097 | 173440264 | 493897 | 173934161 | 5052523519 | |
Anatomy–expresses–Gene | AeG | 526407 | 241 | 18094 | 2290279787 | 10749138 | 2301028925 | 136250872696 | |
Anatomy–upregulates–Gene | AuG | 97848 | 36 | 15929 | 149352969 | 359661 | 149712630 | 4637353998 | |
Compound–binds–Gene | CbG | 11571 | 1389 | 1689 | 104024 | 476540 | 580564 | 66357671 | |
Compound–causes–Side Effect | CcSE | 138944 | 1071 | 5701 | 16998055 | 16764774 | 33762829 | 9618885267 | |
Compound–downregulates–Gene | CdG | 21102 | 734 | 2880 | 1683615 | 291789 | 1975404 | 220661247 | |
Compound–palliates–Disease | CpD | 390 | 221 | 50 | 326 | 2857 | 3183 | 72672 | |
Compound–resembles–Compound | CrC | 12972 | 1281 | 1281 | 120047 | 120047 | 240094 | 83889812 | |
Compound–treats–Disease | CtD | 755 | 387 | 77 | 1420 | 8070 | 9490 | 275145 | |
Compound–upregulates–Gene | CuG | 18756 | 703 | 3247 | 1489248 | 183634 | 1672882 | 174211508 | |
Disease–associates–Gene | DaG | 12623 | 134 | 5392 | 1581400 | 35550 | 1616950 | 78046803 | |
Disease–downregulates–Gene | DdG | 7623 | 44 | 5745 | 884813 | 2498 | 887311 | 28163942 | |
Disease–localizes–Anatomy | DlA | 3602 | 133 | 398 | 72644 | 19982 | 92626 | 6392775 | |
Disease–presents–Symptom | DpS | 3357 | 133 | 415 | 64389 | 23430 | 87819 | 5545227 | |
Disease–resembles–Disease | DrD | 1086 | 129 | 129 | 6531 | 6531 | 13062 | 576093 | |
Disease–upregulates–Gene | DuG | 7731 | 44 | 5630 | 909467 | 3031 | 912498 | 28967817 | |
Gene–covaries–Gene | GcG | 123380 | 12453 | 12453 | 3702610 | 3702610 | 7405220 | 7603845290 | |
Gene–interacts–Gene | GiG | 294328 | 15165 | 15165 | 55412782 | 55412782 | 110825564 | 43203513064 | |
Gene–participates–Biological Process | GpBP | 559504 | 14772 | 11381 | 30123408 | 90103196 | 120226604 | 156401856652 | |
Gene–participates–Cellular Component | GpCC | 73566 | 10580 | 1391 | 456449 | 12531792 | 12988241 | 2692953154 | |
Gene–participates–Molecular Function | GpMF | 97222 | 13063 | 2884 | 591373 | 15444939 | 16036312 | 4709973719 | |
Gene–participates–Pathway | GpPW | 84372 | 8979 | 1822 | 1337526 | 11008146 | 12345672 | 3546929334 | |
Gene→regulates→Gene | Gr>G | 265672 | 4634 | 7048 | 15056962 | 37821797 | 52878759 | 35237794197 | |
Pharmacologic Class–includes–Compound | PCiC | 1029 | 345 | 724 | 3156 | 603 | 3759 | 525147 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Stats for computing prior probability of a hetnet edge based on XSwap permutation\n", | |
"\n", | |
"https://github.com/greenelab/hetmech/issues/134" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import collections\n", | |
"\n", | |
"import numpy\n", | |
"import pandas\n", | |
"import scipy.special\n", | |
"\n", | |
"import hetmech.hetmat" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Functions" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def get_wedge_number(degrees):\n", | |
" \"\"\"\n", | |
" Compute total number of wedges for a list of nodes with the specified degrees.\n", | |
" \"\"\"\n", | |
" return sum(scipy.special.comb(degree, 2, exact=True) for degree in degrees)\n", | |
"\n", | |
"\n", | |
"def get_xswap_metaedge_df(hetmat):\n", | |
" \"\"\"\n", | |
" Get a dataframe with metaedge summary information. Includes\n", | |
" statistics for analytically computing the probability of an edge\n", | |
" existing in XSwap permutations. See\n", | |
" https://github.com/greenelab/hetmech/issues/134#issuecomment-425933781.\n", | |
"\n", | |
" HetMat analog to\n", | |
" https://github.com/dhimmel/hetio-fork/blob/ae2d0bce46e7137ae8812e99ce0e8301f8b7fa53/hetio/stats.py#L94-L105\n", | |
" \"\"\"\n", | |
" assert isinstance(hetmat, hetmech.hetmat.HetMat)\n", | |
" rows = list()\n", | |
" metaedges = list(hetmat.metagraph.get_edges(exclude_inverts=True))\n", | |
" for metaedge in metaedges:\n", | |
" # Metaedge information\n", | |
" row = collections.OrderedDict()\n", | |
" row['metaedge'] = metaedge.get_unicode_str()\n", | |
" row['abbreviation'] = metaedge.get_abbrev()\n", | |
" # Metaedge edges\n", | |
" source_ids, target_ids, matrix = hetmat.metaedge_to_adjacency_matrix(metaedge)\n", | |
" row['n_edges'] = matrix.sum()\n", | |
" # Number of connected source and target nodes\n", | |
" source_degrees = numpy.array(matrix.sum(axis=1).flat)\n", | |
" target_degrees = numpy.array(matrix.sum(axis=0).flat)\n", | |
" row['n_connected_source_nodes'] = sum(source_degrees > 0)\n", | |
" row['n_connected_target_nodes'] = sum(target_degrees > 0)\n", | |
" # XSwap prior probability statistics (https://git.io/fxkcp)\n", | |
" row['n_source_wedges'] = get_wedge_number(source_degrees)\n", | |
" row['n_target_wedges'] = get_wedge_number(target_degrees)\n", | |
" row['n_wedges'] = row['n_source_wedges'] + row['n_target_wedges']\n", | |
" row['n_valid_xswaps'] = scipy.special.comb(row['n_edges'], 2, exact=True) - row['n_wedges']\n", | |
" rows.append(row)\n", | |
" metaedge_df = pandas.DataFrame(rows).sort_values('metaedge')\n", | |
" return metaedge_df" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Execution" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Read Hetionet v1.0\n", | |
"hetmat = hetmech.hetmat.HetMat('../../data/hetionet-v1.0.hetmat/')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style scoped>\n", | |
" .dataframe tbody tr th:only-of-type {\n", | |
" vertical-align: middle;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>metaedge</th>\n", | |
" <th>abbreviation</th>\n", | |
" <th>n_edges</th>\n", | |
" <th>n_connected_source_nodes</th>\n", | |
" <th>n_connected_target_nodes</th>\n", | |
" <th>n_source_wedges</th>\n", | |
" <th>n_target_wedges</th>\n", | |
" <th>n_wedges</th>\n", | |
" <th>n_valid_xswaps</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>Anatomy–downregulates–Gene</td>\n", | |
" <td>AdG</td>\n", | |
" <td>102240</td>\n", | |
" <td>36</td>\n", | |
" <td>15097</td>\n", | |
" <td>173440264</td>\n", | |
" <td>493897</td>\n", | |
" <td>173934161</td>\n", | |
" <td>5052523519</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>Anatomy–expresses–Gene</td>\n", | |
" <td>AeG</td>\n", | |
" <td>526407</td>\n", | |
" <td>241</td>\n", | |
" <td>18094</td>\n", | |
" <td>2290279787</td>\n", | |
" <td>10749138</td>\n", | |
" <td>2301028925</td>\n", | |
" <td>136250872696</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>Anatomy–upregulates–Gene</td>\n", | |
" <td>AuG</td>\n", | |
" <td>97848</td>\n", | |
" <td>36</td>\n", | |
" <td>15929</td>\n", | |
" <td>149352969</td>\n", | |
" <td>359661</td>\n", | |
" <td>149712630</td>\n", | |
" <td>4637353998</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" metaedge abbreviation n_edges n_connected_source_nodes \\\n", | |
"0 Anatomy–downregulates–Gene AdG 102240 36 \n", | |
"1 Anatomy–expresses–Gene AeG 526407 241 \n", | |
"2 Anatomy–upregulates–Gene AuG 97848 36 \n", | |
"\n", | |
" n_connected_target_nodes n_source_wedges n_target_wedges n_wedges \\\n", | |
"0 15097 173440264 493897 173934161 \n", | |
"1 18094 2290279787 10749138 2301028925 \n", | |
"2 15929 149352969 359661 149712630 \n", | |
"\n", | |
" n_valid_xswaps \n", | |
"0 5052523519 \n", | |
"1 136250872696 \n", | |
"2 4637353998 " | |
] | |
}, | |
"execution_count": 4, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"metaedge_df = get_xswap_metaedge_df(hetmat)\n", | |
"metaedge_df.head(3)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Write as TSV\n", | |
"metaedge_df.to_csv('hetionet-v1.0-metaedge-xswap-stats.tsv', sep='\\t', index=False)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python [conda env:hetmech]", | |
"language": "python", | |
"name": "conda-env-hetmech-py" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.6" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Updated figure from this notebook that includes two metaedges:
Includes Disease-associates-Gene data from greenelab/snorkeling#67.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here is the outputted
estimated-xswap-priors.png
: