Created
March 16, 2021 12:39
-
-
Save gtsambos/3165c24aebb50d2ca4b743b25673c413 to your computer and use it in GitHub Desktop.
A quick demonstration of link_ancestors
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import tskit\n", | |
"\n", | |
"import numpy as np\n", | |
"import msprime # Only used to generate sample datasets for this notebook.\n", | |
"from IPython.display import SVG # Only used to plot trees in this notebook." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Setting up you simulations to work with link_ancestors\n", | |
"\n", | |
"I'll generate a simple example: a demographic history involvin\n", | |
" - Samples from an admixed population\n", | |
" - Generated by admixture of 2 other populations x years ago\n", | |
"\n", | |
"See\n", | |
"https://msprime.readthedocs.io/en/latest/tutorial.html#migrations\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"pop0 = msprime.PopulationConfiguration(sample_size=3, initial_size = 500)\n", | |
"pop1 = msprime.PopulationConfiguration(sample_size=3, initial_size = 500)\n", | |
"\n", | |
"M = np.array([\n", | |
"[0, 0.10],\n", | |
"[0.08, 0]])\n", | |
"\n", | |
"ts = msprime.simulate(\n", | |
" population_configurations = [pop0, pop1],\n", | |
" demographic_events = [msprime.CensusEvent(time=1000)],\n", | |
" migration_matrix = M,\n", | |
" length = 1000,\n", | |
" random_seed = 31,\n", | |
" recombination_rate = 1e-7)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"colour_map = {0:\"red\", 1:\"blue\"}\n", | |
"node_colours = {u.id: colour_map[u.population] for u in ts.nodes()}\n", | |
"# for tree in ts.trees():\n", | |
"# print(\"Tree on interval:\", tree.interval)\n", | |
"# # The code below will only work in a Jupyter notebook with SVG output enabled.\n", | |
"# display(SVG(tree.draw(node_colours=node_colours, height=250)))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# How to use link_ancestors to get local ancestry\n", | |
"\n", | |
"## 1. Choose the 'ancestors' that you are interested in.\n", | |
"\n", | |
"In tree sequence land, this means picking the 'nodes' of the tree sequence that belong to the ancestors of interest. Typically, this will be all of the nodes that exist at a particular generation back in time, or all nodes that belonged to some particular ancestral population of interest.\n", | |
"\n", | |
"Suppose you wanted to find local ancestry wrt to all ancestors who were alive 1000 generations ago. You'd select the appropriate nodes like this:\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[10 11 12]\n" | |
] | |
} | |
], | |
"source": [ | |
"ancestors_1000_gens = np.where(ts.tables.nodes.time == 1000)[0]\n", | |
"\n", | |
"print(ancestors_1000_gens)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Suppose you want to find local ancestry from all ancestors at time 1000 belonging to population 1. You'd find the nodes like this:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[11 12]\n" | |
] | |
} | |
], | |
"source": [ | |
"all_ancestors = np.where(ts.tables.nodes.time == 1000)\n", | |
"all_pop1 = np.where(ts.tables.nodes.population == 1)\n", | |
"ancestors_pop1 = np.intersect1d(all_ancestors, all_pop1)\n", | |
"\n", | |
"print(ancestors_pop1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 2. Apply link_ancestors using the set of ancestors you just chose." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"id\tleft\t\tright\t\tparent\tchild\tmetadata\n", | |
"0\t0.00000000\t25.49455408\t10\t0\t\n", | |
"1\t0.00000000\t1000.00000000\t10\t1\t\n", | |
"2\t0.00000000\t1000.00000000\t10\t4\t\n", | |
"3\t25.49455408\t1000.00000000\t11\t0\t\n", | |
"4\t0.00000000\t1000.00000000\t11\t3\t\n", | |
"5\t0.00000000\t1000.00000000\t11\t5\t\n", | |
"6\t0.00000000\t1000.00000000\t12\t2\t\n" | |
] | |
} | |
], | |
"source": [ | |
"ancestrytable_1000gens = ts.tables.link_ancestors(\n", | |
" samples=ts.samples(), ancestors=ancestors_1000_gens)\n", | |
"\n", | |
"print(ancestrytable_1000gens)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Note that every number in the 'parent' table above is one of the nodes in `ancestors_1000_gens`. This is essentially what `link_ancestors` does -- it shows you directly which of your samples descend from which ancestors." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# ancestrytable_pop1 = ts.tables.link_ancestors(\n", | |
"# samples=ts.samples(), ancestors=ancestors_pop1)\n", | |
"\n", | |
"# print(ancestrytable_pop1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 3. Replace the parent nodes with their population label." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"nodeTable = ts.tables.nodes" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def get_population_id(node, ts):\n", | |
" return ts.tables.nodes.population[node]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" left right population child\n", | |
"0 0.000000 25.494554 0 0\n", | |
"1 0.000000 1000.000000 0 1\n", | |
"2 0.000000 1000.000000 0 4\n", | |
"3 25.494554 1000.000000 1 0\n", | |
"4 0.000000 1000.000000 1 3\n", | |
"5 0.000000 1000.000000 1 5\n", | |
"6 0.000000 1000.000000 1 2\n" | |
] | |
} | |
], | |
"source": [ | |
"import pandas as pd\n", | |
"\n", | |
"ancestry_table = pd.DataFrame(\n", | |
" data = {\n", | |
" 'left': ancestrytable_1000gens.left,\n", | |
" 'right': ancestrytable_1000gens.right,\n", | |
" 'population' : [get_population_id(u, ts) for u in ancestrytable_1000gens.parent],\n", | |
" 'child' : ancestrytable_1000gens.child\n", | |
" }\n", | |
")\n", | |
"\n", | |
"print(ancestry_table)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.4" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment