gtsambos/link_ancestors_demo.ipynb

## link_ancestors_demo.ipynb
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tskit\n",
    "\n",
    "import numpy as np\n",
    "import msprime # Only used to generate sample datasets for this notebook.\n",
    "from IPython.display import SVG # Only used to plot trees in this notebook."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setting up you simulations to work with link_ancestors\n",
    "\n",
    "I'll generate a simple example: a demographic history involvin\n",
    " - Samples from an admixed population\n",
    " - Generated by admixture of 2 other populations x years ago\n",
    "\n",
    "See\n",
    "https://msprime.readthedocs.io/en/latest/tutorial.html#migrations\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "pop0 = msprime.PopulationConfiguration(sample_size=3, initial_size = 500)\n",
    "pop1 = msprime.PopulationConfiguration(sample_size=3, initial_size = 500)\n",
    "\n",
    "M = np.array([\n",
    "[0, 0.10],\n",
    "[0.08, 0]])\n",
    "\n",
    "ts = msprime.simulate(\n",
    "        population_configurations = [pop0, pop1],\n",
    "        demographic_events = [msprime.CensusEvent(time=1000)],\n",
    "        migration_matrix = M,\n",
    "        length = 1000,\n",
    "        random_seed = 31,\n",
    "        recombination_rate = 1e-7)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "colour_map = {0:\"red\", 1:\"blue\"}\n",
    "node_colours = {u.id: colour_map[u.population] for u in ts.nodes()}\n",
    "# for tree in ts.trees():\n",
    "#     print(\"Tree on interval:\", tree.interval)\n",
    "#     # The code below will only work in a Jupyter notebook with SVG output enabled.\n",
    "#     display(SVG(tree.draw(node_colours=node_colours, height=250)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to use link_ancestors to get local ancestry\n",
    "\n",
    "## 1. Choose the 'ancestors' that you are interested in.\n",
    "\n",
    "In tree sequence land, this means picking the 'nodes' of the tree sequence that belong to the ancestors of interest. Typically, this will be all of the nodes that exist at a particular generation back in time, or all nodes that belonged to some particular ancestral population of interest.\n",
    "\n",
    "Suppose you wanted to find local ancestry wrt to all ancestors who were alive 1000 generations ago. You'd select the appropriate nodes like this:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[10 11 12]\n"
     ]
    }
   ],
   "source": [
    "ancestors_1000_gens = np.where(ts.tables.nodes.time == 1000)[0]\n",
    "\n",
    "print(ancestors_1000_gens)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose you want to find local ancestry from all ancestors at time 1000 belonging to population 1. You'd find the nodes like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[11 12]\n"
     ]
    }
   ],
   "source": [
    "all_ancestors = np.where(ts.tables.nodes.time == 1000)\n",
    "all_pop1 = np.where(ts.tables.nodes.population == 1)\n",
    "ancestors_pop1 = np.intersect1d(all_ancestors, all_pop1)\n",
    "\n",
    "print(ancestors_pop1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Apply link_ancestors using the set of ancestors you just chose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "id\tleft\t\tright\t\tparent\tchild\tmetadata\n",
      "0\t0.00000000\t25.49455408\t10\t0\t\n",
      "1\t0.00000000\t1000.00000000\t10\t1\t\n",
      "2\t0.00000000\t1000.00000000\t10\t4\t\n",
      "3\t25.49455408\t1000.00000000\t11\t0\t\n",
      "4\t0.00000000\t1000.00000000\t11\t3\t\n",
      "5\t0.00000000\t1000.00000000\t11\t5\t\n",
      "6\t0.00000000\t1000.00000000\t12\t2\t\n"
     ]
    }
   ],
   "source": [
    "ancestrytable_1000gens = ts.tables.link_ancestors(\n",
    "    samples=ts.samples(), ancestors=ancestors_1000_gens)\n",
    "\n",
    "print(ancestrytable_1000gens)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that every number in the 'parent' table above is one of the nodes in `ancestors_1000_gens`. This is essentially what `link_ancestors` does -- it shows you directly which of your samples descend from which ancestors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ancestrytable_pop1 = ts.tables.link_ancestors(\n",
    "#     samples=ts.samples(), ancestors=ancestors_pop1)\n",
    "\n",
    "# print(ancestrytable_pop1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Replace the parent nodes with their population label."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "nodeTable = ts.tables.nodes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_population_id(node, ts):\n",
    "    return ts.tables.nodes.population[node]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "        left        right  population  child\n",
      "0   0.000000    25.494554           0      0\n",
      "1   0.000000  1000.000000           0      1\n",
      "2   0.000000  1000.000000           0      4\n",
      "3  25.494554  1000.000000           1      0\n",
      "4   0.000000  1000.000000           1      3\n",
      "5   0.000000  1000.000000           1      5\n",
      "6   0.000000  1000.000000           1      2\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "ancestry_table = pd.DataFrame(\n",
    "    data = {\n",
    "        'left': ancestrytable_1000gens.left,\n",
    "        'right': ancestrytable_1000gens.right,\n",
    "        'population' : [get_population_id(u, ts) for u in ancestrytable_1000gens.parent],\n",
    "        'child' : ancestrytable_1000gens.child\n",
    "    }\n",
    ")\n",
    "\n",
    "print(ancestry_table)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {},
	"outputs": [],
	"source": [
	"import tskit\n",
	"\n",
	"import numpy as np\n",
	"import msprime # Only used to generate sample datasets for this notebook.\n",
	"from IPython.display import SVG # Only used to plot trees in this notebook."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Setting up you simulations to work with link_ancestors\n",
	"\n",
	"I'll generate a simple example: a demographic history involvin\n",
	" - Samples from an admixed population\n",
	" - Generated by admixture of 2 other populations x years ago\n",
	"\n",
	"See\n",
	"https://msprime.readthedocs.io/en/latest/tutorial.html#migrations\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {},
	"outputs": [],
	"source": [
	"pop0 = msprime.PopulationConfiguration(sample_size=3, initial_size = 500)\n",
	"pop1 = msprime.PopulationConfiguration(sample_size=3, initial_size = 500)\n",
	"\n",
	"M = np.array([\n",
	"[0, 0.10],\n",
	"[0.08, 0]])\n",
	"\n",
	"ts = msprime.simulate(\n",
	" population_configurations = [pop0, pop1],\n",
	" demographic_events = [msprime.CensusEvent(time=1000)],\n",
	" migration_matrix = M,\n",
	" length = 1000,\n",
	" random_seed = 31,\n",
	" recombination_rate = 1e-7)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {},
	"outputs": [],
	"source": [
	"colour_map = {0:\"red\", 1:\"blue\"}\n",
	"node_colours = {u.id: colour_map[u.population] for u in ts.nodes()}\n",
	"# for tree in ts.trees():\n",
	"# print(\"Tree on interval:\", tree.interval)\n",
	"# # The code below will only work in a Jupyter notebook with SVG output enabled.\n",
	"# display(SVG(tree.draw(node_colours=node_colours, height=250)))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# How to use link_ancestors to get local ancestry\n",
	"\n",
	"## 1. Choose the 'ancestors' that you are interested in.\n",
	"\n",
	"In tree sequence land, this means picking the 'nodes' of the tree sequence that belong to the ancestors of interest. Typically, this will be all of the nodes that exist at a particular generation back in time, or all nodes that belonged to some particular ancestral population of interest.\n",
	"\n",
	"Suppose you wanted to find local ancestry wrt to all ancestors who were alive 1000 generations ago. You'd select the appropriate nodes like this:\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[10 11 12]\n"
	]
	}
	],
	"source": [
	"ancestors_1000_gens = np.where(ts.tables.nodes.time == 1000)[0]\n",
	"\n",
	"print(ancestors_1000_gens)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Suppose you want to find local ancestry from all ancestors at time 1000 belonging to population 1. You'd find the nodes like this:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[11 12]\n"
	]
	}
	],
	"source": [
	"all_ancestors = np.where(ts.tables.nodes.time == 1000)\n",
	"all_pop1 = np.where(ts.tables.nodes.population == 1)\n",
	"ancestors_pop1 = np.intersect1d(all_ancestors, all_pop1)\n",
	"\n",
	"print(ancestors_pop1)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 2. Apply link_ancestors using the set of ancestors you just chose."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"id\tleft\t\tright\t\tparent\tchild\tmetadata\n",
	"0\t0.00000000\t25.49455408\t10\t0\t\n",
	"1\t0.00000000\t1000.00000000\t10\t1\t\n",
	"2\t0.00000000\t1000.00000000\t10\t4\t\n",
	"3\t25.49455408\t1000.00000000\t11\t0\t\n",
	"4\t0.00000000\t1000.00000000\t11\t3\t\n",
	"5\t0.00000000\t1000.00000000\t11\t5\t\n",
	"6\t0.00000000\t1000.00000000\t12\t2\t\n"
	]
	}
	],
	"source": [
	"ancestrytable_1000gens = ts.tables.link_ancestors(\n",
	" samples=ts.samples(), ancestors=ancestors_1000_gens)\n",
	"\n",
	"print(ancestrytable_1000gens)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Note that every number in the 'parent' table above is one of the nodes in `ancestors_1000_gens`. This is essentially what `link_ancestors` does -- it shows you directly which of your samples descend from which ancestors."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {},
	"outputs": [],
	"source": [
	"# ancestrytable_pop1 = ts.tables.link_ancestors(\n",
	"# samples=ts.samples(), ancestors=ancestors_pop1)\n",
	"\n",
	"# print(ancestrytable_pop1)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## 3. Replace the parent nodes with their population label."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {},
	"outputs": [],
	"source": [
	"nodeTable = ts.tables.nodes"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {},
	"outputs": [],
	"source": [
	"def get_population_id(node, ts):\n",
	" return ts.tables.nodes.population[node]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	" left right population child\n",
	"0 0.000000 25.494554 0 0\n",
	"1 0.000000 1000.000000 0 1\n",
	"2 0.000000 1000.000000 0 4\n",
	"3 25.494554 1000.000000 1 0\n",
	"4 0.000000 1000.000000 1 3\n",
	"5 0.000000 1000.000000 1 5\n",
	"6 0.000000 1000.000000 1 2\n"
	]
	}
	],
	"source": [
	"import pandas as pd\n",
	"\n",
	"ancestry_table = pd.DataFrame(\n",
	" data = {\n",
	" 'left': ancestrytable_1000gens.left,\n",
	" 'right': ancestrytable_1000gens.right,\n",
	" 'population' : [get_population_id(u, ts) for u in ancestrytable_1000gens.parent],\n",
	" 'child' : ancestrytable_1000gens.child\n",
	" }\n",
	")\n",
	"\n",
	"print(ancestry_table)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.4"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}