Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ljyanesm/f09bd722f0ad877fb7a11f1512544862 to your computer and use it in GitHub Desktop.
Save ljyanesm/f09bd722f0ad877fb7a11f1512544862 to your computer and use it in GitHub Desktop.
Set Cover Problem - Minimum set selection for maximal weighted coverage
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import shutil\nimport sys\nimport os.path\nimport textwrap # For pretty printing missing genes\nfrom random import random # Generate random numbers (Costs)\n\n# pyomo is used as the model generator for the solver\nif not shutil.which(\"pyomo\"):\n !pip install --user -q pyomo\n assert(shutil.which(\"pyomo\"))\n\n# cbc/glpk are installed if missing, these are oss solvers\nif not (shutil.which(\"cbc\") or os.path.isfile(\"cbc\")):\n if \"google.colab\" in sys.modules:\n !apt-get install --user -y -qq coinor-cbc\n else:\n if sys.platform == \"darwin\":\n try:\n !brew tap coin-or-tools/coinor\n !brew install coin-or-tools/coinor/cbc pkg-config\n !brew install glpk\n except:\n pass\n else:\n try:\n !conda install -c conda-forge coincbc glpk\n except:\n pass\n\nassert(shutil.which(\"cbc\") or os.path.isfile(\"cbc\"))\nassert(shutil.which(\"glpsol\") or os.path.isfile(\"glpsol\"))\n\nl80_wrap=textwrap.TextWrapper(width = 80)\n\nfrom pyomo.environ import *\nimport pandas as pd # For manipulating the input data",
"execution_count": 165,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# Background - Minimum sample selection for maximal gene coverage\n\nThe prototypical set cover problem deals with identifying the smallest sub-collection of sets whose union represents the universe.\n\nWe can represent as a set covering problem, selecting the minimum set of libraries required to cover all genes in a genome.\n\nLet $G = \\{g_1, g_2, \\dots, g_n\\}$ be all the known genes of a genome, and let $L = \\{l_1, l_2, \\dots, l_m\\}$ be a collection of subsets $l_j \\subseteq G$, where $\\bigcup l_j = G$. Each library $l_j$ covers at least one gene of $G$ and has an associated cost $c_j > 0$. The objective is to find a subcollection of libraries $X \\subseteq L$ that covers all genes in $G$ at a minimal cost.\n\nA set of possible costs per library can be calculated using:\n\n$$c_i = numReads(l_i)$$\n\n$$c_i = numGenes(l_i)$$\n\n$$c_i = Quality(l_i)$$\n\nThe problem objective is to:\n\nMinimise\n\n$$cost = \\sum_{i\\ \\in\\ Libraries} \\alpha*numReads_i*x_i$$\n\nor\n\n$$cost = \\sum_{i\\ \\in\\ Libraries} \\beta*numGenes_i*x_i$$\n\nor\n\n$$cost = \\sum_{i\\ \\in\\ Libraries} \\frac{\\gamma}{Quality_i}*x_i$$\n\nor any linear combination of the different costs scaled by $\\alpha, \\beta, \\gamma$\n\nsubject to: \n\n\\begin{align*}\n\\sum_{i\\ \\in\\ Libraries} A_{i,j} * x_i >=& minLibraries \\quad \\forall j \\in Genes\\\\\nx_i &\\quad \\in \\{0,1\\} \\\\\n\\end{align*}\n\nwhere:\n\n$A^{GenesxLibraries}$ is a zero-one matrix where $a_{ij} = 1$ if gene $i$ is covered by library $j$ and $a_{ij} = 0$ otherwise.\n\n$X = \\{x_1,x_2, \\dots, x_m\\}$ where $x_i = 1$ if library $i$ (with cost $c_i > 0$) is part of the solution and $x_i = 0$ otherwise.\n\n$minLibraries$ is the minimum number of libraries that are required to consider a gene covered.\n\nThe data will be gathered from a single CSV table containing an expression matrix with libraries in it's columns and genes in rows\n\nExample:\n\n| | | | | | | | | \n|-----------|------------|------------|------------|------------|------------|------------|------------| \n| Gene | SRX2988238 | GSM3408860 | GSM2751096 | GSM1514875 | GSM3490689 | GSM2751088 | GSM1514876 | \n| AT1G11920 | 22.2 | 0.0 | 0.2 | 3.5 | 0.4 | 0.1 | 4.5 | \n| AT5G09760 | 35.7 | 23.3 | 40.7 | 31.7 | 22.8 | 41.8 | 34.4 | \n| AT3G53430 | 35.4 | 55.0 | 57.0 | 51.3 | 58.9 | 48.9 | 47.4 | \n| AT3G29033 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | \n| AT5G18940 | 33.0 | 27.9 | 26.0 | 32.9 | 34.6 | 30.8 | 32.1 | \n| AT3G03340 | 26.7 | 36.1 | 28.6 | 37.6 | 45.5 | 29.5 | 32.0 | \n| AT3G05020 | 45.9 | 27.1 | 54.8 | 66.0 | 109.0 | 51.1 | 57.6 | \n| AT2G01210 | 20.6 | 1.6 | 10.9 | 13.7 | 1.0 | 13.6 | 13.7 | \n| AT2G30990 | 26.3 | 30.6 | 19.4 | 15.9 | 42.1 | 24.9 | 14.8 | \n| AT1G11112 | 1.5 | 0.2 | 2.4 | 7.7 | 0.9 | 3.8 | 4.2 | \n\n\n\nThis matrix will be then transformed to a binary version using a FPKM threshold and posterior to that, filtered to only include genes which exists in at least one library (so that we can find a solution to our ILP).\n\n| | | | | | | | | \n|-----------|------------|------------|------------|------------|------------|------------|------------| \n| Gene | SRX2988238 | GSM3408860 | GSM2751096 | GSM1514875 | GSM3490689 | GSM2751088 | GSM1514876 | \n| AT1G11920 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | \n| AT5G09760 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | \n| AT3G53430 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | \n| AT3G29033 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | \n| AT5G18940 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | \n| AT3G03340 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | \n| AT3G05020 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | \n| AT2G01210 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | \n| AT2G30990 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | \n| AT1G11112 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | \n\n\n"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Import the data\ndata = pd.read_csv(\"./GENANNO-477.csv\", index_col=0)\n# To try different groups of genes/libraries, sampling of the matrix can be enabled here:\ndata=data.sample(20) # Samples X number of genes\ndata=data.sample(8,axis=1) # Samples X number of libraries\ndata",
"execution_count": 166,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>GSM1514869</th>\n <th>SRX2988238</th>\n <th>GSM3433627</th>\n <th>GSM1514872</th>\n <th>GSM2751096</th>\n <th>GSM3490689</th>\n <th>GSM2751088</th>\n <th>GSM3433656</th>\n </tr>\n <tr>\n <th>Gene</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>AT1G16680</th>\n <td>28.2</td>\n <td>27.1</td>\n <td>20.6</td>\n <td>28.3</td>\n <td>20.7</td>\n <td>4.7</td>\n <td>21.4</td>\n <td>24.9</td>\n </tr>\n <tr>\n <th>ATCG01270</th>\n <td>0.8</td>\n <td>3.3</td>\n <td>0.3</td>\n <td>1.3</td>\n <td>102.8</td>\n <td>4.4</td>\n <td>90.9</td>\n <td>0.6</td>\n </tr>\n <tr>\n <th>AT2G45170</th>\n <td>53.3</td>\n <td>42.8</td>\n <td>53.8</td>\n <td>55.7</td>\n <td>52.5</td>\n <td>139.8</td>\n <td>46.1</td>\n <td>83.5</td>\n </tr>\n <tr>\n <th>AT5G66950</th>\n <td>35.1</td>\n <td>34.9</td>\n <td>22.5</td>\n <td>24.7</td>\n <td>39.5</td>\n <td>16.3</td>\n <td>39.8</td>\n <td>20.8</td>\n </tr>\n <tr>\n <th>AT1G71090</th>\n <td>38.1</td>\n <td>36.0</td>\n <td>33.8</td>\n <td>32.6</td>\n <td>42.2</td>\n <td>23.1</td>\n <td>0.0</td>\n <td>44.8</td>\n </tr>\n <tr>\n <th>AT1G61165</th>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.6</td>\n <td>3.4</td>\n <td>0.6</td>\n <td>0.0</td>\n <td>0.9</td>\n <td>1.6</td>\n </tr>\n <tr>\n <th>AT5G40480</th>\n <td>29.3</td>\n <td>29.3</td>\n <td>10.8</td>\n <td>28.2</td>\n <td>25.6</td>\n <td>1.7</td>\n <td>27.6</td>\n <td>10.8</td>\n </tr>\n <tr>\n <th>AT2G22270</th>\n <td>27.7</td>\n <td>25.9</td>\n <td>16.7</td>\n <td>21.8</td>\n <td>16.9</td>\n <td>20.4</td>\n <td>19.7</td>\n <td>18.4</td>\n </tr>\n <tr>\n <th>AT3G17900</th>\n <td>40.3</td>\n <td>36.9</td>\n <td>41.9</td>\n <td>35.2</td>\n <td>33.7</td>\n <td>19.3</td>\n <td>36.7</td>\n <td>41.6</td>\n </tr>\n <tr>\n <th>AT1G06920</th>\n <td>0.0</td>\n <td>0.5</td>\n <td>0.4</td>\n <td>0.0</td>\n <td>0.1</td>\n <td>0.4</td>\n <td>0.2</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>AT2G47844</th>\n <td>1.1</td>\n <td>1.6</td>\n <td>0.4</td>\n <td>1.1</td>\n <td>2.1</td>\n <td>8.8</td>\n <td>5.0</td>\n <td>0.2</td>\n </tr>\n <tr>\n <th>AT5G60780</th>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.1</td>\n <td>3.0</td>\n <td>0.3</td>\n <td>1.6</td>\n <td>1.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>AT5G44470</th>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>AT3G06630</th>\n <td>10.6</td>\n <td>9.5</td>\n <td>5.8</td>\n <td>3.3</td>\n <td>3.4</td>\n <td>2.3</td>\n <td>4.3</td>\n <td>2.2</td>\n </tr>\n <tr>\n <th>AT5G60770</th>\n <td>0.2</td>\n <td>29.4</td>\n <td>0.4</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.2</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>AT1G27320</th>\n <td>44.8</td>\n <td>36.2</td>\n <td>35.7</td>\n <td>46.5</td>\n <td>49.3</td>\n <td>23.6</td>\n <td>46.4</td>\n <td>53.8</td>\n </tr>\n <tr>\n <th>AT1G56710</th>\n <td>0.0</td>\n <td>0.7</td>\n <td>0.6</td>\n <td>3.1</td>\n <td>5.8</td>\n <td>6.1</td>\n <td>8.6</td>\n <td>0.4</td>\n </tr>\n <tr>\n <th>AT1G79950</th>\n <td>25.3</td>\n <td>23.6</td>\n <td>15.9</td>\n <td>24.2</td>\n <td>11.0</td>\n <td>5.9</td>\n <td>11.8</td>\n <td>19.3</td>\n </tr>\n <tr>\n <th>AT1G64455</th>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>AT1G60989</th>\n <td>43.7</td>\n <td>31.4</td>\n <td>20.7</td>\n <td>30.4</td>\n <td>39.5</td>\n <td>41.8</td>\n <td>36.1</td>\n <td>38.0</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " GSM1514869 SRX2988238 GSM3433627 GSM1514872 GSM2751096 \\\nGene \nAT1G16680 28.2 27.1 20.6 28.3 20.7 \nATCG01270 0.8 3.3 0.3 1.3 102.8 \nAT2G45170 53.3 42.8 53.8 55.7 52.5 \nAT5G66950 35.1 34.9 22.5 24.7 39.5 \nAT1G71090 38.1 36.0 33.8 32.6 42.2 \nAT1G61165 0.0 0.0 0.6 3.4 0.6 \nAT5G40480 29.3 29.3 10.8 28.2 25.6 \nAT2G22270 27.7 25.9 16.7 21.8 16.9 \nAT3G17900 40.3 36.9 41.9 35.2 33.7 \nAT1G06920 0.0 0.5 0.4 0.0 0.1 \nAT2G47844 1.1 1.6 0.4 1.1 2.1 \nAT5G60780 0.0 0.0 0.1 3.0 0.3 \nAT5G44470 0.0 0.0 0.0 0.0 0.0 \nAT3G06630 10.6 9.5 5.8 3.3 3.4 \nAT5G60770 0.2 29.4 0.4 0.0 0.0 \nAT1G27320 44.8 36.2 35.7 46.5 49.3 \nAT1G56710 0.0 0.7 0.6 3.1 5.8 \nAT1G79950 25.3 23.6 15.9 24.2 11.0 \nAT1G64455 0.0 0.0 0.0 0.0 0.0 \nAT1G60989 43.7 31.4 20.7 30.4 39.5 \n\n GSM3490689 GSM2751088 GSM3433656 \nGene \nAT1G16680 4.7 21.4 24.9 \nATCG01270 4.4 90.9 0.6 \nAT2G45170 139.8 46.1 83.5 \nAT5G66950 16.3 39.8 20.8 \nAT1G71090 23.1 0.0 44.8 \nAT1G61165 0.0 0.9 1.6 \nAT5G40480 1.7 27.6 10.8 \nAT2G22270 20.4 19.7 18.4 \nAT3G17900 19.3 36.7 41.6 \nAT1G06920 0.4 0.2 0.0 \nAT2G47844 8.8 5.0 0.2 \nAT5G60780 1.6 1.0 0.0 \nAT5G44470 0.0 0.0 0.0 \nAT3G06630 2.3 4.3 2.2 \nAT5G60770 0.0 0.2 0.0 \nAT1G27320 23.6 46.4 53.8 \nAT1G56710 6.1 8.6 0.4 \nAT1G79950 5.9 11.8 19.3 \nAT1G64455 0.0 0.0 0.0 \nAT1G60989 41.8 36.1 38.0 "
},
"execution_count": 166,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Apply minimum FPKM transformation to binary\nminFPKM=10\ndata[data<minFPKM]=0\ndata[data>=minFPKM]=1\ndata=data.astype('int')\ndata",
"execution_count": 167,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>GSM1514869</th>\n <th>SRX2988238</th>\n <th>GSM3433627</th>\n <th>GSM1514872</th>\n <th>GSM2751096</th>\n <th>GSM3490689</th>\n <th>GSM2751088</th>\n <th>GSM3433656</th>\n </tr>\n <tr>\n <th>Gene</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>AT1G16680</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>ATCG01270</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT2G45170</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT5G66950</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G71090</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G61165</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT5G40480</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT2G22270</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT3G17900</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G06920</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT2G47844</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT5G60780</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT5G44470</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT3G06630</th>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT5G60770</th>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT1G27320</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G56710</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT1G79950</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G64455</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT1G60989</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " GSM1514869 SRX2988238 GSM3433627 GSM1514872 GSM2751096 \\\nGene \nAT1G16680 1 1 1 1 1 \nATCG01270 0 0 0 0 1 \nAT2G45170 1 1 1 1 1 \nAT5G66950 1 1 1 1 1 \nAT1G71090 1 1 1 1 1 \nAT1G61165 0 0 0 0 0 \nAT5G40480 1 1 1 1 1 \nAT2G22270 1 1 1 1 1 \nAT3G17900 1 1 1 1 1 \nAT1G06920 0 0 0 0 0 \nAT2G47844 0 0 0 0 0 \nAT5G60780 0 0 0 0 0 \nAT5G44470 0 0 0 0 0 \nAT3G06630 1 0 0 0 0 \nAT5G60770 0 1 0 0 0 \nAT1G27320 1 1 1 1 1 \nAT1G56710 0 0 0 0 0 \nAT1G79950 1 1 1 1 1 \nAT1G64455 0 0 0 0 0 \nAT1G60989 1 1 1 1 1 \n\n GSM3490689 GSM2751088 GSM3433656 \nGene \nAT1G16680 0 1 1 \nATCG01270 0 1 0 \nAT2G45170 1 1 1 \nAT5G66950 1 1 1 \nAT1G71090 1 0 1 \nAT1G61165 0 0 0 \nAT5G40480 0 1 1 \nAT2G22270 1 1 1 \nAT3G17900 1 1 1 \nAT1G06920 0 0 0 \nAT2G47844 0 0 0 \nAT5G60780 0 0 0 \nAT5G44470 0 0 0 \nAT3G06630 0 0 0 \nAT5G60770 0 0 0 \nAT1G27320 1 1 1 \nAT1G56710 0 0 0 \nAT1G79950 0 1 1 \nAT1G64455 0 0 0 \nAT1G60989 1 1 1 "
},
"execution_count": 167,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Filter rows based on presence in all libraries\n# removes genes which are not expressed at the minFPKM level in more than minLibraries libraries\n# this ensures the solver can find a valid solution to the problem\n\nminLibraries=3\noriginal_number_of_genes = data.shape[0]\noriginal_GENES=list(data.index.values)\ndata=data[data.sum(axis=1)>=minLibraries]\ndata",
"execution_count": 168,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>GSM1514869</th>\n <th>SRX2988238</th>\n <th>GSM3433627</th>\n <th>GSM1514872</th>\n <th>GSM2751096</th>\n <th>GSM3490689</th>\n <th>GSM2751088</th>\n <th>GSM3433656</th>\n </tr>\n <tr>\n <th>Gene</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>AT1G16680</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT2G45170</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT5G66950</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G71090</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT5G40480</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT2G22270</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT3G17900</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G27320</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G79950</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G60989</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " GSM1514869 SRX2988238 GSM3433627 GSM1514872 GSM2751096 \\\nGene \nAT1G16680 1 1 1 1 1 \nAT2G45170 1 1 1 1 1 \nAT5G66950 1 1 1 1 1 \nAT1G71090 1 1 1 1 1 \nAT5G40480 1 1 1 1 1 \nAT2G22270 1 1 1 1 1 \nAT3G17900 1 1 1 1 1 \nAT1G27320 1 1 1 1 1 \nAT1G79950 1 1 1 1 1 \nAT1G60989 1 1 1 1 1 \n\n GSM3490689 GSM2751088 GSM3433656 \nGene \nAT1G16680 0 1 1 \nAT2G45170 1 1 1 \nAT5G66950 1 1 1 \nAT1G71090 1 0 1 \nAT5G40480 0 1 1 \nAT2G22270 1 1 1 \nAT3G17900 1 1 1 \nAT1G27320 1 1 1 \nAT1G79950 0 1 1 \nAT1G60989 1 1 1 "
},
"execution_count": 168,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "Libraries = list(data.columns.values)\nGenes = list(data.index.values)\nCosts={}\nfor l in Libraries:\n Costs[l]=random()\nLibraries,Costs,Genes",
"execution_count": 219,
"outputs": [
{
"data": {
"text/plain": "(['GSM2751092',\n 'GSM1514876',\n 'GSM3490690',\n 'SRX2988238',\n 'GSM1514873',\n 'GSM3408860',\n 'GSM2787948',\n 'GSM3433656'],\n {'GSM2751092': 0.16240731671566533,\n 'GSM1514876': 0.0950497270559909,\n 'GSM3490690': 0.1564233302286947,\n 'SRX2988238': 0.43734970193126554,\n 'GSM1514873': 0.008268477176007316,\n 'GSM3408860': 0.6374430020225389,\n 'GSM2787948': 0.005660267519729145,\n 'GSM3433656': 0.15381972225713147},\n ['AT1G74680',\n 'AT5G53840',\n 'AT2G42470',\n 'AT1G49310',\n 'AT3G10950',\n 'AT5G39170',\n 'AT5G08110',\n 'AT4G08500',\n 'AT3G50280',\n 'AT5G05720',\n 'AT5G24270',\n 'AT1G09260',\n 'AT1G12930',\n 'AT1G57610',\n 'AT1G24430',\n 'AT5G66880',\n 'AT4G11150',\n 'AT2G01220',\n 'AT1G50420',\n 'AT3G42630'])"
},
"execution_count": 219,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": 170,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "model = ConcreteModel()\nmodel.genes = Set(initialize=Genes)\nmodel.libraries = Set(initialize=Libraries)\nmodel.x = Var(model.libraries, domain=Boolean)\n\n# Generate the matrix in tuple form as input for the model\nA = data.stack().to_dict()\nmodel.a = Param(model.genes,model.libraries, initialize=A)\n\nmodel.Cost = Objective(\n expr = sum([Costs[l]*model.x[l] for l in Libraries]),\n sense = minimize\n)\n\nmodel.constraints = ConstraintList()\nfor g in Genes:\n model.constraints.add(sum([model.a[g,l]*model.x[l] for l in Libraries]) >= minLibraries)",
"execution_count": 171,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "results = SolverFactory('glpk').solve(model)",
"execution_count": 172,
"outputs": []
},
{
"metadata": {
"scrolled": false,
"trusted": true
},
"cell_type": "code",
"source": "print(\"The libraries required to cover all the genes\", minLibraries,\"times based on a FPKM value of\", minFPKM,\"are:\\n\")\nrequired_libraries=0\n\nfor l in Libraries:\n if model.x[l]() > 0:\n print(l)\n required_libraries+=1\n\nprint()\nprint(\"The following represents a list of genes and the samples from the final solution supporting them\\n\")\nfor g in Genes:\n total = sum([data[l][g]*model.x[l]() for l in Libraries])\n print(g, [l for l in Libraries if model.x[l]()>0])\nprint(\"\\nThese represent\", required_libraries,\"out of\",len(Libraries),\"original libraries\\n\")\n\nsogenes=set(original_GENES)\nsgenes=set(Genes)\nmissing_genes = sogenes-sgenes\nprint(l80_wrap.fill(' '.join([\"From the original\", str(len(sogenes)),\"genes, there are\",str(len(missing_genes)),\"not present in at least\",str(minLibraries),\"of the samples at an FPKM level >=\",str(minFPKM),\"these are:\"])))\nprint()\nprint(l80_wrap.fill(' '.join([m for m in missing_genes])))",
"execution_count": 173,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "The libraries required to cover all the genes 3 times based on a FPKM value of 10 are:\n\nGSM1514869\nSRX2988238\nGSM2751096\n\nThe following represents a list of genes and the samples from the final solution supporting them\n\nAT1G16680 ['GSM1514869', 'SRX2988238', 'GSM2751096']\nAT2G45170 ['GSM1514869', 'SRX2988238', 'GSM2751096']\nAT5G66950 ['GSM1514869', 'SRX2988238', 'GSM2751096']\nAT1G71090 ['GSM1514869', 'SRX2988238', 'GSM2751096']\nAT5G40480 ['GSM1514869', 'SRX2988238', 'GSM2751096']\nAT2G22270 ['GSM1514869', 'SRX2988238', 'GSM2751096']\nAT3G17900 ['GSM1514869', 'SRX2988238', 'GSM2751096']\nAT1G27320 ['GSM1514869', 'SRX2988238', 'GSM2751096']\nAT1G79950 ['GSM1514869', 'SRX2988238', 'GSM2751096']\nAT1G60989 ['GSM1514869', 'SRX2988238', 'GSM2751096']\n\nThese represent 3 out of 8 original libraries\n\nFrom the original 20 genes, there are 10 not present in at least 3 of the\nsamples at an FPKM level >= 10 these are:\n\nAT1G06920 AT5G60780 AT5G60770 AT1G56710 AT1G64455 AT2G47844 AT3G06630 AT1G61165\nAT5G44470 ATCG01270\n"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# Background - Maximising genes formulation\n\nThe previous problem could be also intepreted as a maximisation problem where the objective is maximising the number of covered genes using a library based budget.\n\nIn this case the model can be expressed using the previous definitions for $L$, $G$, $X$ and $Y \\subseteq L\\times G$:\n\n\nIn this case costs are associated to each $g_{i,j}$ according to the importance $w_i$ of the presence of this gene when using $l_i$ in the solution.\n\nThe objective is to maximise:\n\n$$\n\\sum_{g \\subseteq G} w_i(g_j) \\, y_{i,j}\n$$\n\nsubject to:\n\n\\begin{align*}\n&\\sum_{l \\subseteq L} c(l_i) \\, x_i \\leq Budget\\\\\n&\\sum_{g_j \\subseteq L_i} x_i \\geq y_{i,j}\\\\\n&y_i \\in \\{0,1\\}\\\\\n\\end{align*}\n\n"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Import the data\ndata = pd.read_csv(\"./GENANNO-477.csv\", index_col=0)\n# To try different groups of genes/libraries, sampling of the matrix can be enabled here:\ndata=data.sample(20) # Samples X number of genes\ndata=data.sample(9,axis=1) # Samples X number of libraries\ndata",
"execution_count": 414,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>GSM1514869</th>\n <th>GSM2751088</th>\n <th>GSM1514872</th>\n <th>GSM2751096</th>\n <th>GSM1514873</th>\n <th>GSM3433656</th>\n <th>GSM1514876</th>\n <th>GSM3433627</th>\n <th>SRX2988238</th>\n </tr>\n <tr>\n <th>Gene</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>AT4G26020</th>\n <td>1.7</td>\n <td>1.7</td>\n <td>1.6</td>\n <td>0.6</td>\n <td>0.5</td>\n <td>0.0</td>\n <td>3.6</td>\n <td>0.9</td>\n <td>2.0</td>\n </tr>\n <tr>\n <th>AT4G13490</th>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>AT4G19810</th>\n <td>22.6</td>\n <td>25.7</td>\n <td>10.0</td>\n <td>24.8</td>\n <td>12.2</td>\n <td>10.1</td>\n <td>6.3</td>\n <td>8.2</td>\n <td>29.4</td>\n </tr>\n <tr>\n <th>AT3G44405</th>\n <td>2.0</td>\n <td>3.0</td>\n <td>0.0</td>\n <td>3.1</td>\n <td>0.0</td>\n <td>1.0</td>\n <td>4.7</td>\n <td>0.5</td>\n <td>6.5</td>\n </tr>\n <tr>\n <th>AT1G32760</th>\n <td>14.1</td>\n <td>9.6</td>\n <td>11.1</td>\n <td>6.6</td>\n <td>11.1</td>\n <td>16.5</td>\n <td>16.7</td>\n <td>10.3</td>\n <td>20.2</td>\n </tr>\n <tr>\n <th>AT3G43480</th>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>AT5G58980</th>\n <td>23.4</td>\n <td>17.0</td>\n <td>18.6</td>\n <td>16.1</td>\n <td>17.2</td>\n <td>11.5</td>\n <td>19.2</td>\n <td>8.4</td>\n <td>20.1</td>\n </tr>\n <tr>\n <th>AT3G22690</th>\n <td>5.4</td>\n <td>3.5</td>\n <td>5.1</td>\n <td>3.5</td>\n <td>3.0</td>\n <td>1.9</td>\n <td>5.6</td>\n <td>3.4</td>\n <td>5.9</td>\n </tr>\n <tr>\n <th>AT1G23900</th>\n <td>39.7</td>\n <td>33.5</td>\n <td>31.4</td>\n <td>33.6</td>\n <td>32.2</td>\n <td>32.3</td>\n <td>29.5</td>\n <td>35.2</td>\n <td>35.9</td>\n </tr>\n <tr>\n <th>AT2G44270</th>\n <td>31.8</td>\n <td>21.2</td>\n <td>21.8</td>\n <td>18.6</td>\n <td>22.1</td>\n <td>27.8</td>\n <td>25.6</td>\n <td>21.1</td>\n <td>33.3</td>\n </tr>\n <tr>\n <th>AT4G35250</th>\n <td>15.1</td>\n <td>57.3</td>\n <td>62.3</td>\n <td>62.9</td>\n <td>61.8</td>\n <td>63.9</td>\n <td>41.7</td>\n <td>68.8</td>\n <td>21.4</td>\n </tr>\n <tr>\n <th>AT1G59540</th>\n <td>16.5</td>\n <td>14.4</td>\n <td>17.1</td>\n <td>9.8</td>\n <td>17.2</td>\n <td>3.7</td>\n <td>22.6</td>\n <td>21.6</td>\n <td>19.1</td>\n </tr>\n <tr>\n <th>AT1G52905</th>\n <td>14.6</td>\n <td>18.6</td>\n <td>14.3</td>\n <td>16.1</td>\n <td>14.0</td>\n <td>7.6</td>\n <td>20.0</td>\n <td>7.8</td>\n <td>23.6</td>\n </tr>\n <tr>\n <th>AT5G47380</th>\n <td>0.2</td>\n <td>11.8</td>\n <td>23.1</td>\n <td>10.9</td>\n <td>19.5</td>\n <td>16.7</td>\n <td>14.2</td>\n <td>13.4</td>\n <td>0.3</td>\n </tr>\n <tr>\n <th>AT1G11460</th>\n <td>9.1</td>\n <td>1.6</td>\n <td>0.4</td>\n <td>2.4</td>\n <td>1.3</td>\n <td>1.5</td>\n <td>0.9</td>\n <td>0.3</td>\n <td>25.5</td>\n </tr>\n <tr>\n <th>AT4G31400</th>\n <td>8.1</td>\n <td>11.5</td>\n <td>7.5</td>\n <td>10.2</td>\n <td>8.2</td>\n <td>2.5</td>\n <td>10.7</td>\n <td>7.8</td>\n <td>9.8</td>\n </tr>\n <tr>\n <th>AT4G22120</th>\n <td>30.9</td>\n <td>34.3</td>\n <td>24.4</td>\n <td>31.0</td>\n <td>22.7</td>\n <td>22.1</td>\n <td>33.2</td>\n <td>25.9</td>\n <td>35.1</td>\n </tr>\n <tr>\n <th>AT1G02305</th>\n <td>65.1</td>\n <td>51.7</td>\n <td>54.2</td>\n <td>57.1</td>\n <td>53.8</td>\n <td>84.5</td>\n <td>53.6</td>\n <td>56.7</td>\n <td>45.3</td>\n </tr>\n <tr>\n <th>AT4G26490</th>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>AT3G61180</th>\n <td>39.7</td>\n <td>33.1</td>\n <td>40.4</td>\n <td>32.4</td>\n <td>41.2</td>\n <td>50.5</td>\n <td>33.4</td>\n <td>34.4</td>\n <td>34.8</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " GSM1514869 GSM2751088 GSM1514872 GSM2751096 GSM1514873 \\\nGene \nAT4G26020 1.7 1.7 1.6 0.6 0.5 \nAT4G13490 0.0 0.0 0.0 0.0 0.0 \nAT4G19810 22.6 25.7 10.0 24.8 12.2 \nAT3G44405 2.0 3.0 0.0 3.1 0.0 \nAT1G32760 14.1 9.6 11.1 6.6 11.1 \nAT3G43480 0.0 0.0 0.0 0.0 0.0 \nAT5G58980 23.4 17.0 18.6 16.1 17.2 \nAT3G22690 5.4 3.5 5.1 3.5 3.0 \nAT1G23900 39.7 33.5 31.4 33.6 32.2 \nAT2G44270 31.8 21.2 21.8 18.6 22.1 \nAT4G35250 15.1 57.3 62.3 62.9 61.8 \nAT1G59540 16.5 14.4 17.1 9.8 17.2 \nAT1G52905 14.6 18.6 14.3 16.1 14.0 \nAT5G47380 0.2 11.8 23.1 10.9 19.5 \nAT1G11460 9.1 1.6 0.4 2.4 1.3 \nAT4G31400 8.1 11.5 7.5 10.2 8.2 \nAT4G22120 30.9 34.3 24.4 31.0 22.7 \nAT1G02305 65.1 51.7 54.2 57.1 53.8 \nAT4G26490 0.0 0.0 0.0 0.0 0.0 \nAT3G61180 39.7 33.1 40.4 32.4 41.2 \n\n GSM3433656 GSM1514876 GSM3433627 SRX2988238 \nGene \nAT4G26020 0.0 3.6 0.9 2.0 \nAT4G13490 0.0 0.0 0.0 0.0 \nAT4G19810 10.1 6.3 8.2 29.4 \nAT3G44405 1.0 4.7 0.5 6.5 \nAT1G32760 16.5 16.7 10.3 20.2 \nAT3G43480 0.0 0.0 0.0 0.0 \nAT5G58980 11.5 19.2 8.4 20.1 \nAT3G22690 1.9 5.6 3.4 5.9 \nAT1G23900 32.3 29.5 35.2 35.9 \nAT2G44270 27.8 25.6 21.1 33.3 \nAT4G35250 63.9 41.7 68.8 21.4 \nAT1G59540 3.7 22.6 21.6 19.1 \nAT1G52905 7.6 20.0 7.8 23.6 \nAT5G47380 16.7 14.2 13.4 0.3 \nAT1G11460 1.5 0.9 0.3 25.5 \nAT4G31400 2.5 10.7 7.8 9.8 \nAT4G22120 22.1 33.2 25.9 35.1 \nAT1G02305 84.5 53.6 56.7 45.3 \nAT4G26490 0.0 0.0 0.0 0.0 \nAT3G61180 50.5 33.4 34.4 34.8 "
},
"execution_count": 414,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Apply minimum FPKM transformation to binary\nminFPKM=5\ndata[data<minFPKM]=0\ndata[data>=minFPKM]=1\ndata=data.astype('int')\ndataT=data.transpose()\ndata",
"execution_count": 415,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>GSM1514869</th>\n <th>GSM2751088</th>\n <th>GSM1514872</th>\n <th>GSM2751096</th>\n <th>GSM1514873</th>\n <th>GSM3433656</th>\n <th>GSM1514876</th>\n <th>GSM3433627</th>\n <th>SRX2988238</th>\n </tr>\n <tr>\n <th>Gene</th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n <th></th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>AT4G26020</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT4G13490</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT4G19810</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT3G44405</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G32760</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT3G43480</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT5G58980</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT3G22690</th>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G23900</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT2G44270</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT4G35250</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G59540</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G52905</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT5G47380</th>\n <td>0</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT1G11460</th>\n <td>1</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT4G31400</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>0</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT4G22120</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT1G02305</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n <tr>\n <th>AT4G26490</th>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n <td>0</td>\n </tr>\n <tr>\n <th>AT3G61180</th>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n <td>1</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " GSM1514869 GSM2751088 GSM1514872 GSM2751096 GSM1514873 \\\nGene \nAT4G26020 0 0 0 0 0 \nAT4G13490 0 0 0 0 0 \nAT4G19810 1 1 1 1 1 \nAT3G44405 0 0 0 0 0 \nAT1G32760 1 1 1 1 1 \nAT3G43480 0 0 0 0 0 \nAT5G58980 1 1 1 1 1 \nAT3G22690 1 0 1 0 0 \nAT1G23900 1 1 1 1 1 \nAT2G44270 1 1 1 1 1 \nAT4G35250 1 1 1 1 1 \nAT1G59540 1 1 1 1 1 \nAT1G52905 1 1 1 1 1 \nAT5G47380 0 1 1 1 1 \nAT1G11460 1 0 0 0 0 \nAT4G31400 1 1 1 1 1 \nAT4G22120 1 1 1 1 1 \nAT1G02305 1 1 1 1 1 \nAT4G26490 0 0 0 0 0 \nAT3G61180 1 1 1 1 1 \n\n GSM3433656 GSM1514876 GSM3433627 SRX2988238 \nGene \nAT4G26020 0 0 0 0 \nAT4G13490 0 0 0 0 \nAT4G19810 1 1 1 1 \nAT3G44405 0 0 0 1 \nAT1G32760 1 1 1 1 \nAT3G43480 0 0 0 0 \nAT5G58980 1 1 1 1 \nAT3G22690 0 1 0 1 \nAT1G23900 1 1 1 1 \nAT2G44270 1 1 1 1 \nAT4G35250 1 1 1 1 \nAT1G59540 0 1 1 1 \nAT1G52905 1 1 1 1 \nAT5G47380 1 1 1 0 \nAT1G11460 0 0 0 1 \nAT4G31400 0 1 1 1 \nAT4G22120 1 1 1 1 \nAT1G02305 1 1 1 1 \nAT4G26490 0 0 0 0 \nAT3G61180 1 1 1 1 "
},
"execution_count": 415,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "Libraries = list(data.columns.values)\n\nGenes = list(data.index.values)\n\nCosts={}\nfor l in Libraries:\n Costs[l]=1\n\nWeights={}\nfor g in Genes:\n Weights[g]=1\n \nBudget = 2",
"execution_count": 518,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "model = ConcreteModel()\nmodel.genes = Set(initialize=Genes)\nmodel.libraries = Set(initialize=Libraries)\nmodel.x = Var(model.libraries, domain=Boolean)\nmodel.y = Var(model.libraries,model.genes, domain=Boolean)\n\n# Generate the matrix in tuple form as input for the model\nA = data.stack().to_dict()\nmodel.a = Param(model.genes,model.libraries, initialize=A)\n\nmodel.Cost = Objective(\n expr = sum([model.y[l,g] for l in Libraries for g in Genes]),\n sense = maximize\n)\n\nmodel.constraints = ConstraintList()\nmodel.constraints.add(\n sum([model.x[l]*Costs[l] for l in Libraries]) <= Budget\n) \n\nfor g in Genes:\n for l in dataT[dataT[g]>0].index.to_list():\n model.constraints.add(model.x[l] >= model.y[l,g])",
"execution_count": 519,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "results = SolverFactory('glpk').solve(model)\nresults.write()",
"execution_count": 520,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "# ==========================================================\n# = Solver Results =\n# ==========================================================\n# ----------------------------------------------------------\n# Problem Information\n# ----------------------------------------------------------\nProblem: \n- Name: unknown\n Lower bound: 89.0\n Upper bound: 89.0\n Number of objectives: 1\n Number of constraints: 122\n Number of variables: 190\n Number of nonzeros: 250\n Sense: maximize\n# ----------------------------------------------------------\n# Solver Information\n# ----------------------------------------------------------\nSolver: \n- Status: ok\n Termination condition: optimal\n Statistics: \n Branch and bound: \n Number of bounded subproblems: 1\n Number of created subproblems: 1\n Error rc: 0\n Time: 0.014689922332763672\n# ----------------------------------------------------------\n# Solution Information\n# ----------------------------------------------------------\nSolution: \n- number of solutions: 0\n number of solutions displayed: 0\n"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "print(\"Genes covered (and total FPKM value) after filtering with an FPKM value lower than\", minFPKM,\"and using a maximum budget of\", Budget,\"libraries:\\n\")\n\nprint(\"Genes\\t\\tTotal FPKM\")\nfor g in Genes:\n gene_used=0\n for l in dataT[dataT[g]>0].index.to_list():\n if model.y[l,g]() > 0:\n gene_used+=1\n if gene_used>0:\n print(g, gene_used, sep='\\t')\n\nprint()\nprint(\"Libraries in the optimal solution\")\nfor l in Libraries:\n if model.x[l]()>0:\n print(l)",
"execution_count": 522,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "Genes covered (and total FPKM value) after filtering with an FPKM value lower than 5 and using a maximum budget of 2 libraries:\n\nGenes\t\tTotal FPKM\nAT4G19810\t2\nAT3G44405\t1\nAT1G32760\t2\nAT5G58980\t2\nAT3G22690\t2\nAT1G23900\t2\nAT2G44270\t2\nAT4G35250\t2\nAT1G59540\t2\nAT1G52905\t2\nAT1G11460\t2\nAT4G31400\t2\nAT4G22120\t2\nAT1G02305\t2\nAT3G61180\t2\n\nLibraries in the optimal solution\nGSM1514869\nSRX2988238\n"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.7.7",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"gist": {
"id": "",
"data": {
"description": "Set Cover Problem - Minimum set selection for maximal weighted coverage",
"public": true
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment