Skip to content

Instantly share code, notes, and snippets.

@petermchale
Last active November 20, 2023 00:07
Show Gist options
  • Save petermchale/437a996f70d7bbbd036d835f2f92858e to your computer and use it in GitHub Desktop.
Save petermchale/437a996f70d7bbbd036d835f2f92858e to your computer and use it in GitHub Desktop.
How to organize the tables of a database, and how to query the database using python (sqlite3)
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Schema \n",
"\n",
"| Projects Table | \n",
"|----------|\n",
"| ProjectID (Primary Key) |\n",
"| ProjectName |\n",
"| Description |\n",
"| StartDate |\n",
"| EndDate |\n",
"\n",
"<br> <!-- Add vertical space here -->\n",
"\n",
"| Subjects Table |\n",
"|----------|\n",
"| SubjectID (Primary Key) |\n",
"| ProjectID (Foreign Key referencing Projects.ProjectID) |\n",
"| Age | \n",
"| Sex |\n",
"| Condition (Disease or Healthy) | \n",
"| Treatment (Treatment for the disease or None if healthy) | \n",
"| Response (Response to the treatment or None if healthy) | \n",
"\n",
"<br> <!-- Add vertical space here -->\n",
"\n",
"| Samples Table |\n",
"|----------|\n",
"| SampleID (Primary Key) |\n",
"| SubjectID (Foreign Key referencing Subjects.SubjectID) | \n",
"| SampleName | \n",
"| SampleType (“PBMC” or “Tumor”) | \n",
"| CollectionDate | \n",
"| DaysSinceTreatmentStarted | \n",
"\n",
"<br> <!-- Add vertical space here -->\n",
"\n",
"| Populations Table | \n",
"|----------|\n",
"| PopulationID (Primary Key) | \n",
"| PopulationName | \n",
"\n",
"<br> <!-- Add vertical space here -->\n",
"\n",
"| CellCounts Table | \n",
"|----------|\n",
"| CellCountID (Primary Key) | \n",
"| SampleID (Foreign Key referencing Samples.SampleID) | \n",
"| PopulationID (Foreign Key referencing Populations.PopulationID) | \n",
"| Count | \n",
"\n",
"<br> <!-- Add vertical space here -->\n",
"\n",
"This schema allows one to scale-up in the following ways:\n",
"1. One can easily create hundreds of projects.\n",
"2. Each project can have easily have tens of subjects.\n",
"3. Each subject can have multiple samples.\n",
"4. One can easily add additional cell population types.\n",
"\n",
"This schema supports various types of analytics. For example, to analyze responders versus non-responders, \n",
"you could use Foreign Keys to join the Subjects, Samples, and CellCounts tables,\n",
"create a “cell-count.csv” file, and then proceed to find the cell\n",
"populations that are most predictive of treatment response.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Queries \n",
"\n",
"Based on the schema above, please write a query to summarize the number of subjects available for each condition:\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('healthy', 4), ('lung', 3), ('melanoma', 6)]\n"
]
}
],
"source": [
"import sqlite3\n",
"\n",
"conn = sqlite3.connect(':memory:')\n",
"c = conn.cursor()\n",
"c.execute('''\n",
" CREATE TABLE Subjects (\n",
" SubjectID TEXT PRIMARY KEY,\n",
" Condition TEXT\n",
" )\n",
"''')\n",
"# mock data: \n",
"subjects = [\n",
" ('sbj1', 'melanoma'),\n",
" ('sbj2', 'healthy'),\n",
" ('sbj3', 'melanoma'),\n",
" ('sbj4', 'lung'),\n",
" ('sbj5', 'healthy'),\n",
" ('sbj6', 'healthy'),\n",
" ('sbj7', 'healthy'),\n",
" ('sbj8', 'melanoma'),\n",
" ('sbj9', 'melanoma'),\n",
" ('sbj10', 'lung'),\n",
" ('sbj11', 'lung'),\n",
" ('sbj12', 'melanoma'),\n",
" ('sbj13', 'melanoma')\n",
"]\n",
"c.executemany('INSERT INTO Subjects VALUES (?,?)', subjects)\n",
"conn.commit()\n",
"c.execute('''\n",
" SELECT Condition, COUNT(*) as NumberOfSubjects\n",
" FROM Subjects\n",
" GROUP BY Condition\n",
"''')\n",
"print(c.fetchall())\n",
"conn.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please write a query that returns all melanoma PBMC samples at baseline (`time_from_treatment_start` is 0) from patients who have treatment tr1.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('sample1',), ('sample4',), ('sample6',), ('sample8',), ('sample10',), ('sample12',)]\n"
]
}
],
"source": [
"import sqlite3\n",
"\n",
"conn = sqlite3.connect(':memory:')\n",
"c = conn.cursor()\n",
"\n",
"c.execute('''\n",
" CREATE TABLE Subjects (\n",
" SubjectID TEXT PRIMARY KEY,\n",
" Condition TEXT,\n",
" Treatment TEXT\n",
" )\n",
"''')\n",
"# mock data: \n",
"subjects = [\n",
" ('sbj1', 'melanoma', 'tr1'),\n",
" ('sbj2', 'healthy', 'None'),\n",
" ('sbj3', 'melanoma', 'tr1'),\n",
" ('sbj4', 'lung', 'tr2'),\n",
" ('sbj5', 'healthy', 'None'),\n",
" ('sbj6', 'healthy', 'None'),\n",
" ('sbj7', 'healthy', 'None'),\n",
" ('sbj8', 'melanoma', 'tr1'),\n",
" ('sbj9', 'melanoma', 'tr1'),\n",
" ('sbj10', 'lung', 'tr2'),\n",
" ('sbj11', 'lung', 'tr2'),\n",
" ('sbj12', 'melanoma', 'tr1'),\n",
" ('sbj13', 'melanoma', 'tr1')\n",
"]\n",
"c.executemany('INSERT INTO Subjects VALUES (?,?,?)', subjects)\n",
"\n",
"c.execute('''\n",
" CREATE TABLE Samples (\n",
" SampleID TEXT PRIMARY KEY,\n",
" SubjectID TEXT,\n",
" SampleType TEXT,\n",
" DaysSinceTreatmentStarted INTEGER\n",
" )\n",
"''')\n",
"# mock data:\n",
"samples = [\n",
" ('sample1', 'sbj1', 'PBMC', 0),\n",
" ('sample2', 'sbj1', 'PBMC', 7),\n",
" ('sample3', 'sbj1', 'tumor', 0),\n",
" ('sample4', 'sbj3', 'PBMC', 0),\n",
" ('sample5', 'sbj3', 'tumor', 7),\n",
" ('sample6', 'sbj8', 'PBMC', 0),\n",
" ('sample7', 'sbj8', 'tumor', 14),\n",
" ('sample8', 'sbj9', 'PBMC', 0),\n",
" ('sample9', 'sbj9', 'tumor', 21),\n",
" ('sample10', 'sbj12', 'PBMC', 0),\n",
" ('sample11', 'sbj12', 'tumor', 28),\n",
" ('sample12', 'sbj13', 'PBMC', 0),\n",
" ('sample13', 'sbj13', 'tumor', 35),\n",
"]\n",
"c.executemany('INSERT INTO Samples VALUES (?,?,?,?)', samples)\n",
"\n",
"conn.commit()\n",
"\n",
"c.execute('''\n",
" SELECT s.SampleID\n",
" FROM Subjects sub\n",
" JOIN Samples s ON sub.SubjectID = s.SubjectID\n",
" WHERE sub.Condition = 'melanoma'\n",
" AND s.SampleType = 'PBMC'\n",
" AND s.DaysSinceTreatmentStarted = 0\n",
" AND sub.Treatment = 'tr1'\n",
"''')\n",
"print(c.fetchall())\n",
"conn.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please write queries to provide these following further breakdowns for the samples identified by the previous query: \n",
"\n",
"* How many samples from each project? \n",
"* How many responders/non-responders?\n",
"* How many males, females?"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('female', 2), ('male', 4)]\n",
"[('responder', 6)]\n",
"[('prj1', 2), ('prj2', 2), ('prj3', 2)]\n"
]
}
],
"source": [
"conn = sqlite3.connect(':memory:')\n",
"c = conn.cursor()\n",
"\n",
"c.execute('''\n",
" CREATE TABLE Subjects (\n",
" SubjectID TEXT PRIMARY KEY,\n",
" ProjectID TEXT,\n",
" Condition TEXT,\n",
" Treatment TEXT,\n",
" Response TEXT,\n",
" Sex TEXT,\n",
" FOREIGN KEY(ProjectID) REFERENCES Projects(ProjectID)\n",
" )\n",
"''')\n",
"subjects = [\n",
" ('sbj1', 'prj1', 'melanoma', 'tr1', 'responder', 'male'),\n",
" ('sbj2', 'prj1', 'healthy', 'None', 'non-responder', 'female'),\n",
" ('sbj3', 'prj2', 'melanoma', 'tr1', 'responder', 'male'),\n",
" ('sbj4', 'prj2', 'lung', 'tr2', 'non-responder', 'female'),\n",
" ('sbj5', 'prj3', 'healthy', 'None', 'non-responder', 'male'),\n",
" ('sbj6', 'prj3', 'healthy', 'None', 'non-responder', 'female'),\n",
" ('sbj7', 'prj1', 'healthy', 'None', 'non-responder', 'male'),\n",
" ('sbj8', 'prj2', 'melanoma', 'tr1', 'responder', 'female'),\n",
" ('sbj9', 'prj3', 'melanoma', 'tr1', 'responder', 'male'),\n",
" ('sbj10', 'prj1', 'lung', 'tr2', 'non-responder', 'female'),\n",
" ('sbj11', 'prj2', 'lung', 'tr2', 'non-responder', 'male'),\n",
" ('sbj12', 'prj3', 'melanoma', 'tr1', 'responder', 'female'),\n",
" ('sbj13', 'prj1', 'melanoma', 'tr1', 'responder', 'male')\n",
"]\n",
"c.executemany('INSERT INTO Subjects VALUES (?,?,?,?,?,?)', subjects)\n",
"\n",
"c.execute('''\n",
" CREATE TABLE Samples (\n",
" SampleID TEXT PRIMARY KEY,\n",
" SubjectID TEXT,\n",
" SampleType TEXT,\n",
" DaysSinceTreatmentStarted INTEGER,\n",
" FOREIGN KEY(SubjectID) REFERENCES Subjects(SubjectID)\n",
" )\n",
"''')\n",
"samples = [\n",
" ('sample1', 'sbj1', 'PBMC', 0),\n",
" ('sample2', 'sbj1', 'PBMC', 7),\n",
" ('sample3', 'sbj1', 'tumor', 0),\n",
" ('sample4', 'sbj3', 'PBMC', 0),\n",
" ('sample5', 'sbj3', 'tumor', 7),\n",
" ('sample6', 'sbj8', 'PBMC', 0),\n",
" ('sample7', 'sbj8', 'tumor', 14),\n",
" ('sample8', 'sbj9', 'PBMC', 0),\n",
" ('sample9', 'sbj9', 'tumor', 21),\n",
" ('sample10', 'sbj12', 'PBMC', 0),\n",
" ('sample11', 'sbj12', 'tumor', 28),\n",
" ('sample12', 'sbj13', 'PBMC', 0),\n",
" ('sample13', 'sbj13', 'tumor', 35),\n",
"]\n",
"c.executemany('INSERT INTO Samples VALUES (?,?,?,?)', samples)\n",
"\n",
"conn.commit()\n",
"\n",
"# Use Common Table Expression (CTE)\n",
"\n",
"# Define the base query\n",
"base_query = \"\"\"\n",
" SELECT sub.Sex, sub.Response, sub.ProjectID, s.SampleID\n",
" FROM Subjects sub\n",
" JOIN Samples s ON sub.SubjectID = s.SubjectID\n",
" WHERE sub.Condition = 'melanoma'\n",
" AND s.SampleType = 'PBMC'\n",
" AND s.DaysSinceTreatmentStarted = 0\n",
" AND sub.Treatment = 'tr1'\n",
"\"\"\"\n",
"\n",
"# Execute the first query\n",
"c.execute(f\"\"\"\n",
"WITH base_query AS ({base_query})\n",
"SELECT Sex, COUNT(*) as NumberOfSamples\n",
"FROM base_query\n",
"GROUP BY Sex;\n",
"\"\"\")\n",
"print(c.fetchall())\n",
"\n",
"# Execute the second query\n",
"c.execute(f\"\"\"\n",
"WITH base_query AS ({base_query})\n",
"SELECT Response, COUNT(*) as NumberOfSamples\n",
"FROM base_query\n",
"GROUP BY Response;\n",
"\"\"\")\n",
"print(c.fetchall())\n",
"\n",
"# Execute the third query\n",
"c.execute(f\"\"\"\n",
"WITH base_query AS ({base_query})\n",
"SELECT ProjectID, COUNT(*) as NumberOfSamples\n",
"FROM base_query\n",
"GROUP BY ProjectID;\n",
"\"\"\")\n",
"print(c.fetchall())\n",
"\n",
"# Close the connection\n",
"conn.close()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "teiko-bio",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment