Last active
March 30, 2018 14:53
-
-
Save kantale/5c483f2b23208b57895e3f2218755272 to your computer and use it in GitHub Desktop.
Οδηγίες για τη 1η άσκηση
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Κάποιες οδηγίες για την άσκηση" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ας \"κατεβάσουμε\" το dataset:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"!wget -o gwas.tcv \"https://www.ebi.ac.uk/gwas/api/search/downloads/full\"" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ανοίγουμε το αρχείο:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"f = open('gwas.tsv')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Διαβάζουμε τη 1η γραμμή:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"first_line = f.readline()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Το first_line περιέχει και το \"enter\" ('\\n'). Μπορούμε να το αφαιρέσουμε:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"first_line = first_line.replace('\\n', '')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Τη κάνουμε \"split\" με βάση τα tabs:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"header = first_line.split('\\t')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['DATE ADDED TO CATALOG', 'PUBMEDID', 'FIRST AUTHOR', 'DATE', 'JOURNAL', 'LINK', 'STUDY', 'DISEASE/TRAIT', 'INITIAL SAMPLE SIZE', 'REPLICATION SAMPLE SIZE', 'REGION', 'CHR_ID', 'CHR_POS', 'REPORTED GENE(S)', 'MAPPED_GENE', 'UPSTREAM_GENE_ID', 'DOWNSTREAM_GENE_ID', 'SNP_GENE_IDS', 'UPSTREAM_GENE_DISTANCE', 'DOWNSTREAM_GENE_DISTANCE', 'STRONGEST SNP-RISK ALLELE', 'SNPS', 'MERGED', 'SNP_ID_CURRENT', 'CONTEXT', 'INTERGENIC', 'RISK ALLELE FREQUENCY', 'P-VALUE', 'PVALUE_MLOG', 'P-VALUE (TEXT)', 'OR or BETA', '95% CI (TEXT)', 'PLATFORM [SNPS PASSING QC]', 'CNV']\n" | |
] | |
} | |
], | |
"source": [ | |
"print (header)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ας αποθηκεύσουμε τώρα σε μία λίστα όλες τις υπόλοιπες γραμμές:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"lines = f.readlines()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Σε κάθε μία από αυτές τις γραμμές ας βγάλουμε το enter, και ας κάνουμε split με βάση τα tabs" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"content = [x.replace('\\n', '').split('\\t') for x in lines]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Πόσα entries έχει το content;" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"64239" | |
] | |
}, | |
"execution_count": 9, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(content)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Τώρα μπορούμε να κλείσουμε και το αρχείο:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"f.close()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Τώρα έχουμε όλο το περιεχόμενο του αρχείου στις λίστες header και content. Ποιο είναι το index του 'CHR_ID';" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"11" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"header.index('CHR_ID')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ας πάρουμε τα CHR_ID από όλο το content:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"chromosomes = [x[11] for x in content]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ας ελέγξουμε τη τιμή για τα πρώτα 10" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['1', '13', '15', '1', '3', '15', '15', '8', '11', '18']" | |
] | |
}, | |
"execution_count": 13, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"chromosomes[:10]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ποιες διαφορετικές τιμές υπάρχουν;" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'',\n", | |
" '1',\n", | |
" '1 x 1',\n", | |
" '1 x 10',\n", | |
" '1 x 13',\n", | |
" '1 x 14',\n", | |
" '1 x 16',\n", | |
" '1 x 17',\n", | |
" '1 x 19',\n", | |
" '1 x 3',\n", | |
" '1 x 6',\n", | |
" '1 x 7',\n", | |
" '1 x 9',\n", | |
" '10',\n", | |
" '10 x 11',\n", | |
" '10 x 12',\n", | |
" '10 x 14',\n", | |
" '10 x 19',\n", | |
" '10 x 21',\n", | |
" '10 x 22',\n", | |
" '10 x 8',\n", | |
" '10;10;10',\n", | |
" '10;10;10;10',\n", | |
" '10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10;10',\n", | |
" '11',\n", | |
" '11 x 4',\n", | |
" '11;11;11',\n", | |
" '11;11;11;11',\n", | |
" '12',\n", | |
" '12 x 12',\n", | |
" '12 x 15',\n", | |
" '12 x 16',\n", | |
" '12 x 17',\n", | |
" '12 x 20',\n", | |
" '12 x 22',\n", | |
" '12 x 8',\n", | |
" '12;12',\n", | |
" '12;12;12',\n", | |
" '13',\n", | |
" '13 x 16',\n", | |
" '13 x 18',\n", | |
" '13 x 2',\n", | |
" '13 x 5',\n", | |
" '13 x 8',\n", | |
" '14',\n", | |
" '14 x 11',\n", | |
" '14 x 21',\n", | |
" '14 x 3',\n", | |
" '14;14;14;14;14;14',\n", | |
" '15',\n", | |
" '15 x 11',\n", | |
" '15 x 8',\n", | |
" '15;15',\n", | |
" '16',\n", | |
" '16 x 7',\n", | |
" '16;16;16',\n", | |
" '16;16;16;16;16;16',\n", | |
" '17',\n", | |
" '17;17',\n", | |
" '17;17;17;17;17;17;17',\n", | |
" '17;17;17;17;17;17;17;17;17;17;17;17',\n", | |
" '18',\n", | |
" '18 x 22',\n", | |
" '18 x 3',\n", | |
" '18 x X',\n", | |
" '19',\n", | |
" '19;19;19;19',\n", | |
" '19;19;19;19;19;19;19',\n", | |
" '1;1',\n", | |
" '1;1;1',\n", | |
" '1;1;1;1',\n", | |
" '1;1;1;1;1',\n", | |
" '1;1;1;1;1;1',\n", | |
" '1;1;1;1;1;1;1',\n", | |
" '1;1;1;1;1;1;1;1',\n", | |
" '1;1;1;1;1;1;1;1;1',\n", | |
" '1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1',\n", | |
" '2',\n", | |
" '2 x 11',\n", | |
" '2 x 12',\n", | |
" '2 x 13',\n", | |
" '2 x 15',\n", | |
" '2 x 17',\n", | |
" '2 x 2',\n", | |
" '2 x 20',\n", | |
" '2 x 3',\n", | |
" '2 x 5',\n", | |
" '2 x 6',\n", | |
" '2 x 9',\n", | |
" '20',\n", | |
" '20 x 19',\n", | |
" '20 x 20',\n", | |
" '20;20',\n", | |
" '20;20;20;20',\n", | |
" '21',\n", | |
" '22',\n", | |
" '22 x 11',\n", | |
" '22 x 4',\n", | |
" '22 x 8',\n", | |
" '22;22;22;22',\n", | |
" '2;1;2;2;2;2;2;2;2;2;2;2;2;2',\n", | |
" '2;2',\n", | |
" '2;2;2',\n", | |
" '2;2;2;2',\n", | |
" '2;2;2;2;2;2;2;2;2;2;2;2',\n", | |
" '2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2;2',\n", | |
" '3',\n", | |
" '3 x 10',\n", | |
" '3 x 11',\n", | |
" '3 x 12',\n", | |
" '3 x 15',\n", | |
" '3 x 18',\n", | |
" '3 x 2',\n", | |
" '3 x 20',\n", | |
" '3 x 22',\n", | |
" '3 x 3',\n", | |
" '3 x 4',\n", | |
" '3 x 5',\n", | |
" '3 x 7',\n", | |
" '3 x 9',\n", | |
" '3;3',\n", | |
" '3;3;3;3',\n", | |
" '4',\n", | |
" '4 x 11',\n", | |
" '4 x 12',\n", | |
" '4 x 18',\n", | |
" '4 x 19',\n", | |
" '4 x 20',\n", | |
" '4 x 22',\n", | |
" '4 x 4',\n", | |
" '4 x 6',\n", | |
" '4 x 8',\n", | |
" '4;4',\n", | |
" '4;4;4;4',\n", | |
" '4;4;4;4;4',\n", | |
" '5',\n", | |
" '5 x 10',\n", | |
" '5 x 11',\n", | |
" '5 x 13',\n", | |
" '5 x 14',\n", | |
" '5 x 15',\n", | |
" '5 x 16',\n", | |
" '5 x 17',\n", | |
" '5 x 19',\n", | |
" '5 x 21',\n", | |
" '5 x 3',\n", | |
" '5 x 5',\n", | |
" '5 x 6',\n", | |
" '5 x 7',\n", | |
" '5 x 8',\n", | |
" '5;5',\n", | |
" '6',\n", | |
" '6 x 1',\n", | |
" '6 x 12',\n", | |
" '6 x 16',\n", | |
" '6 x 17',\n", | |
" '6 x 6',\n", | |
" '6 x 7',\n", | |
" '6 x 8',\n", | |
" '6 x 9',\n", | |
" '6;6',\n", | |
" '6;6;6',\n", | |
" '6;6;6;6',\n", | |
" '6;6;6;6;6',\n", | |
" '6;6;6;6;6;6',\n", | |
" '6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6;6',\n", | |
" '7',\n", | |
" '7 x 1',\n", | |
" '7 x 10',\n", | |
" '7 x 15',\n", | |
" '7 x 16',\n", | |
" '7 x 17',\n", | |
" '7 x 20',\n", | |
" '7 x 8',\n", | |
" '7 x 9',\n", | |
" '7;7',\n", | |
" '7;7;7;7',\n", | |
" '8',\n", | |
" '8 x 10',\n", | |
" '8 x 11',\n", | |
" '8 x 15',\n", | |
" '8 x 18',\n", | |
" '8 x 8',\n", | |
" '8 x 9',\n", | |
" '8;8',\n", | |
" '8;8;8',\n", | |
" '8;8;8;8;8;8;8;8;8;8;8;8;8;8',\n", | |
" '9',\n", | |
" '9 x 10',\n", | |
" '9 x 15',\n", | |
" '9 x 3',\n", | |
" '9 x 4',\n", | |
" '9 x 8',\n", | |
" '9 x 9',\n", | |
" '9;9',\n", | |
" '9;9;9;9',\n", | |
" 'X',\n", | |
" 'Y'}" | |
] | |
}, | |
"execution_count": 14, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"set(chromosomes)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Παρατηρούμε ότι έχει διάφορες τιμές πέρα από τα κλασσικά ονόματα χρωμοσωμάτων. Ας τα πετάξουμε!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y']\n" | |
] | |
} | |
], | |
"source": [ | |
"accepted = [str(x) for x in range(1,23)] + ['X', 'Y']\n", | |
"print (accepted)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"chromosomes = [x for x in chromosomes if x in accepted]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ας μετρήσουμε πόσες φορές υπάρχει το κάθε ένα. \n", | |
"\n", | |
"Πρώτος τρόπος: Custom dictionary" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'1': 5529,\n", | |
" '10': 2755,\n", | |
" '11': 3348,\n", | |
" '12': 2912,\n", | |
" '13': 1405,\n", | |
" '14': 1640,\n", | |
" '15': 2534,\n", | |
" '16': 2318,\n", | |
" '17': 2130,\n", | |
" '18': 1278,\n", | |
" '19': 2177,\n", | |
" '2': 5039,\n", | |
" '20': 1440,\n", | |
" '21': 550,\n", | |
" '22': 1091,\n", | |
" '3': 4122,\n", | |
" '4': 3554,\n", | |
" '5': 3485,\n", | |
" '6': 6068,\n", | |
" '7': 3035,\n", | |
" '8': 2791,\n", | |
" '9': 2512,\n", | |
" 'X': 372,\n", | |
" 'Y': 2}" | |
] | |
}, | |
"execution_count": 17, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"counts = {}\n", | |
"for c in chromosomes:\n", | |
" counts[c] = counts.get(c, 0) + 1\n", | |
"counts" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Δεύτερος τρόπος dictionary comprehension:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'1': 5529,\n", | |
" '10': 2755,\n", | |
" '11': 3348,\n", | |
" '12': 2912,\n", | |
" '13': 1405,\n", | |
" '14': 1640,\n", | |
" '15': 2534,\n", | |
" '16': 2318,\n", | |
" '17': 2130,\n", | |
" '18': 1278,\n", | |
" '19': 2177,\n", | |
" '2': 5039,\n", | |
" '20': 1440,\n", | |
" '21': 550,\n", | |
" '22': 1091,\n", | |
" '3': 4122,\n", | |
" '4': 3554,\n", | |
" '5': 3485,\n", | |
" '6': 6068,\n", | |
" '7': 3035,\n", | |
" '8': 2791,\n", | |
" '9': 2512,\n", | |
" 'X': 372,\n", | |
" 'Y': 2}" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"{x:sum([1 for y in chromosomes if y==x ]) for x in set(chromosomes)}" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" Τρίτος τρόπος. Χρησιμοποιούμε τη κλάση Counter:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Counter({'6': 6068, '1': 5529, '2': 5039, '3': 4122, '4': 3554, '5': 3485, '11': 3348, '7': 3035, '12': 2912, '8': 2791, '10': 2755, '15': 2534, '9': 2512, '16': 2318, '19': 2177, '17': 2130, '14': 1640, '20': 1440, '13': 1405, '18': 1278, '22': 1091, '21': 550, 'X': 372, 'Y': 2})\n" | |
] | |
} | |
], | |
"source": [ | |
"from collections import Counter\n", | |
"counts = Counter(chromosomes)\n", | |
"print (counts)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ποιο είναι το μικρότερο p-value για το χρωμόσωμα 5;" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"5e-274" | |
] | |
}, | |
"execution_count": 20, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"chromosome_index = header.index('CHR_ID')\n", | |
"pvalue_index = header.index('P-VALUE')\n", | |
"min([float(x[pvalue_index]) for x in content if x[chromosome_index] == '5'])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ποιος first author έχει κάνει τις περισσότερες δημοσιεύσεις για το χρωμόσωμα 10;" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(224, 'Astle WJ')" | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"first_author_index = header.index('FIRST AUTHOR')\n", | |
"max((v,k) for k,v in Counter([x[first_author_index] for x in content if x[chromosome_index] == '10']).items())" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Πόσα entries υπάρχουν που έχουν στο \"STUDY\" τη λέξη: \"cancer\";" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"4108" | |
] | |
}, | |
"execution_count": 22, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"study_index = header.index('STUDY')\n", | |
"sum(1 for x in content if 'cancer' in x[study_index])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ποιο χρωμόσωμα έχει τις περισσότερες μελέτες για cancer που έχουν δημοσιευτεί στο Nature;" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(85, '5')" | |
] | |
}, | |
"execution_count": 23, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"journal_index = header.index('JOURNAL')\n", | |
"all_chromosomes = [x[chromosome_index] for x in content \n", | |
" if 'cancer' in x[study_index] and 'nature' in x[journal_index].lower()]\n", | |
"max([(v,k) for k,v in Counter(all_chromosomes).items()])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Υπάρχει κάποιο χρωμόσωμα που δεν έχει καμία δημοσίευση με τον τίτλο cancer στο Nature" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'Y'}" | |
] | |
}, | |
"execution_count": 24, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"set(accepted) - set(all_chromosomes)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ας φτιάξουμε ένα dictionary όπου για κάθε χρωμόσωμα θα έχει τους 3 authors με τα περισσότερα publications στο Nature" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'1': [(131, 'Shungin D'), (67, 'Michailidou K'), (63, 'Locke AE')],\n", | |
" '10': [(64, 'Michailidou K'), (18, 'Locke AE'), (17, 'Okada Y')],\n", | |
" '11': [(41, 'Locke AE'), (35, 'Michailidou K'), (32, 'Shungin D')],\n", | |
" '12': [(66, 'Shungin D'), (33, 'Locke AE'), (12, 'Teslovich TM')],\n", | |
" '13': [(20, 'Locke AE'), (12, 'Shungin D'), (10, 'Michailidou K')],\n", | |
" '14': [(28, 'Locke AE'), (22, 'Michailidou K'), (19, 'Shungin D')],\n", | |
" '15': [(33, 'Shungin D'), (16, 'Locke AE'), (11, 'Lango Allen H')],\n", | |
" '16': [(52, 'Shungin D'), (37, 'Locke AE'), (21, 'Michailidou K')],\n", | |
" '17': [(36, 'Shungin D'), (29, 'Michailidou K'), (21, 'Locke AE')],\n", | |
" '18': [(28, 'Shungin D'), (23, 'Locke AE'), (19, 'Michailidou K')],\n", | |
" '19': [(39, 'Shungin D'), (28, 'Locke AE'), (21, 'Michailidou K')],\n", | |
" '2': [(76, 'Shungin D'), (70, 'Locke AE'), (50, 'Michailidou K')],\n", | |
" '20': [(37, 'Shungin D'), (11, 'Locke AE'), (10, 'Michailidou K')],\n", | |
" '21': [(8, 'Michailidou K'), (7, 'Locke AE'), (5, 'Okada Y')],\n", | |
" '22': [(37, 'Michailidou K'), (14, 'Shungin D'), (6, 'Okada Y')],\n", | |
" '3': [(93, 'Shungin D'), (54, 'Locke AE'), (39, 'Michailidou K')],\n", | |
" '4': [(39, 'Shungin D'), (26, 'Michailidou K'), (20, 'Locke AE')],\n", | |
" '5': [(83, 'Michailidou K'), (55, 'Shungin D'), (20, 'Locke AE')],\n", | |
" '6': [(126, 'Shungin D'), (68, 'Michailidou K'), (45, 'Locke AE')],\n", | |
" '7': [(54, 'Shungin D'), (33, 'Michailidou K'), (24, 'Locke AE')],\n", | |
" '8': [(40, 'Michailidou K'), (27, 'Locke AE'), (26, 'Shungin D')],\n", | |
" '9': [(26, 'Michailidou K'), (24, 'Locke AE'), (19, 'Shungin D')],\n", | |
" 'X': [(3, 'Ripke S'), (2, 'Okada Y'), (2, 'Michailidou K')],\n", | |
" 'Y': []}" | |
] | |
}, | |
"execution_count": 25, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"results = { chromosome: \n", | |
" sorted(\n", | |
" [(times, author) for author, times in \n", | |
" Counter(\n", | |
" [y[first_author_index] for y in content \n", | |
" if y[chromosome_index]==chromosome and 'nature' in y[journal_index].lower()]\n", | |
" ).items()\n", | |
" ], \n", | |
" reverse=True)[:3] \n", | |
" for chromosome in accepted\n", | |
" }\n", | |
"\n", | |
"results" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Ας προσπαθήσουμε να δείξουμε τα αποτελέσματα πιο όμορφα:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<style>\n", | |
" .dataframe thead tr:only-child th {\n", | |
" text-align: right;\n", | |
" }\n", | |
"\n", | |
" .dataframe thead th {\n", | |
" text-align: left;\n", | |
" }\n", | |
"\n", | |
" .dataframe tbody tr th {\n", | |
" vertical-align: top;\n", | |
" }\n", | |
"</style>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>1st</th>\n", | |
" <th>2nd</th>\n", | |
" <th>3rd</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>chromosome</th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" <th></th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Locke AE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Michailidou K</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Michailidou K</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Locke AE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Locke AE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Locke AE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Locke AE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Shungin D</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Shungin D</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Okada Y</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Shungin D</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Teslovich TM</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Michailidou K</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Shungin D</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Lango Allen H</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Michailidou K</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Locke AE</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Michailidou K</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Michailidou K</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Michailidou K</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Locke AE</td>\n", | |
" <td>Okada Y</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>Michailidou K</td>\n", | |
" <td>Shungin D</td>\n", | |
" <td>Okada Y</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>X</th>\n", | |
" <td>Ripke S</td>\n", | |
" <td>Okada Y</td>\n", | |
" <td>Michailidou K</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" 1st 2nd 3rd\n", | |
"chromosome \n", | |
"1 Shungin D Michailidou K Locke AE\n", | |
"2 Shungin D Locke AE Michailidou K\n", | |
"3 Shungin D Locke AE Michailidou K\n", | |
"4 Shungin D Michailidou K Locke AE\n", | |
"5 Michailidou K Shungin D Locke AE\n", | |
"6 Shungin D Michailidou K Locke AE\n", | |
"7 Shungin D Michailidou K Locke AE\n", | |
"8 Michailidou K Locke AE Shungin D\n", | |
"9 Michailidou K Locke AE Shungin D\n", | |
"10 Michailidou K Locke AE Okada Y\n", | |
"11 Locke AE Michailidou K Shungin D\n", | |
"12 Shungin D Locke AE Teslovich TM\n", | |
"13 Locke AE Shungin D Michailidou K\n", | |
"14 Locke AE Michailidou K Shungin D\n", | |
"15 Shungin D Locke AE Lango Allen H\n", | |
"16 Shungin D Locke AE Michailidou K\n", | |
"17 Shungin D Michailidou K Locke AE\n", | |
"18 Shungin D Locke AE Michailidou K\n", | |
"19 Shungin D Locke AE Michailidou K\n", | |
"20 Shungin D Locke AE Michailidou K\n", | |
"21 Michailidou K Locke AE Okada Y\n", | |
"22 Michailidou K Shungin D Okada Y\n", | |
"X Ripke S Okada Y Michailidou K" | |
] | |
}, | |
"execution_count": 26, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import pandas as pd\n", | |
"\n", | |
"items_2 = sorted([x for x in results.items() if x[1]], key=lambda x : accepted.index(x[0]))\n", | |
"results_2 = {\n", | |
" 'chromosome': [x[0] for x in items_2],\n", | |
" '1st': [x[1][0][1] for x in items_2],\n", | |
" '2nd': [x[1][1][1] for x in items_2],\n", | |
" '3rd': [x[1][2][1] for x in items_2],\n", | |
"}\n", | |
"\n", | |
"df = pd.DataFrame.from_dict(results_2)\n", | |
"df = df.set_index('chromosome')\n", | |
"df\n" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python [Root]", | |
"language": "python", | |
"name": "Python [Root]" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.3" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment