This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| df['chromosome'].unique() | |
| array(['1', '2', '3', '4', '5', '6', '7', '8', | |
| '9', '10', '11', '12','13', '14', '15', | |
| '16', '17', '18', '19', '20', '21', '22', | |
| 'X','MT'], dtype=object) | |
| df['chromosome'] = df['chromosome'].apply(lambda x: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Read the data into a pandas DataFrame and do some EDA | |
| df = pd.DataFrame(data) | |
| df.head() | |
| df.isna().any() | |
| rsid False | |
| chromosome False |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| data = pd.read_csv('genome.txt', sep='\t', dtype={'rsid':'str', 'chromosome':'object', 'position':'int', 'genotype':'str'}, comment='#') | |
| print(data) | |
| rsid chromosome position genotype | |
| 0 rs548049170 1 69869 TT | |
| 1 rs13328684 1 74792 -- | |
| 2 rs9283150 1 565508 AA |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| %matplotlib inline | |
| import seaborn as sns | |
| sns.set_style('darkgrid') | |
| sns.color_palette('Spectral') | |
| import matplotlib.pyplot as plt | |
| import numpy as np | |
| import requests | |
| import pandas as pd |
NewerOlder