Information on data files for the Genetic Links to Health project
Data analysis performed in Python 3 primarily on jupyter notebooks.
Datasets are not stored on github due to file size limitations.
The Full Analysis notebook contains information on the merging and thresholding of data sets.
Web scraping to collect NIH disease categories and description links is detailed in genediseasedata-original.
Visualization work is detailed in genediseasedata-visualize.
If interested, also visit the full data repository or the website repository or other projects.
New features planned include:
- continuous updating of the database to include the most recent results from literature,
- factoring in contradictory results
- time series analysis to identify trends in gene-disease links.