Genetic Links to Health projectInformation on data files for the
Data analysis performed in Python 3 primarily on jupyter notebooks.
Datasets are not stored on github due to file size limitations.
The Full Analysis notebook contains information on the merging and thresholding of data sets.
Web scraping to collect NIH disease categories and description links is detailed in genediseasedata-original.
Visualization work is detailed in genediseasedata-visualize.
Project is still under development.
New features planned include:
- continuous updating of the database to include the most recent results from literature,
- factoring in contradictory results
- time series analysis to identify trends in gene-disease links.