Skip to content

Instantly share code, notes, and snippets.

@satwik77
Last active October 3, 2017 18:50
Show Gist options
  • Save satwik77/1b9e1727c21b3f4642c55206c9e2b54c to your computer and use it in GitHub Desktop.
Save satwik77/1b9e1727c21b3f4642c55206c9e2b54c to your computer and use it in GitHub Desktop.
My Contribution in GSoC 2016

Google Summer of Code Project

Organization: The Monarch Initiative

Name: Satwik Bhattamishra

Project Title: PhenoPacket Scraper

Desciption

Phenotypic data is key to advancing health in biomedicine. However, most phenotypic data is not in structured form, but is instead encoded as free text in various websites and journal publications. A phenopacket is a standardized structured text file about the genes and the observable biological features (phenotypes) of an individual organism or group of organisms. This project aims to create phenopackets from semi-structured phenotype data.

Phenopacket Scraper is a tool which extracts information from life sciences websites, analyzes them and generates a phenopacket at the end based on the extracted information and correct external ontology references. It includes a multi-level command line interface, a REST API and a webapp. This projects aims to extend the utility of a common phenotype exchange format so as to improve collaboration and analysis among biological researchers.

The command line interface and the api are purely written in python. I have implemented the CLI using cliff framework so as to allow multi-level commands. It takes input in the form of a url or a file, scrapes the required data from the website and generates a phenopacket. The scraping part has been implemented using beautifulsoup library and the phenopacket generation has been implemented mostly using phenopacket-python and scigraph-services. The REST API has been built using Django REST framework and its purpose and core implementation is similar to that of the command line interface. The webapp has been implemented using django and uses the phenopacket-scraper-api to produce its results.

The setup and usage guidelines are well explained in the readme of the respective github repositories.

Github Repositories

My Contributions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment