Skip to content

Instantly share code, notes, and snippets.

@felipealbrecht
Last active June 17, 2018 09:02
Show Gist options
  • Save felipealbrecht/b12bf3aa7c6146f8890dfc1ce50584e1 to your computer and use it in GitHub Desktop.
Save felipealbrecht/b12bf3aa7c6146f8890dfc1ce50584e1 to your computer and use it in GitHub Desktop.
Count how many times a motif appears in the region sequence
#Our objective is to extract the average DNA methylation (beta value) in all regulatory elements defined in the BLUEPRINT regulatory build (modified from~\cite{Zerbino2015}) across all BLUEPRINT samples. After downloading these data from DeepBlue we use the package gplots to create a heatmap showing the most variable regions (rows) across samples (columns). Moreover, we cluster samples by their pairwise Spearman correlation coefficients (Figure~\ref{Fig:1}).
#In Listing 1 we demonstrate how DeepBlueR can be used to quickly and efficiently aggregate a large data set on the DeepBlue server in preparation for downstream analysis in R. We omit the R code for generating the heatmap here. It can be found in the supplemental material. To retrieve the required data, we first select all bisulfite sequencing experiments from the BLUEPRINT project (lines $5$-$9$). Our selection comprises $206$ data files and $47$ distinct cell types after filtering for the correct file type (lines $9$-$11$). For each file, we select the column that contains the beta value (line $12$). Next, we select the regions of the regulatory build (lines $14$-$16$), which we use in the next command (lines $18$-$21$) to request a data matrix in which the average beta values are computed for each regulatory element and experiment. Finally, we download the resulting data matrix in lines $23$-$24$.\\
import xmlrpclib
import time
url = "http://deepblue.mpi-inf.mpg.de/xmlrpc"
user_key = "anonymous_key"
server = xmlrpclib.Server(url, allow_none=True)
(status, query_id) = server.select_experiments ( "DG-75_c01.ERX297417.H3K27ac.bwa.GRCh38.20150527.bed", None, None, None, user_key )
fmt = "CHROMOSOME,START,END,@LENGTH,@COUNT.MOTIF(C),@COUNT.MOTIF(G),@COUNT.MOTIF(GC),@SEQUENCE"
(status, request_id) = server.get_regions(query_id, fmt, user_key)
(status, info) = server.info(request_id, user_key)
request_status = info[0]["state"]
while request_status != "done" and request_status != "failed":
time.sleep(1)
(status, info) = server.info(request_id, user_key)
request_status = info[0]["state"]
(status, regions) = server.get_request_data(request_id, user_key)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment