Skip to content

Instantly share code, notes, and snippets.

View boboppie's full-sized avatar

Fengyuan Hu boboppie

  • AstraZeneca
  • Cambridge, UK
View GitHub Profile
@boboppie
boboppie / remove_duplicate_sequence.shx
Created September 27, 2018 13:20 — forked from ShaiberAlon/remove_duplicate_sequence.shx
Short bash for removing duplicates of sequences from a fasta file (keeping one copy of each unique sequence)
#!bin/bash
# remove_duplicate_sequence.shx is a short bash to remove multiple copies of sequences from an input fasta file and saves the result in an output fasta file.
# the bash script was based on Pierre Lindenbaum's script: https://www.biostars.org/p/3003/#3008
# input:
# -f | --file : fasta file
# -o | --output : output file after removing all the sequences
#
while [ "$1" != "" ]; do
case $1 in
-f | --file ) shift
@boboppie
boboppie / beanplots.R
Created October 21, 2016 10:15 — forked from yannabraham/beanplots.R
ggplot2 BeanPlots
## reproduce the figures from http://www.jstatsoft.org/v28/c01/paper using ggplot2
library(ggplot2)
## parameters
set.seed(2710)
## Figure 1
d <- rnorm(50)
@boboppie
boboppie / coord_map.py
Created July 6, 2016 11:09 — forked from lennax/coord_map.py
biopython coordinate mapper example
# Copyright 2012-2014 Lenna X. Peterson
# arklenna@gmail.com
# The first step to using the mapper is to get the exons from a GenBank or similar file.
# The mapper will accept exons as a sequence of pairs, a SeqRecord with a CDS feature, or a CDS SeqFeature.
# The file used in this example is located in the Tests directory of the Biopython source code.
from Bio.SeqUtils.Mapper import CoordinateMapper
from Bio import SeqIO
@boboppie
boboppie / readBAM.R
Created April 26, 2016 10:35 — forked from SamBuckberry/readBAM.R
Import a bam file into R
# install the Rsamtools package if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("Rsamtools")
# load the library
library(Rsamtools)
# specify the bam file you want to import
bamFile <- "test.bam"
@boboppie
boboppie / plot_aligned_series.R
Created February 16, 2016 17:20 — forked from tomhopper/plot_aligned_series.R
Align multiple ggplot2 graphs with a common x axis and different y axes, each with different y-axis labels.
#' When plotting multiple data series that share a common x axis but different y axes,
#' we can just plot each graph separately. This suffers from the drawback that the shared axis will typically
#' not align across graphs due to different plot margins.
#' One easy solution is to reshape2::melt() the data and use ggplot2's facet_grid() mapping. However, there is
#' no way to label individual y axes.
#' facet_grid() and facet_wrap() were designed to plot small multiples, where both x- and y-axis ranges are
#' shared acros all plots in the facetting. While the facet_ calls allow us to use different scales with
#' the \code{scales = "free"} argument, they should not be used this way.
#' A more robust approach is to the grid package grid.draw(), rbind() and ggplotGrob() to create a grid of
#' individual plots where the plot axes are properly aligned within the grid.
@boboppie
boboppie / bamfilter_oneliners.md
Created January 28, 2016 09:49 — forked from davfre/bamfilter_oneliners.md
SAM and BAM filtering oneliners
#!/usr/local/bin/perl
=head1 NAME
run_cnvnator_on_assembly.pl
=head1 SYNOPSIS
run_cnvnator_on_assembly.pl input_fasta input_bam output outputdir path_to_cnvnator windowsize
where input_fasta is the input fasta file,
@boboppie
boboppie / vioplot2
Last active August 29, 2015 14:13 — forked from mbjoseph/vioplot2.R
vioplot2 <- function (x, ..., range = 1.5, h = NULL, ylim = NULL, names = NULL,
horizontal = FALSE, col = "magenta", border = "black", lty = 1,
lwd = 1, rectCol = "black", colMed = "white", pchMed = 19,
at, add = FALSE, wex = 1, drawRect = TRUE, side="both")
{
datas <- list(x, ...)
n <- length(datas)
if (missing(at))
at <- 1:n
upper <- vector(mode = "numeric", length = n)
myData=c(1,1,1,1,1,1,1,1,1,1,1,0,0,0)
# 尝试1万个不同的参数
tryn = 1e4
Theta = sort(runif(tryn))
pTheta = 1/tryn
z = sum( myData==1 )
N = length( myData )
# 似然函数
pDataGivenTheta = Theta^z * (1-Theta)^(N-z)
pData = sum( pDataGivenTheta * pTheta )