Skip to content

Instantly share code, notes, and snippets.

View boboppie's full-sized avatar

Fengyuan Hu boboppie

  • AstraZeneca
  • Cambridge, UK
View GitHub Profile
perl -pe 's/>(.*)/>\1\t/g; s/\n//g; s/>/\n>/g' pep.fa | grep -v '^$' | cut -c 2- | sort | uniq >pep.tsv
#!/usr/bin/perl
# Copyright (c) 2011 Erik Aronesty (erik@q32.com)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
@boboppie
boboppie / remove_duplicate_sequence.shx
Created September 27, 2018 13:20 — forked from ShaiberAlon/remove_duplicate_sequence.shx
Short bash for removing duplicates of sequences from a fasta file (keeping one copy of each unique sequence)
#!bin/bash
# remove_duplicate_sequence.shx is a short bash to remove multiple copies of sequences from an input fasta file and saves the result in an output fasta file.
# the bash script was based on Pierre Lindenbaum's script: https://www.biostars.org/p/3003/#3008
# input:
# -f | --file : fasta file
# -o | --output : output file after removing all the sequences
#
while [ "$1" != "" ]; do
case $1 in
-f | --file ) shift
@boboppie
boboppie / beanplots.R
Created October 21, 2016 10:15 — forked from yannabraham/beanplots.R
ggplot2 BeanPlots
## reproduce the figures from http://www.jstatsoft.org/v28/c01/paper using ggplot2
library(ggplot2)
## parameters
set.seed(2710)
## Figure 1
d <- rnorm(50)
#!/usr/bin/Rscript
# Author: Fengyuan Hu
# Search ORFs from a given transcript sequences, covert local coordinates to genomic coordinates in BED format
# Input:
# Transcriptome (protein coding + lincRNAs) in Fasta format
# Transcriptome (protein coding + lincRNAs) annotation in GTF format
# Make sure annotation is consistent with sequences, or extract sequences from genome?
@boboppie
boboppie / coord_map.py
Created July 6, 2016 11:09 — forked from lennax/coord_map.py
biopython coordinate mapper example
# Copyright 2012-2014 Lenna X. Peterson
# arklenna@gmail.com
# The first step to using the mapper is to get the exons from a GenBank or similar file.
# The mapper will accept exons as a sequence of pairs, a SeqRecord with a CDS feature, or a CDS SeqFeature.
# The file used in this example is located in the Tests directory of the Biopython source code.
from Bio.SeqUtils.Mapper import CoordinateMapper
from Bio import SeqIO
@boboppie
boboppie / readBAM.R
Created April 26, 2016 10:35 — forked from SamBuckberry/readBAM.R
Import a bam file into R
# install the Rsamtools package if necessary
source("http://bioconductor.org/biocLite.R")
biocLite("Rsamtools")
# load the library
library(Rsamtools)
# specify the bam file you want to import
bamFile <- "test.bam"
@boboppie
boboppie / plot_aligned_series.R
Created February 16, 2016 17:20 — forked from tomhopper/plot_aligned_series.R
Align multiple ggplot2 graphs with a common x axis and different y axes, each with different y-axis labels.
#' When plotting multiple data series that share a common x axis but different y axes,
#' we can just plot each graph separately. This suffers from the drawback that the shared axis will typically
#' not align across graphs due to different plot margins.
#' One easy solution is to reshape2::melt() the data and use ggplot2's facet_grid() mapping. However, there is
#' no way to label individual y axes.
#' facet_grid() and facet_wrap() were designed to plot small multiples, where both x- and y-axis ranges are
#' shared acros all plots in the facetting. While the facet_ calls allow us to use different scales with
#' the \code{scales = "free"} argument, they should not be used this way.
#' A more robust approach is to the grid package grid.draw(), rbind() and ggplotGrob() to create a grid of
#' individual plots where the plot axes are properly aligned within the grid.
@boboppie
boboppie / bamfilter_oneliners.md
Created January 28, 2016 09:49 — forked from davfre/bamfilter_oneliners.md
SAM and BAM filtering oneliners