James Taylor jxtx

## bioc2019.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                jxtx
                / bioc2019.md
            
            
              Last active
              June 26, 2019 15:46
            
          
    Day 1: 25 June 2019
BioC2019: Where Software and Biology Connect (Martin)


Martin providing some "brief logistics"

Inference after prediction (Jeffrey "John" Leek)

aka "What do we do after we have machine learned everything"


## bioc2018.md

      
              1 file
            
          
              3 forks
            
          
              8 comments
            
          
              8 stars
            
          
                jxtx
                / bioc2018.md
            
            
              Last active
              August 22, 2018 14:30
            
              
                #bioC 2018 Conference Notes
              
          
    Conference info: https://bioc2018.bioconductor.org/
My first Bioconductor meeting, and I'm not a BioC or R expert so these notes are probably going to be naïve!
Contents


Developer Day

⚡️talks II
BoFs


⚡️ III


## using-hifive-re.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jxtx
                / using-hifive-re.md
            
            
              Last active
              December 4, 2017 19:16
            
              
                Using HiFive on restriction digest bulk Hi-C
              
          
    Working through running HiFive on a Hi-C datasets.
First, a note on memory an performance: bin size influences everything. Starting
with a bin size of 40kb, loading data in hg38 seems to stay under ~16GB. At fend level
resolution memory requirements approach ~32GB and running time increases several fold.
Dealing with restriction fragment details

HiFive stores a fend file with information on the locations of restriction
fragments in the genome. We need to get the locations of the RE sites into a BED

  
## GLBio_3D.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                jxtx
                / GLBio_3D.md
            
            
              Last active
              May 17, 2017 22:05
            
              
                Notes for 3D genome track at GLBio 2017
              
          
    Keles -- Statistical Methods for profiling long range chromatin interactions from repetitive regions of the genome


Multi-mapping reads (multi-reads) are typically thrown out in many HTS analyses incuding Hi-C

Assays predominently rely on short-read (50-150bp) so multi-reads are common
Using ChIP-seq as an example, incorporating multi-reads finds peaks in regions where "uni-reads" do not
e.g. Perm-seq using DHS + ChIP-seq data and multi-reads. 27.3% more peaks compared to ENCODE uniform processing pipeline


How to combine this with Hi-C data?

Hi-C read processing

Typical pipelines: singletons, multi-mapping ends, low map quality, and unaligned all discarded


Evaluation of the impact of this using IMR90 and Plasmodium datasets


## WhyGalaxy.md

      
              1 file
            
          
              0 forks
            
          
              2 comments
            
          
              0 stars
            
          
                jxtx
                / WhyGalaxy.md
            
            
              Last active
              March 28, 2017 15:12
            
          
    Why is it called Galaxy

Once upon a time there was the Genome ALignment and Annotation database or GALA, which allowed for analysis of genomic elements alongside comparative genomic information. However, this tool supported only a few analyses. What-would-be-galaxy was born from the idea of being able to easily take any existing analysis tool and quickly integrate it into this platform. But what should we call this next direction? Bob Harris suggested the use of X/Y to represent this "next dimension" of analysis. GALA + XY ⟶ GALAXY ⟶ Galaxy.
Or at least this is how I remember it.
#usegalaxy

  
## Standalone linuxbrew on SL5.5
# Mostly based on this:
#  https://github.com/Homebrew/linuxbrew/wiki/Standalone-Installation
# But I started with nothing (no ruby, no gcc)

# Ruby and GCC will go here
mkdir bootstrap

# Get GCC 4.4 and install under bootstrap
# We also need libstdc++ when we get to building gcc-4.9 because somebody decided it was a good idea to start writing GCC in C++
wget http://ftp1.scientificlinux.org/linux/scientific/55/x86_64/SL/gcc44-4.4.0-6.el5.x86_64.rpm

## scrape_gs.js
/**
 * usage: node scrape_gs.js USERKEY
 *
 * Determine h-index for papers published AFTER each year found in a Google
 * scholar profile. The USERKEY is found in your Google scholar citations
 * page url.
 */

var request = require('request');
var cheerio = require('cheerio');

## blast.c
/*
* BLAST - Search two DNA sequences for locally maximal segment pairs. The basic
* command syntax is
*
* 	BLAST sequence1 sequence2
*
* where sequence1 and sequence2 name files containing DNA sequences.  Lines
* at the beginnings of the files that don't start with 'A', 'C', 'T' or 'G'
* are discarded.  Thus a typical sequence file might begin:
*
	# Mostly based on this:
	# https://github.com/Homebrew/linuxbrew/wiki/Standalone-Installation
	# But I started with nothing (no ruby, no gcc)

	# Ruby and GCC will go here
	mkdir bootstrap

	# Get GCC 4.4 and install under bootstrap
	# We also need libstdc++ when we get to building gcc-4.9 because somebody decided it was a good idea to start writing GCC in C++
	wget http://ftp1.scientificlinux.org/linux/scientific/55/x86_64/SL/gcc44-4.4.0-6.el5.x86_64.rpm
	/**
	* usage: node scrape_gs.js USERKEY
	*
	* Determine h-index for papers published AFTER each year found in a Google
	* scholar profile. The USERKEY is found in your Google scholar citations
	* page url.
	*/

	var request = require('request');
	var cheerio = require('cheerio');
	/*
	* BLAST - Search two DNA sequences for locally maximal segment pairs. The basic
	* command syntax is
	*
	* BLAST sequence1 sequence2
	*
	* where sequence1 and sequence2 name files containing DNA sequences. Lines
	* at the beginnings of the files that don't start with 'A', 'C', 'T' or 'G'
	* are discarded. Thus a typical sequence file might begin:
	*