John Blischak 2014-05-14
Multiple users have observed that submitting jobs via Snakemake requires much more memory than is necessary to run the command (e.g. mailing list post, [Bitbucket issue][issue]).
| # Renames all files using name of directory. | |
| # e.g. sample_name_1/s_1_sequence.txt.gz -> sample_name_1_s_1_sequence.txt.gz | |
| import glob | |
| import shutil | |
| filepaths = glob.glob("sample_name_*/s_*_sequence.txt.gz") | |
| new_names = [x.replace("/", "_") for x in filepaths] | |
| [shutil.copyfile(x[0], x[1]) for x in zip(filepaths, new_names)] |
| #!/bin/bash | |
| for VERSION in 3.3.6 3.4.0 3.4.1 3.4.2 3.4.3 | |
| do | |
| bash install-python.sh $VERSION | |
| bash install-pysam.sh $VERSION | |
| done |
| #!/bin/bash | |
| # Install local copy of R from source. | |
| # To run: | |
| # bash build-r.sh >& log.txt | |
| # Version of R to install (must be part of the R 3.0 series) | |
| VERSION=3.3.2 | |
| # Directory to install R. If not already present, creates bin, lib and share |
| #!/usr/bin/env conda-execute | |
| # Helper script to identify conda-forge R packages that have more recent | |
| # versions released on CRAN. | |
| # | |
| # Usage: | |
| # | |
| # conda execute conda-forge-cran.R | |
| # | |
| # or |
| # Example Snakemake pipeline | |
| # | |
| # This example snakefile performs the following steps: | |
| # | |
| # * Downloads an Excel file | |
| # * Converts the file to CSV format | |
| # * Plots a result | |
| # * Creates a summary report | |
| # | |
| # LICENSE: CC0. Do what you want with the code, but it has no guarantees. |
John Blischak 2014-05-14
Multiple users have observed that submitting jobs via Snakemake requires much more memory than is necessary to run the command (e.g. mailing list post, [Bitbucket issue][issue]).
[kallisto][] is a new method for processing RNA-seq data. By pseudoaligning reads to a transcriptome instead of aligning reads to a genome, the quantification step is much faster. While the computational speedup will be huge for projects with many samples and/or with organisms with large genomes, I was curious how much time would be saved using [kallisto][] on a small RNA-seq project for an organism with a smaller genome. To perform this comparison, I downloaded 6 fastq files from a recent yeast RNA-seq study on GEO. I chose [Subread][subread] as the comparison method because it performs read alignment but is optimized for quickly obtaining gene counts (it soft clips reads instead of trying to map exact exon-exon boundaries).
| #!/usr/bin/env Rscript | |
| # Do CRAN packages that depend on the stats package use a copyleft license? | |
| # https://twitter.com/cimentadaj/status/1154420408508043264 | |
| Sys.Date() | |
| ## [1] "2019-07-26" | |
| library(stringr) |
| calc3 <- function(sets) | |
| { | |
| sets <- check_sets(sets) | |
| set_lengths <- vapply(sets, length, 0) | |
| set_order <- order(set_lengths) | |
| sets <- sets[set_order] | |
| set_lengths <- set_lengths[set_order] | |
| n_sets <- length(sets) | |
| set_names <- names(sets) |
| Additional_repositories | |
| Author | |
| Authors@R | |
| Biarch | |
| BugReports | |
| BuildVignettes | |
| Built | |
| ByteCompile | |
| Classification/ACM | |
| Classification/ACM-2012 |