Skip to content

Instantly share code, notes, and snippets.

@iracooke
iracooke / submodulewiki.md
Last active August 29, 2025 08:17
Wiki as a submodule

From within the parent repo do

        git submodule add git@bitbucket.org:iracooke/transcriptomes.git/wiki wiki
        git commit -m "Adding wiki as submodule"
        git push

Making changes to the wiki and to the parent require separate git commit commands.

@iracooke
iracooke / ignore-osxquarantine.diff
Last active August 29, 2015 14:17
Rsync patches
This patch is for rsync running in a sandboxed environment on OSX
It ignores the com.apple.quarantine attribute which code running in
a sandbox cannot modify but which will invariably be set by the system
on files rsync creates
To use this patch, run these commands for a successful build:
patch -p1 <patches/ignore-case.diff
./configure (optional if already run)
make
@iracooke
iracooke / readme.md
Last active August 29, 2015 14:24
MSConvert Cheat Sheet

#MSConvert Cheat Sheet

Initial conversion from RAW. Titles in TPP Compatible format

	msconvert *.raw --filter "peakPicking true 1-" --filter "titleMaker <RunId>.<ScanNumber>.<ScanNumber>.<ChargeState>" -z

Select scans within a time range

@iracooke
iracooke / setOperations.R
Created August 18, 2015 01:39
R Cheat Sheet
setA = c("A","B","C","D")
setB = c("A","B","E","F")
# Everything in both sets
union(setA,setB)
# Only items present in setB
setdiff(setB,setA)
@iracooke
iracooke / README.md
Last active September 20, 2015 23:53
Create a combined transdecoder + 6frame database

Creating a protein database from 6-frame and transdecoder sequences

Analyses in Galaxy

  1. Run the TranscriptomePGMakeDatabase workflow. Input files for this include, a trinity assembly, predicted proteins from Transdecoder, gff3 coordinates corresponding to transdecoder predictions and the cRAP database of contaminants.
  2. Ensure that the known_novel_crap_decoy.fasta output from the above workflow is loaded onto Mascot for searching.
  3. Use the outputs from TranscriptomePGMakeDatabase to run the Transcriptome PG workflow. This workflow will be related to the Transcriptome PG workflow but should be modified to include a Mascot search for your specific organism.
  4. Download the observed_peptides.gff3 file that you get from running the previous workflow step.
@iracooke
iracooke / thermoid_to_scan_onlyid.sh
Created December 9, 2015 05:22
File Cleanup for PRIDE
#!/bin/bash
# Converts an mzID file from Thermo nativeID format to scan number only nativeID format
file=$1
sed -i.bak s/controllerType\=[0-9]\ controllerNumber\=[0-9]\ // $file
sed -i.bak s/Thermo\ nativeID\ format/scan\ number\ only\ nativeID\ format/ $file

Nucleotide fasta files sometimes encode ambiguous bases simply with an 'N'.
Many downstream tools support this but don't support the full set of IUPAC ambiguity codes

The unix tool tr can be used to get rid of these.

  tr 'RYSWKMBDHV' 'N' < input.fasta
@iracooke
iracooke / README.md
Last active October 28, 2022 05:18
NCBI TSA Submission Guide

Steps to submit to TSA

If you have a transcriptome that has been assembled from shotgun reads the TSA (Transcriptome Shotgun Assembly) database is a good place to put it so that it can be widely accessed.

This guide assumes that you simply want to submit the assembled sequences from your transcriptome without annotations. NCBI sets a high bar for inclusion of annotations so for most non-model organisms they are probably not going to meet the criteria.

To create a TSA submission take a look at the ncbi guidelines. This gist is based on those guidelines.

Register BioProject

@iracooke
iracooke / README.md
Last active August 20, 2016 00:49
Rename fasta identifiers

Rename Fasta IDs

This script is needed for programs like secretomep that truncate fasta ids.
We need to be able to uniquely identify each fasta entry so this script renames the ids with a numeric scheme It also produces a mapping file from old to new ids so the original ids can be recovered later

Use it like this

 ./rename_fasta.rb yourfasta.fasta