Skip to content

Instantly share code, notes, and snippets.

View disulfidebond's full-sized avatar

disulfidebond disulfidebond

  • UWisconsin-Madison
  • Madison, WI
View GitHub Profile
@disulfidebond
disulfidebond / stata_overview.md
Last active February 10, 2020 17:26
stata description and overview

Overview

This gist is designed to provide a brief overview of how Stata is configured, as well as some basic commands.

A concise way to describe Stata would be that Stata is to R as a multitool is to a team of contractors. Stata has been designed with a different purpose in mind than other pythonic or R libraries.

Its strengths include:

  • Users can use either a type commands with a command-line interface (CLI), or point and click with a graphical user interface (GUI) to perform the same command
  • Highly customizable charts and graphs can be generated with a single mouse click
  • Although admittedly a bit quirky at first glance, Stata is fairly easy to pick up
@disulfidebond
disulfidebond / abbreviations_keyterms.md
Last active January 10, 2020 16:49
cancer abbreviations and terms

A

B

BAM: Binary Alignment Map; a type of Bioinformatics analysis file.

C

chromoanagenesis: events that generate complex structural chromosomal abnormalities, which can include chromoplexy and chromothripsis

@disulfidebond
disulfidebond / common_abbreviations.md
Last active January 10, 2020 16:27
Commonly Used Abbreviations

Updated version can be found here

@disulfidebond
disulfidebond / graph_compendium.md
Last active January 6, 2020 16:22
Description of graph types and how data is displayed in them

Bar Chart

Bar charts can have several variations, but all utilize a common theme. Data from one variable is plotted on the X axis, and the Y axis quantifies the data displayed.

Click to show Examples

Bar Chart Example

barchart_example

Stacked Bar Chart Example

@disulfidebond
disulfidebond / card_template.md
Last active December 10, 2019 17:57
Cards in Projects

Welcome to GitHub Projects ✨ We're so excited that you've decided to create a new project! Now that you're here, let's make sure you know how to get the most out of GitHub Projects.

  • Create a new project
  • Give your project a name
  • Press the ? key to see available keyboard shortcuts
  • Add a new column
  • Drag and drop this card to the new column
  • Search for and add issues or PRs to your project
  • Manage automation on columns
  • Archive a card or archive all cards in a column
@disulfidebond
disulfidebond / r_setup_macosx.md
Last active December 3, 2019 18:23
R Setup Mac OSX

Overview

This writeup describes how to setup R on Mac OSX. The steps below are not numbered, but should be completed in sequential order.

Check to see if Xcode Commandline tools are installed:

  1. Open the Terminal.app application, either by searching for it via Spotlight or in the /Applications folder
  2. Type git in the window that opens, and press return.
  3. If you see the following output, git and commandline tools are installed:

usage: git [--version] [--help] [-C ] [-c =]

@disulfidebond
disulfidebond / create_IPD_csvFile.md
Created September 27, 2019 21:33
Workflow to create an IPD csv file

Instructions

Run this notebook from the same directory as the EMBL files that will be submitted to the Otting Lab for IPD submissions. The output file name will be 'IPD_submission3_TIMESTAMP.csv', where TIMESTAMP is a unique identifier, but you may modify this to another filename if you wish.

The only required input is a tab-delimited file with the representative animal Identifiers and Comments, formatted as:

  Working genomic allele name	IPD Accession No.	Representative Animal	BLAST comments
  >Mamu-B11L*01:04:01:01	NHP02117	MD103	7 identical fosmids (Rh22777)
  CTCCCCGGACGCCTAGGATGGGGTCATGGCGCCTCGAGCCCTCCTCCTGCTGCTCTCGGGGGCCCTGGCCCTGACCGAGACCTGGG

Enter this filename in the next cell for the animalID_file variable.

Overview

The Combined Annotation Dependent Depletion, CADD, is a useful tool for querying SNPs of interest. The following is an implementation of their API to perform batch queries. The code is attached in a jupyter notebook, which can be run by itself, or reused as part of a larger program.

There are several important caveats to keep in mind:

  • The API is, by definition, experimental, and not thought to be used for retrieving thousands or millions of variants. Do NOT remove the lines of code that provide a pause, and do NOT use this for more than 1000 queries at a time. Doing so will result in the server crashing, and some very irate researchers at the University of Washington.
  • The API is in the early stages, and may change significantly at a later date, requiring the code to be updated as well.
  • The jupyter notebook is written to only accommodate a single SNP position, but the API also supports a SNP range, such as `22:44
@disulfidebond
disulfidebond / parse_correct_gb.md
Last active August 22, 2019 15:24
file parsing with Genbank files

Overview

This gist displays several strategies to parse and correct genbank files in the general format of a cookbook

Delete and append a line by matching a pattern PATTERN

sed '/PATTERN/d' someFile.gb > outputFile.gb

perl -pe '$_.= qq(TEXT N\n) if /PATTERN/' outputFile.gb > otherOutfile.gb

@disulfidebond
disulfidebond / parse_bed_file.md
Created July 29, 2019 15:59
BED file parsing

Overview

A BED file needs to be parsed and reformatted into a CSV file. Broadly speaking, there are two options: use a GUI, or use scripting.

GUI

Use BBEdit, Atom, or Apple's TextEdit (see caution) to search and replace. An example is shown here:

gui_parsing

  • Caution: Apple's TextEdit has an extremely useful GUI for search and replace, that even simplifies Regex. However, it may replace some characters with one that is not recognized by all text editors, such as the double-quote character. You've been warned.