Skip to content

Instantly share code, notes, and snippets.

@disulfidebond
Last active January 6, 2020 16:22
Show Gist options
  • Save disulfidebond/39b03a945a35ded7571328c5a859a506 to your computer and use it in GitHub Desktop.
Save disulfidebond/39b03a945a35ded7571328c5a859a506 to your computer and use it in GitHub Desktop.
Description of graph types and how data is displayed in them

Bar Chart

Bar charts can have several variations, but all utilize a common theme. Data from one variable is plotted on the X axis, and the Y axis quantifies the data displayed.

Click to show Examples

Bar Chart Example

barchart_example

Stacked Bar Chart Example

barchart_example-stacked

Scatter Plot

Scatter plots and line plots display data by contrasting the dependent variable (the Y axis) against an independent variable (the X axis). It uses the Cartesian Coordinate System, but is usually restricted to a specific X-Y quadrant, such as positive X and positive Y values. A scatter plot can also have multiple dependent variables for a single independent variable, and it may be shown in 3 dimensions with X, Y, and Z axes.

Click to Show Example

Scatterplot Example

A fictitious dataset is used here that visualizes the number of Single Nucleotide Polymorphisms (SNPs) found versus the distance from the 5' end of an exon.

scatterplot_example

Dot plot

Dot plots are used to visualize how multiple categories or labels of data relate to each other. It's useful if you want to use a barchart-type visualization of data for numerous categories, but want to present the data in a way that isn't overly cluttered. A Wilkinson dot plot can also be used within other types of graphs, such as a layer in a Circos Plot. A Cleveland dot plot (not shown) uses a single dot for quantity instead of multiple dots.

Click to Show Example

Wilkinson Dot Plot Example

dotplot_example
# code in R
library(gcookbook)
countries2009 <- subset(countries, Year==2009 & healthexp>2000)
p <- ggplot(countries2009, aes(x=infmortality))
p + geom_dotplot(binwidth = 0.25) + geom_rug() + scale_y_continuous(breaks = NULL) + theme(axis.title.y = element_blank()) + labs(x = 'Infant Mortality')

Violin Plot

A violin plot compares multiple data distributions. The X axis describes the dataset being compared, and the Y axis quantifies the frequency of values in that dataset. The width of each dataset at given Y points show the distribution of values (usually this is the frequency of values at that point).

Violin Plots usually have a boxplot overlay that shows the median and range of values.

Click to Show Example Dataset and Example Plot

Violin Plot Example Dataset

The example dataset shows Microsatellite Instability (MSI) for a given cell line.

  Cell_Line,MSI
  WT,1
  WT,3
  WT,6
  WT,3
  WT,9
  WT,9
  WT,1
  WT,9
  WT,3
  WT,8
  WT,0
  WT,7
  WT,6
  WT,4
  WT,3
  WT,5
  WT,0
  WT,2
  WT,2
  WT,0
  Mut1,18
  Mut1,10
  Mut1,16
  Mut1,18
  Mut1,15
  Mut1,0
  Mut1,9
  Mut1,16
  Mut1,9
  Mut1,9
  Mut1,3
  Mut1,5
  Mut1,10
  Mut1,16
  Mut1,11
  Mut1,19
  Mut1,1
  Mut1,7
  Mut1,13
  Mut1,16
  Mut2,28
  Mut2,39
  Mut2,2
  Mut2,24
  Mut2,9
  Mut2,26
  Mut2,37
  Mut2,19
  Mut2,37
  Mut2,11
  Mut2,3
  Mut2,15
  Mut2,0
  Mut2,21
  Mut2,38
  Mut2,36
  Mut2,18
  Mut2,20
  Mut2,37
  Mut2,32

# code
d <- read.csv('example_violin.csv')
p <- ggplot(d, aes(x=Cell_Line, y=MSI))
p + geom_violin() + geom_boxplot(width=0.1, fill="black", outlier.color = NA) + stat_summary(fun.y = median, geom = "point", fill="white", shape=21, size=2.5)

Violin Plot Example

violinplot_example

Circos Plot

Circos plots are used to show relationships among numerous variables. It is commonly used in Bioinformatics to show variants among an entire organism's genome.

Click to Show Examples

Circos Plot Description

A circos plot has different layers to show relationships and quantification. The plot should have a detailed description for each layer that describes the type and scale of data being visualized, unfortunately this does not always happen.

The outer layer has the identities for the dataset; in the example below, this is the chromosome number. The quantification layers show the quantities for the outer identities; in the example below, this is the number of homologous regions between mouse and human. The inner relationship layer (with 'ribbons') shows how groups of identities are similar to each other; in the example below, this is the homology among regions of the human and mouse genomes.

Note that the number of quantification layers is at least one, but technically has no maximum, and the relationship inner layer may not be present.

Circos Plot Explanation

circosplot_explained

Circos Plot Example

circosplot_example

References and Sources

  1. R Graphics Cookbook by Winston Chang, c2013. O'Reilly Press. Chapters 3 and 5, Chapter 6 pp 135-141.

  2. BioCircos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment