Skip to content

Instantly share code, notes, and snippets.

@cfhammill
Created April 14, 2015 12:51
Show Gist options
  • Save cfhammill/62ac9f7705fe0607c327 to your computer and use it in GitHub Desktop.
Save cfhammill/62ac9f7705fe0607c327 to your computer and use it in GitHub Desktop.
Introduction to Igraph and Shiny Presentation
---
title: "An Introduction to Graphs"
author: "Chris Hammill"
date: '2015-04-01'
output:
beamer_presentation:
theme: "Boadilla"
fig_crop: false
keep_tex: true
template: chris.beamer
---
```{r setup, echo = FALSE}
library(ggplot2)
library(magrittr)
library(grid)
library(igraph)
library(knitr)
opts_chunk$set(dev = "tikz",fig.width = 5, fig.height = 4, out.width = "0.6\\linewidth", fig.align='center')
options(width=60)
```
```{r, include = FALSE}
setwd("~/Documents/T1DM/graphWork/")
bigtable <- readRDS("bigtableFeb19Cleaned.rds")
```
## About Me
\centering \scalebox{0.45}{\includegraphics{meBySea.jpg}}
- Graduate Student in Biology
- Bioinformatics Research Assistant
- R *Afficianado*
- Data Analysis/Visualization Contractor
- Alumnus of this course
## Why I'm Here
>- Talk about my research
\newline
>- Teach you a bit about graphs
\newline
>- Introduce you to some useful packages
\newline
>- Get you excited about interactive analysis
## Outline
- Introduce graphs
- Introduce igraph
- Introduce Interactivity with Shiny
- Introduce the diabetes project
- Demo the diabetes project app
- Offer resources
## \
- \Large{This presentation was written in R Markdown!} \newline\newline
- \Large{The slides and code will be made available via D2L}
## Outline
- **Introduce graphs**
- Introduce igraph
- Introduce Interactivity with Shiny
- Introduce the diabetes project
- Demo the diabetes project app
- Offer resources
## So What Are Graphs?
```{r, echo = FALSE}
data.frame(x = 1:50, y = sample(1:100,50, replace = TRUE)) %>%
ggplot(aes(x = x, y = y)) + geom_point()
```
This?
## So What Are Graphs?
```{r, echo = FALSE}
data.frame(x = 1:50, y = sample(1:100,50, replace = TRUE)) %>%
ggplot(aes(x = x, y = y)) + geom_point()
grid.lines(0:1, 0:1, gp = gpar(col = 2, lwd = 10))
grid.lines(0:1, 1:0, gp = gpar(col = 2, lwd = 10))
```
Nope!
## So What Are Graphs
- Graphs are a formal system for representing connections between things
- Graphs are composed of nodes (or vertices) and edges (connections)
- Edges can be weighted or unweighted, directed or not
- Graphs have recently been rebranded as networks \newline
## So What Are Graphs?
```{r, echo = FALSE}
baG <- ba.game(10)
plot(baG)
```
So This?
## So What Are Graphs
```{r, echo = FALSE}
plot(baG)
grid.lines(c(0,.15), c(.25,0), gp = gpar(col = 3, lwd = 10))
grid.lines(c(.15,1), c(0,1), gp = gpar(col = 3, lwd = 10))
```
Yup!
## Graphs in Math
- \centering{Graphs were first described by Euler (of \textit{e} fame)} \newline
-\centering\scalebox{.75}{\includegraphics{Konigsberg.png}} \newline
\centering{The bridges of Konigsberg}
- The name graph is due Sylvester (1878) which is widely considered frustrating
## Graphs For the Rest of Us
- Graphs were brought out of the math domain primarily by social scientists
- For example Sampson (1968) did a social network analysis on monks in a monastery identifying social dynamics
## But More Importantly
\centering![](F_icon.png)
## And
\centering![](Logo_2013_Google.png)
## And
\centering![](brain.jpeg)
## So
- Graphs are everywhere \newline
- Social Networks? Graphs \newline
- Internet? Graph \newline
- Metabolic pathways? Graphs \newline \newline
- \textbf{Due to this amazing generality, graph based representations and algorithms can be incredibly useful for both exploration and inference}
## What Can We Learn From Graphs?
- \textit{\small{Disclaimer: I'm still learning plenty about what can be done using graphs, so this section will be necessarily over simplified}}. \newline
- Typically graphs are used to answer questions about the nature of its connections
(although graph representations can be used to carry out immensely complex calculations as well; as you might have noticed when you learned about artificial neural networks)
- Typical questions include:
1. Where are the hubs (highly connected nodes)?
2. Can the graph be subdivided into clusters or communities?
3. Are there unexpected connections?
But as with any data representation you're usually limited by your ability to ask interesting questions, not the representations ability to answer them
## Graph Properties
### Degree Distribution
- Degree is the number of edges a node has
- The distribution of degrees in a graph is interesting and can hint at the process generating the graph
### Diameter
- What is the longest direct path between two nodes
### Average Path
- What is the average path length between two nodes
## Outline
- Introduce graphs
- **Introduce igraph**
- Introduce Interactivity with Shiny
- Introduce the diabetes project
- Demo the diabetes project app
- Offer resources
## Creating and Using Graphs
- Manipulating graphs with R is typically done with the `igraph` package, so let's try it out:
First Off, install `igraph` and attach it with the usual code
```{r, eval = FALSE}
install.packages("igraph")
library(igraph)
```
## Create a Random Graph
- For exploration sake, lets generate a random graph (An Erdos-Renyi random graph)
```{r, cache = TRUE}
randomGraph <- erdos.renyi.game(20, 0.2)
plot(randomGraph)
```
## Summary Statistics
Degree
```{r}
hist(degree(randomGraph))
```
## Summary Statistics
Diameter
```{r}
diameter(randomGraph)
```
Path Length
```{r}
average.path.length(randomGraph)
```
## Other Useful Commands
```{r, eval = FALSE}
# Pull out all the Vertices
V(graph)
# Pull out all the Edges
E(graph)
#Change a component of the edges (or vertices)
E(graph)$weight <- newWeights
#Get all node pairs
get.edgelist(graph)
#Compute the adjacency matrix
get.adjacency(graph)
```
## Outline
- Introduce graphs
- Introduce igraph
- **Introduce Interactivity with Shiny**
- Introduce the diabetes project
- Demo the diabetes project app
- Offer resources
## Switching gears
\Large{Lets talk about exploratory analysis}
## Interactivity
- A typical first pass of data analysis involves:
1. Visualizing your data
2. Searching for hypotheses to test
3. Tuning parameters and repeating steps 1 and 2
- You will waste untold hours (if you pursue science) doing guess-and-check plot parameter tuning
- You will grow weary in your search and likely settle for less than optimal choices \newline
- \large{\textbf{Why not take the guess work out and make it faster to explore parameter space}}
## Enter Shiny
- Shiny is a framework developed by the people at R Studio to bring interactivity to R \newline
- Provides a tool to bring your analyses into the modern age \newline
- Not to mention the benefit in presenting your analyses to non-experts when they can see for themselves how parameters affect the results. \newline
- Slightly frustrating interface, but very little new needs to be learned \newline
## So How Does Shiny Work
- A shiny app is composed of (at least) two files
1. server.R
2. UI.R
- server.R is responsible for performing the calculations in the app
- UI.R is responsible for coordinating input from the user and output from the server
## Minimal Example
### server.R
```{r, eval = FALSE}
library(shiny)
shinyServer(function(input, output){
output$quadraticPlot <- renderPlot({
x <- seq(-2,2, length.out = 500)
y <- input$a * x^2 + input$b * x + input$c
plot(y ~ x,
xlim = c(-2,2),
ylim = c(-2,4),
type = "l")
})
})
```
## Minimal Example
### UI.R
```{r, eval = FALSE}
library(shiny)
shinyUI(
fluidPage(
sliderInput("a", "a", min = -2L, max = 2L, value = 1),
sliderInput("b", "b", min = -1L, max = 1L, value = 0),
sliderInput("c", "c", min = -2L, max = 2L, value = 0),
plotOutput("quadraticPlot")
)
)
```
## A Not So Minimal Example
\centering \scalebox{1.35}{\includegraphics{graphSample.pdf}}
## Outline
- Introduce graphs
- Introduce igraph
- Introduce Interactivity with Shiny
- **Introduce the diabetes project**
- Demo the diabetes project app
- Offer resources
## Diabetes Project
- Attempting to predict health outcomes for Newfoundlanders suffering from type one diabetes mellitus
- Data from a large cohort of diabetes patents gathered ~10 years ago
- Heterogenous mix of data sources, types, and completeness
- Lots of data cleaning
## The Data
three major data sources
>1. Diabetes database \newline
contains information about 631 study participants at the time of study start
>2. Genetics Data \newline
contains genotype markers for 591 study participants (and family members)
>3. 2014 Checkup Database \newline
contains survey data and chart review for ~100 study participants
>- This analysis is only concerned with the individuals for whom we have updated information
>- After cleaning 300 features exist for the participants
## Analysis Approach
>- Considering each feature how well does it correlate to the rest of the features
\newline
>- Pairwise correlation measures can be treated as a distance measure between features
\newline
>- Correlations can be filtered by signficance level
\newline
>- Each significant correlation can be viewed as an edge connecting the two features
\newline\newline
## Creating the Graph
Challenge in going from
### Spread Sheet Representation
```{r}
head(bigtable[25:28,c(1,21,23, 41)])
```
-------------------
\centering \scalebox{1.35}{\includegraphics{graphSample.pdf}}
## Producing the Base Graph
### Convert to a distance matrix
```{r, eval = FALSE}
bt <- pCorrelationMatrix(bigtable)
```
### Convert To Adjacency Matrix
```{r, eval = FALSE}
adjacencyMat <- bt < threshold
```
### Create an Igraph Object
```{r, eval = FALSE}
network <- igraph.adjacency(adjacencyMat)
```
## Converting the Igraph to a data.frame
Create a data.frame of vectices
```{r}
getVertices <- function(graph, vertexNames = NULL){
vertices <- as.data.frame(layout.fruchterman.reingold(graph))
names(vertices) <- c("x","y")
vertices$vertexName <- 1:nrow(vertices)
if(!is.null(vertexNames)) vertices$vertexName <- vertexNames
vertices$size <- get.vertex.attribute(graph, "weight")
vertices
}
```
## Converting the Igraph to a data.frame
Create a data.frame of edges
```{r}
getEdges <- function(graph, vertices){
edgeLocations <- get.edgelist(graph)
edgeCoords <- mapply(function(v1,v2){
c(vertices[v1,], vertices[v2,])
}, edgeLocations[,1], edgeLocations[,2])
edgeFrame <- as.data.frame(t(edgeCoords))[,c(1,2,5,6)]
edgeFrame[,1:4] <- lapply(edgeFrame[,1:4], as.numeric)
edgeFrame$weight <- get.edge.attribute(graph, "weight")
edgeFrame$npo <- get.edge.attribute(graph, "npo")
names(edgeFrame) <- c("x0", "y0", "x1", "y1", "weight", "npo")
return(edgeFrame)
}
```
## Do Both and Smoosh 'em Together
```{r}
graph2frame <- function(graph, vertexNames = NULL){
vertices <- getVertices(graph, vertexNames)
edges <- getEdges(graph, vertices)
names(vertices) <- c("x0","y0", "vertexName", "size")
vertices$x1 <- NA
vertices$y1 <- NA
vertices$weight <- NA
vertices$npo <- NA
vertices$use <- "vertex"
edges$vertexName <- NA
edges$use <- "edge"
edges$size <- NA
rbind(vertices, edges)
}
```
## Outline
- Introduce graphs
- Introduce igraph
- Introduce Interactivity with Shiny
- Introduce the diabetes project
- **Demo the diabetes project app**
- Offer resources
## The App
## Resources
- [\color{blue}{Igraph}](http://igraph.org/redirect.html)\newline
- [\color{blue}{Ggplot}](http://docs.ggplot2.org/current/)\newline
- [\color{blue}{Shiny}](http://shiny.rstudio.com/)\newline
- [\color{blue}{R Markdown}](http://rmarkdown.rstudio.com/)\newline
- [\color{blue}{Knitr}](http://yihui.name/knitr/)\newline
- [\color{blue}{Datatables for R}](http://rstudio.github.io/DT/)\newline
- [\color{blue}{My Blog!}](http://datamancy.blogspot.ca/)\newline
## Thanks For Having Me
\Large{Any questions?}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment