Created
September 16, 2019 06:41
-
-
Save ag1805x/89cbc34ca9b20b877a3f070455289906 to your computer and use it in GitHub Desktop.
Using hclust() to cluster RNA-seq samples and visualize dendrogram
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(dendextend) | |
countsPC <- read.table("CountsPC_Gene_MultimapOverlap.tsv", header=T, row.names=c(1)) | |
factors <- as.data.frame(read.table("factors.txt", sep="\t", header = TRUE, row.names=c(1))) | |
dist <- dist(t(countsPC), method="euclidean") | |
cluster <- hclust(dist) | |
# main, sub, xlab, ylab | |
dend <- as.dendrogram(cluster) | |
#Add colors to label | |
label_colors_to_use <- as.numeric(factors$SampleType) #Input groups of sample type for label color | |
label_colors_to_use <- label_colors_to_use[order.dendrogram(dend)] #Order group annotation as per in dendrogram | |
#Use to add custom label color | |
for(i in 1:length(label_colors_to_use)){ | |
if(label_colors_to_use[i] == 1){ | |
label_colors_to_use[i] = "blue" | |
}else if(label_colors_to_use[i] == 2){ | |
label_colors_to_use[i] = "red" | |
} | |
} | |
#Add colored bars showing study accession | |
StudyAccession_colors_to_use <- as.numeric(factors$StudyAccession) #Input groups of sample type for label color | |
StudyAccession_colors_to_use <- StudyAccession_colors_to_use[order.dendrogram(dend)] #Order group annotation as per in dendrogram | |
dend <- set(dend, "labels_colors", label_colors_to_use) #Apply colors to labels | |
dend <- hang.dendrogram(dend, hang=-1) #Leaves at same height | |
dend <- set(dend, "branches_lwd", 2) #Thick branches | |
dend <- set(dend, "labels_cex", 1) #label font size | |
tiff(filename="Sample_hclust.tiff", height=10, width=20, units='in', res=300) | |
par(mar = c(10,5,3,1)) | |
plot(dend, main = "Sample Clustering\ndist.method=Eucleadian", ylab = "Height") #Plot the dendrogram | |
colored_bars(colors = StudyAccession_colors_to_use, dend = dend, rowLabels = "Project", sort_by_labels_order = FALSE) | |
dev.off() | |
Thanks for a quick reply, I will try that. for 2) not sure yet how to use that "rowSums", but will search and come back if I'm really stuck.
Just use this on your raw count matrix:
keep <- rowSums(cpm(counts)>1) >1
counts_filt <- counts[keep, ]
After this continue as is with the filtered counts.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The
factors.txt
file contains sample information. Each row for each sample. Columns for covariate/condition (eg. test or control)etc. As you mentioned, the factors is used for colours here. The same factors file can be used in other steps of analysis including DE testing.This is an issue even I have observed often. May be by reading the original paper more carefully about how they created the plots will help. You can also try the process you mentioned to consider top expressed genes or most variable genes. Instead of using all genes it may be better to filter genes with low/zero counts. eg.
rowSums(cpm(counts_pc)>1) >1