iracooke/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Enrichment analysis with topGO

Input data needed;

A file orf2go.map for your study organism.  It should contain entries like this;

cds.comp1000018_c0_seq1|m.205055	GO:0005618, GO:0016021, GO:0045330, GO:0004857, GO:0030599, GO:0042545
cds.comp1000154_c0_seq1|m.205060	GO:0030136, GO:0031901, GO:0005768


If you don't have an orf2go.map file but you have a Trinity sqlite file you can generate one using trinotate_proteomics as follows;
	orf_to_go.rb Trinotate.sqlite > orf2go.map

A file containing a list of IDs for a specific tissue, or (optionally) with quantitative expression info.

Step 1: Read mappings file

	library(topGO)
	geneID2GO <- readMappings(file = "orf2go.map")
Step2: Create the background set

Note that in this example the file orf2go.map serves 2 purposes. It contains all transcriptID -> GO mappings.  But it also contains a complete list of all the transcriptID's.  This complete list of transcript ID's serves as the background set for the enrichment analysis.  This means that the code to generate the background set is trivial (see below).
	geneNames <- names(geneID2GO)
Step3: Create the sample set

This is the set of ID's in which we will look for enriched GO terms.  This set can be any subset of the background, and the set that is chosen affects the interpretation of the analysis.  In our proteomics analyses we might look at the top 200 most expressed proteins (by iBAQ).  Alternatively the top hits from the differential expression analysis could be used.
	myInterestingGenes=... my code to generate these
It is essential that the items in the myInterestingGenes vector correspond to items within the background ie (geneNames).
	geneList <- factor(as.integer(geneNames %in% myInterestingGenes))
	names(geneList) <- geneNames
Step4: Create GOData object

	GOdata <- new("topGOdata", ontology = "BP", allGenes = geneList,annot = annFUN.gene2GO, gene2GO = geneID2GO)
You might also want to try the other two GO ontologies MF, Molecular Function and CC Cellular Component
Step5: Run analysis

This runs a Fisher's exact test
	resultFis <- runTest(GOdata, algorithm = "classic", statistic = "fisher")
Step6: Look at the results

This creates a plot, but the table allRes is perhaps more useful.
	allRes <- GenTable(GOdata, classic = resultFis,orderBy = "weight", ranksOf = "classic", topNodes = 20)
	showSigOfNodes(GOdata, score(resultFis), firstSigNodes = 5, useInfo = 'all')