R/qtl provides many example data sets in several species and crosses. After loading the package by using either a library() or require() call, these can be loaded into the work space by using the data() function. To load the hyper data set containing measurements on hypertension in mice give the following commands in the R interpreter:
library(qtl) #Load the R/qtl library
data(hyper) #Load the example data set hyper
hyper #Print the data set summary
data(multitrait) #Load the example data set multitrait
multitrait #Print the data set summary
Loading your own data can be done by using the read.cross functionality provided by R/qtl. Try loading the data set on the USB stick into R. Note that the format in which the data is saved is called csvr. Note that we have to explicitly mention the genotype coding used in the file by using the genotypes parameter.
mycross <- read.cross(format="csvr", file="i:/R/dataset.csvr", na.strings="-", genotypes=c("AA","AB"))
mycross #Print the data set summary
R/qtl comes with a range of plotting options the most used are: plot.map(), geno.image(), plot.rf() and plot.scanone(). Plotting a genetic map shows a graphical representation of the location of the markers on the chromosomes. The plot.map() function allows us to input multiple genetic maps to compare against each other. (as long as there are markers occuring in both maps). We use the est.map() function to re-estimate the genetic map of A. thaliana using a different distance measurement.
plot.map(hyper) #Genetic map of M. musculus used in the hyper dataset
plot.map(multitrait) #Genetic map of A. thaliana used in the multitrait dataset
newmap <- est.map(multitrait) #Estimate a new map
plot.map(multitrait, newmap) #Compare with the previous map
Also we can be interested in genotypes, we want to know the amount of missing data and look at the segregation per marker or across chromosomes. Also we can pull out a matrix of genotypes using the pull.geno() function.
geno.image(multitrait)
genotypematrix <- pull.geno(multitrait)
We can fix missing genotype values by multiple imputation or by expectation maximization, this algorithm fills the missing values with the most likely values.
multi_f <- fill.geno(multitrait)
geno.image(multi_f)
We can plot recombination frequencies to detect any seggregation distortion. Furthermore looking at the recombination frequencies gives us an initial idea of the resolution we might obtain in mapping.
plot.rf(multitrait) #Show the recombinations
plot(pull.rf(multitrait,"lod"),"BF.116C") #Show the resolution on marker BF.116C
R/qtl provides many ways of scanning QTL. The user has the option to use an EM algorithm, imputation, Haley-Knott regression, the extended Haley-Knott method, or marker regression. All of these methods can be applied to the cross object using the scanone() function. An example:
result_hk <- scanone(multitrait, pheno.col=1, method= "hk") #Scan the first column using Haley-Knott
result_em <- scanone(multitrait, pheno.col=1, method= "em") #Use expectation maximization
result_imp <- scanone(multitrait, pheno.col=1, method= "imp") #Use imputation
plot(result_hk,result_em,result_imp,lwd=c(4,3,1),col=c("red","green","blue"))
legend("topleft",c("hk","em","imp"),lwd=c(4,3,1),col=c("red","green","blue"))
To determine what LOD scores can be called significant, we use the n.perm parameter of the scanone function:
perm_hk <- scanone(multitrait, pheno.col=1, method= "hk",n.perm=10000)
plot(perm_hk)
summary(perm_hk) #print the thresholds
plot(result_hk)
abline(h=summary(perm_hk)[[1]],col="green",lty=2,lwd=2)
abline(h=summary(perm_hk)[[2]],col="orange",lty=2,lwd=2)
Multiple QTL mapping allows to compensate for known genetic effects, also it provides an unbiased backward selection strategy to select markers most associated with a trait. In most cases MQM cannot handle all markers as cofactors (An overfit of the model because nind() <= cofactors). So as a rule of thumb we can place (# individuals - 20) cofactors, which will be used in backward elimination. Also MQM needs complete genotype information, which one can create using the fill.geno() function of the mqmaugment() function. MQM augmentation makes a dataset unsuitable for usage with scanone, because individuals are augmented (duplicated) inside the cross object.
#Use imputation to fill the missing genotypes
multi_filled <- fill.geno(multitrait)
cofactors <- mqmautocofactors(multi_filled, 40, plot=T)
qtlresult <- mqmscan(multi_filled, cofactors=cofactors, verbose=T, plot=T)
#Use augmentation to fill the missing genotypes
multi_aug <- mqmaugment(multitrait)
geno.image(multi_aug)
nind(multi_aug) #The augmented set contains more individuals then
nind(multitrait) #The unaugmented dataset
qtlresult <- mqmscan(multi_aug, cofactors=cofactors, verbose=T, plot=T)
R/qtl has many additional functions like determining LOD drops, confidence intervals and effect sizes. The help files of these functions provide additional information and an example on how-to use these. A short overview of the functions I use a lot (you can get help in R by using the ? or ??):
effects <- effectscan(sim.geno(multitrait)) #Raw effects on phenotype
lodint(qtlresult, chr=4) #Confidence interval for the QTL on chromosome 4
?calc.genoprob
?sim.geno
Copyright (c) 2010 Danny Arends