DannyArends/RqtlTutorial.md

## RqtlTutorial.md

      
    Raw
  

              RqtlTutorial.md
            
          
    R and R/qtl tutorial

Loading example data

R/qtl provides many example data sets in several species and crosses. After loading the package by using either a library()
or require() call, these can be loaded into the work space by using the data() function. To load the hyper data set containing measurements on hypertension in mice give the following commands in the R interpreter:
    library(qtl)       #Load the R/qtl library
    data(hyper)        #Load the example data set hyper
    hyper              #Print the data set summary
    data(multitrait)   #Load the example data set multitrait
    multitrait         #Print the data set summary
Your own data files

Loading your own data can be done by using the read.cross functionality provided by R/qtl. Try loading the data set on
the USB stick into R. Note that the format in which the data is saved is called csvr. Note that we have to explicitly
mention the genotype coding used in the file by using the genotypes parameter.
    mycross <- read.cross(format="csvr", file="i:/R/dataset.csvr", na.strings="-", genotypes=c("AA","AB"))
    mycross            #Print the data set summary
Plotting data

R/qtl comes with a range of plotting options the most used are: plot.map(), geno.image(), plot.rf() and plot.scanone().
Plotting a genetic map shows a graphical representation of the location of the markers on the chromosomes. The
plot.map() function allows us to input multiple genetic maps to compare against each other. (as long as there are
markers occuring in both maps). We use the est.map() function to re-estimate the genetic map of A. thaliana using a
different distance measurement.
    plot.map(hyper)                #Genetic map of M. musculus used in the hyper dataset
    plot.map(multitrait)           #Genetic map of A. thaliana used in the multitrait dataset
    newmap <- est.map(multitrait)  #Estimate a new map
    plot.map(multitrait, newmap)   #Compare with the previous map
Also we can be interested in genotypes, we want to know the amount of missing data and look at the segregation per
marker or across chromosomes. Also we can pull out a matrix of genotypes using the pull.geno() function.
  geno.image(multitrait)
  genotypematrix <- pull.geno(multitrait)
We can fix missing genotype values by multiple imputation or by expectation maximization, this algorithm fills the
missing values with the most likely values.
    multi_f <- fill.geno(multitrait)
    geno.image(multi_f)
We can plot recombination frequencies to detect any seggregation distortion. Furthermore looking at the
recombination frequencies gives us an initial idea of the resolution we might obtain in mapping.
    plot.rf(multitrait)                        #Show the recombinations
    plot(pull.rf(multitrait,"lod"),"BF.116C")  #Show the resolution on marker BF.116C
Scanning for QTL

R/qtl provides many ways of scanning QTL. The user has the option to use an EM algorithm, imputation,
Haley-Knott regression, the extended Haley-Knott method, or marker regression. All of these methods
can be applied to the cross object using the scanone() function. An example:
    result_hk <- scanone(multitrait, pheno.col=1, method= "hk")     #Scan the first column using Haley-Knott
    result_em <- scanone(multitrait, pheno.col=1, method= "em")     #Use expectation maximization
    result_imp <- scanone(multitrait, pheno.col=1, method= "imp")   #Use imputation
    plot(result_hk,result_em,result_imp,lwd=c(4,3,1),col=c("red","green","blue"))
    legend("topleft",c("hk","em","imp"),lwd=c(4,3,1),col=c("red","green","blue"))
To determine what LOD scores can be called significant, we use the n.perm parameter of the scanone function:
    perm_hk <- scanone(multitrait, pheno.col=1, method= "hk",n.perm=10000)
    plot(perm_hk)
    summary(perm_hk)  #print the thresholds
    plot(result_hk)
    abline(h=summary(perm_hk)[[1]],col="green",lty=2,lwd=2)
    abline(h=summary(perm_hk)[[2]],col="orange",lty=2,lwd=2)
Multiple QTL Mapping

Multiple QTL mapping allows to compensate for known genetic effects, also it provides an unbiased backward
selection strategy to select markers most associated with a trait. In most cases MQM cannot handle all
markers as cofactors (An overfit of the model because nind() <= cofactors). So as a rule of thumb we can
place (# individuals - 20) cofactors, which will be used in backward elimination. Also MQM needs complete
genotype information, which one can create using the fill.geno() function of the mqmaugment() function.
MQM augmentation makes a dataset unsuitable for usage with scanone, because individuals are augmented
(duplicated) inside the cross object.
    #Use imputation to fill the missing genotypes
    multi_filled <- fill.geno(multitrait)
    cofactors <- mqmautocofactors(multi_filled, 40, plot=T)
    qtlresult <- mqmscan(multi_filled, cofactors=cofactors, verbose=T, plot=T)
    
    #Use augmentation to fill the missing genotypes
    multi_aug <- mqmaugment(multitrait)
    geno.image(multi_aug)
    nind(multi_aug)   #The augmented set contains more individuals then
    nind(multitrait)  #The unaugmented dataset
    qtlresult <- mqmscan(multi_aug, cofactors=cofactors, verbose=T, plot=T)
And even more

R/qtl has many additional functions like determining LOD drops, confidence intervals and effect sizes. The
help files of these functions provide additional information and an example on how-to use these. A short
overview of the functions I use a lot (you can get help in R by using the ? or ??):
effects <- effectscan(sim.geno(multitrait))    #Raw effects on phenotype
lodint(qtlresult, chr=4)                       #Confidence interval for the QTL on chromosome 4
?calc.genoprob
?sim.geno

Disclaimer

Copyright (c) 2010 Danny Arends