Skip to content

Instantly share code, notes, and snippets.

@DannyArends
Created February 21, 2012 15:13
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save DannyArends/1876977 to your computer and use it in GitHub Desktop.
Save DannyArends/1876977 to your computer and use it in GitHub Desktop.
Small demo tutorial for R/qtl

R and R/qtl tutorial

Loading example data

R/qtl provides many example data sets in several species and crosses. After loading the package by using either a library() or require() call, these can be loaded into the work space by using the data() function. To load the hyper data set containing measurements on hypertension in mice give the following commands in the R interpreter:

    library(qtl)       #Load the R/qtl library
    data(hyper)        #Load the example data set hyper
    hyper              #Print the data set summary
    data(multitrait)   #Load the example data set multitrait
    multitrait         #Print the data set summary

Your own data files

Loading your own data can be done by using the read.cross functionality provided by R/qtl. Try loading the data set on the USB stick into R. Note that the format in which the data is saved is called csvr. Note that we have to explicitly mention the genotype coding used in the file by using the genotypes parameter.

    mycross <- read.cross(format="csvr", file="i:/R/dataset.csvr", na.strings="-", genotypes=c("AA","AB"))
    mycross            #Print the data set summary

Plotting data

R/qtl comes with a range of plotting options the most used are: plot.map(), geno.image(), plot.rf() and plot.scanone(). Plotting a genetic map shows a graphical representation of the location of the markers on the chromosomes. The plot.map() function allows us to input multiple genetic maps to compare against each other. (as long as there are markers occuring in both maps). We use the est.map() function to re-estimate the genetic map of A. thaliana using a different distance measurement.

    plot.map(hyper)                #Genetic map of M. musculus used in the hyper dataset
    plot.map(multitrait)           #Genetic map of A. thaliana used in the multitrait dataset
    newmap <- est.map(multitrait)  #Estimate a new map
    plot.map(multitrait, newmap)   #Compare with the previous map

Also we can be interested in genotypes, we want to know the amount of missing data and look at the segregation per marker or across chromosomes. Also we can pull out a matrix of genotypes using the pull.geno() function.

  geno.image(multitrait)
  genotypematrix <- pull.geno(multitrait)

We can fix missing genotype values by multiple imputation or by expectation maximization, this algorithm fills the missing values with the most likely values.

    multi_f <- fill.geno(multitrait)
    geno.image(multi_f)

We can plot recombination frequencies to detect any seggregation distortion. Furthermore looking at the recombination frequencies gives us an initial idea of the resolution we might obtain in mapping.

    plot.rf(multitrait)                        #Show the recombinations
    plot(pull.rf(multitrait,"lod"),"BF.116C")  #Show the resolution on marker BF.116C

Scanning for QTL

R/qtl provides many ways of scanning QTL. The user has the option to use an EM algorithm, imputation, Haley-Knott regression, the extended Haley-Knott method, or marker regression. All of these methods can be applied to the cross object using the scanone() function. An example:

    result_hk <- scanone(multitrait, pheno.col=1, method= "hk")     #Scan the first column using Haley-Knott
    result_em <- scanone(multitrait, pheno.col=1, method= "em")     #Use expectation maximization
    result_imp <- scanone(multitrait, pheno.col=1, method= "imp")   #Use imputation
    plot(result_hk,result_em,result_imp,lwd=c(4,3,1),col=c("red","green","blue"))
    legend("topleft",c("hk","em","imp"),lwd=c(4,3,1),col=c("red","green","blue"))

To determine what LOD scores can be called significant, we use the n.perm parameter of the scanone function:

    perm_hk <- scanone(multitrait, pheno.col=1, method= "hk",n.perm=10000)
    plot(perm_hk)
    summary(perm_hk)  #print the thresholds
    plot(result_hk)
    abline(h=summary(perm_hk)[[1]],col="green",lty=2,lwd=2)
    abline(h=summary(perm_hk)[[2]],col="orange",lty=2,lwd=2)

Multiple QTL Mapping

Multiple QTL mapping allows to compensate for known genetic effects, also it provides an unbiased backward selection strategy to select markers most associated with a trait. In most cases MQM cannot handle all markers as cofactors (An overfit of the model because nind() <= cofactors). So as a rule of thumb we can place (# individuals - 20) cofactors, which will be used in backward elimination. Also MQM needs complete genotype information, which one can create using the fill.geno() function of the mqmaugment() function. MQM augmentation makes a dataset unsuitable for usage with scanone, because individuals are augmented (duplicated) inside the cross object.

    #Use imputation to fill the missing genotypes
    multi_filled <- fill.geno(multitrait)
    cofactors <- mqmautocofactors(multi_filled, 40, plot=T)
    qtlresult <- mqmscan(multi_filled, cofactors=cofactors, verbose=T, plot=T)
    
    #Use augmentation to fill the missing genotypes
    multi_aug <- mqmaugment(multitrait)
    geno.image(multi_aug)
    nind(multi_aug)   #The augmented set contains more individuals then
    nind(multitrait)  #The unaugmented dataset
    qtlresult <- mqmscan(multi_aug, cofactors=cofactors, verbose=T, plot=T)

And even more

R/qtl has many additional functions like determining LOD drops, confidence intervals and effect sizes. The help files of these functions provide additional information and an example on how-to use these. A short overview of the functions I use a lot (you can get help in R by using the ? or ??):

effects <- effectscan(sim.geno(multitrait))    #Raw effects on phenotype
lodint(qtlresult, chr=4)                       #Confidence interval for the QTL on chromosome 4
?calc.genoprob
?sim.geno

Disclaimer

Copyright (c) 2010 Danny Arends

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment