benmarwick/AU13_ARCHY_482_Geoarch.R

## AU13_ARCHY_482_Geoarch.R
## get data from google sheet
# connect to google sheet
require(RCurl) # if you get an error message here that says something like 'there is no package called 'RCurl''
# then you need to install the package by typing this into the console (and then hitting enter): install.packages("RCurl")
# wait for the package to download and install, then run line 3 again before proceeding.
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
# in google spreadsheet, go to file-> publish to web -> get link to publish to web -> get csv file
goog <- "https://docs.google.com/spreadsheet/pub?key=0As7CmPqGXTzldE5UV3FOZTRlYy10ZF9JOWcxYy1tSGc&single=true&gid=0&output=csv"
# assign data from google sheet to R object
data <- read.csv(textConnection(getURL(goog)), stringsAsFactors = FALSE)

###########################################################################
############### Exploring the pH and EC data ################################

# let's just make a nice small table of pH and EC data
# since we have a complete set of those now
pH_EC <- data[,c('Sample.ID',
                 'mean.pH',
                 'mean.EC',
                 'Group.members..initials')]
# we've got a lot of rows at the bottom with NA in them, let's exclude those
pH_EC <- na.omit(pH_EC)

# Now we're ready for some exploratory data analysis! We'll go through
# bunch of simple analysis that will help us understand the data
# better than what we get from just scanning the column of numbers
# Some of the results of these analysis will be details you'll want
# to include in your lab report (and some wont). As you read for the
# Wednesday seminars you should be looking to see what the norms are
# for data display and reporting, so you can adhere to these norms in
# your work.

# here's a simple function to compute some common summary statistics
summary(pH_EC)
# But we see a small problem, the pH and EC variables are
# formatted as 'character' rather than 'numeric', that's easy
# to fix...
pH_EC$mean.pH <- as.numeric(pH_EC$mean.pH)
pH_EC$mean.EC <- as.numeric(pH_EC$mean.EC)
# try the simple function again
summary(pH_EC)

# It's good practise during exploratory data analysis do visualise the
# data to see the shapes of the distributions in case anything
# in ususual or worthy of further investigation. So let's do some plotting

# here's a histogram showing the distribution of pH values
# first load the plotting library
require(ggplot2) # if you get an error, see line 3 and adapt those instructions
# now make the plot
ggplot(pH_EC, aes(mean.pH)) +
  geom_histogram()

# Now write your own code to make a histogram of EC values. Start by
# copying and pasting the code for the pH histogram and then
# modifying that.

# So now we've got a good sense of the distributions of those
# variables, let's see how they change over time in our site

require(rioja) # if you get an error message, see line 3 and adapt
     strat.plot(pH_EC[,-1],
                yvar = pH_EC$Sample.ID,
                y.rev = TRUE,
                ylabel = "Depth below surface (m)",
                col.line = "black", # try others that you like
                lwd.line = 1 )      # ditto
dev.off() # clear the plot

# the resulting plot gives us a good idea about
# the trends over time in the pH and EC variables
# The first EC measurement looks like quite an outlier
# so let's exlcude it and plot ...

strat.plot(pH_EC[-1,2:3],
           yvar = pH_EC$Sample.ID[-1],
           y.rev = TRUE,
           ylabel = "Depth below surface (m)",
           col.line = "black", # try others that you like
           lwd.line = 1 )      # ditto
dev.off() # clear the plot

# One thing that looks a bit odd about this plot is
# the spikey shape of the curve... like a regular
# rhythm of something happening to our samples
# One hypothesis we can propose about this is
# that there is a systematic error in our analysis
# Perhaps one group is consistently different from
# the other groups? Let's test that hypothesis!

# Let's plot the values by each lab group
# starting with EC, which looks relatively flat
# over time...
ggplot(pH_EC, aes(Group.members..initials, mean.EC)) +
  geom_boxplot()
# There does seem to be some difference between groups
# Let's explore this more formally with a statistical test

# One-Way Permutation Test based on
# 9999 Monte-Carlo resamplings.
require(coin) # you'll get an error... see line 3...
oneway_test(mean.EC ~ as.factor(Group.members..initials), data = pH_EC,
            distribution = approximate(B = 9999))
# conventionally we reject the null hypothesis of 'no difference
# between groups' if the p-value is less than 0.05. That is, if
# p-value < 0.05 then we say YES there IS a signficant difference
# between the groups.

# A bit more detail:
# p-values tells you the probability of the difference
# in a collection of random data in which you'd
# have no reason to suspect any relationship. A
# p-value of 0.05 indicates a 5% chance that
# results you are seeing would have come up in
# a random dataset, so you can say with a 95%
# probability of being correct that you are
# observing a real relationship in your data.
# Scholarly convention tends to hold p values
# of less than 0.05 as indicators of a
# statistically signficant relationship. Above
# 0.05 is not significant.
#
# Note that the size of the p-value in your
# test says nothing about the
# ssize of the difference between the groups
# - it is possible to have a
# highly significant result (very small p-value,
# eg. p = 0.001) for a miniscule effect (ie.
# a difference very close to zero) that might not
# be of any real-world importance or archaeological
# meaning. Sample size is an important key to making sense
# of the p-values here.
#
# You need to use your common sense
# when interpreting your statistical data; don't
# just mindlessly copy and paste them and get
# excited when p-values
# are small. Ensure that you're only
# getting excited about data that make
# sense in the real world.


# Are the range of measurements for each sample greater
# than the differences between groups?
data <- data.frame(apply(data, 2, function(x) as.numeric(as.character(x))))[1:41,]
# pH repeated measurement ranges
apply(data[,7:8], 1, max)  - apply(data[,7:8], 1, min)
# EC repeated measurement ranges
ECrep <- data.frame(rep = apply(data[,11:13], 1, max)  - apply(data[,11:13], 1, min))
# plot
ggplot(ECrep, aes(rep)) +
  geom_histogram()

## For calibration of differenct pH/EC meters
# which sample is the max EC value for each group?
library(plyr)
f <- function(x) x[which.max( x[["mean.EC"]] ), ]
## x will be a subset of Dat according to ID
ddply(pH_EC, "Group.members..initials", f)

# which sample is the max EC value for each group?
f <- function(x) x[which.min( x[["mean.EC"]] ), ]
## x will be a subset of Dat according to ID
ddply(pH_EC, "Group.members..initials", f)

###########################################################################
############### Exploring the Mag sus data ################################

# let's just make a nice small table of pH, EC and mag sus data
# since we have a complete set of those now
pH_EC_ms <- data[,c('Sample.ID',
                 'mean.pH',
                 'mean.EC',
                 'MS.LF.reading.average.mass.susceptibility',
                 'MS.FD',
                 'Group.members..initials')]
# we've got a lot of rows at the bottom with NA in them, let's exclude those
# pH_EC_ms <- na.omit(pH_EC_ms)
# replace google docs' '#DIV/0!' with something that R understands
pH_EC_ms[ pH_EC_ms == '#DIV/0!']  = NA
# convert mag sus data type from character to numeric so we
# can do calculations with it
pH_EC_ms$MS.LF.reading.average.mass.susceptibility <- as.numeric(as.character(pH_EC_ms$MS.LF.reading.average.mass.susceptibility))
pH_EC_ms$MS.FD <- as.numeric(as.character(pH_EC_ms$MS.FD))

# rename this variable so it's not such a mouthful
names(pH_EC_ms)[names(pH_EC_ms) == 'MS.LF.reading.average.mass.susceptibility'] <- 'mean.MS.LF'
names(pH_EC_ms)[names(pH_EC_ms) == 'MS.FD'] <- 'mean.MS.FD'

# Now for some Exploratory Data Analysis (this is an actual thing, often referred
# to as EDA: http://en.wikipedia.org/wiki/Exploratory_data_analysis pioneered by
# one of my heros, John Tukey)
summary(pH_EC_ms)

# quick visual inpsection by plot
require(ggplot2)
ggplot(pH_EC_ms, aes(mean.MS.LF)) +
  geom_histogram()

require(rioja)
strat.plot(pH_EC_ms[,-1],
           yvar = pH_EC_ms$Sample.ID,
           y.rev = TRUE,
           ylabel = "Depth below surface (m)",
           col.line = "black", # try others that you like
           lwd.line = 1 )

# biplot of LF and %FD, using sample IDs as data points
# to aid reading of the plot
ggplot(pH_EC_ms, aes(mean.MS.LF, mean.MS.FD)) +
  geom_text(aes(label = Sample.ID))

# change point detection - Binary segmentation (most common method)
require(changepoint)
data_change <- cpt.mean(pH_EC_ms$mean.pH, method='BinSeg')
data_change <- cpt.meanvar(pH_EC_ms$mean.pH,test.stat='Normal',method='PELT', penalty="BIC")
# plot results
plot(data_change, type='l', cpt.col='red', xlab='Sample',
     ylab='Average MS', cpt.width=2, main="Changepoints in Magnetic Susceptibility values")

# change point detection - Bayesian
require(bcp)
data_bcp <- bcp(pH_EC_ms$mean.pH)
plot(data_bcp)

###########################################################################
############### Exploring the LOI data ####################################

## LOI data

pH_EC_ms_LOI <- data[,c('Sample.ID',
                    'mean.pH',
                    'mean.EC',
                    'MS.LF.reading.average.mass.susceptibility',
                    'MS.FD',
                    'LOI.percent.organic.material..Average',
                    'LOI.Percent.carbonate.content..Average',
                    'Group.members..initials')]
# replace google docs' '#DIV/0!' with something that R understands
pH_EC_ms_LOI[ pH_EC_ms_LOI == '#DIV/0!']  = NA
# convert data types from character to numeric so we
# can do calculations with it, do this as a batch
pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI)-1)] <- apply(pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI))-1], 2, function(x) as.numeric(as.character(x)))

# rename some variables so it's not such a mouthful
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.LF.reading.average.mass.susceptibility'] <- 'mean.MS.LF'
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.FD'] <- 'mean.MS.FD'
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.percent.organic.material..Average'] <- 'mean.Organic'
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.Percent.carbonate.content..Average'] <- 'mean.CaCO3'

# You know the drill... let's do some EDA..
summary(pH_EC_ms_LOI)

# quick visual inpsection by histogram
require(ggplot2)
ggplot(pH_EC_ms_LOI, aes(mean.Organic)) +
  geom_histogram()

require(ggplot2)
ggplot(pH_EC_ms_LOI, aes(mean.CaCO3)) +
  geom_histogram()

dev.off()

require(rioja)
strat.plot(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))],
           yvar = pH_EC_ms_LOI$Sample.ID,
           y.rev = TRUE,
           ylabel = "Depth below surface (m)",
           col.line = "black", # try others that you like
           lwd.line = 1 )

#### slightly more fancy plot ################
# with dendrogram showing groups of similar samples
?chclust # to get information how the clustering is done
# We will add a dendrogram from constrained cluster analysis
# to help identify groups of similar samples. We might conclude
# that similar samples represent similar environments of deposition

# We'll use 'coniss', a stratigraphically constrained cluster
# analysis by method of incremental sum of squares
diss <- dist(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))])
clust <- chclust(diss, method = "coniss")
# broken stick model to suggest significant zones, 3?
bstick(clust) # look for a sharp elbow, that's the ideal number of clusters

# Now, plot with clusters, you might want to include this one in your report
x <- strat.plot(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))],
                yvar = pH_EC_ms_LOI$Sample.ID,
                y.rev = TRUE,
                ylabel = "Depth below surface (m)",
                col.line = "black", # try others that you like
                lwd.line = 1,  	# ditto
                clust = clust)
# add lines to indicate clusters, leave out if it looks odd
addClustZone(x, clust, 3, col = "red")

##########################################################
##########################################################
# particle size data

# subset just the sieve and pipette data
psd <- data[, names(data) %in% c("Sample.ID",
"percent.mass.of.total.used.for.pipetting.in.5.6.mm.screen..g.",
"percent.mass.of.total.used.for.pipetting.in.4.75.mm.screen..g.",
"percent.mass.of.total.used.for.pipetting.in.2.8.mm.screen..g.",
"percent.mass.of.total.used.for.pipetting.2.33.mm.screen..g.",
"percent.mass.of.total.used.for.pipetting.in.2.00.mm.screen..g.",
"percent.mass.of.total.used.for.pipetting.in.1.18.mm.screen..g." ,
"percent.mass.of.total.used.for.pipetting.in.1.00.mm.screen..g."  ,
"percent.mass.of.total.used.for.pipetting.in.800.um.screen..g."    ,
"percent.mass.of.total.used.for.pipetting.in.710.um.screen..g."     ,
"percent.mass.of.total.used.for.pipetting.in.500.um.screen..g.",
"percent.mass.of.total.used.for.pipetting.in.425.um.screen..g." ,
"percent.mass.of.total.used.for.pipetting.in.355.um.screen..g."  ,
"percent.mass.of.total.used.for.pipetting.in.250.um.screen..g."   ,
"percent.mass.of.total.used.for.pipetting.in.125.um.screen..g."    ,
"percent.mass.of.fraction.62.5...31.3.um",
"percent.mass.of.fraction.31.3...15.6.um" ,
"percent.mass.of.fraction.15.6...7.8.um"   ,
"percent.mass.of.fraction..7.8...3.9.um"    ,
"percent.mass.of.fraction.3.9...1.9.um"      ,
"percent.mass.of.fraction.finer.than.2.um")]


# replace google docs' '#DIV/0!' with something that R understands
psd[ psd == '#DIV/0!']  = NA
# convert data types from character to numeric so we
# can do calculations with it, do this as a batch
psd[,1:(length(psd))] <- apply(psd[,1:(length(psd))], 2, function(x) as.numeric(as.character(x)))

# get sizes from column names
# psd_size <- as.numeric(unlist(regmatches(names(psd),gregexpr("[[:digit:]]+\\.*[[:digit:]]*",names(psd)))))
# standardise units to um
# psd_size <- ifelse(psd_size > 50,  psd_size, psd_size * 1000)
psd_size <- c(62.5, 31.3, 15.6, 7.8, 3.9, 2, 5600, 4750, 2800,
2300, 2000, 1180, 1000, 800, 710, 500, 425, 355, 250, 125)

# tranpose dataframe so we can have a column of size
psd_t <- data.frame(t(psd))
colnames(psd_t) <- psd$Sample.ID
psd_t  <- data.frame(psd_t[-1,])
psd_t$size <- psd_size

# plot of particle size distribution per sample
require(reshape2)
psd_m <- melt(psd_t, id.var = 'size')

# plot all samples (quite slow!)
xaxbreaks <-   c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04)
require(ggplot2)
ggplot(psd_m, aes(size, value)) +
  geom_bar(stat="identity") +
  scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  facet_wrap(~ variable) +
  ylab("percentage mass")

# plot only one sample
psd_m_1 <- psd_m[psd_m$variable == 'X3.5',] # subset sample 3.5, for example, you can change it!
xaxbreaks <-   c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04)
require(ggplot2)
ggplot(psd_m_1, aes(size, value)) +
  geom_bar(stat="identity") +
  scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))  +
  ylab("percentage mass")


# combine variables into three groups: sand, silt and clay
# using Wentworth classes...
# sand: 2000 - 62.5 um
# silt: 62.5 - 4 um
# clay: < 4 um

sand <- colSums(psd_t[ psd_t$size >= 62.5 & psd_t$size < 2000, ] )
silt <- colSums(psd_t[ psd_t$size >= 4 & psd_t$size < 62.5, ] )
clay <- colSums(psd_t[ psd_t$size >= 0 & psd_t$size < 4, ] )

# combine into one data frame
three_classes <- data.frame(CLAY = clay,
                            SILT = silt,
                            SAND = sand)

# remove size row
three_classes <- three_classes[-nrow(three_classes),]

# clean up row names
rownames(three_classes) <- as.numeric(unlist(regmatches(rownames(three_classes),gregexpr("[[:digit:]]+\\.*[[:digit:]]*",rownames(three_classes)))))

# remove NAs
three_classes <- na.omit(three_classes)

# plot samples in ternary plot
# see here for more details:
# http://cran.r-project.org/web/packages/soiltexture/vignettes/soiltexture_vignette.pdf
require(soiltexture)
dev.off() # clear plotting area
# draw empty triangle, just for fun
TT.plot(
  class.sys = "USDA.TT", # "AU.TT" for Australia
  class.p.bg.col = FALSE
) #
# draw triangle with samples plotted
# Display the USDA texture triangle:
geo <- TT.plot(class.sys="USDA.TT")
# Display the text
TT.text(
  tri.data = three_classes,
  geo = geo,
  labels =  rownames(three_classes),
  font = 2,
  col = "blue",
  tri.sum.tst = FALSE
) #

# stratigraphic plot
# change col names to be more pretty
names(three_classes) <- tolower(names(three_classes))
# put cols in a logical order
three_classes <- three_classes[c("sand", "silt", "clay")]
dev.off() # clear plotting area
require(rioja)
strat.plot(three_classes,
           yvar = as.numeric(row.names(three_classes)),
           y.rev = TRUE,
           ylabel = "Depth below surface (m)",
           col.line = "black", # try others that you like
           lwd.line = 1,
           scale.percent=TRUE)


#### slightly more fancy plot: with dendrogram showing groups of similar samples
?chclust # to get information how the clustering is done
# add a dendrogram from constrained cluster analysis
# coniss is a stratigraphically constrained cluster analysis by
# method of incremental sum of squares
diss <- dist(three_classes)
clust <- chclust(diss, method = "coniss")
# broken stick model to suggest significant zones, 3?
bstick(clust)
# plot with clusters
x <- strat.plot(three_classes,
                yvar = as.numeric(row.names(three_classes)),
                y.rev = TRUE,
                ylabel = "Depth below surface (m)",
                col.line = "black", # try others that you like
                lwd.line = 1,  	# ditto
                clust = clust,
)
# add lines to indicate clusters, leave out if it looks odd
addClustZone(x, clust, 3, col = "red")
dev.off() # clear that plot

## Lab_Report_template.Rmd
Title of your lab report
========================================================


## Introduction

You are going to write your lab report as an R Markdown document. Markdown is a simple formatting syntax that is used to write plain text documents that can be created and edited in many different programs and easily converted into HTML and PDF. You should open and edit this document in RStudio, which has many useful features for writing markdown. To learn more, click the **Help** toolbar button for more details on using R Markdown. It should take you five minutes to learn all about Markdown that you need to know to write your lab report. If you want to spend more than five minutes, you can read the detailed syntax [here](http://daringfireball.net/projects/markdown/syntax). When you click the **Knit HTML** button in RStudio, a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

```{r, echo = FALSE}
# get a summary of some built-in data about cars
summary(cars)
```

## Background

The reason why we are using this exotic format for our lab report is to learn about reproducible research. Reproducible research is an approach to science  that connects specific instructions to data analysis and empirical data so that scholarship can be recreated, better understood and verified. In practice, this means that all the results from a publication or report can be reproduced with the code and data available online. The advantages of this approach include:

* We can automatically regenerate documents when our code, data, or assumptions change. For example if we do further analysis of our samples we can just add it on to this report.
* We minimize or eliminate transposition errors that occur when copying results (eg from Excel) into documents.
* We preserve a detailed preserve contextual narrative about why analysis was performed in a certain fashion (ie. we don't have to search on our computer for all the files because everything is one file)
* Documentation for the analytic and computational processes from which conclusions are drawn. In case anyone wants to know _all_ the details of our statistical methods, they can see them all in the code blocks here.

Our specific tool-kit for reproducible research is a very common one: R + RStudio + Markdown. Although it's common, it has a bit of a steep learning curve. There's a lot of documentation and examples on the web that can ease the frustration. You may never (want to) write another document using this tool-kit ever again, but I think it's important that you have an understanding of this approach to doing science, in the hope that some of the ideas will rub off to improve any kind of empirical work that you might do in the future. This approach to doing science is becoming widespread in computer science and some areas of biology and psychology. I expect it's part of a general shift in the way all sciences will be done, and will come to archaeology eventually.

## Methods and Materials

### Chemical analyses

Brief description of how you measured pH, EC, SOM , CaCO~3~...

### Physical analyses

Brief description of how you measured colour, magnetic suceptibility and particle size distributions...

## Results

```{r input-all-data, echo = FALSE, message = FALSE, cache=TRUE}
### This chunk pulls in all our data from google sheet ###
# It shoult be located at the very start of the results section, since all of the
# later chunks depend on it.


# connect to google sheet
require(RCurl) # if you get an error message here that says something like 'there is no package called 'RCurl''
# then you need to install the package by typing this into the console (and then hitting enter): install.packages("RCurl")
# wait for the package to download and install, then run line 3 again before proceeding.
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
# in google spreadsheet, go to file-> publish to web -> get link to publish to web -> get csv file
goog <- "https://docs.google.com/spreadsheet/pub?key=0As7CmPqGXTzldE5UV3FOZTRlYy10ZF9JOWcxYy1tSGc&single=true&gid=0&output=csv"
# assign data from google sheet to R object
data <- read.csv(textConnection(getURL(goog)), stringsAsFactors = FALSE)

##########################################################
# pH, EC, LOI and Mag sus data

# let's just make a nice small table of pH, EC, mag sus & LOI data
# since we have a complete set of those now
pH_EC_ms_LOI <- data[,c('Sample.ID',
                    'mean.pH',
                    'mean.EC',
                    'MS.LF.reading.average.mass.susceptibility',
                    'MS.FD',
                    'LOI.percent.organic.material..Average',
                    'LOI.Percent.carbonate.content..Average',
                    'Group.members..initials')]
# replace google docs' '#DIV/0!' with something that R understands
pH_EC_ms_LOI[ pH_EC_ms_LOI == '#DIV/0!']  = NA
# convert data types from character to numeric so we
# can do calculations with it, do this as a batch
pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI)-1)] <- apply(pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI))-1], 2, function(x) as.numeric(as.character(x)))
# rename some variables so it's not such a mouthful
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.LF.reading.average.mass.susceptibility'] <- 'mean.MS.LF'
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.FD'] <- 'mean.MS.FD'
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.percent.organic.material..Average'] <- 'mean.Organic'
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.Percent.carbonate.content..Average'] <- 'mean.CaCO3'


##########################################################
# particle size data

# subset just the sieve and pipette data
psd <- data[, names(data) %in% c("Sample.ID",
"percent.mass.of.total.used.for.pipetting.in.5.6.mm.screen..g.",
"percent.mass.of.total.used.for.pipetting.in.4.75.mm.screen..g.",
"percent.mass.of.total.used.for.pipetting.in.2.8.mm.screen..g.",
"percent.mass.of.total.used.for.pipetting.2.33.mm.screen..g.",
"percent.mass.of.total.used.for.pipetting.in.2.00.mm.screen..g.",
"percent.mass.of.total.used.for.pipetting.in.1.18.mm.screen..g." ,
"percent.mass.of.total.used.for.pipetting.in.1.00.mm.screen..g."  ,
"percent.mass.of.total.used.for.pipetting.in.800.um.screen..g."    ,
"percent.mass.of.total.used.for.pipetting.in.710.um.screen..g."     ,
"percent.mass.of.total.used.for.pipetting.in.500.um.screen..g.",
"percent.mass.of.total.used.for.pipetting.in.425.um.screen..g." ,
"percent.mass.of.total.used.for.pipetting.in.355.um.screen..g."  ,
"percent.mass.of.total.used.for.pipetting.in.250.um.screen..g."   ,
"percent.mass.of.total.used.for.pipetting.in.125.um.screen..g."    ,
"percent.mass.of.fraction.62.5...31.3.um",
"percent.mass.of.fraction.31.3...15.6.um" ,
"percent.mass.of.fraction.15.6...7.8.um"   ,
"percent.mass.of.fraction..7.8...3.9.um"    ,
"percent.mass.of.fraction.3.9...1.9.um"      ,
"percent.mass.of.fraction.finer.than.2.um")]

# replace google docs' '#DIV/0!' with something that R understands
psd[ psd == '#DIV/0!']  = NA
# convert data types from character to numeric so we
# can do calculations with it, do this as a batch
psd[,1:(length(psd))] <- apply(psd[,1:(length(psd))], 2, function(x) as.numeric(as.character(x)))

# get sizes from column names
# psd_size <- as.numeric(unlist(regmatches(names(psd),gregexpr("[[:digit:]]+\\.*[[:digit:]]*",names(psd)))))
# standardise units to um
# psd_size <- ifelse(psd_size > 50,  psd_size, psd_size * 1000)
psd_size <- c(62.5, 31.3, 15.6, 7.8, 3.9, 2, 5600, 4750, 2800,
2300, 2000, 1180, 1000, 800, 710, 500, 425, 355, 250, 125)

# tranpose dataframe so we can have a column of size
psd_t <- data.frame(t(psd))
colnames(psd_t) <- psd$Sample.ID
psd_t  <- data.frame(psd_t[-1,])
psd_t$size <- psd_size
require(reshape2)
psd_m <- melt(psd_t, id.var = 'size')
```

Your one or two sentence summary observations of the most striking changes in the sedimentary sequence at the site...

### Chemical analyses

Now the details... pH, EC, SOM, CaCO~3~ e.g. The pH values ranged from `r max(pH_EC_ms_LOI$mean.pH)` to `r min(pH_EC_ms_LOI$mean.pH)`...

### Physical analyses

Particle size distributions...

```{r psd-plot1, echo=FALSE, message=FALSE, warning=FALSE, fig.cap='this is one plot'}
# plot particle size distributions for all samples (quite slow!)
xaxbreaks <-   c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04)
require(ggplot2)
ggplot(psd_m, aes(size, value)) +
  geom_bar(stat="identity") +
  scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  facet_wrap(~ variable) +
  ylab("percentage mass")
```


```{r psd-plot2, echo=FALSE, message=FALSE, fig.cap='this is one plot'}
# plot only one sample
psd_m_1 <- psd_m[psd_m$variable == 'X3.5',] # subset sample 3.5, for example, you can change it!
xaxbreaks <-   c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04)
require(ggplot2)
ggplot(psd_m_1, aes(size, value)) +
  geom_bar(stat="identity") +
  scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))  +
  ylab("percentage mass")
```


```{r psd-plot4, echo=FALSE, message=FALSE, fig.cap='this is one plot'}
# combine variables into three groups: sand, silt and clay
# using Wentworth classes...
# sand: 2000 - 62.5 um
# silt: 62.5 - 4 um
# clay: < 4 um

sand <- colSums(psd_t[ psd_t$size >= 62.5 & psd_t$size < 2000, ] )
silt <- colSums(psd_t[ psd_t$size >= 4 & psd_t$size < 62.5, ] )
clay <- colSums(psd_t[ psd_t$size >= 0 & psd_t$size < 4, ] )

# combine into one data frame
three_classes <- data.frame(CLAY = clay,
                            SILT = silt,
                            SAND = sand)

# remove size row
three_classes <- three_classes[-nrow(three_classes),]

# clean up row names
rownames(three_classes) <- as.numeric(unlist(regmatches(rownames(three_classes),gregexpr("[[:digit:]]+\\.*[[:digit:]]*",rownames(three_classes)))))

# remove NAs
three_classes <- na.omit(three_classes)

# plot samples in ternary plot
# see here for more details:
# http://cran.r-project.org/web/packages/soiltexture/vignettes/soiltexture_vignette.pdf
require(soiltexture)
# draw triangle with samples plotted
# Display the USDA texture triangle:
geo <- TT.plot(class.sys="USDA.TT")
# Display the text
TT.text(
  tri.data = three_classes,
  geo = geo,
  labels =  rownames(three_classes),
  font = 2,
  col = "blue",
  tri.sum.tst = FALSE
) #


```


```{r psd-plot3, echo=FALSE, message=FALSE, fig.cap='this is one plot'}
# stratigraphic plot
# change col names to be more pretty
names(three_classes) <- tolower(names(three_classes))
# put cols in a logical order
three_classes <- three_classes[c("sand", "silt", "clay")]
require(rioja)
strat.plot(three_classes,
           yvar = as.numeric(row.names(three_classes)),
           y.rev = TRUE,
           ylabel = "Depth below surface (m)",
           col.line = "black", # try others that you like
           lwd.line = 1,
           scale.percent=TRUE)

```


```{r psd-plot5, echo=FALSE, message=FALSE, fig.cap='this is one plot'}
#### slightly more fancy plot: with dendrogram showing groups of similar samples
# add a dendrogram from constrained cluster analysis
# coniss is a stratigraphically constrained cluster analysis by
# method of incremental sum of squares
diss <- dist(three_classes)
clust <- chclust(diss, method = "coniss")
# broken stick model to suggest significant zones, 3?
# bstick(clust)
# plot with clusters
x <- strat.plot(three_classes,
                yvar = as.numeric(row.names(three_classes)),
                y.rev = TRUE,
                ylabel = "Depth below surface (m)",
                col.line = "black", # try others that you like
                lwd.line = 1,    # ditto
                clust = clust,
)
# add lines to indicate clusters, leave out if it looks odd
addClustZone(x, clust, 3, col = "red")
```

Magnetic susceptibilty values ranged from `r round(max(na.omit(pH_EC_ms_LOI$mean.MS.LF)),0)` to `r round(min(na.omit(pH_EC_ms_LOI$mean.MS.LF)),0)`...

```{r LF-FD-biplot, echo=FALSE, message=FALSE, fig.cap='this is one plot'}
# biplot of LF and %FD, using sample IDs as data points
# to aid reading of the plot
require(ggplot2)
ggplot(na.omit(pH_EC_ms_LOI), aes(mean.MS.LF, mean.MS.FD)) +
  geom_text(aes(label = Sample.ID)) +
  xlab("insert x-axis label here") +
  ylab("insert y-axis lable here")
```


Loss on ignition data...

```{r LOI-stratigraphic-plot, echo=FALSE, message=FALSE, warnings=FALSE, fig.cap='here are the data!'}
require(rioja)
strat.plot(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))],
           yvar = pH_EC_ms_LOI$Sample.ID,
           y.rev = TRUE,
           ylabel = "Depth below surface (m)",
           col.line = "black", # try others that you like
           lwd.line = 1 )
```

You will probably want to have a table in your results, here's what you'd do for a simple table (it will look funny when you Knit HTML, but will look good in the PDF, which is what matters). This style of table is for simple qualitative tables (ie. mostly text in the table, not numbers, read on for making tables of numbers...):


```{r table2, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
tabl <- "
| Tables        | Are           | Cool  |
|---------------|:-------------:|------:|
| col 3 is      | right-aligned | $1600 |
| col 2 is      | centered      |   $12 |
| zebra stripes | are neat      |    $1 |
"
cat(tabl)
```

Or we can make a table from our data, so the table will update when we update our data (a much better approach! Do this for any lab data you want to tabulate). For example:

```{r table-data, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
require(pander)
panderOptions('table.split.table', Inf)
set.caption("My great data")
pander(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))], style = 'rmarkdown')
```

### Discussion

Here's where you describe the implications of your data in terms of:

1. Past human behaviours
2. Past environments
3. Site formation processes

You will need to draw on previously published studies to make new connections with your data. Show how your data support or contradict previous claims.

### References

Use APA style, look on [google scholar](http://scholar.google.com.offcampus.lib.washington.edu/) for the little 'cite' link that will generate nicely formatted references for you.


```{r,  code-to-make-PDF, echo=FALSE, message=FALSE, eval=FALSE}
# This chunck is to run the code and generate the PDF
# it will not appear in the PDF because we've set 'echo=FALSE' and
# it will not run when you knit HTML because we've set 'eval=FALSE' so
# you'll need to run this block line-by-line to get the PDF (save
# the file first!)

# set working directory - you'll need to change this to the folder where
# you've saved this file that you're working on. Note the double-backslashes!
wd <- "E:\\My Documents\\My UW\\Teaching\\2008 482 Geoarchaeology\\2013 Autumn 482 & 486\\AU13 482 & 486 Lab report"
setwd(wd) # assumes wd has been set earlier in the line above

# Load packages. By now you know you'll need to install these packages
# the first time that you need them.
require(knitr)
require(markdown)

# Process this .Rmd and generate a .pdf file (including smart punctuation and grey background of code blocks)
# For this step you'll need to have two other programs installed on your computer
# 1. Pandoc: http://johnmacfarlane.net/pandoc/installing.html
# 2. LaTeX: follow the instructions on the Pandoc download page

filen <- "Lab_Report_template" # name of this markdown file without suffix
knit(paste0(filen,".Rmd"))
system(paste0("pandoc -s ", paste0(filen,".md"), " -t latex -o ", paste0(filen,".pdf"), " --highlight-style=tango  -S"))

# Now find the location on your computer where the PDF file was created:
getwd()
```
	## get data from google sheet
	# connect to google sheet
	require(RCurl) # if you get an error message here that says something like 'there is no package called 'RCurl''
	# then you need to install the package by typing this into the console (and then hitting enter): install.packages("RCurl")
	# wait for the package to download and install, then run line 3 again before proceeding.
	options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
	# in google spreadsheet, go to file-> publish to web -> get link to publish to web -> get csv file
	goog <- "https://docs.google.com/spreadsheet/pub?key=0As7CmPqGXTzldE5UV3FOZTRlYy10ZF9JOWcxYy1tSGc&single=true&gid=0&output=csv"
	# assign data from google sheet to R object
	data <- read.csv(textConnection(getURL(goog)), stringsAsFactors = FALSE)

	###########################################################################
	############### Exploring the pH and EC data ################################

	# let's just make a nice small table of pH and EC data
	# since we have a complete set of those now
	pH_EC <- data[,c('Sample.ID',
	'mean.pH',
	'mean.EC',
	'Group.members..initials')]
	# we've got a lot of rows at the bottom with NA in them, let's exclude those
	pH_EC <- na.omit(pH_EC)

	# Now we're ready for some exploratory data analysis! We'll go through
	# bunch of simple analysis that will help us understand the data
	# better than what we get from just scanning the column of numbers
	# Some of the results of these analysis will be details you'll want
	# to include in your lab report (and some wont). As you read for the
	# Wednesday seminars you should be looking to see what the norms are
	# for data display and reporting, so you can adhere to these norms in
	# your work.

	# here's a simple function to compute some common summary statistics
	summary(pH_EC)
	# But we see a small problem, the pH and EC variables are
	# formatted as 'character' rather than 'numeric', that's easy
	# to fix...
	pH_EC$mean.pH <- as.numeric(pH_EC$mean.pH)
	pH_EC$mean.EC <- as.numeric(pH_EC$mean.EC)
	# try the simple function again
	summary(pH_EC)

	# It's good practise during exploratory data analysis do visualise the
	# data to see the shapes of the distributions in case anything
	# in ususual or worthy of further investigation. So let's do some plotting

	# here's a histogram showing the distribution of pH values
	# first load the plotting library
	require(ggplot2) # if you get an error, see line 3 and adapt those instructions
	# now make the plot
	ggplot(pH_EC, aes(mean.pH)) +
	geom_histogram()

	# Now write your own code to make a histogram of EC values. Start by
	# copying and pasting the code for the pH histogram and then
	# modifying that.

	# So now we've got a good sense of the distributions of those
	# variables, let's see how they change over time in our site

	require(rioja) # if you get an error message, see line 3 and adapt
	strat.plot(pH_EC[,-1],
	yvar = pH_EC$Sample.ID,
	y.rev = TRUE,
	ylabel = "Depth below surface (m)",
	col.line = "black", # try others that you like
	lwd.line = 1 ) # ditto
	dev.off() # clear the plot

	# the resulting plot gives us a good idea about
	# the trends over time in the pH and EC variables
	# The first EC measurement looks like quite an outlier
	# so let's exlcude it and plot ...

	strat.plot(pH_EC[-1,2:3],
	yvar = pH_EC$Sample.ID[-1],
	y.rev = TRUE,
	ylabel = "Depth below surface (m)",
	col.line = "black", # try others that you like
	lwd.line = 1 ) # ditto
	dev.off() # clear the plot

	# One thing that looks a bit odd about this plot is
	# the spikey shape of the curve... like a regular
	# rhythm of something happening to our samples
	# One hypothesis we can propose about this is
	# that there is a systematic error in our analysis
	# Perhaps one group is consistently different from
	# the other groups? Let's test that hypothesis!

	# Let's plot the values by each lab group
	# starting with EC, which looks relatively flat
	# over time...
	ggplot(pH_EC, aes(Group.members..initials, mean.EC)) +
	geom_boxplot()
	# There does seem to be some difference between groups
	# Let's explore this more formally with a statistical test

	# One-Way Permutation Test based on
	# 9999 Monte-Carlo resamplings.
	require(coin) # you'll get an error... see line 3...
	oneway_test(mean.EC ~ as.factor(Group.members..initials), data = pH_EC,
	distribution = approximate(B = 9999))
	# conventionally we reject the null hypothesis of 'no difference
	# between groups' if the p-value is less than 0.05. That is, if
	# p-value < 0.05 then we say YES there IS a signficant difference
	# between the groups.

	# A bit more detail:
	# p-values tells you the probability of the difference
	# in a collection of random data in which you'd
	# have no reason to suspect any relationship. A
	# p-value of 0.05 indicates a 5% chance that
	# results you are seeing would have come up in
	# a random dataset, so you can say with a 95%
	# probability of being correct that you are
	# observing a real relationship in your data.
	# Scholarly convention tends to hold p values
	# of less than 0.05 as indicators of a
	# statistically signficant relationship. Above
	# 0.05 is not significant.
	#
	# Note that the size of the p-value in your
	# test says nothing about the
	# ssize of the difference between the groups
	# - it is possible to have a
	# highly significant result (very small p-value,
	# eg. p = 0.001) for a miniscule effect (ie.
	# a difference very close to zero) that might not
	# be of any real-world importance or archaeological
	# meaning. Sample size is an important key to making sense
	# of the p-values here.
	#
	# You need to use your common sense
	# when interpreting your statistical data; don't
	# just mindlessly copy and paste them and get
	# excited when p-values
	# are small. Ensure that you're only
	# getting excited about data that make
	# sense in the real world.


	# Are the range of measurements for each sample greater
	# than the differences between groups?
	data <- data.frame(apply(data, 2, function(x) as.numeric(as.character(x))))[1:41,]
	# pH repeated measurement ranges
	apply(data[,7:8], 1, max) - apply(data[,7:8], 1, min)
	# EC repeated measurement ranges
	ECrep <- data.frame(rep = apply(data[,11:13], 1, max) - apply(data[,11:13], 1, min))
	# plot
	ggplot(ECrep, aes(rep)) +
	geom_histogram()

	## For calibration of differenct pH/EC meters
	# which sample is the max EC value for each group?
	library(plyr)
	f <- function(x) x[which.max( x[["mean.EC"]] ), ]
	## x will be a subset of Dat according to ID
	ddply(pH_EC, "Group.members..initials", f)

	# which sample is the max EC value for each group?
	f <- function(x) x[which.min( x[["mean.EC"]] ), ]
	## x will be a subset of Dat according to ID
	ddply(pH_EC, "Group.members..initials", f)

	###########################################################################
	############### Exploring the Mag sus data ################################

	# let's just make a nice small table of pH, EC and mag sus data
	# since we have a complete set of those now
	pH_EC_ms <- data[,c('Sample.ID',
	'mean.pH',
	'mean.EC',
	'MS.LF.reading.average.mass.susceptibility',
	'MS.FD',
	'Group.members..initials')]
	# we've got a lot of rows at the bottom with NA in them, let's exclude those
	# pH_EC_ms <- na.omit(pH_EC_ms)
	# replace google docs' '#DIV/0!' with something that R understands
	pH_EC_ms[ pH_EC_ms == '#DIV/0!'] = NA
	# convert mag sus data type from character to numeric so we
	# can do calculations with it
	pH_EC_ms$MS.LF.reading.average.mass.susceptibility <- as.numeric(as.character(pH_EC_ms$MS.LF.reading.average.mass.susceptibility))
	pH_EC_ms$MS.FD <- as.numeric(as.character(pH_EC_ms$MS.FD))

	# rename this variable so it's not such a mouthful
	names(pH_EC_ms)[names(pH_EC_ms) == 'MS.LF.reading.average.mass.susceptibility'] <- 'mean.MS.LF'
	names(pH_EC_ms)[names(pH_EC_ms) == 'MS.FD'] <- 'mean.MS.FD'

	# Now for some Exploratory Data Analysis (this is an actual thing, often referred
	# to as EDA: http://en.wikipedia.org/wiki/Exploratory_data_analysis pioneered by
	# one of my heros, John Tukey)
	summary(pH_EC_ms)

	# quick visual inpsection by plot
	require(ggplot2)
	ggplot(pH_EC_ms, aes(mean.MS.LF)) +
	geom_histogram()

	require(rioja)
	strat.plot(pH_EC_ms[,-1],
	yvar = pH_EC_ms$Sample.ID,
	y.rev = TRUE,
	ylabel = "Depth below surface (m)",
	col.line = "black", # try others that you like
	lwd.line = 1 )

	# biplot of LF and %FD, using sample IDs as data points
	# to aid reading of the plot
	ggplot(pH_EC_ms, aes(mean.MS.LF, mean.MS.FD)) +
	geom_text(aes(label = Sample.ID))

	# change point detection - Binary segmentation (most common method)
	require(changepoint)
	data_change <- cpt.mean(pH_EC_ms$mean.pH, method='BinSeg')
	data_change <- cpt.meanvar(pH_EC_ms$mean.pH,test.stat='Normal',method='PELT', penalty="BIC")
	# plot results
	plot(data_change, type='l', cpt.col='red', xlab='Sample',
	ylab='Average MS', cpt.width=2, main="Changepoints in Magnetic Susceptibility values")

	# change point detection - Bayesian
	require(bcp)
	data_bcp <- bcp(pH_EC_ms$mean.pH)
	plot(data_bcp)

	###########################################################################
	############### Exploring the LOI data ####################################

	## LOI data

	pH_EC_ms_LOI <- data[,c('Sample.ID',
	'mean.pH',
	'mean.EC',
	'MS.LF.reading.average.mass.susceptibility',
	'MS.FD',
	'LOI.percent.organic.material..Average',
	'LOI.Percent.carbonate.content..Average',
	'Group.members..initials')]
	# replace google docs' '#DIV/0!' with something that R understands
	pH_EC_ms_LOI[ pH_EC_ms_LOI == '#DIV/0!'] = NA
	# convert data types from character to numeric so we
	# can do calculations with it, do this as a batch
	pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI)-1)] <- apply(pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI))-1], 2, function(x) as.numeric(as.character(x)))

	# rename some variables so it's not such a mouthful
	names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.LF.reading.average.mass.susceptibility'] <- 'mean.MS.LF'
	names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.FD'] <- 'mean.MS.FD'
	names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.percent.organic.material..Average'] <- 'mean.Organic'
	names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.Percent.carbonate.content..Average'] <- 'mean.CaCO3'

	# You know the drill... let's do some EDA..
	summary(pH_EC_ms_LOI)

	# quick visual inpsection by histogram
	require(ggplot2)
	ggplot(pH_EC_ms_LOI, aes(mean.Organic)) +
	geom_histogram()

	require(ggplot2)
	ggplot(pH_EC_ms_LOI, aes(mean.CaCO3)) +
	geom_histogram()

	dev.off()

	require(rioja)
	strat.plot(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))],
	yvar = pH_EC_ms_LOI$Sample.ID,
	y.rev = TRUE,
	ylabel = "Depth below surface (m)",
	col.line = "black", # try others that you like
	lwd.line = 1 )

	#### slightly more fancy plot ################
	# with dendrogram showing groups of similar samples
	?chclust # to get information how the clustering is done
	# We will add a dendrogram from constrained cluster analysis
	# to help identify groups of similar samples. We might conclude
	# that similar samples represent similar environments of deposition

	# We'll use 'coniss', a stratigraphically constrained cluster
	# analysis by method of incremental sum of squares
	diss <- dist(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))])
	clust <- chclust(diss, method = "coniss")
	# broken stick model to suggest significant zones, 3?
	bstick(clust) # look for a sharp elbow, that's the ideal number of clusters

	# Now, plot with clusters, you might want to include this one in your report
	x <- strat.plot(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))],
	yvar = pH_EC_ms_LOI$Sample.ID,
	y.rev = TRUE,
	ylabel = "Depth below surface (m)",
	col.line = "black", # try others that you like
	lwd.line = 1, # ditto
	clust = clust)
	# add lines to indicate clusters, leave out if it looks odd
	addClustZone(x, clust, 3, col = "red")

	##########################################################
	##########################################################
	# particle size data

	# subset just the sieve and pipette data
	psd <- data[, names(data) %in% c("Sample.ID",
	"percent.mass.of.total.used.for.pipetting.in.5.6.mm.screen..g.",
	"percent.mass.of.total.used.for.pipetting.in.4.75.mm.screen..g.",
	"percent.mass.of.total.used.for.pipetting.in.2.8.mm.screen..g.",
	"percent.mass.of.total.used.for.pipetting.2.33.mm.screen..g.",
	"percent.mass.of.total.used.for.pipetting.in.2.00.mm.screen..g.",
	"percent.mass.of.total.used.for.pipetting.in.1.18.mm.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.1.00.mm.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.800.um.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.710.um.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.500.um.screen..g.",
	"percent.mass.of.total.used.for.pipetting.in.425.um.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.355.um.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.250.um.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.125.um.screen..g." ,
	"percent.mass.of.fraction.62.5...31.3.um",
	"percent.mass.of.fraction.31.3...15.6.um" ,
	"percent.mass.of.fraction.15.6...7.8.um" ,
	"percent.mass.of.fraction..7.8...3.9.um" ,
	"percent.mass.of.fraction.3.9...1.9.um" ,
	"percent.mass.of.fraction.finer.than.2.um")]


	# replace google docs' '#DIV/0!' with something that R understands
	psd[ psd == '#DIV/0!'] = NA
	# convert data types from character to numeric so we
	# can do calculations with it, do this as a batch
	psd[,1:(length(psd))] <- apply(psd[,1:(length(psd))], 2, function(x) as.numeric(as.character(x)))

	# get sizes from column names
	# psd_size <- as.numeric(unlist(regmatches(names(psd),gregexpr("[[:digit:]]+\\.[[:digit:]]",names(psd)))))
	# standardise units to um
	# psd_size <- ifelse(psd_size > 50, psd_size, psd_size * 1000)
	psd_size <- c(62.5, 31.3, 15.6, 7.8, 3.9, 2, 5600, 4750, 2800,
	2300, 2000, 1180, 1000, 800, 710, 500, 425, 355, 250, 125)

	# tranpose dataframe so we can have a column of size
	psd_t <- data.frame(t(psd))
	colnames(psd_t) <- psd$Sample.ID
	psd_t <- data.frame(psd_t[-1,])
	psd_t$size <- psd_size

	# plot of particle size distribution per sample
	require(reshape2)
	psd_m <- melt(psd_t, id.var = 'size')

	# plot all samples (quite slow!)
	xaxbreaks <- c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04)
	require(ggplot2)
	ggplot(psd_m, aes(size, value)) +
	geom_bar(stat="identity") +
	scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) +
	theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
	facet_wrap(~ variable) +
	ylab("percentage mass")

	# plot only one sample
	psd_m_1 <- psd_m[psd_m$variable == 'X3.5',] # subset sample 3.5, for example, you can change it!
	xaxbreaks <- c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04)
	require(ggplot2)
	ggplot(psd_m_1, aes(size, value)) +
	geom_bar(stat="identity") +
	scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) +
	theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
	ylab("percentage mass")


	# combine variables into three groups: sand, silt and clay
	# using Wentworth classes...
	# sand: 2000 - 62.5 um
	# silt: 62.5 - 4 um
	# clay: < 4 um

	sand <- colSums(psd_t[ psd_t$size >= 62.5 & psd_t$size < 2000, ] )
	silt <- colSums(psd_t[ psd_t$size >= 4 & psd_t$size < 62.5, ] )
	clay <- colSums(psd_t[ psd_t$size >= 0 & psd_t$size < 4, ] )

	# combine into one data frame
	three_classes <- data.frame(CLAY = clay,
	SILT = silt,
	SAND = sand)

	# remove size row
	three_classes <- three_classes[-nrow(three_classes),]

	# clean up row names
	rownames(three_classes) <- as.numeric(unlist(regmatches(rownames(three_classes),gregexpr("[[:digit:]]+\\.[[:digit:]]",rownames(three_classes)))))

	# remove NAs
	three_classes <- na.omit(three_classes)

	# plot samples in ternary plot
	# see here for more details:
	# http://cran.r-project.org/web/packages/soiltexture/vignettes/soiltexture_vignette.pdf
	require(soiltexture)
	dev.off() # clear plotting area
	# draw empty triangle, just for fun
	TT.plot(
	class.sys = "USDA.TT", # "AU.TT" for Australia
	class.p.bg.col = FALSE
	) #
	# draw triangle with samples plotted
	# Display the USDA texture triangle:
	geo <- TT.plot(class.sys="USDA.TT")
	# Display the text
	TT.text(
	tri.data = three_classes,
	geo = geo,
	labels = rownames(three_classes),
	font = 2,
	col = "blue",
	tri.sum.tst = FALSE
	) #

	# stratigraphic plot
	# change col names to be more pretty
	names(three_classes) <- tolower(names(three_classes))
	# put cols in a logical order
	three_classes <- three_classes[c("sand", "silt", "clay")]
	dev.off() # clear plotting area
	require(rioja)
	strat.plot(three_classes,
	yvar = as.numeric(row.names(three_classes)),
	y.rev = TRUE,
	ylabel = "Depth below surface (m)",
	col.line = "black", # try others that you like
	lwd.line = 1,
	scale.percent=TRUE)


	#### slightly more fancy plot: with dendrogram showing groups of similar samples
	?chclust # to get information how the clustering is done
	# add a dendrogram from constrained cluster analysis
	# coniss is a stratigraphically constrained cluster analysis by
	# method of incremental sum of squares
	diss <- dist(three_classes)
	clust <- chclust(diss, method = "coniss")
	# broken stick model to suggest significant zones, 3?
	bstick(clust)
	# plot with clusters
	x <- strat.plot(three_classes,
	yvar = as.numeric(row.names(three_classes)),
	y.rev = TRUE,
	ylabel = "Depth below surface (m)",
	col.line = "black", # try others that you like
	lwd.line = 1, # ditto
	clust = clust,
	)
	# add lines to indicate clusters, leave out if it looks odd
	addClustZone(x, clust, 3, col = "red")
	dev.off() # clear that plot
	Title of your lab report
	========================================================


	## Introduction

	You are going to write your lab report as an R Markdown document. Markdown is a simple formatting syntax that is used to write plain text documents that can be created and edited in many different programs and easily converted into HTML and PDF. You should open and edit this document in RStudio, which has many useful features for writing markdown. To learn more, click the Help toolbar button for more details on using R Markdown. It should take you five minutes to learn all about Markdown that you need to know to write your lab report. If you want to spend more than five minutes, you can read the detailed syntax [here](http://daringfireball.net/projects/markdown/syntax). When you click the Knit HTML button in RStudio, a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

	```{r, echo = FALSE}
	# get a summary of some built-in data about cars
	summary(cars)
	```

	## Background

	The reason why we are using this exotic format for our lab report is to learn about reproducible research. Reproducible research is an approach to science that connects specific instructions to data analysis and empirical data so that scholarship can be recreated, better understood and verified. In practice, this means that all the results from a publication or report can be reproduced with the code and data available online. The advantages of this approach include:

	* We can automatically regenerate documents when our code, data, or assumptions change. For example if we do further analysis of our samples we can just add it on to this report.
	* We minimize or eliminate transposition errors that occur when copying results (eg from Excel) into documents.
	* We preserve a detailed preserve contextual narrative about why analysis was performed in a certain fashion (ie. we don't have to search on our computer for all the files because everything is one file)
	* Documentation for the analytic and computational processes from which conclusions are drawn. In case anyone wants to know _all_ the details of our statistical methods, they can see them all in the code blocks here.

	Our specific tool-kit for reproducible research is a very common one: R + RStudio + Markdown. Although it's common, it has a bit of a steep learning curve. There's a lot of documentation and examples on the web that can ease the frustration. You may never (want to) write another document using this tool-kit ever again, but I think it's important that you have an understanding of this approach to doing science, in the hope that some of the ideas will rub off to improve any kind of empirical work that you might do in the future. This approach to doing science is becoming widespread in computer science and some areas of biology and psychology. I expect it's part of a general shift in the way all sciences will be done, and will come to archaeology eventually.

	## Methods and Materials

	### Chemical analyses

	Brief description of how you measured pH, EC, SOM , CaCO~3~...

	### Physical analyses

	Brief description of how you measured colour, magnetic suceptibility and particle size distributions...

	## Results

	```{r input-all-data, echo = FALSE, message = FALSE, cache=TRUE}
	### This chunk pulls in all our data from google sheet ###
	# It shoult be located at the very start of the results section, since all of the
	# later chunks depend on it.


	# connect to google sheet
	require(RCurl) # if you get an error message here that says something like 'there is no package called 'RCurl''
	# then you need to install the package by typing this into the console (and then hitting enter): install.packages("RCurl")
	# wait for the package to download and install, then run line 3 again before proceeding.
	options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
	# in google spreadsheet, go to file-> publish to web -> get link to publish to web -> get csv file
	goog <- "https://docs.google.com/spreadsheet/pub?key=0As7CmPqGXTzldE5UV3FOZTRlYy10ZF9JOWcxYy1tSGc&single=true&gid=0&output=csv"
	# assign data from google sheet to R object
	data <- read.csv(textConnection(getURL(goog)), stringsAsFactors = FALSE)

	##########################################################
	# pH, EC, LOI and Mag sus data

	# let's just make a nice small table of pH, EC, mag sus & LOI data
	# since we have a complete set of those now
	pH_EC_ms_LOI <- data[,c('Sample.ID',
	'mean.pH',
	'mean.EC',
	'MS.LF.reading.average.mass.susceptibility',
	'MS.FD',
	'LOI.percent.organic.material..Average',
	'LOI.Percent.carbonate.content..Average',
	'Group.members..initials')]
	# replace google docs' '#DIV/0!' with something that R understands
	pH_EC_ms_LOI[ pH_EC_ms_LOI == '#DIV/0!'] = NA
	# convert data types from character to numeric so we
	# can do calculations with it, do this as a batch
	pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI)-1)] <- apply(pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI))-1], 2, function(x) as.numeric(as.character(x)))
	# rename some variables so it's not such a mouthful
	names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.LF.reading.average.mass.susceptibility'] <- 'mean.MS.LF'
	names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.FD'] <- 'mean.MS.FD'
	names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.percent.organic.material..Average'] <- 'mean.Organic'
	names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.Percent.carbonate.content..Average'] <- 'mean.CaCO3'


	##########################################################
	# particle size data

	# subset just the sieve and pipette data
	psd <- data[, names(data) %in% c("Sample.ID",
	"percent.mass.of.total.used.for.pipetting.in.5.6.mm.screen..g.",
	"percent.mass.of.total.used.for.pipetting.in.4.75.mm.screen..g.",
	"percent.mass.of.total.used.for.pipetting.in.2.8.mm.screen..g.",
	"percent.mass.of.total.used.for.pipetting.2.33.mm.screen..g.",
	"percent.mass.of.total.used.for.pipetting.in.2.00.mm.screen..g.",
	"percent.mass.of.total.used.for.pipetting.in.1.18.mm.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.1.00.mm.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.800.um.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.710.um.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.500.um.screen..g.",
	"percent.mass.of.total.used.for.pipetting.in.425.um.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.355.um.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.250.um.screen..g." ,
	"percent.mass.of.total.used.for.pipetting.in.125.um.screen..g." ,
	"percent.mass.of.fraction.62.5...31.3.um",
	"percent.mass.of.fraction.31.3...15.6.um" ,
	"percent.mass.of.fraction.15.6...7.8.um" ,
	"percent.mass.of.fraction..7.8...3.9.um" ,
	"percent.mass.of.fraction.3.9...1.9.um" ,
	"percent.mass.of.fraction.finer.than.2.um")]

	# replace google docs' '#DIV/0!' with something that R understands
	psd[ psd == '#DIV/0!'] = NA
	# convert data types from character to numeric so we
	# can do calculations with it, do this as a batch
	psd[,1:(length(psd))] <- apply(psd[,1:(length(psd))], 2, function(x) as.numeric(as.character(x)))

	# get sizes from column names
	# psd_size <- as.numeric(unlist(regmatches(names(psd),gregexpr("[[:digit:]]+\\.[[:digit:]]",names(psd)))))
	# standardise units to um
	# psd_size <- ifelse(psd_size > 50, psd_size, psd_size * 1000)
	psd_size <- c(62.5, 31.3, 15.6, 7.8, 3.9, 2, 5600, 4750, 2800,
	2300, 2000, 1180, 1000, 800, 710, 500, 425, 355, 250, 125)

	# tranpose dataframe so we can have a column of size
	psd_t <- data.frame(t(psd))
	colnames(psd_t) <- psd$Sample.ID
	psd_t <- data.frame(psd_t[-1,])
	psd_t$size <- psd_size
	require(reshape2)
	psd_m <- melt(psd_t, id.var = 'size')
	```

	Your one or two sentence summary observations of the most striking changes in the sedimentary sequence at the site...

	### Chemical analyses

	Now the details... pH, EC, SOM, CaCO~3~ e.g. The pH values ranged from `r max(pH_EC_ms_LOI$mean.pH)` to `r min(pH_EC_ms_LOI$mean.pH)`...

	### Physical analyses

	Particle size distributions...

	```{r psd-plot1, echo=FALSE, message=FALSE, warning=FALSE, fig.cap='this is one plot'}
	# plot particle size distributions for all samples (quite slow!)
	xaxbreaks <- c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04)
	require(ggplot2)
	ggplot(psd_m, aes(size, value)) +
	geom_bar(stat="identity") +
	scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) +
	theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
	facet_wrap(~ variable) +
	ylab("percentage mass")
	```


	```{r psd-plot2, echo=FALSE, message=FALSE, fig.cap='this is one plot'}
	# plot only one sample
	psd_m_1 <- psd_m[psd_m$variable == 'X3.5',] # subset sample 3.5, for example, you can change it!
	xaxbreaks <- c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04)
	require(ggplot2)
	ggplot(psd_m_1, aes(size, value)) +
	geom_bar(stat="identity") +
	scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) +
	theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
	ylab("percentage mass")
	```


	```{r psd-plot4, echo=FALSE, message=FALSE, fig.cap='this is one plot'}
	# combine variables into three groups: sand, silt and clay
	# using Wentworth classes...
	# sand: 2000 - 62.5 um
	# silt: 62.5 - 4 um
	# clay: < 4 um

	sand <- colSums(psd_t[ psd_t$size >= 62.5 & psd_t$size < 2000, ] )
	silt <- colSums(psd_t[ psd_t$size >= 4 & psd_t$size < 62.5, ] )
	clay <- colSums(psd_t[ psd_t$size >= 0 & psd_t$size < 4, ] )

	# combine into one data frame
	three_classes <- data.frame(CLAY = clay,
	SILT = silt,
	SAND = sand)

	# remove size row
	three_classes <- three_classes[-nrow(three_classes),]

	# clean up row names
	rownames(three_classes) <- as.numeric(unlist(regmatches(rownames(three_classes),gregexpr("[[:digit:]]+\\.[[:digit:]]",rownames(three_classes)))))

	# remove NAs
	three_classes <- na.omit(three_classes)

	# plot samples in ternary plot
	# see here for more details:
	# http://cran.r-project.org/web/packages/soiltexture/vignettes/soiltexture_vignette.pdf
	require(soiltexture)
	# draw triangle with samples plotted
	# Display the USDA texture triangle:
	geo <- TT.plot(class.sys="USDA.TT")
	# Display the text
	TT.text(
	tri.data = three_classes,
	geo = geo,
	labels = rownames(three_classes),
	font = 2,
	col = "blue",
	tri.sum.tst = FALSE
	) #


	```



	```{r psd-plot3, echo=FALSE, message=FALSE, fig.cap='this is one plot'}
	# stratigraphic plot
	# change col names to be more pretty
	names(three_classes) <- tolower(names(three_classes))
	# put cols in a logical order
	three_classes <- three_classes[c("sand", "silt", "clay")]
	require(rioja)
	strat.plot(three_classes,
	yvar = as.numeric(row.names(three_classes)),
	y.rev = TRUE,
	ylabel = "Depth below surface (m)",
	col.line = "black", # try others that you like
	lwd.line = 1,
	scale.percent=TRUE)

	```


	```{r psd-plot5, echo=FALSE, message=FALSE, fig.cap='this is one plot'}
	#### slightly more fancy plot: with dendrogram showing groups of similar samples
	# add a dendrogram from constrained cluster analysis
	# coniss is a stratigraphically constrained cluster analysis by
	# method of incremental sum of squares
	diss <- dist(three_classes)
	clust <- chclust(diss, method = "coniss")
	# broken stick model to suggest significant zones, 3?
	# bstick(clust)
	# plot with clusters
	x <- strat.plot(three_classes,
	yvar = as.numeric(row.names(three_classes)),
	y.rev = TRUE,
	ylabel = "Depth below surface (m)",
	col.line = "black", # try others that you like
	lwd.line = 1, # ditto
	clust = clust,
	)
	# add lines to indicate clusters, leave out if it looks odd
	addClustZone(x, clust, 3, col = "red")
	```

	Magnetic susceptibilty values ranged from `r round(max(na.omit(pH_EC_ms_LOI$mean.MS.LF)),0)` to `r round(min(na.omit(pH_EC_ms_LOI$mean.MS.LF)),0)`...

	```{r LF-FD-biplot, echo=FALSE, message=FALSE, fig.cap='this is one plot'}
	# biplot of LF and %FD, using sample IDs as data points
	# to aid reading of the plot
	require(ggplot2)
	ggplot(na.omit(pH_EC_ms_LOI), aes(mean.MS.LF, mean.MS.FD)) +
	geom_text(aes(label = Sample.ID)) +
	xlab("insert x-axis label here") +
	ylab("insert y-axis lable here")
	```


	Loss on ignition data...

	```{r LOI-stratigraphic-plot, echo=FALSE, message=FALSE, warnings=FALSE, fig.cap='here are the data!'}
	require(rioja)
	strat.plot(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))],
	yvar = pH_EC_ms_LOI$Sample.ID,
	y.rev = TRUE,
	ylabel = "Depth below surface (m)",
	col.line = "black", # try others that you like
	lwd.line = 1 )
	```

	You will probably want to have a table in your results, here's what you'd do for a simple table (it will look funny when you Knit HTML, but will look good in the PDF, which is what matters). This style of table is for simple qualitative tables (ie. mostly text in the table, not numbers, read on for making tables of numbers...):


	```{r table2, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
	tabl <- "
	\| Tables \| Are \| Cool \|
	\|---------------\|:-------------:\|------:\|
	\| col 3 is \| right-aligned \| $1600 \|
	\| col 2 is \| centered \| $12 \|
	\| zebra stripes \| are neat \| $1 \|
	"
	cat(tabl)
	```

	Or we can make a table from our data, so the table will update when we update our data (a much better approach! Do this for any lab data you want to tabulate). For example:

	```{r table-data, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'}
	require(pander)
	panderOptions('table.split.table', Inf)
	set.caption("My great data")
	pander(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))], style = 'rmarkdown')
	```

	### Discussion

	Here's where you describe the implications of your data in terms of:

	1. Past human behaviours
	2. Past environments
	3. Site formation processes

	You will need to draw on previously published studies to make new connections with your data. Show how your data support or contradict previous claims.

	### References

	Use APA style, look on [google scholar](http://scholar.google.com.offcampus.lib.washington.edu/) for the little 'cite' link that will generate nicely formatted references for you.


	```{r, code-to-make-PDF, echo=FALSE, message=FALSE, eval=FALSE}
	# This chunck is to run the code and generate the PDF
	# it will not appear in the PDF because we've set 'echo=FALSE' and
	# it will not run when you knit HTML because we've set 'eval=FALSE' so
	# you'll need to run this block line-by-line to get the PDF (save
	# the file first!)

	# set working directory - you'll need to change this to the folder where
	# you've saved this file that you're working on. Note the double-backslashes!
	wd <- "E:\\My Documents\\My UW\\Teaching\\2008 482 Geoarchaeology\\2013 Autumn 482 & 486\\AU13 482 & 486 Lab report"
	setwd(wd) # assumes wd has been set earlier in the line above

	# Load packages. By now you know you'll need to install these packages
	# the first time that you need them.
	require(knitr)
	require(markdown)

	# Process this .Rmd and generate a .pdf file (including smart punctuation and grey background of code blocks)
	# For this step you'll need to have two other programs installed on your computer
	# 1. Pandoc: http://johnmacfarlane.net/pandoc/installing.html
	# 2. LaTeX: follow the instructions on the Pandoc download page

	filen <- "Lab_Report_template" # name of this markdown file without suffix
	knit(paste0(filen,".Rmd"))
	system(paste0("pandoc -s ", paste0(filen,".md"), " -t latex -o ", paste0(filen,".pdf"), " --highlight-style=tango -S"))

	# Now find the location on your computer where the PDF file was created:
	getwd()
	```