Last active
December 26, 2015 04:59
-
-
Save benmarwick/7097120 to your computer and use it in GitHub Desktop.
R code for working on the data from our labs. Note that these are no longer actively developed. For the most recent code, see here: https://github.com/benmarwick/au13uwgeoarchlab
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## get data from google sheet | |
# connect to google sheet | |
require(RCurl) # if you get an error message here that says something like 'there is no package called 'RCurl'' | |
# then you need to install the package by typing this into the console (and then hitting enter): install.packages("RCurl") | |
# wait for the package to download and install, then run line 3 again before proceeding. | |
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE)) | |
# in google spreadsheet, go to file-> publish to web -> get link to publish to web -> get csv file | |
goog <- "https://docs.google.com/spreadsheet/pub?key=0As7CmPqGXTzldE5UV3FOZTRlYy10ZF9JOWcxYy1tSGc&single=true&gid=0&output=csv" | |
# assign data from google sheet to R object | |
data <- read.csv(textConnection(getURL(goog)), stringsAsFactors = FALSE) | |
########################################################################### | |
############### Exploring the pH and EC data ################################ | |
# let's just make a nice small table of pH and EC data | |
# since we have a complete set of those now | |
pH_EC <- data[,c('Sample.ID', | |
'mean.pH', | |
'mean.EC', | |
'Group.members..initials')] | |
# we've got a lot of rows at the bottom with NA in them, let's exclude those | |
pH_EC <- na.omit(pH_EC) | |
# Now we're ready for some exploratory data analysis! We'll go through | |
# bunch of simple analysis that will help us understand the data | |
# better than what we get from just scanning the column of numbers | |
# Some of the results of these analysis will be details you'll want | |
# to include in your lab report (and some wont). As you read for the | |
# Wednesday seminars you should be looking to see what the norms are | |
# for data display and reporting, so you can adhere to these norms in | |
# your work. | |
# here's a simple function to compute some common summary statistics | |
summary(pH_EC) | |
# But we see a small problem, the pH and EC variables are | |
# formatted as 'character' rather than 'numeric', that's easy | |
# to fix... | |
pH_EC$mean.pH <- as.numeric(pH_EC$mean.pH) | |
pH_EC$mean.EC <- as.numeric(pH_EC$mean.EC) | |
# try the simple function again | |
summary(pH_EC) | |
# It's good practise during exploratory data analysis do visualise the | |
# data to see the shapes of the distributions in case anything | |
# in ususual or worthy of further investigation. So let's do some plotting | |
# here's a histogram showing the distribution of pH values | |
# first load the plotting library | |
require(ggplot2) # if you get an error, see line 3 and adapt those instructions | |
# now make the plot | |
ggplot(pH_EC, aes(mean.pH)) + | |
geom_histogram() | |
# Now write your own code to make a histogram of EC values. Start by | |
# copying and pasting the code for the pH histogram and then | |
# modifying that. | |
# So now we've got a good sense of the distributions of those | |
# variables, let's see how they change over time in our site | |
require(rioja) # if you get an error message, see line 3 and adapt | |
strat.plot(pH_EC[,-1], | |
yvar = pH_EC$Sample.ID, | |
y.rev = TRUE, | |
ylabel = "Depth below surface (m)", | |
col.line = "black", # try others that you like | |
lwd.line = 1 ) # ditto | |
dev.off() # clear the plot | |
# the resulting plot gives us a good idea about | |
# the trends over time in the pH and EC variables | |
# The first EC measurement looks like quite an outlier | |
# so let's exlcude it and plot ... | |
strat.plot(pH_EC[-1,2:3], | |
yvar = pH_EC$Sample.ID[-1], | |
y.rev = TRUE, | |
ylabel = "Depth below surface (m)", | |
col.line = "black", # try others that you like | |
lwd.line = 1 ) # ditto | |
dev.off() # clear the plot | |
# One thing that looks a bit odd about this plot is | |
# the spikey shape of the curve... like a regular | |
# rhythm of something happening to our samples | |
# One hypothesis we can propose about this is | |
# that there is a systematic error in our analysis | |
# Perhaps one group is consistently different from | |
# the other groups? Let's test that hypothesis! | |
# Let's plot the values by each lab group | |
# starting with EC, which looks relatively flat | |
# over time... | |
ggplot(pH_EC, aes(Group.members..initials, mean.EC)) + | |
geom_boxplot() | |
# There does seem to be some difference between groups | |
# Let's explore this more formally with a statistical test | |
# One-Way Permutation Test based on | |
# 9999 Monte-Carlo resamplings. | |
require(coin) # you'll get an error... see line 3... | |
oneway_test(mean.EC ~ as.factor(Group.members..initials), data = pH_EC, | |
distribution = approximate(B = 9999)) | |
# conventionally we reject the null hypothesis of 'no difference | |
# between groups' if the p-value is less than 0.05. That is, if | |
# p-value < 0.05 then we say YES there IS a signficant difference | |
# between the groups. | |
# A bit more detail: | |
# p-values tells you the probability of the difference | |
# in a collection of random data in which you'd | |
# have no reason to suspect any relationship. A | |
# p-value of 0.05 indicates a 5% chance that | |
# results you are seeing would have come up in | |
# a random dataset, so you can say with a 95% | |
# probability of being correct that you are | |
# observing a real relationship in your data. | |
# Scholarly convention tends to hold p values | |
# of less than 0.05 as indicators of a | |
# statistically signficant relationship. Above | |
# 0.05 is not significant. | |
# | |
# Note that the size of the p-value in your | |
# test says nothing about the | |
# ssize of the difference between the groups | |
# - it is possible to have a | |
# highly significant result (very small p-value, | |
# eg. p = 0.001) for a miniscule effect (ie. | |
# a difference very close to zero) that might not | |
# be of any real-world importance or archaeological | |
# meaning. Sample size is an important key to making sense | |
# of the p-values here. | |
# | |
# You need to use your common sense | |
# when interpreting your statistical data; don't | |
# just mindlessly copy and paste them and get | |
# excited when p-values | |
# are small. Ensure that you're only | |
# getting excited about data that make | |
# sense in the real world. | |
# Are the range of measurements for each sample greater | |
# than the differences between groups? | |
data <- data.frame(apply(data, 2, function(x) as.numeric(as.character(x))))[1:41,] | |
# pH repeated measurement ranges | |
apply(data[,7:8], 1, max) - apply(data[,7:8], 1, min) | |
# EC repeated measurement ranges | |
ECrep <- data.frame(rep = apply(data[,11:13], 1, max) - apply(data[,11:13], 1, min)) | |
# plot | |
ggplot(ECrep, aes(rep)) + | |
geom_histogram() | |
## For calibration of differenct pH/EC meters | |
# which sample is the max EC value for each group? | |
library(plyr) | |
f <- function(x) x[which.max( x[["mean.EC"]] ), ] | |
## x will be a subset of Dat according to ID | |
ddply(pH_EC, "Group.members..initials", f) | |
# which sample is the max EC value for each group? | |
f <- function(x) x[which.min( x[["mean.EC"]] ), ] | |
## x will be a subset of Dat according to ID | |
ddply(pH_EC, "Group.members..initials", f) | |
########################################################################### | |
############### Exploring the Mag sus data ################################ | |
# let's just make a nice small table of pH, EC and mag sus data | |
# since we have a complete set of those now | |
pH_EC_ms <- data[,c('Sample.ID', | |
'mean.pH', | |
'mean.EC', | |
'MS.LF.reading.average.mass.susceptibility', | |
'MS.FD', | |
'Group.members..initials')] | |
# we've got a lot of rows at the bottom with NA in them, let's exclude those | |
# pH_EC_ms <- na.omit(pH_EC_ms) | |
# replace google docs' '#DIV/0!' with something that R understands | |
pH_EC_ms[ pH_EC_ms == '#DIV/0!'] = NA | |
# convert mag sus data type from character to numeric so we | |
# can do calculations with it | |
pH_EC_ms$MS.LF.reading.average.mass.susceptibility <- as.numeric(as.character(pH_EC_ms$MS.LF.reading.average.mass.susceptibility)) | |
pH_EC_ms$MS.FD <- as.numeric(as.character(pH_EC_ms$MS.FD)) | |
# rename this variable so it's not such a mouthful | |
names(pH_EC_ms)[names(pH_EC_ms) == 'MS.LF.reading.average.mass.susceptibility'] <- 'mean.MS.LF' | |
names(pH_EC_ms)[names(pH_EC_ms) == 'MS.FD'] <- 'mean.MS.FD' | |
# Now for some Exploratory Data Analysis (this is an actual thing, often referred | |
# to as EDA: http://en.wikipedia.org/wiki/Exploratory_data_analysis pioneered by | |
# one of my heros, John Tukey) | |
summary(pH_EC_ms) | |
# quick visual inpsection by plot | |
require(ggplot2) | |
ggplot(pH_EC_ms, aes(mean.MS.LF)) + | |
geom_histogram() | |
require(rioja) | |
strat.plot(pH_EC_ms[,-1], | |
yvar = pH_EC_ms$Sample.ID, | |
y.rev = TRUE, | |
ylabel = "Depth below surface (m)", | |
col.line = "black", # try others that you like | |
lwd.line = 1 ) | |
# biplot of LF and %FD, using sample IDs as data points | |
# to aid reading of the plot | |
ggplot(pH_EC_ms, aes(mean.MS.LF, mean.MS.FD)) + | |
geom_text(aes(label = Sample.ID)) | |
# change point detection - Binary segmentation (most common method) | |
require(changepoint) | |
data_change <- cpt.mean(pH_EC_ms$mean.pH, method='BinSeg') | |
data_change <- cpt.meanvar(pH_EC_ms$mean.pH,test.stat='Normal',method='PELT', penalty="BIC") | |
# plot results | |
plot(data_change, type='l', cpt.col='red', xlab='Sample', | |
ylab='Average MS', cpt.width=2, main="Changepoints in Magnetic Susceptibility values") | |
# change point detection - Bayesian | |
require(bcp) | |
data_bcp <- bcp(pH_EC_ms$mean.pH) | |
plot(data_bcp) | |
########################################################################### | |
############### Exploring the LOI data #################################### | |
## LOI data | |
pH_EC_ms_LOI <- data[,c('Sample.ID', | |
'mean.pH', | |
'mean.EC', | |
'MS.LF.reading.average.mass.susceptibility', | |
'MS.FD', | |
'LOI.percent.organic.material..Average', | |
'LOI.Percent.carbonate.content..Average', | |
'Group.members..initials')] | |
# replace google docs' '#DIV/0!' with something that R understands | |
pH_EC_ms_LOI[ pH_EC_ms_LOI == '#DIV/0!'] = NA | |
# convert data types from character to numeric so we | |
# can do calculations with it, do this as a batch | |
pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI)-1)] <- apply(pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI))-1], 2, function(x) as.numeric(as.character(x))) | |
# rename some variables so it's not such a mouthful | |
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.LF.reading.average.mass.susceptibility'] <- 'mean.MS.LF' | |
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.FD'] <- 'mean.MS.FD' | |
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.percent.organic.material..Average'] <- 'mean.Organic' | |
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.Percent.carbonate.content..Average'] <- 'mean.CaCO3' | |
# You know the drill... let's do some EDA.. | |
summary(pH_EC_ms_LOI) | |
# quick visual inpsection by histogram | |
require(ggplot2) | |
ggplot(pH_EC_ms_LOI, aes(mean.Organic)) + | |
geom_histogram() | |
require(ggplot2) | |
ggplot(pH_EC_ms_LOI, aes(mean.CaCO3)) + | |
geom_histogram() | |
dev.off() | |
require(rioja) | |
strat.plot(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))], | |
yvar = pH_EC_ms_LOI$Sample.ID, | |
y.rev = TRUE, | |
ylabel = "Depth below surface (m)", | |
col.line = "black", # try others that you like | |
lwd.line = 1 ) | |
#### slightly more fancy plot ################ | |
# with dendrogram showing groups of similar samples | |
?chclust # to get information how the clustering is done | |
# We will add a dendrogram from constrained cluster analysis | |
# to help identify groups of similar samples. We might conclude | |
# that similar samples represent similar environments of deposition | |
# We'll use 'coniss', a stratigraphically constrained cluster | |
# analysis by method of incremental sum of squares | |
diss <- dist(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))]) | |
clust <- chclust(diss, method = "coniss") | |
# broken stick model to suggest significant zones, 3? | |
bstick(clust) # look for a sharp elbow, that's the ideal number of clusters | |
# Now, plot with clusters, you might want to include this one in your report | |
x <- strat.plot(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))], | |
yvar = pH_EC_ms_LOI$Sample.ID, | |
y.rev = TRUE, | |
ylabel = "Depth below surface (m)", | |
col.line = "black", # try others that you like | |
lwd.line = 1, # ditto | |
clust = clust) | |
# add lines to indicate clusters, leave out if it looks odd | |
addClustZone(x, clust, 3, col = "red") | |
########################################################## | |
########################################################## | |
# particle size data | |
# subset just the sieve and pipette data | |
psd <- data[, names(data) %in% c("Sample.ID", | |
"percent.mass.of.total.used.for.pipetting.in.5.6.mm.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.in.4.75.mm.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.in.2.8.mm.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.2.33.mm.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.in.2.00.mm.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.in.1.18.mm.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.1.00.mm.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.800.um.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.710.um.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.500.um.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.in.425.um.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.355.um.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.250.um.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.125.um.screen..g." , | |
"percent.mass.of.fraction.62.5...31.3.um", | |
"percent.mass.of.fraction.31.3...15.6.um" , | |
"percent.mass.of.fraction.15.6...7.8.um" , | |
"percent.mass.of.fraction..7.8...3.9.um" , | |
"percent.mass.of.fraction.3.9...1.9.um" , | |
"percent.mass.of.fraction.finer.than.2.um")] | |
# replace google docs' '#DIV/0!' with something that R understands | |
psd[ psd == '#DIV/0!'] = NA | |
# convert data types from character to numeric so we | |
# can do calculations with it, do this as a batch | |
psd[,1:(length(psd))] <- apply(psd[,1:(length(psd))], 2, function(x) as.numeric(as.character(x))) | |
# get sizes from column names | |
# psd_size <- as.numeric(unlist(regmatches(names(psd),gregexpr("[[:digit:]]+\\.*[[:digit:]]*",names(psd))))) | |
# standardise units to um | |
# psd_size <- ifelse(psd_size > 50, psd_size, psd_size * 1000) | |
psd_size <- c(62.5, 31.3, 15.6, 7.8, 3.9, 2, 5600, 4750, 2800, | |
2300, 2000, 1180, 1000, 800, 710, 500, 425, 355, 250, 125) | |
# tranpose dataframe so we can have a column of size | |
psd_t <- data.frame(t(psd)) | |
colnames(psd_t) <- psd$Sample.ID | |
psd_t <- data.frame(psd_t[-1,]) | |
psd_t$size <- psd_size | |
# plot of particle size distribution per sample | |
require(reshape2) | |
psd_m <- melt(psd_t, id.var = 'size') | |
# plot all samples (quite slow!) | |
xaxbreaks <- c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04) | |
require(ggplot2) | |
ggplot(psd_m, aes(size, value)) + | |
geom_bar(stat="identity") + | |
scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) + | |
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + | |
facet_wrap(~ variable) + | |
ylab("percentage mass") | |
# plot only one sample | |
psd_m_1 <- psd_m[psd_m$variable == 'X3.5',] # subset sample 3.5, for example, you can change it! | |
xaxbreaks <- c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04) | |
require(ggplot2) | |
ggplot(psd_m_1, aes(size, value)) + | |
geom_bar(stat="identity") + | |
scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) + | |
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + | |
ylab("percentage mass") | |
# combine variables into three groups: sand, silt and clay | |
# using Wentworth classes... | |
# sand: 2000 - 62.5 um | |
# silt: 62.5 - 4 um | |
# clay: < 4 um | |
sand <- colSums(psd_t[ psd_t$size >= 62.5 & psd_t$size < 2000, ] ) | |
silt <- colSums(psd_t[ psd_t$size >= 4 & psd_t$size < 62.5, ] ) | |
clay <- colSums(psd_t[ psd_t$size >= 0 & psd_t$size < 4, ] ) | |
# combine into one data frame | |
three_classes <- data.frame(CLAY = clay, | |
SILT = silt, | |
SAND = sand) | |
# remove size row | |
three_classes <- three_classes[-nrow(three_classes),] | |
# clean up row names | |
rownames(three_classes) <- as.numeric(unlist(regmatches(rownames(three_classes),gregexpr("[[:digit:]]+\\.*[[:digit:]]*",rownames(three_classes))))) | |
# remove NAs | |
three_classes <- na.omit(three_classes) | |
# plot samples in ternary plot | |
# see here for more details: | |
# http://cran.r-project.org/web/packages/soiltexture/vignettes/soiltexture_vignette.pdf | |
require(soiltexture) | |
dev.off() # clear plotting area | |
# draw empty triangle, just for fun | |
TT.plot( | |
class.sys = "USDA.TT", # "AU.TT" for Australia | |
class.p.bg.col = FALSE | |
) # | |
# draw triangle with samples plotted | |
# Display the USDA texture triangle: | |
geo <- TT.plot(class.sys="USDA.TT") | |
# Display the text | |
TT.text( | |
tri.data = three_classes, | |
geo = geo, | |
labels = rownames(three_classes), | |
font = 2, | |
col = "blue", | |
tri.sum.tst = FALSE | |
) # | |
# stratigraphic plot | |
# change col names to be more pretty | |
names(three_classes) <- tolower(names(three_classes)) | |
# put cols in a logical order | |
three_classes <- three_classes[c("sand", "silt", "clay")] | |
dev.off() # clear plotting area | |
require(rioja) | |
strat.plot(three_classes, | |
yvar = as.numeric(row.names(three_classes)), | |
y.rev = TRUE, | |
ylabel = "Depth below surface (m)", | |
col.line = "black", # try others that you like | |
lwd.line = 1, | |
scale.percent=TRUE) | |
#### slightly more fancy plot: with dendrogram showing groups of similar samples | |
?chclust # to get information how the clustering is done | |
# add a dendrogram from constrained cluster analysis | |
# coniss is a stratigraphically constrained cluster analysis by | |
# method of incremental sum of squares | |
diss <- dist(three_classes) | |
clust <- chclust(diss, method = "coniss") | |
# broken stick model to suggest significant zones, 3? | |
bstick(clust) | |
# plot with clusters | |
x <- strat.plot(three_classes, | |
yvar = as.numeric(row.names(three_classes)), | |
y.rev = TRUE, | |
ylabel = "Depth below surface (m)", | |
col.line = "black", # try others that you like | |
lwd.line = 1, # ditto | |
clust = clust, | |
) | |
# add lines to indicate clusters, leave out if it looks odd | |
addClustZone(x, clust, 3, col = "red") | |
dev.off() # clear that plot |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Title of your lab report | |
======================================================== | |
## Introduction | |
You are going to write your lab report as an R Markdown document. Markdown is a simple formatting syntax that is used to write plain text documents that can be created and edited in many different programs and easily converted into HTML and PDF. You should open and edit this document in RStudio, which has many useful features for writing markdown. To learn more, click the **Help** toolbar button for more details on using R Markdown. It should take you five minutes to learn all about Markdown that you need to know to write your lab report. If you want to spend more than five minutes, you can read the detailed syntax [here](http://daringfireball.net/projects/markdown/syntax). When you click the **Knit HTML** button in RStudio, a web page will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this: | |
```{r, echo = FALSE} | |
# get a summary of some built-in data about cars | |
summary(cars) | |
``` | |
## Background | |
The reason why we are using this exotic format for our lab report is to learn about reproducible research. Reproducible research is an approach to science that connects specific instructions to data analysis and empirical data so that scholarship can be recreated, better understood and verified. In practice, this means that all the results from a publication or report can be reproduced with the code and data available online. The advantages of this approach include: | |
* We can automatically regenerate documents when our code, data, or assumptions change. For example if we do further analysis of our samples we can just add it on to this report. | |
* We minimize or eliminate transposition errors that occur when copying results (eg from Excel) into documents. | |
* We preserve a detailed preserve contextual narrative about why analysis was performed in a certain fashion (ie. we don't have to search on our computer for all the files because everything is one file) | |
* Documentation for the analytic and computational processes from which conclusions are drawn. In case anyone wants to know _all_ the details of our statistical methods, they can see them all in the code blocks here. | |
Our specific tool-kit for reproducible research is a very common one: R + RStudio + Markdown. Although it's common, it has a bit of a steep learning curve. There's a lot of documentation and examples on the web that can ease the frustration. You may never (want to) write another document using this tool-kit ever again, but I think it's important that you have an understanding of this approach to doing science, in the hope that some of the ideas will rub off to improve any kind of empirical work that you might do in the future. This approach to doing science is becoming widespread in computer science and some areas of biology and psychology. I expect it's part of a general shift in the way all sciences will be done, and will come to archaeology eventually. | |
## Methods and Materials | |
### Chemical analyses | |
Brief description of how you measured pH, EC, SOM , CaCO~3~... | |
### Physical analyses | |
Brief description of how you measured colour, magnetic suceptibility and particle size distributions... | |
## Results | |
```{r input-all-data, echo = FALSE, message = FALSE, cache=TRUE} | |
### This chunk pulls in all our data from google sheet ### | |
# It shoult be located at the very start of the results section, since all of the | |
# later chunks depend on it. | |
# connect to google sheet | |
require(RCurl) # if you get an error message here that says something like 'there is no package called 'RCurl'' | |
# then you need to install the package by typing this into the console (and then hitting enter): install.packages("RCurl") | |
# wait for the package to download and install, then run line 3 again before proceeding. | |
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE)) | |
# in google spreadsheet, go to file-> publish to web -> get link to publish to web -> get csv file | |
goog <- "https://docs.google.com/spreadsheet/pub?key=0As7CmPqGXTzldE5UV3FOZTRlYy10ZF9JOWcxYy1tSGc&single=true&gid=0&output=csv" | |
# assign data from google sheet to R object | |
data <- read.csv(textConnection(getURL(goog)), stringsAsFactors = FALSE) | |
########################################################## | |
# pH, EC, LOI and Mag sus data | |
# let's just make a nice small table of pH, EC, mag sus & LOI data | |
# since we have a complete set of those now | |
pH_EC_ms_LOI <- data[,c('Sample.ID', | |
'mean.pH', | |
'mean.EC', | |
'MS.LF.reading.average.mass.susceptibility', | |
'MS.FD', | |
'LOI.percent.organic.material..Average', | |
'LOI.Percent.carbonate.content..Average', | |
'Group.members..initials')] | |
# replace google docs' '#DIV/0!' with something that R understands | |
pH_EC_ms_LOI[ pH_EC_ms_LOI == '#DIV/0!'] = NA | |
# convert data types from character to numeric so we | |
# can do calculations with it, do this as a batch | |
pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI)-1)] <- apply(pH_EC_ms_LOI[,1:(length(pH_EC_ms_LOI))-1], 2, function(x) as.numeric(as.character(x))) | |
# rename some variables so it's not such a mouthful | |
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.LF.reading.average.mass.susceptibility'] <- 'mean.MS.LF' | |
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'MS.FD'] <- 'mean.MS.FD' | |
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.percent.organic.material..Average'] <- 'mean.Organic' | |
names(pH_EC_ms_LOI)[names(pH_EC_ms_LOI) == 'LOI.Percent.carbonate.content..Average'] <- 'mean.CaCO3' | |
########################################################## | |
# particle size data | |
# subset just the sieve and pipette data | |
psd <- data[, names(data) %in% c("Sample.ID", | |
"percent.mass.of.total.used.for.pipetting.in.5.6.mm.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.in.4.75.mm.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.in.2.8.mm.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.2.33.mm.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.in.2.00.mm.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.in.1.18.mm.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.1.00.mm.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.800.um.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.710.um.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.500.um.screen..g.", | |
"percent.mass.of.total.used.for.pipetting.in.425.um.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.355.um.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.250.um.screen..g." , | |
"percent.mass.of.total.used.for.pipetting.in.125.um.screen..g." , | |
"percent.mass.of.fraction.62.5...31.3.um", | |
"percent.mass.of.fraction.31.3...15.6.um" , | |
"percent.mass.of.fraction.15.6...7.8.um" , | |
"percent.mass.of.fraction..7.8...3.9.um" , | |
"percent.mass.of.fraction.3.9...1.9.um" , | |
"percent.mass.of.fraction.finer.than.2.um")] | |
# replace google docs' '#DIV/0!' with something that R understands | |
psd[ psd == '#DIV/0!'] = NA | |
# convert data types from character to numeric so we | |
# can do calculations with it, do this as a batch | |
psd[,1:(length(psd))] <- apply(psd[,1:(length(psd))], 2, function(x) as.numeric(as.character(x))) | |
# get sizes from column names | |
# psd_size <- as.numeric(unlist(regmatches(names(psd),gregexpr("[[:digit:]]+\\.*[[:digit:]]*",names(psd))))) | |
# standardise units to um | |
# psd_size <- ifelse(psd_size > 50, psd_size, psd_size * 1000) | |
psd_size <- c(62.5, 31.3, 15.6, 7.8, 3.9, 2, 5600, 4750, 2800, | |
2300, 2000, 1180, 1000, 800, 710, 500, 425, 355, 250, 125) | |
# tranpose dataframe so we can have a column of size | |
psd_t <- data.frame(t(psd)) | |
colnames(psd_t) <- psd$Sample.ID | |
psd_t <- data.frame(psd_t[-1,]) | |
psd_t$size <- psd_size | |
require(reshape2) | |
psd_m <- melt(psd_t, id.var = 'size') | |
``` | |
Your one or two sentence summary observations of the most striking changes in the sedimentary sequence at the site... | |
### Chemical analyses | |
Now the details... pH, EC, SOM, CaCO~3~ e.g. The pH values ranged from `r max(pH_EC_ms_LOI$mean.pH)` to `r min(pH_EC_ms_LOI$mean.pH)`... | |
### Physical analyses | |
Particle size distributions... | |
```{r psd-plot1, echo=FALSE, message=FALSE, warning=FALSE, fig.cap='this is one plot'} | |
# plot particle size distributions for all samples (quite slow!) | |
xaxbreaks <- c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04) | |
require(ggplot2) | |
ggplot(psd_m, aes(size, value)) + | |
geom_bar(stat="identity") + | |
scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) + | |
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + | |
facet_wrap(~ variable) + | |
ylab("percentage mass") | |
``` | |
```{r psd-plot2, echo=FALSE, message=FALSE, fig.cap='this is one plot'} | |
# plot only one sample | |
psd_m_1 <- psd_m[psd_m$variable == 'X3.5',] # subset sample 3.5, for example, you can change it! | |
xaxbreaks <- c(4000, 2000, 1000, 500, 250, 125, 63, 31, 16, 8, 4, 2, 0.5, 0.12, 0.04) | |
require(ggplot2) | |
ggplot(psd_m_1, aes(size, value)) + | |
geom_bar(stat="identity") + | |
scale_x_log10(breaks = xaxbreaks, labels = xaxbreaks) + | |
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + | |
ylab("percentage mass") | |
``` | |
```{r psd-plot4, echo=FALSE, message=FALSE, fig.cap='this is one plot'} | |
# combine variables into three groups: sand, silt and clay | |
# using Wentworth classes... | |
# sand: 2000 - 62.5 um | |
# silt: 62.5 - 4 um | |
# clay: < 4 um | |
sand <- colSums(psd_t[ psd_t$size >= 62.5 & psd_t$size < 2000, ] ) | |
silt <- colSums(psd_t[ psd_t$size >= 4 & psd_t$size < 62.5, ] ) | |
clay <- colSums(psd_t[ psd_t$size >= 0 & psd_t$size < 4, ] ) | |
# combine into one data frame | |
three_classes <- data.frame(CLAY = clay, | |
SILT = silt, | |
SAND = sand) | |
# remove size row | |
three_classes <- three_classes[-nrow(three_classes),] | |
# clean up row names | |
rownames(three_classes) <- as.numeric(unlist(regmatches(rownames(three_classes),gregexpr("[[:digit:]]+\\.*[[:digit:]]*",rownames(three_classes))))) | |
# remove NAs | |
three_classes <- na.omit(three_classes) | |
# plot samples in ternary plot | |
# see here for more details: | |
# http://cran.r-project.org/web/packages/soiltexture/vignettes/soiltexture_vignette.pdf | |
require(soiltexture) | |
# draw triangle with samples plotted | |
# Display the USDA texture triangle: | |
geo <- TT.plot(class.sys="USDA.TT") | |
# Display the text | |
TT.text( | |
tri.data = three_classes, | |
geo = geo, | |
labels = rownames(three_classes), | |
font = 2, | |
col = "blue", | |
tri.sum.tst = FALSE | |
) # | |
``` | |
```{r psd-plot3, echo=FALSE, message=FALSE, fig.cap='this is one plot'} | |
# stratigraphic plot | |
# change col names to be more pretty | |
names(three_classes) <- tolower(names(three_classes)) | |
# put cols in a logical order | |
three_classes <- three_classes[c("sand", "silt", "clay")] | |
require(rioja) | |
strat.plot(three_classes, | |
yvar = as.numeric(row.names(three_classes)), | |
y.rev = TRUE, | |
ylabel = "Depth below surface (m)", | |
col.line = "black", # try others that you like | |
lwd.line = 1, | |
scale.percent=TRUE) | |
``` | |
```{r psd-plot5, echo=FALSE, message=FALSE, fig.cap='this is one plot'} | |
#### slightly more fancy plot: with dendrogram showing groups of similar samples | |
# add a dendrogram from constrained cluster analysis | |
# coniss is a stratigraphically constrained cluster analysis by | |
# method of incremental sum of squares | |
diss <- dist(three_classes) | |
clust <- chclust(diss, method = "coniss") | |
# broken stick model to suggest significant zones, 3? | |
# bstick(clust) | |
# plot with clusters | |
x <- strat.plot(three_classes, | |
yvar = as.numeric(row.names(three_classes)), | |
y.rev = TRUE, | |
ylabel = "Depth below surface (m)", | |
col.line = "black", # try others that you like | |
lwd.line = 1, # ditto | |
clust = clust, | |
) | |
# add lines to indicate clusters, leave out if it looks odd | |
addClustZone(x, clust, 3, col = "red") | |
``` | |
Magnetic susceptibilty values ranged from `r round(max(na.omit(pH_EC_ms_LOI$mean.MS.LF)),0)` to `r round(min(na.omit(pH_EC_ms_LOI$mean.MS.LF)),0)`... | |
```{r LF-FD-biplot, echo=FALSE, message=FALSE, fig.cap='this is one plot'} | |
# biplot of LF and %FD, using sample IDs as data points | |
# to aid reading of the plot | |
require(ggplot2) | |
ggplot(na.omit(pH_EC_ms_LOI), aes(mean.MS.LF, mean.MS.FD)) + | |
geom_text(aes(label = Sample.ID)) + | |
xlab("insert x-axis label here") + | |
ylab("insert y-axis lable here") | |
``` | |
Loss on ignition data... | |
```{r LOI-stratigraphic-plot, echo=FALSE, message=FALSE, warnings=FALSE, fig.cap='here are the data!'} | |
require(rioja) | |
strat.plot(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))], | |
yvar = pH_EC_ms_LOI$Sample.ID, | |
y.rev = TRUE, | |
ylabel = "Depth below surface (m)", | |
col.line = "black", # try others that you like | |
lwd.line = 1 ) | |
``` | |
You will probably want to have a table in your results, here's what you'd do for a simple table (it will look funny when you Knit HTML, but will look good in the PDF, which is what matters). This style of table is for simple qualitative tables (ie. mostly text in the table, not numbers, read on for making tables of numbers...): | |
```{r table2, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'} | |
tabl <- " | |
| Tables | Are | Cool | | |
|---------------|:-------------:|------:| | |
| col 3 is | right-aligned | $1600 | | |
| col 2 is | centered | $12 | | |
| zebra stripes | are neat | $1 | | |
" | |
cat(tabl) | |
``` | |
Or we can make a table from our data, so the table will update when we update our data (a much better approach! Do this for any lab data you want to tabulate). For example: | |
```{r table-data, echo=FALSE, message=FALSE, warnings=FALSE, results='asis'} | |
require(pander) | |
panderOptions('table.split.table', Inf) | |
set.caption("My great data") | |
pander(pH_EC_ms_LOI[,c(-1, -ncol(pH_EC_ms_LOI))], style = 'rmarkdown') | |
``` | |
### Discussion | |
Here's where you describe the implications of your data in terms of: | |
1. Past human behaviours | |
2. Past environments | |
3. Site formation processes | |
You will need to draw on previously published studies to make new connections with your data. Show how your data support or contradict previous claims. | |
### References | |
Use APA style, look on [google scholar](http://scholar.google.com.offcampus.lib.washington.edu/) for the little 'cite' link that will generate nicely formatted references for you. | |
```{r, code-to-make-PDF, echo=FALSE, message=FALSE, eval=FALSE} | |
# This chunck is to run the code and generate the PDF | |
# it will not appear in the PDF because we've set 'echo=FALSE' and | |
# it will not run when you knit HTML because we've set 'eval=FALSE' so | |
# you'll need to run this block line-by-line to get the PDF (save | |
# the file first!) | |
# set working directory - you'll need to change this to the folder where | |
# you've saved this file that you're working on. Note the double-backslashes! | |
wd <- "E:\\My Documents\\My UW\\Teaching\\2008 482 Geoarchaeology\\2013 Autumn 482 & 486\\AU13 482 & 486 Lab report" | |
setwd(wd) # assumes wd has been set earlier in the line above | |
# Load packages. By now you know you'll need to install these packages | |
# the first time that you need them. | |
require(knitr) | |
require(markdown) | |
# Process this .Rmd and generate a .pdf file (including smart punctuation and grey background of code blocks) | |
# For this step you'll need to have two other programs installed on your computer | |
# 1. Pandoc: http://johnmacfarlane.net/pandoc/installing.html | |
# 2. LaTeX: follow the instructions on the Pandoc download page | |
filen <- "Lab_Report_template" # name of this markdown file without suffix | |
knit(paste0(filen,".Rmd")) | |
system(paste0("pandoc -s ", paste0(filen,".md"), " -t latex -o ", paste0(filen,".pdf"), " --highlight-style=tango -S")) | |
# Now find the location on your computer where the PDF file was created: | |
getwd() | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment