Skip to content

Instantly share code, notes, and snippets.

@benmarwick
benmarwick / rotate-axis-labels-ggplot2.R
Last active Sep 13, 2021
I can never remember how to rotate the x-axis labels with ggplot2: theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
View rotate-axis-labels-ggplot2.R
# Adapted from https://stackoverflow.com/a/7267364/1036500 by Andrie de Vries
# This is it: theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
library(ggplot2)
td <- expand.grid(
hjust=c(0, 0.5, 1),
vjust=c(0, 0.5, 1),
angle=c(0, 45, 90),
@benmarwick
benmarwick / shakespeare_plays_genres.Rmd
Last active Sep 6, 2021
Quick and basic cluster analysis of Shakespeare's plays using R and full text from http://shakespeare.mit.edu/
View shakespeare_plays_genres.Rmd
Quick and dirtly look at Shakespeare's plays
====
Introduction
----
I was recently inpsired by the recent posts of Andrew Collier ([1](http://www.exegetic.biz/blog/2013/09/text-mining-the-complete-works-of-william-shakespeare/) and [2](http://www.exegetic.biz/blog/2013/09/clustering-the-words-of-william-shakespeare/)) and an earlier post by [Matt Jockers](http://www.matthewjockers.net/2009/02/13/machine-classifying-novels-and-plays-by-genre/) to take a recreational look at the plays of Shakespeare.
Motivated by Jockers, the specific topic I was interested in is the genres of the plays. For example, are the genres discrete or is there lots of overlap? Are the genres equal in variation or is one genre very focused and other very diverse? What are the key attributes that define the genres? And can I reproduce Jockers' use of high frequency words to identify genres? Related to Jockers' work on high frequency words is an earlier study by [Brainerd (1979)](http://www.jstor.org/stable/30207229) who used pronouns
@benmarwick
benmarwick / common-sci-symbols.md
Last active Aug 9, 2021
Commonly used scientific symbols in pandoc markdown
View common-sci-symbols.md

Commonly used scientific symbols in pandoc markdown

encoding is UTF-8, needs xelatex, like this:

---
output:
  pdf_document:
    latex_engine: xelatex
---
@benmarwick
benmarwick / ggFactoPlot.R
Created Mar 20, 2012
FactoMineR PCA plot with ggplot2
View ggFactoPlot.R
# Plotting the output of FactoMineR's PCA using ggplot2
#
# load libraries
library(FactoMineR)
library(ggplot2)
library(scales)
library(grid)
library(plyr)
library(gridExtra)
#
@benmarwick
benmarwick / test.R
Last active Jul 6, 2021
Convert a folder of text files into a single CSV file with one column for the file names and one column of the text of the file. A function in R.
View test.R
# test it by creating some small text files to run the function on
txt <- c("here is", "some text", "to test", "this function with", "'including a leading quote", '"and another leading quote')
# make text files
dir.create("testdir")
for(i in 1:length(txt)){
writeLines(txt[i], paste0("testdir/outfile-", i, ".txt"))
}
# run the function and then look in the CSV file that is produced.
@benmarwick
benmarwick / super-and-sub-script-labels.R
Created Jan 2, 2019
ggplot axis labels with superscript and subscript
View super-and-sub-script-labels.R
AntroSO42<-read.csv("antroSO42-.csv", header = TRUE)
Bp <- AntroSO42[ ,(2:4), ]
library(tidyverse)
Bp %>%
gather(value, variable, -Class) %>%
ggplot(aes(Class,
variable)) +
geom_boxplot() +
facet_wrap( ~ value) +
View plotting-archaeology-papers-with-R-code.R
# also at https://gist.github.com/benmarwick/f11ae49ab9afde0071b133012ff76cbc
ctv <- "https://raw.githubusercontent.com/benmarwick/ctv-archaeology/master/README.md"
library(tidyverse)
library(glue)
archy_ctv_readme <- readLines(ctv)
# get just the articles
@benmarwick
benmarwick / rich-diffs-rmd-in-local-git-repo.R
Last active Jun 21, 2021
GitHub doesn't show rich diffs for Rmd files. That can make collaborative writing tough. Here's how to see rich diffs of two commits of a single R Markdown document on a GitHub repo or local Git repo
View rich-diffs-rmd-in-local-git-repo.R
# How to see rich diffs of two commits of a single R Markdown document in a local Git repo
# https://github.com/lorenzwalthert/gitsum
library("gitsum")
library("tidyverse")
# To browse the commits locally, open an RStudio
# project that is using Git version control, then ...
# Set the path within the project to the Rmd file
@benmarwick
benmarwick / captions_and_crossrefs.rmd
Last active Jun 11, 2021
Auto-numbering and cross-referencing of figures and tables in rmarkdown
View captions_and_crossrefs.rmd
---
title: "Auto-numbering and cross-referencing of figures and tables in rmarkdown"
output: html_document
---
NOTE: I recommend using the bookdown package and `output: html_document2` to make captions and cross-references more easily than the method described below.
TODO: check this out: https://github.com/adletaw/captioner
Here's how to use:
@benmarwick
benmarwick / viralarchive.Rmd
Created Jun 1, 2021
Object recognition in Images in #viralarchive tweets
View viralarchive.Rmd
I used the Python library GetOldTweets3 to get the tweets because the rtweet package cannot get tweets older than 6-9 days. Details about this Python library are here: https://github.com/Mottl/GetOldTweets3
I used this line in the shell to get tweets using the #viralarchive hashtag:
```{bash, engine.opts="-l", eval = F}
GetOldTweets3 --querysearch 'viralarchive' --maxtweets 10000
```