Skip to content

Instantly share code, notes, and snippets.

@PolMine
Last active March 9, 2017 17:08
Show Gist options
  • Save PolMine/d980008018f1ffe65da08621ed6f49e6 to your computer and use it in GitHub Desktop.
Save PolMine/d980008018f1ffe65da08621ed6f49e6 to your computer and use it in GitHub Desktop.
Packaged corpus installation
Installing a packaged corpus from the PolMine repository
--------------------------------------------------------
As an experiment, I have put a corpus of plenary procotols ("PLPRBT") into a private repository I host at the PolMine server. This is how to get it: You will need the devtools package to get the latest development version of polmineR. On Windows, installing devtools may require that you have installed Rtools.
```{r}
install.packages("devtools")
```
Now, install the development version of the polmineR package.
```{r}
devtools::install_github("PolMine/polmineR", ref = "dev")
```
Load the polmineR package and call the function install.corpus as described:
```{r}
library(polmineR)
install.corpus("plprbt.pvs2017", repo = "http://polmine.sowi.uni-due.de/packages")
```
It is a somewhat large corpus (80M tokens), and a 1,3 GB package. It may take a while.
When the download is finished, check whether the corpus has been installed correctly.
```{r}
use("plprbt.pvs2017")
corpus()
```
You should get a data.frame with one row, and see the corpus "PLPRBT". To check whether the corpus
is there, try the following commands.
```{r}
count("PLPRBT", pAttribute = "word")
kwic("PLPRBT", "Ungleichheit")
context("PLPRBT", "Ungleichheit")
```
There is a html vignette in the data package with a (German) explanation of the corpus. Please note that
it includes some overhead.
```{r}
browseVignettes(package = "plprbt.pvs2017")
```
Enjoy!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment