Skip to content

Instantly share code, notes, and snippets.

@tomastitera
Last active May 2, 2018 17:44
Show Gist options
  • Save tomastitera/75452e258ee59e7dcace4a24b8255a28 to your computer and use it in GitHub Desktop.
Save tomastitera/75452e258ee59e7dcace4a24b8255a28 to your computer and use it in GitHub Desktop.
setwd("~/R/snmupdpipe")
library(udpipe)
udmodel_czech <- udpipe_load_model(file = "czech-ud-2.0-170801.udpipe")
page <- read.csv("page_179497582061065_2018_04_25_14_18_33.tab", sep="\t", encoding = "UTF-8")
korpus.raw <- as.vector(page$post_message)
x <- udpipe_annotate(udmodel_czech, x = korpus.raw)
x <- as.data.frame(x)
View(x)
stats <- subset(x, upos %in% "NOUN")
View(stats)
stats <- txt_freq(x = stats$lemma)
library(lattice)
stats$key <- factor(stats$key, levels = rev(stats$key))
barchart(key ~ freq, data = head(stats, 30), col = "cadetblue", main = "Most occurring nouns", xlab = "Freq")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment