Skip to content

Instantly share code, notes, and snippets.

View giocomai's full-sized avatar

Giorgio Comai giocomai

View GitHub Profile
@giocomai
giocomai / RemoveNewLines
Created May 2, 2015 15:39
Remove new lines from selection in LibreOffice - useful after copy/paste from pdf
sub RemoveNewLine
rem ----------------------------------------------------------------------
rem define variables
dim document as object
dim dispatcher as object
rem ----------------------------------------------------------------------
rem get access to the document
document = ThisComponent.CurrentController.Frame
dispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
@giocomai
giocomai / importCountryNamesRu
Created October 6, 2015 20:50
Import into R names of the countries of the world in Russian
library("XML")
library("plyr")
countriesRu <- ldply(xmlToList(readLines("https://www.artlebedev.ru/tools/country-list/xml/")), data.frame)
countriesRu <- countriesRu[-1,]
sudo dnf install R R-RCurl curl-devel R-zoo R-XML openssl-devel libxml2-devel
@giocomai
giocomai / UN_country_names.R
Created May 28, 2017 07:39
Extract the name of all UN member states from the official website of the United Nations in R #rstats
library("rvest")
read_html(x = "http://www.un.org/en/member-states/") %>%
html_nodes(xpath = "//span[@class='member-state-name']") %>%
html_text()
@giocomai
giocomai / Turkish language wikipedia pageviews.R
Last active June 5, 2017 13:41
Extracts dumps of pageviews for Turkish language version of Wikipedia for the month of April 2017 and creates basic graphs
library("rvest")
library("tidyverse")
library("lubridate")
library("scales")
Sys.setlocale(category = "LC_TIME", locale = "en_IE")
dumpList <- read_html("https://dumps.wikimedia.org/other/pageviews/2017/2017-04/")
links <- data_frame(filename = html_attr(html_nodes(dumpList, "a"), "href")) %>% # extracting links
filter(grepl(x = filename, "projectviews")) %>% # keeping only aggregated data by project
@giocomai
giocomai / 2017-06-19-wikipediaTurk.R
Created June 20, 2017 07:26
Create graph with pageviews to Turkish language Wikipedia projects (April-June 2017)
library("rvest")
library("tidyverse")
library("lubridate")
library("scales")
Sys.setlocale(category = "LC_TIME", locale = "en_IE")
dumpListApril <- read_html("https://dumps.wikimedia.org/other/pageviews/2017/2017-04/")
linksApril <- data_frame(filename = html_attr(html_nodes(dumpListApril, "a"), "href")) %>% # extracting links
filter(grepl(x = filename, "projectviews")) %>% # keeping only aggregated data by project
@giocomai
giocomai / EUtemplate.R
Last active June 28, 2017 13:45
Castarter template for downloading a website, extracting metadata and exporting a dataset in R
## Install castarter (devtools required for installing from github)
# install.packages("devtools")
devtools::install_github("giocomai/castarter")
## Load castarter
library("castarter")
## Set project and website name
SetCastarter(project = "EuropeanUnion", website = "EuropeanParliament")
@giocomai
giocomai / SOTEU2017_post.Rmd
Last active September 13, 2017 09:19
Exploring SOTEU speeches
---
title: 'Exploring #SOTEU speeches'
author: "Giorgio Comai (OBC Transeuropa/#edjnet)"
date: "13 September 2017"
output:
html_document:
code_folding: hide
theme: readable
---
@giocomai
giocomai / 2017-09-19 - EuropeanUnion - EuropeanCommission.R
Created September 27, 2017 18:54
Extract all press releases, speeches and statements issued by the European Commission with R and castarter
## Install castarter (devtools required for installing from github)
# install.packages("devtools")
devtools::install_github(repo = "giocomai/castarter", ref = "development")
setwd("~/R")
## Load castarter
library("castarter")
## Set project and website name
SetCastarter(project = "EuropeanUnion", website = "EuropeanCommission")
@giocomai
giocomai / 2017-09-19 - EuropeanUnion - EuropeanCommission.R
Last active September 27, 2017 19:03
Extract all press releases, speeches and statements issued by the European Commission with R and castarter
## Install castarter (devtools required for installing from github)
# install.packages("devtools")
devtools::install_github(repo = "giocomai/castarter", ref = "development")
setwd("~/R")
## Load castarter
library("castarter")
## Set project and website name
SetCastarter(project = "EuropeanUnion", website = "EuropeanCommission")