Yannick Rochat yrochat

## federer-ATP-100.R
# Load the packages we’re going to be using:
# Alongside the usual stuff like tidyverse and magrittr, we’ll be using rvest for some web-scraping, jsonline to parse some JSON, and extrafont to load some nice custom fonts
needs(tidyverse, magrittr, rvest, jsonlite, extrafont)

# Before we go on, two things to note:

# First, on web scraping:
# You should always check the terms of the site you are extracting data from, to make sure scraping (often referred to as `crawling`) is not prohibited. One way to do this is to visit the website’s `robots.txt` page, and ensure that a) there is nothing explicitly stating that crawlers are not permitted, and b) ideally, the site simply states that all user agents are permitted (indicated by a line saying `User-Agect: *`). Both of those are the case for our use-case today (see https://www.ultimatetennisstatistics.com/robots.txt).

# And second, about those custom fonts:

## euPlayersCommented.R
# The packages we'll be using
packages <- c("rvest","dplyr","tidyr","pipeR","ggplot2","stringr","data.table")

# From those packages, which ones are not yet installed?
newPackages <- packages[!(packages %in% as.character(installed.packages()[,"Package"]))]

# If any weren't already installed, install them now
if(length(newPackages)) install.packages(newPackages)

# Now make sure all necessary packages are loaded

## faa-333-pdf-gathering.md

      
              1 file
            
          
              8 forks
            
          
              2 comments
            
          
              63 stars
            
          
                dannguyen
                / faa-333-pdf-gathering.md
            
            
              Last active
              June 19, 2021 13:18
            
              
                Using wget + grep to explore inconveniently organized federal data (FAA Section 333 Exemptions)
              
          
    if !database: wget + grep

The Federal Aviation Administration is posting PDFs of the Section 333 exemptions that it grants, i.e. the exemptions for operators who want to fly drones commercially before the FAA finishes its rulemaking. A journalist wanted to look for exemptions granted to operators in a given U.S. state. But the FAA doesn't appear to have an easy-to-read data file to use and doesn't otherwise list exemptions by location of operator.
However, since their exemptions page is just one giant HTML table for listing the PDFs, we can just use wget to fetch all the PDFs, run pdftotext on each file, and then [grep](https://medium.com/@rualthanzauva/grep-was-a-private-command-of-m

  
## saveChart.R
require(ggplot2)
require(gridExtra)

saveChart <- function(chart, fileName) {
  # Draw attribution.
  chart <- chart + geom_text(aes(label = 'sentimentview.com', x = 2.5, y = 0), hjust = -2, vjust = 6, color="#a0a0a0", size=3.5)

  # Disable clip-area.
  gt <- ggplot_gtable(ggplot_build(chart))
  gt$layout$clip[gt$layout$name == "panel"] <- "off"

## charlie-data.r
#
# download 338 Charlie Hebdo covers with keywords
#

library(dplyr)
library(XML)
library(lubridate)
library(stringr)
library(ggplot2)

## anonymous
import networkx as nx
from lxml import etree
import re
import itertools

def getNamesInAction(action,textNames,nameDict):
    # go through the names, in order of length, get them from the action, then remove them before looping
    act = action
    sortNames = sorted(textNames, key=len, reverse=True)
    returnNames = []

## Kickstarter_Geocoding.R
# Load desired packages
library(lubridate)
library(stringr)
library(ggplot2)
library(scales)

# Set the working directory
getwd()
setwd("~/Desktop/Patreon/")
	# Load the packages we’re going to be using:
	# Alongside the usual stuff like tidyverse and magrittr, we’ll be using rvest for some web-scraping, jsonline to parse some JSON, and extrafont to load some nice custom fonts
	needs(tidyverse, magrittr, rvest, jsonlite, extrafont)

	# Before we go on, two things to note:

	# First, on web scraping:
	# You should always check the terms of the site you are extracting data from, to make sure scraping (often referred to as `crawling`) is not prohibited. One way to do this is to visit the website’s `robots.txt` page, and ensure that a) there is nothing explicitly stating that crawlers are not permitted, and b) ideally, the site simply states that all user agents are permitted (indicated by a line saying `User-Agect: *`). Both of those are the case for our use-case today (see https://www.ultimatetennisstatistics.com/robots.txt).

	# And second, about those custom fonts:
	# The packages we'll be using
	packages <- c("rvest","dplyr","tidyr","pipeR","ggplot2","stringr","data.table")

	# From those packages, which ones are not yet installed?
	newPackages <- packages[!(packages %in% as.character(installed.packages()[,"Package"]))]

	# If any weren't already installed, install them now
	if(length(newPackages)) install.packages(newPackages)

	# Now make sure all necessary packages are loaded
	require(ggplot2)
	require(gridExtra)

	saveChart <- function(chart, fileName) {
	# Draw attribution.
	chart <- chart + geom_text(aes(label = 'sentimentview.com', x = 2.5, y = 0), hjust = -2, vjust = 6, color="#a0a0a0", size=3.5)

	# Disable clip-area.
	gt <- ggplot_gtable(ggplot_build(chart))
	gt$layout$clip[gt$layout$name == "panel"] <- "off"
	#
	# download 338 Charlie Hebdo covers with keywords
	#

	library(dplyr)
	library(XML)
	library(lubridate)
	library(stringr)
	library(ggplot2)
	import networkx as nx
	from lxml import etree
	import re
	import itertools

	def getNamesInAction(action,textNames,nameDict):
	# go through the names, in order of length, get them from the action, then remove them before looping
	act = action
	sortNames = sorted(textNames, key=len, reverse=True)
	returnNames = []
	# Load desired packages
	library(lubridate)
	library(stringr)
	library(ggplot2)
	library(scales)

	# Set the working directory
	getwd()
	setwd("~/Desktop/Patreon/")