Skip to content

Instantly share code, notes, and snippets.

View hannesdatta's full-sized avatar
🚩
https://tilburgsciencehub.com

Hannes Datta hannesdatta

🚩
https://tilburgsciencehub.com
View GitHub Profile
@hannesdatta
hannesdatta / hashes.ipynb
Created March 6, 2024 10:34
Anonymizing usernames for web scraping projects
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hannesdatta
hannesdatta / script.R
Created January 31, 2024 10:42
Code from the first session of dPrep 2024 (https://dprep.hannesdatta.com)
# use cases
# as a calculator
x + 1
# to assign variables
x = 5
# calculation w/ variables
x + 5
@hannesdatta
hannesdatta / commands.txt
Created October 13, 2023 13:44
starting up R from the command line/terminal
R --vanilla < "filename.R" # you see output on screen
Rscript filename.R # no output, unless explicitly "print"-ed
R -e "unlink(*.*)" # executes one R command
R -e "rmarkdown::render('filename.Rmd', output_file='../paper/output/filename.pdf')"
@hannesdatta
hannesdatta / scraper.py
Last active September 14, 2023 13:42
Web Scraping Mistakes: Handling Lists in Python: Code for https://youtu.be/RV9WOlqmL3E
# FINAL CODE
import requests
from bs4 import BeautifulSoup
# Define the URL and user-agent header
url = 'https://www.coolblue.nl/tweedekans-product/2191236'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
'(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'
}
@hannesdatta
hannesdatta / scripts.R
Created March 2, 2023 11:29
A data cleaning script for demonstration of a setup-input-transformation-output building block
# Setup/initialization
library(tidyverse)
## Wipe any downloaded files before
unlink('*.zip')
unlink('*.csv')
## Download raw data
download.file('https://github.com/hannesdatta/course-dprep/raw/master/content/docs/tutorials/data-preparation/data_without_duplicates.zip', 'data.zip')
@hannesdatta
hannesdatta / exercises.Rmd
Created February 16, 2023 11:37
dprep-exercises-2023-02-16
---
title: "dPrep Tutorial"
output: html_document
date: "2023-02-16"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
@hannesdatta
hannesdatta / books_to_scrape.ipynb
Created September 20, 2022 13:58
Getting product descriptions and unique product category links from books.toscrape.com
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hannesdatta
hannesdatta / exercise_3.9.ipynb
Created September 2, 2022 11:24
Solution to exercise 3.9 in my Python Bootcamp Tutorial (https://odcm.hannesdatta.com/docs/tutorials/pythonbootcamp/)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@hannesdatta
hannesdatta / script.R
Created September 1, 2022 09:29
R scripts written intro class to R
# This is the R Bootcamp - demo (written by Hannes)
1+1
cat("Hello!")
name <- 'Hannes'
dir.create('data')
dir.create('data_output')
dir.create('documents')
@hannesdatta
hannesdatta / scrape_reddit.py
Created May 11, 2022 09:48
Searching reddit and saving search results with a web scraper
# Setup
# Make selenium and chromedriver work for Untappd.com
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
#driver = webdriver.Chrome()
driver = webdriver.Chrome(ChromeDriverManager().install())