Skip to content

Instantly share code, notes, and snippets.

View lorey's full-sized avatar
shipping

Karl Lorey lorey

shipping
View GitHub Profile
@lorey
lorey / selenium_xhr_requests_via_performance_logging.py
Last active April 14, 2024 09:11
Access Chrome's network tab (e.g. XHR requests) with Selenium
#
# This small example shows you how to access JS-based requests via Selenium
# Like this, one can access raw data for scraping,
# for example on many JS-intensive/React-based websites
#
from time import sleep
from selenium import webdriver
from selenium.webdriver import DesiredCapabilities
@lorey
lorey / markdown_to_text.py
Last active April 8, 2024 03:25
Markdown to Plaintext in Python
from bs4 import BeautifulSoup
from markdown import markdown
import re
def markdown_to_text(markdown_string):
""" Converts a markdown string to plaintext """
# md -> html -> text since BeautifulSoup can extract text cleanly
html = markdown(markdown_string)
@lorey
lorey / scrape_spiegel_online.py
Last active April 14, 2023 09:19
spiegel.de article scraper
"""
To use this:
pip install requests
pip install --pre mlscraper
To automatically build any scraper, check out https://github.com/lorey/mlscraper
"""
import logging
@lorey
lorey / avoiding-https-connection-pool-errors.py
Created May 21, 2019 11:54
Dealing with HTTPSConnectionPool errors in requests with adapters and backoff
# this snippet will deal with errors like HTTPSConnectionPool: Max retries exceeded with url...
# by using a backoff factor
# further reading:
# - docs: https://2.python-requests.org/en/master/user/advanced/#transport-adapters
# - stack overflow issue: https://stackoverflow.com/a/47475019
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
@lorey
lorey / firefox-profile-with-automatic-download.py
Created April 1, 2017 10:05
Selenium: Prevent download dialog and download file automatically
# adapted from http://stackoverflow.com/a/25251803
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/vcard') # type of file to download
# use the out folder of the script path
profile.set_preference('browser.download.dir', os.path.join(os.path.dirname(os.path.abspath(__file__)), 'out'))
@lorey
lorey / basic.py
Created March 16, 2019 16:40
Keeping Pandas DataFrames clean when importing JSON
from pandas.io.json import json_normalize
df = json_normalize(data)
@lorey
lorey / rsync-android.sh
Last active April 23, 2021 13:14
This command allows you to rsync your android files to your linux/unix system
# this command will rsync your files via MTP from android to your linux system
# took me a while to find a working combination, so here's the documentation
# 1. plug in phone via USB
# 2. select image or file transfer (image will sync only images, files everything)
# 3. open android in your file system (to make sure it's mounted)
# 4. run the following command
rsync -h --progress --stats -r -tgo -p -l -D --delete "/run/user/1000/gvfs/{insert path here}/" ./{your path without trailing slash}
@lorey
lorey / block-slack-user.js
Last active April 16, 2020 22:29
Block a user in Slack
//
// This will hide all messages from a specific user in Slack. Enjoy the silence.
//
// get the owner id of a message
// -> loops back through list to find owner
function getOwnerId(i) {
var current = i
var sender = current.querySelector(".c-message__sender_link");
var ownerId = sender ? sender.dataset.messageSender : null;
@lorey
lorey / delete-files-that-contain-specific-string.sh
Created July 11, 2019 15:41
Delete all files that contain a specific string via command line
# say we want to delete all files that contain the string "trash"
# source: https://stackoverflow.com/a/4529138
# 1) create a file that lists all files to delete
find .cache/ | xargs grep -l "trash" | awk '{print "rm "$1}' > delete.sh
# 2) check for errors and stuff
vim delete.sh
# 3) make the file executable and execute
@lorey
lorey / pandas-nested-parameters.py
Created May 29, 2019 07:59
Function to flatten hierarchical parameters when training a pandas pipeline
def hierarchical_to_flattened_parameters(parameters_dict):
"""
Flatten an hierarchical dict to an sklearn parameter set.
:param parameters_dict: hierarchical dict
:return: flattened dict
"""
return json_normalize(parameters_dict, sep='__').to_dict(orient='records')[0]