Skip to content

Instantly share code, notes, and snippets.

@wshayes
wshayes / python_example.py
Created Dec 24, 2019
[python multiprocessing example] writing to file from a queue #python #multiprocessing
View python_example.py
# https://stackoverflow.com/a/13530258/886938
import multiprocessing as mp
import time
fn = 'c:/temp/temp.txt'
def worker(arg, q):
'''stupidly simulates long running process'''
start = time.clock()
@snakers4
snakers4 / parse_cc_index.py
Last active Aug 23, 2020
Plain common crawl pre-processing
View parse_cc_index.py
import gc
import gzip
import time
import json
import shutil
import os,sys
import tldextract
import collections
import pandas as pd
from tqdm import tqdm
@wwek
wwek / httpsproxy.go
Created Oct 29, 2017
https proxy in golang
View httpsproxy.go
// https://medium.com/@mlowicki/http-s-proxy-in-golang-in-less-than-100-lines-of-code-6a51c2f2c38c
// #!/usr/bin/env bash
// case `uname -s` in
// Linux*) sslConfig=/etc/ssl/openssl.cnf;;
// Darwin*) sslConfig=/System/Library/OpenSSL/openssl.cnf;;
// esac
// openssl req \
// -newkey rsa:2048 \
// -x509 \
@sirodoht
sirodoht / migrate-django.md
Last active Sep 18, 2021
How to migrate Django from SQLite to PostgreSQL
View migrate-django.md

How to migrate Django from SQLite to PostgreSQL

Dump existing data:

python3 manage.py dumpdata > datadump.json

Change settings.py to Postgres backend.

Make sure you can connect on PostgreSQL. Then:

@racitup
racitup / html_to_text.py
Last active Jul 29, 2021
Extract text from html in python using BeautifulSoup4
View html_to_text.py
from bs4 import BeautifulSoup, NavigableString, Tag
def html_to_text(html):
"Creates a formatted text email message as a string from a rendered html template (page)"
soup = BeautifulSoup(html, 'html.parser')
# Ignore anything in head
body, text = soup.body, []
for element in body.descendants:
# We use type and not isinstance since comments, cdata, etc are subclasses that we don't want
if type(element) == NavigableString:
@valyala
valyala / README.md
Last active Sep 20, 2021
Optimizing postgresql table for more than 100K inserts per second
View README.md

Optimizing postgresql table for more than 100K inserts per second

  • Create UNLOGGED table. This reduces the amount of data written to persistent storage by up to 2x.
  • Set WITH (autovacuum_enabled=false) on the table. This saves CPU time and IO bandwidth on useless vacuuming of the table (since we never DELETE or UPDATE the table).
  • Insert rows with COPY FROM STDIN. This is the fastest possible approach to insert rows into table.
  • Minimize the number of indexes in the table, since they slow down inserts. Usually an index on time timestamp with time zone is enough.
  • Add synchronous_commit = off to postgresql.conf.
  • Use table inheritance for fast removal of old data:
@neu5ron
neu5ron / valid_domain_name_regex
Last active Oct 5, 2020
Valid domain name regex including internationalized domain name
View valid_domain_name_regex
domain_regex = r'(([\da-zA-Z])([_\w-]{,62})\.){,127}(([\da-zA-Z])[_\w-]{,61})?([\da-zA-Z]\.((xn\-\-[a-zA-Z\d]+)|([a-zA-Z\d]{2,})))'
#Python
domain_regex = '{0}$'.format(domain_regex)
valid_domain_name_regex = re.compile(domain_regex, re.IGNORECASE)
self.domain_name = self.domain_name.lower().strip().encode('ascii')
if re.match(valid_domain_name_regex, self.domain_name ):
return True
else:
return False
View user-agents.txt
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.94 Chrome/37.0.2062.94 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/600.8.9 (KHTML, like Gecko) Version/8.0.8 Safari/600.8.9
Mozilla/5.0 (iPad; CPU OS 8_4_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12H321 Safari/600.1.4
Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.10240
Mozilla/5.0 (Windows NT 6.3; WOW64; rv:40.0)
@JeffBelback
JeffBelback / docker-destroy-all.sh
Last active Aug 23, 2021
Destroy all Docker Containers and Images
View docker-destroy-all.sh
#!/bin/bash
# Stop all containers
containers=`docker ps -a -q`
if [ -n "$containers" ] ; then
docker stop $containers
fi
# Delete all containers
containers=`docker ps -a -q`
if [ -n "$containers" ]; then
docker rm -f -v $containers