Skip to content

Instantly share code, notes, and snippets.

View eliask's full-sized avatar
🥼

Elias Kunnas eliask

🥼
View GitHub Profile
@eliask
eliask / update_factorio_server.sh
Created April 10, 2019 08:39
Simple updater for a headless Factorio server
#!/usr/bin/env bash
set -Eeuo pipefail
new_version=$(
curl -fsL https://updater.factorio.com/get-available-versions |
jq -r '."core-linux_headless64"[].to|select(.)' |
python -c'import sys;v=max(sys.stdin,key=lambda x:tuple(map(int,x.split("."))));print(v.strip())'
)
cur_version=$(cat $HOME/factorio/.cur-version || true)
@eliask
eliask / top10k_finnish_domain_other_than_fi.txt
Created March 7, 2019 11:13
Top 10k domains other than .fi classified as Finnish from Common Crawl 2019-02
68026 fi.freelancer.com
21031 fi.hotels.com
13819 fi.pinterest.com
12616 geocaching.com
9991 dwensa.info
9991 pelaajalehti.com
9986 menot.info
9939 forums.offipalsta.com
9493 etuovi.com
9319 nettikone.com
@eliask
eliask / top10k_fi_domains_from_commoncrawl_2019-02.txt
Last active March 6, 2019 20:27
Top 10k .fi domains from the 2019-02 Common Crawl index (counted by the number of pages in the index from the given domain). ~10 million .fi pages in total.
77110 yle.fi
58025 aalto.fi
54801 iltalehti.fi
52602 helsinki.fi
26634 tekniikanmaailma.fi
26594 tulospalvelu.fi
22594 aprs.fi
20667 oikotie.fi
20482 uusisuomi.fi
20445 ilmatieteenlaitos.fi
@eliask
eliask / extract_har_audio.py
Created November 1, 2018 15:28
Extract audio/video parts from a HAR file
import sys
import base64
import json
def main(path):
with open(path, 'rt') as fh:
j = json.load(fh)
for entry in j['log']['entries']:
if not 200 <= entry['response']['status'] < 300:
@eliask
eliask / verkkokauppa_com_sales.py
Created October 16, 2018 14:11
verkkokauppa.com sales 2018-10
import requests
headers = {
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'fi',
'User-Agent': 'Mozilla/5.0',
'Accept': '*/*',
'Referer': 'https://www.verkkokauppa.com/fi/syyssiivous',
'Connection': 'keep-alive',
}
@eliask
eliask / zip_most_recent_timestamp.py
Created August 9, 2018 16:19
Get the most recent timestamp from a ZIP file
#! /usr/bin/env python
# Prints the most recent timestamp for a file in a ZIP file (RFC-3339 format without timezone)
import datetime as dt
import sys
import zipfile
if not sys.argv[1:]:
print ('Usage: {} foo.zip'.format(sys.argv[0]))
exit(1)
@eliask
eliask / arff2json.sh
Created June 24, 2018 21:17
Convert arff to json with Python and liac-arff
#!/bin/sh
# Usage: arff2json.sh < foo.arff > foo.json
# Requires arff: pip install liac-arff
# Why shell instead of python? Probably because I thought the default one-liner was neat:)
# Default format: .data = list of lists
python -c 'import json,sys,arff;json.dump(arff.load(sys.stdin),sys.stdout)'
exit 0
# Or convert data to attribute maps: .data = list of JSON objects/dicts
@eliask
eliask / dump_ryver.py
Created June 11, 2018 13:31
Dump/Archive/Export Ryver.com forum chat history
#! /usr/bin/env python3
# Usage: dump_ryver_forum.py <PHPSESSID> <forum name, like "foo.ryver.com"> <forum ids, e.g. 1217356 ...>
# Requires Python 3.6 and the `requests` library:
# pip install requests
import requests
import sys
import time
session_id = sys.argv[1]
forum_name = sys.argv[2]
@eliask
eliask / using_zstd.py
Created June 9, 2018 21:38
Using zstd in Python (decompression)
#
# pip install zstandard
#
# The zstandard bindings are a little off, compared to gzip, etc.
# So small tricks like this are needed to fully decompress a file in-memory:
dctx = zstd.ZstdDecompressor()
data = b''.join(dctx.read_to_iter(open('foo.zst', 'rb')))
@eliask
eliask / ndjson_to_csv.py
Created June 9, 2018 20:56
Convert newline-delimited JSON (ND-JSON) to CSV
#! /usr/bin/env python3
# Usage: ndjson_to_csv <files... or stdin> > output.csv
# NB: Assumes that each line is a simple JSON object with no nested arrays or objects
import csv
import json
import sys
import fileinput
from collections import namedtuple, OrderedDict
lines = fileinput.input()