Skip to content

Instantly share code, notes, and snippets.

@marco-schmidt
marco-schmidt / download_twitter.sh
Created September 25, 2021 16:35
Download Internet Archive Twitter stream archives
#!/bin/bash
# Purpose: Download a month of archive.org Twitter stream tar files.
# Author: Marco Schmidt
# Usage: ./download_twitter.sh DEST_DIR YEAR MONTH PATHYEAR PATHMONTH
# Requirements: (1) tools bash, date, mkdir, nohup, seq, sleep, wget
# (2) network connectivity to archive.org via https
# (3) enough free disk space
# Examples: (1)
# regular case, the address contains the same month/year in path and file name
@marco-schmidt
marco-schmidt / data.tsv
Last active August 25, 2021 07:55
Create SVG percentiles graph from tab-separated value (TSV) file with gnuplot
2021-06-11T00:00:00Z 15118 22650 27127
2021-06-11T00:10:00Z 14975 22515 27077
2021-06-11T00:20:00Z 15302 22733 27063
2021-06-11T00:30:00Z 15004 22636 27116
2021-06-11T00:40:00Z 15030 22634 27090
2021-06-11T00:50:00Z 14961 22585 27066
2021-06-11T01:00:00Z 15188 22604 27125
2021-06-11T01:10:00Z 15160 22584 27076
2021-06-11T01:20:00Z 15186 22641 27116
2021-06-11T01:30:00Z 15096 22709 27166
@marco-schmidt
marco-schmidt / TimestampParser.java
Last active August 14, 2021 23:42
Parse legacy timestamp strings without timezone information in them and adapt them according to known timezone, considering daylight saving time.
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.ZonedDateTime;
import java.time.format.DateTimeFormatter;
// parse timestamp strings not containing timezone information (but timezone is known)
// without daylight saving time this would be easy (add fixed string like "+01:00")
// how to use Java 8 time API?
// with Java 11+ simply run as: java TimestampParser.java
public class TimestampParser
@marco-schmidt
marco-schmidt / retrieve-imdb-tsv.sh
Created May 30, 2021 23:01
Shell script to download IMDb data files
#!/bin/bash
# Purpose: bash script to retrieve IMDb tsv files described at https://www.imdb.com/interfaces/
# Created: 2018-10-03
# Requires: bash, date, mkdir, pushd, wget, ls, du, gzip, popd
# writing rights and enough space (~ 650 MB as of 2018) in argument directory
# make sure the script has exactly one argument, otherwise exit with usage instructions
if [ -z "$1" ]; then
echo "Usage: '$0 <directory>' to download IMDb tsv files to 'directory/YYYY/YYYY-MM-DD'."
echo "Note: <directory> must exist already, subdirectories will be created."
@marco-schmidt
marco-schmidt / HowManyTicTacToeGames.java
Created May 30, 2021 22:43
Java application to compute the number of distinct tic-tac-toe games
/**
* Compute number of distinct tic-tac-toe games.
* This is a popular interview question.
* With Java 11+ run as:
* <pre>
* java HowManyTicTacToeGames.java
* </pre>
* @see https://en.wikipedia.org/wiki/Tic-tac-toe
* @author Marco Schmidt
*/
@marco-schmidt
marco-schmidt / .gitignore
Last active July 30, 2021 21:28
Process textual log files with Java 8's stream API
*.gz
*.log