Skip to content

Instantly share code, notes, and snippets.

Created December 8, 2021 17:17
Show Gist options
  • Save taikedz/14d9a2c23770740cf738dee60a097357 to your computer and use it in GitHub Desktop.
Save taikedz/14d9a2c23770740cf738dee60a097357 to your computer and use it in GitHub Desktop.
Download's principal wiki documentation pages. downloader

The Love2D wiki is a great starting resource. Mostly I do small-time game development for fun, on long flights or train journeys, or out in remote places. Frankly, it's the only time when I am guaranteed time to not be disturbed for sufficiently long periods.

And so, I need the main wiki in local, offline, usable format.

This script is a considereate slurper which shouldn't DoS the server in the slightest.

Only the principal wiki pages are fetched. None of the images are fetched either (not pretty offline, but it works for documentation purposes)

To run the slurp:


After this, you can even create a ZIP file

bash zipit

And post it somewhere so other people don't need to run this themselves.

#!/usr/bin/env bash
set -euo pipefail
# This is a semi-intelligent API sucker for the Love2D documentation wiki
# It downloads the bare essentials for offline documentation reading without traversing the entire wiki
# We use some specific page because they sublink to locations in the wiki
# that are of direct use.
# Useful to have offline directly
# Useful to be aware of, even if accessing them must be done whilst online
# Delay for $interval seconds to prevent DDoS-ing the server
main() {
# This script can be called without arguments to run the full download
# Or you can specifically execute certain steps (the function names)
if [[ -z "$*" ]]; then
# Encapsulate in subshells
# to allow running standalone
for subcommand in "$@"; do
download_wiki() {
# Ol' buddy :)
# Follow links (up til --level)
# Only recursively fetch once
# This will get all the main pages we're after
# If the destination file exists already, we should not re-attempt download,
# when using multiple base sources. This is checked via the local file
# and server file's timestamps, therefore:
# Adjust file timestamps to match server timestamps
# Convert absolute links to work locally
# Don't track URLs upwards
# Don't DDoS the server
# Don't produce pointless terminal output, progress, etc
if [[ ! -f .wgetrc ]]; then
echo "robots = off" > ./.wgetrc
for wiki_url in "${base_sources[@]}"; do
"${command[@]}" "$wiki_url"
switch_in() {
cd || exit 1
cleanup_pages() {
# Remove some useless pages, including the internationalised Main pages
# (which will not link appropriately any further)
rm -r Talk:* Special:* Main_Page_*.html
# Files with colons in them may not resolve correctly
# Identify those files specifically, and adjust them in source pages
echo "Adjusting colonic filenames in sources ..."
for colonic in "${colonics[@]}"; do
colopatterns+=(-e "s/$colonic/${colonic//:/__}/g")
sed "${colopatterns[@]:1}" -i *.html
echo "... done."
rename 's/:/__/g' *.html
adjust_styles() {
# Download styles and patch main stylesheet links
wget "$styles_link" -O styles.css
sed 's|rel="stylesheet" href="/w/load.php|rel="stylesheet" href="styles.css|' -i *.html
zipit() {
zipname="love_wiki-$(date '+%F').zip"
python -m zipfile -c "$zipname" ./ && echo "Created $zipname"
main "$@"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment