Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
This bash script helps downloading a whole subreddit wiki. Thanks to u/ThePixelHunter (https://www.reddit.com/user/ThePixelHunter/) for the original script adapted to work on Windows machine with added line 10.
#! /bin/bash
set -x
# Requires: bash coreutils curl jq
USER_AGENT='superagent/1.0'
EXPORTDIR="exports"
while read -r line; do
SUBREDDIT="$line"
SUBREDDIT=`echo $SUBREDDIT | sed 's/\\r//g'`
while read -r line; do
PAGE="$line" ; PAGE=$(echo $PAGE | sed 's|\/|\-\-|g')
mkdir -p "./$EXPORTDIR/$SUBREDDIT/wiki/$PAGE"
curl -s --user-agent "$USER_AGENT" "https://www.reddit.com/r/$SUBREDDIT/wiki/$PAGE.json" > "./$EXPORTDIR/$SUBREDDIT/wiki/$PAGE.json"
printf "$SUBREDDIT/wiki/$PAGE " ; echo $?
jq -r '.data.content_md' "./$EXPORTDIR/$SUBREDDIT/wiki/$PAGE.json" > "./$EXPORTDIR/$SUBREDDIT/wiki/$PAGE.md"
find . -type d -exec rmdir '{}' \; > /dev/null 2>&1
done < <(curl -S --user-agent "$USER_AGENT" "https://www.reddit.com/r/$SUBREDDIT/wiki/pages.json" | jq -r '.data | .[]')
done < subreddits.list
@mathisgauthey
Copy link
Author

mathisgauthey commented Aug 8, 2022

Basic idea :

Guide

It exports to a directory structure of each subreddit's wiki, leaving behind a json (raw API response) and markdown (raw wiki page text) file for each wiki page.

The script expects a subreddits.list file in the format:

DataHoarder
Homelab
DHExchange

...separated by unix newlines \n, not Windows newlines \r\n.

There's no error handling, and you'll definitely hit the API rate limits at some point.

Edit that I made to work on Windows with git-bash or WSL (Windows Subsystem for Linux) :

SUBREDDIT=`echo $SUBREDDIT | sed 's/\\r//g'`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment