Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
#!/bin/sh
# This is a four-part process, and it's awful, and I'm sorry. I'd have
# automated more of it, but the interstitials we've put on the etherpads
# have foiled my efforts there and I wanted to get this out fast.
#
# This will give you:
# - A folder full of all your team pads in their current state
# in text form, and
# - A single zipped file of containing all of them.
#
# This _will not give you_:
# - Password-locked etherpads.
#
# You need a few things for this to work:
# - Firefox,
# - A Mozilla VPN connection,
# - wget and zip (available from Unix package managers everywhere)
# - a degree of comfort with a terminal.
#
# If you don't know with certainty that you have all those things
# email me at mhoye@mozilla.com and I will do my best to help you.
# If you need the recorded history of an etherpad, team or not, I have
# a way to export that per-pad that this margin is too narrow to contain.
#
# The process, for which I again apologize, is:
# - mkdir yourself a new dir somewhere, save this shell script into it,
# and make it executable.a
# - Connect to Mozilla's VPN.
# - Log into your team etherpad site, click threough the interstitials
# and click on the "all pads" tab.
# - This is the gross manual part. Go to File -> Save Page As and
# select Format: Web Page, HTML Only. Name the file "all-pads",
# all lowercase, no file extension.
# - Finally, in your terminal window, type in:
#
# ./etherscrape.sh [team name]
#
# where [team name] is whatever your etherpad URLs start with. For
# example, if that URL starts with "https://firefox-ux.old-etherpad..."
# then you'd type in "./etherscrape.sh firefox-ux"
# - Hit enter, and let it run. This process can take a few minutes.
WORKDIR=./saved_etherpads
if [ $# -eq 0 ];
then
echo
echo "Usage: ./etherscrape.sh [team name]"
echo
exit -1
fi
echo "Team Name is: " $1
if [ ! -d $WORKDIR ]; then
mkdir $WORKDIR
fi
echo "Saving files to " $WORKDIR
for i in `cat all-pads | grep padmeta | sed "s/.*href=\"\///g" | sed "s/\".*//g"` ; \
do wget https://$1.old-etherpad.webapp.phx1.mozilla.com/ep/pad/export/$i/latest?format=txt \
-O ./$WORKDIR/$i.txt > /dev/null 2>&1 && echo ".\c" ;
done
echo "Done."
echo "Compressing files..."
zip -r $1.zip $WORKDIR > /dev/null 2>&1
echo "Done. Compressed file is " $1.zip
@mhoye

This comment has been minimized.

Copy link
Owner Author

commented Oct 7, 2015

I'm accepting improvements to this, obviously.

@mhoye

This comment has been minimized.

Copy link
Owner Author

commented Oct 7, 2015

Tested on OSX 10.10.5, for what it's worth. There's a report of some pathological shell behavior on 10.8.something.

@MichaelKohler

This comment has been minimized.

Copy link

commented Oct 7, 2015

As far as I understand, this will get all hrefs of an etherpad, even though it wouldn't be a etherpad, is that what we want?

@andymckay

This comment has been minimized.

Copy link

commented Oct 7, 2015

My team etherpads were all private and I couldn't spot a way to make them all public without a labourious clicking on each pad (then the alert) then the public button. Instead I dumped my cookies from Firefox and passed them through to that script using --load-cookies.

@mhoye

This comment has been minimized.

Copy link
Owner Author

commented Oct 7, 2015

If you've got the Cookie Manager+ addon, you can dump the single cookie you need into a file and use that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.