Skip to content

Instantly share code, notes, and snippets.

@vjt
Created February 10, 2009 17:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vjt/61474 to your computer and use it in GitHub Desktop.
Save vjt/61474 to your computer and use it in GitHub Desktop.
Opensource.org mirror script
<html>
<head>
<title>The mirror is currently updating. Please wait.</title>
</head>
<body>
<h2>The mirror is currently resyncing to the master site.</h2>
<h3>The process will be completed in a matter of minutes.</h3>
<p>In the meantime, try to visit the <a href="http://opensource.org">original site</a> or, if it's unreachable, try one of the following mirrors:</p>
<p><a href="http://opensource.gds.tuwien.ac.at/">Austria</a> | <a href="http://os.fsfmirror.com/">Belgium</a> | <a href="http://opensource.usrbinruby.net/">Canada</a> (<a href="http://open2.mirrors-r-us.net/">2</a>) | <a href="http://opensource.mirrors.typhon.net/">France</a> (<a href="http://os3.fsfmirror.com/">2</a>) | <a href="http://opensource.mirroring.de/">Germany</a> (<a href="http://opensource.linux-mirror.org/">2</a>,<a href="http://os2.fsfmirror.com/">3</a>,<a href="http://opensource.erde3.net/">4</a>) | <a href="http://open3.mirrors-r-us.net">Japan</a> | (<a href="http://open1.mirrors-r-us.net/">2</a>) | <a href="http://os3.osmirror.com/">Singapore</a> | <a href="http://opensource.openmirrors.org">UK</a> (<a href="http://2opensource.openmirrors.org">2</a>) | USA: <a href="http://www.free-soft.org/mirrors/www.opensource.org/">LA</a>, <a href="http://osmirror.com/">Montana</a> (<a href="http://os2.osmirror.com/">2</a>), (<a href="http://opensource2.usrbinruby.net/">3</a>)</p>
<p style="margin-top: 50px; font-style:italic">Thanks for visiting opensource.antifork.org!</p>
</body>
</html>
#!/bin/bash
# Opensource.ORG quick&dirty mirror script.
# (C) 2007-2010 Marcello Barnaba <vjt@openssl.it>
#
# Released under the terms of the Beerware License: if we
# meet some day and you think this stuff is worth it, you
# can buy me a beer in return.
#
# The absolute parent path of the public mirror directory
MIRROR_BASE="/home/httpd/antifork.org/htdocs"
# The relative to $MIRROR_BASE path that contains the mirrored HTML
MIRROR_DIR="opensource.antifork.org"
# The URI to mirror - why is this a variable?!
MIRROR_URI="http://opensource.org"
# A directory containing a placeholder page to show to clients
# while the script is running - a fail whale, in twitter terms
WIP_DIR="${MIRROR_DIR}-updating"
# A Temporary working directory. Everyone needs one!
WORK_DIR="${MIRROR_DIR}-$RANDOM"
# The absolute path to the wget(1) binary
WGET="/usr/bin/wget --quiet"
# Abort on errors .. yuck!
set -e
# .................................................
# NO USER SERVICEABLE PARTS BELOW THE DOTTED LINE .
# .................................................
pushd "$MIRROR_BASE" >/dev/null
# wget madness downloads robots.txt continuously,
# so erase and re-download every time. for this kind
# of site, it's ok -- but should be fixed.
#
rm -rf $MIRROR_DIR
ln -s $WIP_DIR $MIRROR_DIR
mkdir $WORK_DIR
# Download HTML and images
#
$WGET --domains=opensource.org --convert-links --level=0 \
--mirror --page-requisites --no-host-directories \
$MIRROR_URI --exclude-directories '/user,/event' \
--directory-prefix="$WORK_DIR" --html-extension
# Get all the stylesheets from the index page, the one that
# will never, hopefully, disappear.
# It uses Ruby and Nokogiri: you'd best have them installed :-p
#
pushd $WORK_DIR >/dev/null
PARSER="require 'nokogiri'; puts Nokogiri::HTML(File.read('index.html')).search('link[rel=stylesheet]').map {|f| CGI.unescape(f.attr('href'))}.select {|f| File.exists?(f)}.join(10.chr)"
STYLESHEETS=$(ruby -rubygems -rcgi -e "$PARSER")
popd >/dev/null
# Download all the images referenced in the CSS style sheets
#
for stylesheet in $STYLESHEETS; do
base=`dirname $stylesheet`
sed -n 's#.*url(\(.*\)).*#\1#p' < "$WORK_DIR/$stylesheet" | sort | uniq | while read image; do
mkdir -p "`dirname $WORK_DIR/$base/$image`"
$WGET -O $WORK_DIR/$base/$image $MIRROR_URI/$base/$image
done
done
# Remove the placeholder page
rm -f $MIRROR_DIR
# Put the mirror online - yay!
mv $WORK_DIR $MIRROR_DIR
popd >/dev/null
# EOF
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment