Created
February 10, 2009 17:13
-
-
Save vjt/61474 to your computer and use it in GitHub Desktop.
Opensource.org mirror script
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<html> | |
<head> | |
<title>The mirror is currently updating. Please wait.</title> | |
</head> | |
<body> | |
<h2>The mirror is currently resyncing to the master site.</h2> | |
<h3>The process will be completed in a matter of minutes.</h3> | |
<p>In the meantime, try to visit the <a href="http://opensource.org">original site</a> or, if it's unreachable, try one of the following mirrors:</p> | |
<p><a href="http://opensource.gds.tuwien.ac.at/">Austria</a> | <a href="http://os.fsfmirror.com/">Belgium</a> | <a href="http://opensource.usrbinruby.net/">Canada</a> (<a href="http://open2.mirrors-r-us.net/">2</a>) | <a href="http://opensource.mirrors.typhon.net/">France</a> (<a href="http://os3.fsfmirror.com/">2</a>) | <a href="http://opensource.mirroring.de/">Germany</a> (<a href="http://opensource.linux-mirror.org/">2</a>,<a href="http://os2.fsfmirror.com/">3</a>,<a href="http://opensource.erde3.net/">4</a>) | <a href="http://open3.mirrors-r-us.net">Japan</a> | (<a href="http://open1.mirrors-r-us.net/">2</a>) | <a href="http://os3.osmirror.com/">Singapore</a> | <a href="http://opensource.openmirrors.org">UK</a> (<a href="http://2opensource.openmirrors.org">2</a>) | USA: <a href="http://www.free-soft.org/mirrors/www.opensource.org/">LA</a>, <a href="http://osmirror.com/">Montana</a> (<a href="http://os2.osmirror.com/">2</a>), (<a href="http://opensource2.usrbinruby.net/">3</a>)</p> | |
<p style="margin-top: 50px; font-style:italic">Thanks for visiting opensource.antifork.org!</p> | |
</body> | |
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Opensource.ORG quick&dirty mirror script. | |
# (C) 2007-2010 Marcello Barnaba <vjt@openssl.it> | |
# | |
# Released under the terms of the Beerware License: if we | |
# meet some day and you think this stuff is worth it, you | |
# can buy me a beer in return. | |
# | |
# The absolute parent path of the public mirror directory | |
MIRROR_BASE="/home/httpd/antifork.org/htdocs" | |
# The relative to $MIRROR_BASE path that contains the mirrored HTML | |
MIRROR_DIR="opensource.antifork.org" | |
# The URI to mirror - why is this a variable?! | |
MIRROR_URI="http://opensource.org" | |
# A directory containing a placeholder page to show to clients | |
# while the script is running - a fail whale, in twitter terms | |
WIP_DIR="${MIRROR_DIR}-updating" | |
# A Temporary working directory. Everyone needs one! | |
WORK_DIR="${MIRROR_DIR}-$RANDOM" | |
# The absolute path to the wget(1) binary | |
WGET="/usr/bin/wget --quiet" | |
# Abort on errors .. yuck! | |
set -e | |
# ................................................. | |
# NO USER SERVICEABLE PARTS BELOW THE DOTTED LINE . | |
# ................................................. | |
pushd "$MIRROR_BASE" >/dev/null | |
# wget madness downloads robots.txt continuously, | |
# so erase and re-download every time. for this kind | |
# of site, it's ok -- but should be fixed. | |
# | |
rm -rf $MIRROR_DIR | |
ln -s $WIP_DIR $MIRROR_DIR | |
mkdir $WORK_DIR | |
# Download HTML and images | |
# | |
$WGET --domains=opensource.org --convert-links --level=0 \ | |
--mirror --page-requisites --no-host-directories \ | |
$MIRROR_URI --exclude-directories '/user,/event' \ | |
--directory-prefix="$WORK_DIR" --html-extension | |
# Get all the stylesheets from the index page, the one that | |
# will never, hopefully, disappear. | |
# It uses Ruby and Nokogiri: you'd best have them installed :-p | |
# | |
pushd $WORK_DIR >/dev/null | |
PARSER="require 'nokogiri'; puts Nokogiri::HTML(File.read('index.html')).search('link[rel=stylesheet]').map {|f| CGI.unescape(f.attr('href'))}.select {|f| File.exists?(f)}.join(10.chr)" | |
STYLESHEETS=$(ruby -rubygems -rcgi -e "$PARSER") | |
popd >/dev/null | |
# Download all the images referenced in the CSS style sheets | |
# | |
for stylesheet in $STYLESHEETS; do | |
base=`dirname $stylesheet` | |
sed -n 's#.*url(\(.*\)).*#\1#p' < "$WORK_DIR/$stylesheet" | sort | uniq | while read image; do | |
mkdir -p "`dirname $WORK_DIR/$base/$image`" | |
$WGET -O $WORK_DIR/$base/$image $MIRROR_URI/$base/$image | |
done | |
done | |
# Remove the placeholder page | |
rm -f $MIRROR_DIR | |
# Put the mirror online - yay! | |
mv $WORK_DIR $MIRROR_DIR | |
popd >/dev/null | |
# EOF |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment