Skip to content

Instantly share code, notes, and snippets.

@efrecon
Last active February 1, 2019 13:57
Show Gist options
  • Save efrecon/31a1d4020b6fdc3d11e101b35e8cd499 to your computer and use it in GitHub Desktop.
Save efrecon/31a1d4020b6fdc3d11e101b35e8cd499 to your computer and use it in GitHub Desktop.
Continuous mirroring between directories

Directory Mirroring

This script will mirror/backup the latest directories/files matching a glob-style pattern in a directory to another one. It is able to only keep a number of copies in the destination and to skip some items from the source. The script is able to perform the mirror operation on and on, making it suitable for situations where access to cron is not an option.

Command-Line Options

The script takes a number of (single) dash-led options at the command line. These are:

  • -src should point to the source directory.
  • -allow should contain a glob-style pattern. The latest of the directories or files matching the pattern will be copied to the destination directory, unless its name matches the -ignoreoption (see below). Latest is by modification time.
  • -ignore can contain a glob-style pattern of directories of files to ignore from the list matching the -allow option. When empty, nothing will be ignored. The default is to skip anything that ends with .pending.
  • -dst contains the path to the destination directory where to copy items and keep only a number of copies (see -keep)
  • -tmp can point to a directory in which to create copies before these are moved into the destination directory. When the value of the option is empty (the default), copies are made directly. A temporary directory led by the string mirror and with the extension .d will be created within the directory specified by -tmp at every copy.
  • -keep is an integer containing the number of copies to keep in the destination directory. When 0 or negative, all copies will be kept.
  • -period is the time to wait between each mirroring. When empty or negative, a single mirroring will occur. The period can be expressed as an integer (in seconds), or in a human-readable format such as 2 months 4d. The modifiers can be abbreviated to the their minimal unique matcher, in upper or lower case. The recognised modifiers are years, months, weeks, days, hours, minutes, seconds. To allow for single letter modifiers, months are abbreviated lower-case m and minutes are abbreviated M, and nothing else.
  • -wait can be used to specify an initial waiting time. This is either a duration specification (see -period above for allowed modifiers) or a time range. A time range is represented by two durations separated by a : (colon) sign. When a range is specified the random period between the two will be picked up. This can be handy to avoid several instances of the mirroring operation to compete after the same resources.
Copyright (c) 2019, Emmanuel Frécon <efrecon@gmail.com>
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#!/usr/bin/env tclsh
package require Tcl 8.6
set options {
-src "/backup"
-dst "/tmp"
-allow "*-*"
-ignore "*.pending"
-tmp ""
-period 900
-wait 0
-keep 3
}
# Quick options parser
foreach {opt val} $argv {
if { [dict exists $options $opt] } {
dict set options $opt $val
} else {
puts stderr "$opt unknown option, should be [join [dict keys $options] ,\ ]"
exit
}
}
##### Following code from https://wiki.tcl-lang.org/page/Converting+human+time+durations
proc HowLong {len unit} {
if { [string is integer -strict $len] } {
switch -glob -- $unit {
"\[Yy\]*" {
return [expr {$len*31536000}]; # Leap years?
}
"\[Mm\]\[Oo\]*" -
"m*" {
return [expr {$len*2592000}]
}
"\[Ww\]*" {
return [expr {$len*604800}]
}
"\[Dd\]*" {
return [expr {$len*86400}]
}
"\[Hh\]*" {
return [expr {$len*3600}]
}
"\[Mm\]\[Ii\]*" -
"M" {
return [expr {$len*60}]
}
"\[Ss\]*" {
return $len
}
}
}
return 0
}
proc Duration { str } {
set words {}
while {[scan $str %s%n word length] == 2} {
lappend words $word
set str [string range $str $length end]
}
set seconds 0
for {set i 0} {$i<[llength $words]} {incr i} {
set f [lindex $words $i]
if { [scan $f %d%n n length] == 2 } {
set unit [string range $f $length end]
if { $unit eq "" } {
incr seconds [HowLong $n [lindex $words [incr i]]]
} else {
incr seconds [HowLong $n $unit]
}
}
}
return $seconds
}
##### End of code from https://wiki.tcl-lang.org/page/Converting+human+time+durations
# TempName -- Generate a temporary file name
#
# Generate a name that can be used for the creation of temporary
# files, this name will be generated out of a (possibly empty)
# prefix, random characters and an extension.
#
# Arguments:
# ext Extension for the file, with/out leading dot.
# size Size of the random characters
# pfx Prefix to lead the name
# allowed Set of characters to use in the name
#
# Results:
# A good-to-use file name
#
# Side Effects:
# None.
proc TempName { { ext "" } { size 10 } { pfx "" } { allowed "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"}} {
set name $pfx
for {set i 0} {$i < $size} {incr i} {
append name [string index $allowed [expr {int(rand()*[string length $allowed])}]]
}
append name .[string trimleft $ext "."]
return $name
}
proc ::mirror {} {
global options
set t_start [clock milliseconds]
# Collect all files/dirs that match the source specification and that do not
# match the -ignore option, together with their latest modification times.
set consider [list]
foreach fname [glob -nocomplain -directory [dict get $options -src] -tails -- [dict get $options -allow]] {
if { [dict get $options -ignore] eq "" || ![string match [dict get $options -ignore] $fname] } {
set fpath [file join [dict get $options -src] $fname]
lappend consider $fpath [file mtime $fpath]
}
}
# Source will be the latest one, i.e. the head of the sorted list, sorted on
# file modification time.
set src [lindex [lsort -stride 2 -index 1 -integer -decreasing $consider] 0]
# Perform the mirror. When -tmp is not empty, we hurry up copying the source
# to a temporary directory (supposedly on a local disk) to "own" a copy of
# what is to be copied and have the time to copy this to the (supposedly)
# remote destination.
if { [dict get $options -tmp] eq "" } {
puts stdout "Copying $src directly to [dict get $options -dst]"
file copy -force -- $src [dict get $options -dst]
} else {
set tmpdir [file join [dict get $options -tmp] [TempName ".d" 10 mirror]]
file mkdir $tmpdir
puts stdout "Copying $src to temporary dir at $tmpdir"
if { [catch {file copy -force -- $src $tmpdir} err] } {
puts stderr "!! $err"
}
foreach fpath [glob -nocomplain -directory $tmpdir -- *] {
puts stdout "Copying $fpath to [dict get $options -dst]"
if { [catch {file copy -force -- $fpath [dict get $options -dst]} err] } {
puts stderr "!! $err"
}
}
file delete -force -- $tmpdir
}
# When -keep is not 0, remove old mirrors at the remote destination.
if { [dict get $options -keep] > 0 } {
# Collect all files/dirs that match the specification at the destination
# and that do not match the -ignore option, together with their latest
# modification times.
set mirrored [list]
foreach fname [glob -nocomplain -directory [dict get $options -dst] -tails -- [dict get $options -allow]] {
if { [dict get $options -ignore] eq "" || ![string match [dict get $options -ignore] $fname] } {
set fpath [file join [dict get $options -dst] $fname]
lappend mirrored $fpath [file mtime $fpath]
}
}
# Order by decreasing time the mirrored items and select (and delete)
# away the oldest one.
set mirrored [lsort -stride 2 -index 1 -integer -decreasing $mirrored]
foreach {fpath mtime } [lrange $mirrored [expr {[dict get $options -keep]*2}] end] {
puts stdout "Removing old backup $fpath"
file delete -force -- $fpath
}
}
# Estimate elapsed milliseconds for the mirror operations and take this into
# account when scheduling next mirror. This is because mirroring can take a
# long time.
if { [dict get $options -period] eq "" || [dict get $options -period] <= 0 } {
puts stdout "No more mirror, period was '[dict get $options -period]'"
} else {
set elapsed [expr {[clock milliseconds]-$t_start}]
set next [expr {[dict get $options -period]*1000-$elapsed}]
if { $next < 0 } {
set next 0
}
puts stdout "Next mirror in [expr {int($next/1000)}] seconds"
after $next ::mirror
}
}
# Resolve backup period to milliseconds, we allow human-readable durations such
# as 1w, 2 months 4d or 2y -3m. When an integer, this is in milliseconds.
set period [dict get $::options -period]
if { $period ne "" && ! [string is integer -strict $period] } {
dict set ::options -period [Duration $period]
puts stdout "Converted human-readable period: '$period' to [dict get $::options -period] s."
}
# Wait can introduce a first-time delaying period, either random between two
# values separated by a colon sign, or fixed. All these can also be expressed in
# human-redable form, or as an integer (milliseconds)
set wait [dict get $::options -wait]
if { [string first ":" $wait] >= 0 } {
# Extract min and max and make sure to have good defaults for them
lassign [split $wait :] min max
if { $min eq "" } { set min 0 }
if { $max eq "" } { set max [dict get $::options -period] }
# Accept human-readable durations
if { ! [string is integer -strict $min] } {
set min [Duration $min]
}
if { ! [string is integer -strict $max] } {
set max [Duration $max]
}
set wait [expr {$min + int(rand()*($max - $min))}]
} elseif { ![string is integer -strict $wait] } {
set wait [Duration $wait]
}
if { ![file isdirectory [dict get $options -dst]] } {
puts stdout "Creating destination directory [dict get $options -dst]"
file mkdir [dict get $options -dst]
}
if { $wait > 0 } {
puts stdout "Waiting $wait s. before taking first mirror..."
after [expr {1000*$wait}] ::mirror
} else {
after idle ::mirror
}
vwait forever
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment