Skip to content

Instantly share code, notes, and snippets.

@ltiao
Last active December 14, 2015 03:09
Show Gist options
  • Save ltiao/5019238 to your computer and use it in GitHub Desktop.
Save ltiao/5019238 to your computer and use it in GitHub Desktop.

Autobackup

Overview

Automated backup management.

Installation

Required Software

Most of the software listed below can and should be installed via a package manager such as yum or aptitude. Similarly, Python modules can and should be installed via easy_install or preferably pip.

Any one of the following databases that Django supports (MySQL recommended):

  • MySQL
  • SQLite
  • PostgreSQL
  • Oracle

Apache Modules

  • mod_wsgi

Python Modules

  • MySQLdb (or the module corresponding to the database chosen)
  • pexpect

Getting autobackup

Download and extract the files into a directory to which the apache user has read, write and execute permissions or preferably, simply git clone git://github.com/ltiao/autobackup.git into such a directory.

This directory is the root of the Django project and shall be referred to as such throughout this document.

For example:

cd /home
git clone git://github.com/ltiao/autobackup.git

This will create a directory in home named autobackup (the root) with all the data from this repository.

Setup the database

Create a database. E.g. in MySQL:

CREATE DATABASE autobackup CHARACTER SET utf8 COLLATE utf8_general_ci;

Now we need to edit the database settings in <project root>/autobackup/settings/production_settings.py

E.g.

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql', # Add 'postgresql_psycopg2', 'mysql', 'sqlite3' or 'oracle'.
        'NAME': 'autobackup',                      # Or path to database file if using sqlite3.
        'USER': '',                      # Not used with sqlite3.
        'PASSWORD': '',                  # Not used with sqlite3.
        'HOST': '',                      # Set to empty string for localhost. Not used with sqlite3.
        'PORT': '',                      # Set to empty string for default. Not used with sqlite3.
    },
}

Here, USER and PASSWORD is all that is missing.

Once configured, we can proceed to create the model tables by simply running

python manage.py syncdb

from the project root.

This will run all the SQL commands shown in python manage.py sqlall task.

Setup Apache

Lastly, we just need to setup Apache and mod_wsgi by adding the following lines to httpd.conf.

Alias /static/ /<project-root>/static/

<Directory /<project-root>/static>
Order deny,allow
Allow from all
</Directory>

WSGIScriptAlias /autobackup /<project-root>/autobackup/wsgi.py

<Directory /<project-root>/autobackup>
<Files wsgi.py>
Order deny,allow
Allow from all
</Files>
</Directory>

Restart apache. If the installation was successful, you should now be able to login to the autobackup web interface at http://localhost/autobackup/.

Usage

The core autobackup script is located in /<project-root>/script/backup.py

Usage: backup.py [options] [GROUP]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -g GROUP, --backup-group=GROUP
                        Execute backup tasks under groups associated with the
                        supplied group abbreviation name
  -c, --clean           Delete old backup files
  -m, --send-mail       Send details of completed backup tasks by email

Backups

The supplied backup-group argument will be used to query Groups against the abbreviation field (autobackup.task_group.abbreviation) and execute the enabled backup Tasks under the matching Group(s).

Tasks are executed by first creating instances of a concrete subclass in functions.py as specified by the function_name of the Function related to the Task.

By way of example, a function with function_name=A would cause an object A to be instantiated if such a class exists in functions.py. If not, an AttributeError is raised.

The actual execution of the backup task is now done by calling the execute method of the subclass instance which will perform the backup task as a seperate thread. In this way many backup tasks can be performed simultaneously.

The rationale behind wrapping the actual functions that perform the backup in classes is the utilization of the Template Method Pattern.

Obviously, many devices will have the same backup function, but some will have subtle differences.

Fundamentally, all backup tasks are the same. We do some sort of setup step which may involve authentication, directory creation, etc. and then we do the real work. During the process, we'd like to be able to report the progess and outcome of the backup, and if successful, store the path of the backed up file.

These actions are all encapsulated in the AbstractBackup class. With this, it becomes extremely easy to extend the behaviour of existing functions and create new ones.

By way of example, we notice that the command necessary to backup Cisco Switch running configurations are very similar to that of Cisco Firewalls, the only difference lies in sequence of command for pushing the configuration file to the TFTP server. So we can simply have the Cisco firewall backup class subclass the Cisco switch backup (in this case, we really should have an Abstract Cisco backup class that is subclassed by Switches and Firewall backups).

The following is an example of a new backup task that backups up Yahoo! Finance news headlines to a textfile.

class YahooBackup(AbstractBackup):
    def cleanup(self): 
      pass # It is usually not necessary to clean up after 
    
    def setup(self):
        # Import modules
        import urllib2, pprint, json, unicodedata
        
        # Prepare the data
        self.result = json.loads(urllib2.urlopen("http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Ffinance.yahoo.com%2Fq%3Fs%3Dyhoo%22%20and%20xpath%3D'%2F%2Fdiv%5B%40id%3D%22yfi_headlines%22%5D%2Fdiv%5B2%5D%2Ful%2Fli%2Fa'&diagnostics=true&format=json").read())
        
        # Create the destination directory
        self.path = os.path.join(BACKUP_BASE_DIRECTORY, 'news', datetime.datetime.now().strftime("%Y/%m/%d"))
        if not os.path.exists (self.path):
            logger.info('Daily backup directory [{0}] does not exists. Creating now...'.format(self.path))
            os.makedirs (self.path)
            
    def perform_backup(self):
        # Parse the data to get the headlines
        data = u''
        for headline in self.result['query']['results']['a']:
            data += u'*\tHeadline: {content}\n\tLink: {href}\n'.format(**headline)
        
        # Save to the destination directory
        dump_filename = os.path.join(self.path, 'yahoo_finance_news.txt')
        try:
            f = open( dump_filename, 'w' )
            f.write(data.encode('utf8'))
        except Exception as e:
            # We can save any exceptions that occured so it can later be read from the web interface
            self.messages.append('Could not write to {0}: {1}'.format(dump_filename, e))
            logger.exception('Could not write to {0}'.format(dump_filename))
        else:
            # If no exceptions occured, we just modify the success flag to True and save the path to the resulting backup file. 
            self.successful = True
            self.result_file_abs_path = dump_filename

Line such as these

            self.successful = True
            self.result_file_abs_path = dump_filename

are necessary for saving the result of the backup to the database for the web interface. If they are not used when the backup task was successful, the backup task will be reported up as being unsuccessful in the web interface.

The Event being created after a backup task has finished executing:

            # Write the backup event to database. This must occur regardless of the backup's outcome.
            Event.objects.create(name=os.getpid(), messages='\n'.join(self.messages), backup_successful=self.successful, backup_file_path=self.result_file_abs_path, task=self.task)

The Event model has a timestamp field called created which is automatically given the current time so all the web interface has to do is display all events for a given day.

Cleaning

The backup.py --clean conmmand retrieves all the events that are older than 6 months (or whatever value settings.BACKUP_TTL is), that is, all the backups that were perfomed and completed 6 months ago, and deleted the backup file as specified by backup_file_path if it still exists, and deletes the event itself.

Email Notifications

Users belonging to the user group autobackup will be suscribed to the mailing list and notified of outcome of backup tasks so far for the day it is being run. Once notified, the task's mailed flag will be set to True and will not be shown in future notifications on that day.

This way, mail notifications can be scheduled to run several times a day.

To add a recipient to the email notifications, simply add a user to the group autobackup.

SMTP settings

Settings for the SMTP server can be modified in //autobackup/settings/production_settings.py

See https://docs.djangoproject.com/en/dev/ref/settings/#email-backend

Documentation

Class Diagram

class diagram

Design

Template Method Design Pattern

Notes

Issues

Task List

Settings

License

Copyright © 2013 Louis Tiao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment