Skip to content

Instantly share code, notes, and snippets.

@cbaenziger
Forked from mlongob/hbase_backup.md
Last active August 1, 2016 14:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cbaenziger/ee5aa551cdd3312c59004c7e6d79ad00 to your computer and use it in GitHub Desktop.
Save cbaenziger/ee5aa551cdd3312c59004c7e6d79ad00 to your computer and use it in GitHub Desktop.
Hbase backup solutions

Introduction

This is a proposed procedure for Hbase table backups in a secure Hbase cluster. Requirements:

  • Live backups (cannot disable table or take hbase offline)
  • Self-Service (non-HBase user can backup/restore their own data)
  • Automatable procedure (Oozie controlled)
  • On secure cluster (cluster with world non-readable /hbase folder)
  • Supports off cluster backups ** Backup location might not have an installed instance of Hbase, just HDFS ** Backup location does not have credentials for hbase user ** Backup location is secured (could be in a different Kerberos domain)

Setup

There are two clusters:

  • hbcluster is a live HBase cluster
  • bkcluster is an HDFS cluster

We want to periodically backup table snapshots from hbcluster to bkcluster and we want an automated procedure to restore such snapshots from bkcluster to hbcluster.

The backup and restore operations are going to be initiated by the table admin user foousr which is part of the same krb5 realm used by both clusters.

+-------------+              +--------------+
|             |    Backup    |              |
| Live Hbase/ +-------------->  Backup HDFS |
| HDFS Cluster|              |  Cluster     |
|             |    Restore   |              |
| 'hbcluster' <--------------+  'bkcluster' |
|             |              |              |
+-------------+              +--------------+

Ideally the HBase communiity will maintain the HBase specific code (e.g. touching HFiles and WALs)

  • For export snapshot we have: org.apache.hadoop.hbase.snapshot.ExportSnapshot
  • For table incremental backup export we have: org.apache.hadoop.hbase.backup.BackupCopier? We need to stop at step 6 on page 9 of HBASE-7912's design doc and take over copying as the tenant deems fit.

Proposed solution - ExportSnapshot in Oozie action

Backup procedure

  1. On hbcluster foousr creates snapshot of table tableA on hbcluster called tableA-snapshot-2015-11-23 by running command foousr@hbcluster-worker $> echo 'create_snapshot 'tableA', 'tableA-snapshot-2015-11-23' | hbase shell
  2. On hbcluster foousr triggers exportSnapshot custom Oozie Action with arguments: ('table-snapshot-name', 'hdfs-uri') -> ('tableA-snapshot-2015-11-23', '/user/foousr/hbase-backups')
  3. On hbcluster Oozie runs ExportSnapshot MapReduce job to export snapshot to target local URI. Oozie has HDFS admin priviledges and can read all files on HDFS, including files in the /hbase folder. Oozie will chown the newly created files to foousr. (/usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'tableA-snapshot-2015-11-23' -copy-to '/user/foousr/hbase-backups' -chuser 'foousr')
  4. foousr can now copy the files in /user/foousr/hbase-backups off cluster with either distcp or ExportSnapshot.

Restore procedure

  1. foousr populates /user/foousr/hbase-backups with the previously backed-up snapshot (using distcp or ExportSnapshot)
  2. On hbcluster foousr triggers importSnapshot custom Oozie action with arguments ('table-snapshot-name', 'hdfs-uri') -> ('tableA-snapshot-2015-11-23', '/user/foousr/hbase-backups')
  3. On hbcluster Oozie runs ExportSnapshot MapReduce job to import snapshot from target remote URI and changes ownership of the files to the hbase user (/usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'tableA-snapshot-2015-11-23' -copy-from '/user/foousr/hbase-backups' -copy-to '/hbase' -chuser 'hbase')
  4. On hbcluster foousr calls the clone_snapshot or restore_snapshot HBaseAdmin Api to restore the snapshot.

Work items

  • In order for HBase to run ExportSnapshot, the oozie user needs to be enabled to run YARN jobs (we can hadnle that with some chef-bach recipe adjustment)
  • We need to write the two custom oozie actions for import/export

Pro's

  • No extra load on region servers
  • No hbase code change needed
  • The HBase community will write HBase WAL and HFile map/reduce; Oozie will provide for a privilige escallation mechanisim independent of the HBase user running everything.
  • Tenant controls where they backup their data (tape, another cluster, leave it on their cluster, etc.)

Con's

  • Extra copy of the data on source cluster
  • Have to write two custom oozie actions
  • Oozie action has to perform security checks for foousr

Oozie action pseudocode

ExportSnapshot Action

Parameters: user (inferred), destinationPath, snapshotName, numMappers

<< Check that <user> has write permissions on <destinationPath> >>
/usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot '<snapshotName>' -copy-to '<destinationPath>' -chuser '<user>' -mappers <numMappers>

ImportSnapshot Action

Parameters: user (inferred), sourcePath, snapshotName, numMappers

Parameters: user, sourcePath, snapshotName
<< Check that <user> has read permissions on <sourcePath> >>
/usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot '<snapshotName>' -copy-from '<sourcePath>' -copy-to '/hbase' -chuser 'hbase' -mappers <numMappers>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment