This is a proposed procedure for Hbase table backups in a secure Hbase cluster. Requirements:
- Live backups (cannot disable table or take hbase offline)
- Self-Service (non-HBase user can backup/restore their own data)
- Automatable procedure (Oozie controlled)
- On secure cluster (cluster with world non-readable /hbase folder)
- Supports off cluster backups ** Backup location might not have an installed instance of Hbase, just HDFS ** Backup location does not have credentials for hbase user ** Backup location is secured (could be in a different Kerberos domain)
There are two clusters:
hbcluster
is a live HBase clusterbkcluster
is an HDFS cluster
We want to periodically backup table snapshots from hbcluster
to bkcluster
and we want an automated procedure to restore such snapshots from bkcluster
to hbcluster
.
The backup and restore operations are going to be initiated by the table admin user foousr
which is part of the same krb5 realm used by both clusters.
+-------------+ +--------------+
| | Backup | |
| Live Hbase/ +--------------> Backup HDFS |
| HDFS Cluster| | Cluster |
| | Restore | |
| 'hbcluster' <--------------+ 'bkcluster' |
| | | |
+-------------+ +--------------+
Ideally the HBase communiity will maintain the HBase specific code (e.g. touching HFiles and WALs)
- For export snapshot we have:
org.apache.hadoop.hbase.snapshot.ExportSnapshot
- For table incremental backup export we have:
org.apache.hadoop.hbase.backup.BackupCopier
? We need to stop at step 6 on page 9 of HBASE-7912's design doc and take over copying as the tenant deems fit.
- On
hbcluster
foousr
creates snapshot of tabletableA
onhbcluster
calledtableA-snapshot-2015-11-23
by running commandfoousr@hbcluster-worker $> echo 'create_snapshot 'tableA', 'tableA-snapshot-2015-11-23' | hbase shell
- On
hbcluster
foousr
triggersexportSnapshot
custom Oozie Action with arguments:('table-snapshot-name', 'hdfs-uri')
->('tableA-snapshot-2015-11-23', '/user/foousr/hbase-backups')
- On
hbcluster
Oozie runs ExportSnapshot MapReduce job to export snapshot to target local URI. Oozie has HDFS admin priviledges and can read all files on HDFS, including files in the /hbase folder. Oozie will chown the newly created files tofoousr
. (/usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'tableA-snapshot-2015-11-23' -copy-to '/user/foousr/hbase-backups' -chuser 'foousr'
) foousr
can now copy the files in/user/foousr/hbase-backups
off cluster with either distcp or ExportSnapshot.
foousr
populates/user/foousr/hbase-backups
with the previously backed-up snapshot (using distcp or ExportSnapshot)- On
hbcluster
foousr
triggersimportSnapshot
custom Oozie action with arguments('table-snapshot-name', 'hdfs-uri')
->('tableA-snapshot-2015-11-23', '/user/foousr/hbase-backups')
- On
hbcluster
Oozie runs ExportSnapshot MapReduce job to import snapshot from target remote URI and changes ownership of the files to the hbase user (/usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'tableA-snapshot-2015-11-23' -copy-from '/user/foousr/hbase-backups' -copy-to '/hbase' -chuser 'hbase'
) - On
hbcluster
foousr
calls theclone_snapshot
orrestore_snapshot
HBaseAdmin Api to restore the snapshot.
- In order for HBase to run ExportSnapshot, the oozie user needs to be enabled to run YARN jobs (we can hadnle that with some chef-bach recipe adjustment)
- We need to write the two custom oozie actions for import/export
- No extra load on region servers
- No hbase code change needed
- The HBase community will write HBase WAL and HFile map/reduce; Oozie will provide for a privilige escallation mechanisim independent of the HBase user running everything.
- Tenant controls where they backup their data (tape, another cluster, leave it on their cluster, etc.)
- Extra copy of the data on source cluster
- Have to write two custom oozie actions
- Oozie action has to perform security checks for
foousr
Parameters: user (inferred), destinationPath, snapshotName, numMappers
<< Check that <user> has write permissions on <destinationPath> >>
/usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot '<snapshotName>' -copy-to '<destinationPath>' -chuser '<user>' -mappers <numMappers>
Parameters: user (inferred), sourcePath, snapshotName, numMappers
Parameters: user, sourcePath, snapshotName
<< Check that <user> has read permissions on <sourcePath> >>
/usr/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot '<snapshotName>' -copy-from '<sourcePath>' -copy-to '/hbase' -chuser 'hbase' -mappers <numMappers>