Skip to content

Instantly share code, notes, and snippets.

@rberger
Created March 15, 2011 21:37
Show Gist options
  • Save rberger/871542 to your computer and use it in GitHub Desktop.
Save rberger/871542 to your computer and use it in GitHub Desktop.
I'm in the midst of trying to wrangle an HBase backup/restore to/from S3 or HDFS
built around export/backup of 1 table at a time
using org.apache.hadoop.hbase.mapreduce.Export from HBASE-1684.
Just a reminder:
Usage: Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]
In the psuedo code below:
persistant_store is some kind of non-HBase store in the Cloud that you can just
push stuff onto.
all_my_Hbase_tables_to_be_backedup is a list of table names
create_table is a function that would properly create a new HBase Table based on
the schema passed in as an argument
Can I assume that if I do the following (psuedo_code) on HBase 0.20.3 or 0.90.x
to get an initial full backup to S3:
starttime = begining_of_time
endtime = NOW_Minus_60_seconds
versions = 100000 (the largest number of versions we keep, we do some weird
things with versions in some tables)
for table in all_my_Hbase_tables_to_be_backedup
do
$HADOOP_HOME/bin/hadoop jar $HBASE_HOME/hbase-0.20.3.jar export \
$table \
s3n://somebucket/$table/ \
$versions \
$starttime \
$endtime
store_times_for_table_in_persistant_store( $table $starttime $endtime )
store_schema_for_table_in_persistant_store( $table
get_schema_from_HBase($table) )
done
Then do incremental backups from that point on:
endtime = NOW_Minus_60_seconds
versions = 100000
for table in all_my_Hbase_tables_to_be_backedup
do
starttime = get_last_endtime_from_persistant_store( $table )
$HADOOP_HOME/bin/hadoop jar $HBASE_HOME/hbase-0.20.3.jar export \
$table \
s3n://somebucket/$table/ \
$versions \
$starttime \
$endtime
store_times_for_table_in_persistant_store( $table $starttime $endtime )
store_schema_for_table_in_persistant_store( $table
get_schema_from_HBase($table) )
done
The Import usage:
Usage: Import <tablename> <inputdir>
If I wanted to restore a backed up table (table_foo) to a destination table
(table_bar) in the HBase that is running this command which may or may not be
the same HBase the table was originally backed up from from the exports to S3 I
can do:
create_table( get_schema_from_persistant_store(table_bar) )
$HADOOP_HOME/bin/hadoop jar $HBASE_HOME/hbase-0.20.3.jar import \
table_bar \
s3n://somebucket/table_foo/
If I wanted to do a full restore I would just loop thru all the tables the
above import process on an HBase cluster that didn't yet have those tables.
Would I pretty much be guaranteed to get a proper backup snapshotted at the
specified endtime of each run?
Could this be used to copy an the data from one HBase cluster to another (in
particular to go from a production HBase 0.20.3 to a fresh new 0.90.1)?
One normal backup/restore thing that is missing is there is no easy way to get
a restore at a point in time as opposed to the last backup. I presume the worse
case would be to restore everything and then delete rows with timestamps after
the early date one wanted?
Please let me know what I might be missing or what the down sides would be to
such a way to do backups.
Thanks!
Rob
__________________
Robert J Berger - CTO
Runa Inc.
520 San Antonio Rd Suite 210, Mountain View, CA 94040
+1 408-838-8896
http://blog.ibd.com
http://workatruna.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment