Created
March 15, 2011 21:37
-
-
Save rberger/871542 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I'm in the midst of trying to wrangle an HBase backup/restore to/from S3 or HDFS | |
built around export/backup of 1 table at a time | |
using org.apache.hadoop.hbase.mapreduce.Export from HBASE-1684. | |
Just a reminder: | |
Usage: Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]] | |
In the psuedo code below: | |
persistant_store is some kind of non-HBase store in the Cloud that you can just | |
push stuff onto. | |
all_my_Hbase_tables_to_be_backedup is a list of table names | |
create_table is a function that would properly create a new HBase Table based on | |
the schema passed in as an argument | |
Can I assume that if I do the following (psuedo_code) on HBase 0.20.3 or 0.90.x | |
to get an initial full backup to S3: | |
starttime = begining_of_time | |
endtime = NOW_Minus_60_seconds | |
versions = 100000 (the largest number of versions we keep, we do some weird | |
things with versions in some tables) | |
for table in all_my_Hbase_tables_to_be_backedup | |
do | |
$HADOOP_HOME/bin/hadoop jar $HBASE_HOME/hbase-0.20.3.jar export \ | |
$table \ | |
s3n://somebucket/$table/ \ | |
$versions \ | |
$starttime \ | |
$endtime | |
store_times_for_table_in_persistant_store( $table $starttime $endtime ) | |
store_schema_for_table_in_persistant_store( $table | |
get_schema_from_HBase($table) ) | |
done | |
Then do incremental backups from that point on: | |
endtime = NOW_Minus_60_seconds | |
versions = 100000 | |
for table in all_my_Hbase_tables_to_be_backedup | |
do | |
starttime = get_last_endtime_from_persistant_store( $table ) | |
$HADOOP_HOME/bin/hadoop jar $HBASE_HOME/hbase-0.20.3.jar export \ | |
$table \ | |
s3n://somebucket/$table/ \ | |
$versions \ | |
$starttime \ | |
$endtime | |
store_times_for_table_in_persistant_store( $table $starttime $endtime ) | |
store_schema_for_table_in_persistant_store( $table | |
get_schema_from_HBase($table) ) | |
done | |
The Import usage: | |
Usage: Import <tablename> <inputdir> | |
If I wanted to restore a backed up table (table_foo) to a destination table | |
(table_bar) in the HBase that is running this command which may or may not be | |
the same HBase the table was originally backed up from from the exports to S3 I | |
can do: | |
create_table( get_schema_from_persistant_store(table_bar) ) | |
$HADOOP_HOME/bin/hadoop jar $HBASE_HOME/hbase-0.20.3.jar import \ | |
table_bar \ | |
s3n://somebucket/table_foo/ | |
If I wanted to do a full restore I would just loop thru all the tables the | |
above import process on an HBase cluster that didn't yet have those tables. | |
Would I pretty much be guaranteed to get a proper backup snapshotted at the | |
specified endtime of each run? | |
Could this be used to copy an the data from one HBase cluster to another (in | |
particular to go from a production HBase 0.20.3 to a fresh new 0.90.1)? | |
One normal backup/restore thing that is missing is there is no easy way to get | |
a restore at a point in time as opposed to the last backup. I presume the worse | |
case would be to restore everything and then delete rows with timestamps after | |
the early date one wanted? | |
Please let me know what I might be missing or what the down sides would be to | |
such a way to do backups. | |
Thanks! | |
Rob | |
__________________ | |
Robert J Berger - CTO | |
Runa Inc. | |
520 San Antonio Rd Suite 210, Mountain View, CA 94040 | |
+1 408-838-8896 | |
http://blog.ibd.com | |
http://workatruna.com |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment