Skip to content

Instantly share code, notes, and snippets.

@apple-corps
Last active August 29, 2015 14:05
Show Gist options
  • Save apple-corps/80bba7b6b19d64fde6c2 to your computer and use it in GitHub Desktop.
Save apple-corps/80bba7b6b19d64fde6c2 to your computer and use it in GitHub Desktop.
Large discrepancy in hbase rootdir size after copytable operation in hbase .92.1-cdh4.1.3
The guide I used as a reference:
http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters
Supposedly the original command used to create the table on cluster A:
create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'}
How I created the target table on cluster B:
create 'ADMd5','a',{
BLOOMFILTER => 'ROW',
VERSIONS => '1',
COMPRESSION => 'SNAPPY',
MIN_VERSIONS => '0',
SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
'/zyuFR1VmhJyF4rbWsFnEg==',
'0sZYnBd83ul58d1O8I2JnA==',
'2+03N7IicZH3ltrqZUX6kQ==',
'4+/slRQtkBDU7Px6C9MAbg==',
'6+1dGCQ/IBrCsrNQXe/9xQ==',
'7+2pvtpHUQHWkZJoouR9wQ==',
'8+4n2deXhzmrpe//2Fo6Fg==',
'9+4SKW/BmNzpL68cXwKV1Q==',
'A+4ajStFkjEMf36cX5D9xg==',
'B+6Zm6Kccb3l6iM2L0epxQ==',
'C+6lKKDiOWl5qrRn72fNCw==',
'D+6dZMyn7m+NhJ7G07gqaw==',
'E+6BrimmrpAd92gZJ5hyMw==',
'G+5tisu4xWZMOJnDHeYBJg==',
'I+7fRy4dvqcM/L6dFRQk9g==',
'J+8ECMw1zeOyjfOg/ypXJA==',
'K+7tenLYn6a1aNLniL6tbg==']}
How the tables now appear in hbase shell:
table A:
describe 'ADMd5'
DESCRIPTION ENABLED
{NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER true
SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0370 seconds
table B:
hbase(main):003:0> describe 'ADMd5'
DESCRIPTION ENABLED
{NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VE true
RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0280 seconds
The containing foldersize in hdfs:
table A:
sudo -u hdfs hadoop fs -dus -h /a_d
dus: DEPRECATED: Please use 'du -s' instead.
227.4g /a_d
table B:
sudo -u hdfs hadoop fs -dus -h /a_d
dus: DEPRECATED: Please use 'du -s' instead.
501.0g /a_d
@apple-corps
Copy link
Author

I have discovered the error. I made the mistake regarding the compression and the bloom filter. The new table doesn't have them enabled, and the old does. However I'm wondering how I can create tables with splits and bf and compression enabled. Shouldn't the following command return an error?

hbase(main):001:0> create 'ADMd5','a',{
hbase(main):002:1* BLOOMFILTER => 'ROW',
hbase(main):003:1* VERSIONS => '1',
hbase(main):004:1* COMPRESSION => 'SNAPPY',
hbase(main):005:1* MIN_VERSIONS => '0',
hbase(main):006:1* SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
hbase(main):007:2* '/zyuFR1VmhJyF4rbWsFnEg==',
hbase(main):008:2* '0sZYnBd83ul58d1O8I2JnA==',
hbase(main):009:2* '2+03N7IicZH3ltrqZUX6kQ==',
hbase(main):010:2* '4+/slRQtkBDU7Px6C9MAbg==',
hbase(main):011:2* '6+1dGCQ/IBrCsrNQXe/9xQ==',
hbase(main):012:2* '7+2pvtpHUQHWkZJoouR9wQ==',
hbase(main):013:2* '8+4n2deXhzmrpe//2Fo6Fg==',
hbase(main):014:2* '9+4SKW/BmNzpL68cXwKV1Q==',
hbase(main):015:2* 'A+4ajStFkjEMf36cX5D9xg==',
hbase(main):016:2* 'B+6Zm6Kccb3l6iM2L0epxQ==',
hbase(main):017:2* 'C+6lKKDiOWl5qrRn72fNCw==',
hbase(main):018:2* 'D+6dZMyn7m+NhJ7G07gqaw==',
hbase(main):019:2* 'E+6BrimmrpAd92gZJ5hyMw==',
hbase(main):020:2* 'G+5tisu4xWZMOJnDHeYBJg==',
hbase(main):021:2* 'I+7fRy4dvqcM/L6dFRQk9g==',
hbase(main):022:2* 'J+8ECMw1zeOyjfOg/ypXJA==',
hbase(main):023:2* 'K+7tenLYn6a1aNLniL6tbg==',]}
0 row(s) in 1.8010 seconds

hbase(main):024:0> describe 'ADMd5'
DESCRIPTION ENABLED
{NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOO true
MFILTER => 'NONE', REPLICATION_SCOPE => '0', VERS
IONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS
=> '0', TTL => '2147483647', BLOCKSIZE => '65536'
, IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0420 seconds

@apple-corps
Copy link
Author

The correct syntax is :

create 'ADMd5',{
NAME => 'a',
VERSIONS => '1',
COMPRESSION => 'SNAPPY',
BLOOMFILTER => 'ROW',
},
{
SPLITS => ['/++ASUZm4u7YsTcF/VtK6Q==',
'/zyuFR1VmhJyF4rbWsFnEg==',
'0sZYnBd83ul58d1O8I2JnA==',
'2+03N7IicZH3ltrqZUX6kQ==',
'4+/slRQtkBDU7Px6C9MAbg==',
'6+1dGCQ/IBrCsrNQXe/9xQ==',
'7+2pvtpHUQHWkZJoouR9wQ==',
'8+4n2deXhzmrpe//2Fo6Fg==',
'9+4SKW/BmNzpL68cXwKV1Q==',
'A+4ajStFkjEMf36cX5D9xg==',
'B+6Zm6Kccb3l6iM2L0epxQ==',
'C+6lKKDiOWl5qrRn72fNCw==',
'D+6dZMyn7m+NhJ7G07gqaw==',
'E+6BrimmrpAd92gZJ5hyMw==',
'G+5tisu4xWZMOJnDHeYBJg==',
'I+7fRy4dvqcM/L6dFRQk9g==',
'J+8ECMw1zeOyjfOg/ypXJA==',
'K+7tenLYn6a1aNLniL6tbg==',]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment