Skip to content

Instantly share code, notes, and snippets.

@apple-corps
Last active August 29, 2015 14:05
Show Gist options
  • Save apple-corps/80bba7b6b19d64fde6c2 to your computer and use it in GitHub Desktop.
Save apple-corps/80bba7b6b19d64fde6c2 to your computer and use it in GitHub Desktop.
Large discrepancy in hbase rootdir size after copytable operation in hbase .92.1-cdh4.1.3
The guide I used as a reference:
http://blog.pivotal.io/pivotal/products/migrating-an-apache-hbase-table-between-different-clusters
Supposedly the original command used to create the table on cluster A:
create 'ADMd5', {NAME => 'a', BLOOMFILTER => 'ROW', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0'}
How I created the target table on cluster B:
create 'ADMd5','a',{
BLOOMFILTER => 'ROW',
VERSIONS => '1',
COMPRESSION => 'SNAPPY',
MIN_VERSIONS => '0',
SPLITS =>['/++ASUZm4u7YsTcF/VtK6Q==',
'/zyuFR1VmhJyF4rbWsFnEg==',
'0sZYnBd83ul58d1O8I2JnA==',
'2+03N7IicZH3ltrqZUX6kQ==',
'4+/slRQtkBDU7Px6C9MAbg==',
'6+1dGCQ/IBrCsrNQXe/9xQ==',
'7+2pvtpHUQHWkZJoouR9wQ==',
'8+4n2deXhzmrpe//2Fo6Fg==',
'9+4SKW/BmNzpL68cXwKV1Q==',
'A+4ajStFkjEMf36cX5D9xg==',
'B+6Zm6Kccb3l6iM2L0epxQ==',
'C+6lKKDiOWl5qrRn72fNCw==',
'D+6dZMyn7m+NhJ7G07gqaw==',
'E+6BrimmrpAd92gZJ5hyMw==',
'G+5tisu4xWZMOJnDHeYBJg==',
'I+7fRy4dvqcM/L6dFRQk9g==',
'J+8ECMw1zeOyjfOg/ypXJA==',
'K+7tenLYn6a1aNLniL6tbg==']}
How the tables now appear in hbase shell:
table A:
describe 'ADMd5'
DESCRIPTION ENABLED
{NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VER true
SIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0370 seconds
table B:
hbase(main):003:0> describe 'ADMd5'
DESCRIPTION ENABLED
{NAME => 'ADMd5', FAMILIES => [{NAME => 'a', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VE true
RSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
1 row(s) in 0.0280 seconds
The containing foldersize in hdfs:
table A:
sudo -u hdfs hadoop fs -dus -h /a_d
dus: DEPRECATED: Please use 'du -s' instead.
227.4g /a_d
table B:
sudo -u hdfs hadoop fs -dus -h /a_d
dus: DEPRECATED: Please use 'du -s' instead.
501.0g /a_d
@apple-corps
Copy link
Author

The correct syntax is :

create 'ADMd5',{
NAME => 'a',
VERSIONS => '1',
COMPRESSION => 'SNAPPY',
BLOOMFILTER => 'ROW',
},
{
SPLITS => ['/++ASUZm4u7YsTcF/VtK6Q==',
'/zyuFR1VmhJyF4rbWsFnEg==',
'0sZYnBd83ul58d1O8I2JnA==',
'2+03N7IicZH3ltrqZUX6kQ==',
'4+/slRQtkBDU7Px6C9MAbg==',
'6+1dGCQ/IBrCsrNQXe/9xQ==',
'7+2pvtpHUQHWkZJoouR9wQ==',
'8+4n2deXhzmrpe//2Fo6Fg==',
'9+4SKW/BmNzpL68cXwKV1Q==',
'A+4ajStFkjEMf36cX5D9xg==',
'B+6Zm6Kccb3l6iM2L0epxQ==',
'C+6lKKDiOWl5qrRn72fNCw==',
'D+6dZMyn7m+NhJ7G07gqaw==',
'E+6BrimmrpAd92gZJ5hyMw==',
'G+5tisu4xWZMOJnDHeYBJg==',
'I+7fRy4dvqcM/L6dFRQk9g==',
'J+8ECMw1zeOyjfOg/ypXJA==',
'K+7tenLYn6a1aNLniL6tbg==',]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment