public
Last active

  • Download Gist
count.diff
Diff
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
diff --git a/core/src/main/ruby/hbase.rb b/core/src/main/ruby/hbase.rb
index de9d006..4ba2a98 100644
--- a/core/src/main/ruby/hbase.rb
+++ b/core/src/main/ruby/hbase.rb
@@ -31,6 +31,8 @@ module HBaseConstants
MAXLENGTH = "MAXLENGTH"
CACHE_BLOCKS = "CACHE_BLOCKS"
REPLICATION_SCOPE = "REPLICATION_SCOPE"
+ INTERVAL = 'INTERVAL'
+ CACHE = 'CACHE'
# Load constants from hbase java API
def self.promote_constants(constants)
diff --git a/core/src/main/ruby/hbase/table.rb b/core/src/main/ruby/hbase/table.rb
index 7e1c808..51115bf 100644
--- a/core/src/main/ruby/hbase/table.rb
+++ b/core/src/main/ruby/hbase/table.rb
@@ -63,11 +63,11 @@ module Hbase
#----------------------------------------------------------------------------------------------
# Count rows in a table
- def count(interval = 1000)
+ def count(interval = 1000, caching_rows = 10)
# We can safely set scanner caching with the first key only filter
scan = Scan.new
scan.cache_blocks = false
- scan.caching = 10
+ scan.caching = caching_rows
scan.setFilter(FirstKeyOnlyFilter.new)
# Run the scanner
diff --git a/core/src/main/ruby/shell/commands/count.rb b/core/src/main/ruby/shell/commands/count.rb
index 4341776..f65b98c 100644
--- a/core/src/main/ruby/shell/commands/count.rb
+++ b/core/src/main/ruby/shell/commands/count.rb
@@ -6,17 +6,26 @@ module Shell
Count the number of rows in a table. This operation may take a LONG
time (Run '$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount' to run a
counting mapreduce job). Current count is shown every 1000 rows by
- default. Count interval may be optionally specified. Examples:
+ default. Count interval may be optionally specified. Scan caching
+ is enabled on count scans by default. Default cache size is 10 rows.
+ If your rows are small in size, you may want to increase this
+ parameter. Examples:
hbase> count 't1'
- hbase> count 't1', 100000
+ hbase> count 't1', INTERVAL => 100000
+ hbase> count 't1', CACHE => 1000
EOF
end
- def command(table, interval = 1000)
+ def command(table, params)
+ params = {
+ 'INTERVAL' => 1000,
+ 'CACHE' => 10
+ }.merge(params)
+
now = Time.now
formatter.header
- count = table(table).count(interval) do |cnt, row|
+ count = table(table).count(params['INTERVAL'].to_i, params['CACHE'].to_i) do |cnt, row|
formatter.row([ "Current count: #{cnt}, row: #{row}" ])
end
formatter.footer(now, count)

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.