Skip to content

Instantly share code, notes, and snippets.

@abajwa-hw
Last active December 1, 2020 18:09
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save abajwa-hw/86ec17a6f0b3542fd4a9 to your computer and use it in GitHub Desktop.
Save abajwa-hw/86ec17a6f0b3542fd4a9 to your computer and use it in GitHub Desktop.
Hbase indexing to solr in HDP 2.3

Hbase indexing to solr in HDP 2.3

  • Background:

The HBase Indexer provides the ability to stream events from HBase to Solr for near real time searching. The HBase indexer is included with HDPSearch as an additional service. The indexer works by acting as an HBase replication sink. As updates are written to HBase, the events are asynchronously replicated to the HBase Indexer processes, which in turn creates Solr documents and pushes them to Solr.

Steps

  • Download and start HDP 2.3 sandbox VM which comes with LW HDP search installed (under /opt/lucidworks-hdpsearch)

  • Point Solr to Zookeeper by configuring hbase-indexer-site.xml

vi /opt/lucidworks-hdpsearch/hbase-indexer/conf/hbase-indexer-site.xml

<?xml version="1.0"?>
<configuration>
   <property>
      <name>hbaseindexer.zookeeper.connectstring</name>
      <value>sandbox.hortonworks.com:2181</value>
   </property>
  <property>
     <name>hbase.zookeeper.quorum</name>
     <value>sandbox.hortonworks.com</value>
   </property>
</configuration>
  • In Ambari > HBase > Configs > Custom hbase-site add the below properties, but do not restart HBase just yet:
hbase.replication=true
replication.source.ratio=1.0
replication.source.nb.capacity=1000
replication.replicationsource.implementation=com.ngdata.sep.impl.SepReplicationSource
  • Copy Solrs Hbase related libs to $HBASE_HOME/lib
cp /opt/lucidworks-hdpsearch/hbase-indexer/lib/hbase-sep* /usr/hdp/current/hbase-master/lib/
  • Restart Hbase

  • Copy hbase-site.xml to hbase-indexer's conf dir

cp /etc/hbase/conf/hbase-site.xml /opt/lucidworks-hdpsearch/hbase-indexer/conf/
  • Start Solr in cloud mode (pointing to ZK)
cd /opt/lucidworks-hdpsearch/solr
bin/solr start -c -z sandbox.hortonworks.com:2181
  • Create collection
  bin/solr create -c hbaseCollection \
     -d data_driven_schema_configs \
     -n myCollConfigs \
     -s 2 \
     -rf 2 
  • Start Hbase indexer
cd /opt/lucidworks-hdpsearch/hbase-indexer/bin/
./hbase-indexer server
  • In a second terminal, create table to be indexed in HBase. Open hbase shell and run below to create a table named "indexdemo-user", with a single column family named "info". Note that the REPLICATION_SCOPE of the column family of the table must be set to 1.:
create 'indexdemo-user', { NAME => 'info', REPLICATION_SCOPE => '1' }
!quit
  • Now we'll create an indexer that will index the the indexdemo-user table as its contents are updated.
vi /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml

<?xml version="1.0"?>
<indexer table="indexdemo-user">
  <field name="firstname_s" value="info:firstname"/>
  <field name="lastname_s" value="info:lastname"/>
  <field name="age_i" value="info:age" type="int"/>
</indexer>
  • The above file defines three pieces of information that will be used for indexing, how to interpret them, and how they will be stored in Solr.

  • Next, create an indexer based on the created indexer xml file.

/opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer add-indexer -n hbaseindexer -c /opt/lucidworks-hdpsearch/hbase-indexer/indexdemo-indexer.xml  -cp solr.zk=sandbox.hortonworks.com:2181 -cp solr.collection=hbaseCollection 
  • Check it got created
/opt/lucidworks-hdpsearch/hbase-indexer/bin/hbase-indexer list-indexers
  • Check that the index server output shows below
INFO supervisor.IndexerSupervisor: Started indexer for hbaseindexer
  • Log back in the hbase shell try adding some data to the indexdemo-user table
hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'John'
hbase> put 'indexdemo-user', 'row1', 'info:lastname', 'Smith'
  • Run commit
curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true

Now try updating the data you've just added in hbase shell and commit

hbase> put 'indexdemo-user', 'row1', 'info:firstname', 'Jim'
curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
hbase> deleteall 'indexdemo-user', 'row1'
curl http://sandbox.hortonworks.com:8983/solr/hbaseCollection/update?commit=true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment