Skip to content

Instantly share code, notes, and snippets.

@rajkrrsingh
Last active November 4, 2021 08:24
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rajkrrsingh/96abe1bd9bb79173ba35ac49c97b44e0 to your computer and use it in GitHub Desktop.
Save rajkrrsingh/96abe1bd9bb79173ba35ac49c97b44e0 to your computer and use it in GitHub Desktop.
Jump start guide for Hive Replication V2 - to know more about hive replication please refer https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development

Prerequisite hive settings:

set hive.server2.logging.operation.level=execution;
set hive.metastore.transactional.event.listeners=org.apache.hive.hcatalog.listener.DbNotificationListener;
set hive.metastore.dml.events=true;

Setup database and tables

create database sampledb with dbproperties('repl.source.for'='1,2,3');
create table sampledb.sampletble (id int);
insert into sampledb.sampletble values (1), (2),(3);

Bootstrap dump

repl dump sampledb
or
repl dump sampledb with ('hive.repl.rootdir'='/tmp/hive/repl');

+----------------------------------------------------+---------------+
|                      dump_dir                      | last_repl_id  |
+----------------------------------------------------+---------------+
| /tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0 | 2925          |
+----------------------------------------------------+---------------+

Bootstrap load

--on DR cluster or in same cluster using different dbname

repl load sampledb_replica from '/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0';

INFO  : Starting task [Stage-0:REPL_BOOTSTRAP_LOAD] in serial mode
INFO  : REPL::START: {"dbName":"sampledb_replica","dumpDir":"hdfs://hdp31a.hdp.local:8020/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0","loadType":"BOOTSTRAP","numTables":1,"numFunctions":0,"loadStartTime":1563253421}
INFO  : Root Tasks / Total Tasks : 1 / 9 
INFO  : completed load task run : 1
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Starting task [Stage-1:DDL] in serial mode
INFO  : Starting task [Stage-2:DDL] in serial mode
INFO  : Starting task [Stage-3:REPL_TXN] in serial mode
INFO  : Replicated WriteId state for DbName: sampledb_replica TableName: sampletble ValidWriteIdList: sampledb.sampletble:1:9223372036854775807::
INFO  : Starting task [Stage-4:COPY] in serial mode
INFO  : Copying data from hdfs://hdp31a.hdp.local:8020/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0/sampledb/sampletble/data to hdfs://hdp31a.hdp.local:8020/warehouse/tablespace/managed/hive/sampledb_replica.db/sampletble/.hive-staging_hive_2019-07-16_05-03-41_924_3159588847098506185-19/-ext-10026
INFO  : Starting task [Stage-5:MOVE] in serial mode
INFO  : Moving data to directory hdfs://hdp31a.hdp.local:8020/warehouse/tablespace/managed/hive/sampledb_replica.db/sampletble from hdfs://hdp31a.hdp.local:8020/warehouse/tablespace/managed/hive/sampledb_replica.db/sampletble/.hive-staging_hive_2019-07-16_05-03-41_924_3159588847098506185-19/-ext-10026
INFO  : Starting task [Stage-6:DDL] in serial mode
INFO  : Starting task [Stage-7:REPL_STATE_LOG] in serial mode
INFO  : REPL::TABLE_LOAD: {"dbName":"sampledb_replica","tableName":"sampletble","tableType":"MANAGED_TABLE","tablesLoadProgress":"1/1","loadTime":1563253422}
INFO  : Starting task [Stage-8:REPL_STATE_LOG] in serial mode
INFO  : REPL::END: {"dbName":"sampledb_replica","loadType":"BOOTSTRAP","numTables":1,"numFunctions":0,"loadEndTime":1563253422,"dumpDir":"hdfs://hdp31a.hdp.local:8020/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0","lastReplId":"2925"}
INFO  : Starting task [Stage-9:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20190716050341_f370af6d-a294-418d-83e0-115318a0aa0a); Time taken: 0.479 seconds
INFO  : OK
No rows affected (0.705 seconds)

Perform operation for incremental replication

insert into sampledb.sampletble values (5), (6),(7);

Incremental dump -- last_repl_id=2925

repl dump sampledb from 2925 with ('hive.repl.rootdir'='/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0');

INFO  : Starting task [Stage-0:REPL_DUMP] in serial mode
INFO  : REPL::START: {"dbName":"sampledb","dumpType":"INCREMENTAL","estimatedNumEvents":18,"dumpStartTime":1563253877}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2926","eventType":"EVENT_OPEN_TXN","eventsDumpProgress":"1/18","dumpTime":1563253877}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2927","eventType":"EVENT_ALTER_DATABASE","eventsDumpProgress":"2/18","dumpTime":1563253877}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2928","eventType":"EVENT_ALTER_DATABASE","eventsDumpProgress":"3/18","dumpTime":1563253877}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2929","eventType":"EVENT_COMMIT_TXN","eventsDumpProgress":"4/18","dumpTime":1563253885}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2930","eventType":"EVENT_OPEN_TXN","eventsDumpProgress":"5/18","dumpTime":1563253885}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2931","eventType":"EVENT_ABORT_TXN","eventsDumpProgress":"6/18","dumpTime":1563253885}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2932","eventType":"EVENT_OPEN_TXN","eventsDumpProgress":"7/18","dumpTime":1563253885}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2933","eventType":"EVENT_ABORT_TXN","eventsDumpProgress":"8/18","dumpTime":1563253885}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2934","eventType":"EVENT_OPEN_TXN","eventsDumpProgress":"9/18","dumpTime":1563253885}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2935","eventType":"EVENT_ABORT_TXN","eventsDumpProgress":"10/18","dumpTime":1563253885}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2936","eventType":"EVENT_OPEN_TXN","eventsDumpProgress":"11/18","dumpTime":1563253885}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2942","eventType":"EVENT_COMMIT_TXN","eventsDumpProgress":"12/18","dumpTime":1563253885}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2943","eventType":"EVENT_OPEN_TXN","eventsDumpProgress":"13/18","dumpTime":1563253885}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2944","eventType":"EVENT_ALLOC_WRITE_ID","eventsDumpProgress":"14/18","dumpTime":1563253886}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2945","eventType":"EVENT_ALTER_TABLE","eventsDumpProgress":"15/18","dumpTime":1563253886}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2946","eventType":"EVENT_ALTER_TABLE","eventsDumpProgress":"16/18","dumpTime":1563253886}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2947","eventType":"EVENT_COMMIT_TXN","eventsDumpProgress":"17/18","dumpTime":1563253886}
INFO  : REPL::EVENT_DUMP: {"dbName":"sampledb","eventId":"2948","eventType":"EVENT_OPEN_TXN","eventsDumpProgress":"18/18","dumpTime":1563253886}
INFO  : REPL::END: {"dbName":"sampledb","dumpType":"INCREMENTAL","actualNumEvents":18,"dumpEndTime":1563253886,"dumpDir":"/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0/7bc2acc3-d788-4028-b188-49e92d124436","lastReplId":"2948"}
INFO  : Completed executing command(queryId=hive_20190716051116_14527a00-293f-494c-b8e8-f3faee9ea37d); Time taken: 9.451 seconds
INFO  : OK
+----------------------------------------------------+---------------+
|                      dump_dir                      | last_repl_id  |
+----------------------------------------------------+---------------+
| /tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0/7bc2acc3-d788-4028-b188-49e92d124436 | 2948          |
+----------------------------------------------------+---------------+
1 row selected (17.111 seconds)

Incremental load

repl load sampledb_replica from '/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0/7bc2acc3-d788-4028-b188-49e92d124436';

INFO  : Compiling command(queryId=hive_20190716051521_57336e5c-9250-4f2d-b269-bfd9d6993fae): repl load sampledb_replica from '/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0/7bc2acc3-d788-4028-b188-49e92d124436'
INFO  : REPL::START: {"dbName":"sampledb_replica","dumpDir":"hdfs://hdp31a.hdp.local:8020/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0/7bc2acc3-d788-4028-b188-49e92d124436","loadType":"INCREMENTAL","numEvents":18,"loadStartTime":1563254121}
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20190716051521_57336e5c-9250-4f2d-b269-bfd9d6993fae); Time taken: 0.266 seconds
INFO  : Executing command(queryId=hive_20190716051521_57336e5c-9250-4f2d-b269-bfd9d6993fae): repl load sampledb_replica from '/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0/7bc2acc3-d788-4028-b188-49e92d124436'
INFO  : Starting task [Stage-0:REPL_INCREMENTAL_LOAD] in serial mode
INFO  : Added alloc write id task : Stage-47
INFO  : Iteration 1 done with num task : 71, lastReplayedEvent : 2948
INFO  : Starting task [Stage-0:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-1:REPL_TXN] in serial mode
INFO  : Replayed OpenTxn Event for policy sampledb_replica.* with srcTxn [1148] and target txn id [1157]
INFO  : Starting task [Stage-2:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-3:DDL] in serial mode
INFO  : Starting task [Stage-4:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2926","eventType":"EVENT_OPEN_TXN","eventsLoadProgress":"1/18","loadTime":1563254122}
INFO  : Starting task [Stage-5:DDL] in serial mode
INFO  : Starting task [Stage-6:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-7:DDL] in serial mode
INFO  : Starting task [Stage-8:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2927","eventType":"EVENT_ALTER_DATABASE","eventsLoadProgress":"2/18","loadTime":1563254122}
INFO  : Starting task [Stage-9:DDL] in serial mode
INFO  : Starting task [Stage-10:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-11:DDL] in serial mode
INFO  : Starting task [Stage-12:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2928","eventType":"EVENT_ALTER_DATABASE","eventsLoadProgress":"3/18","loadTime":1563254122}
INFO  : Starting task [Stage-14:REPL_TXN] in serial mode
INFO  : Replayed OpenTxn Event for policy sampledb_replica.* with srcTxn [1149] and target txn id [1158]
INFO  : Starting task [Stage-15:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-16:DDL] in serial mode
INFO  : Starting task [Stage-17:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2930","eventType":"EVENT_OPEN_TXN","eventsLoadProgress":"4/18","loadTime":1563254122}
INFO  : Starting task [Stage-18:REPL_TXN] in serial mode
INFO  : Replayed AbortTxn Event for policy sampledb_replica.* with srcTxn 1149
INFO  : Starting task [Stage-19:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-20:DDL] in serial mode
INFO  : Starting task [Stage-21:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2931","eventType":"EVENT_ABORT_TXN","eventsLoadProgress":"5/18","loadTime":1563254122}
INFO  : Starting task [Stage-22:REPL_TXN] in serial mode
INFO  : Replayed OpenTxn Event for policy sampledb_replica.* with srcTxn [1150] and target txn id [1159]
INFO  : Starting task [Stage-23:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-24:DDL] in serial mode
INFO  : Starting task [Stage-25:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2932","eventType":"EVENT_OPEN_TXN","eventsLoadProgress":"6/18","loadTime":1563254122}
INFO  : Starting task [Stage-26:REPL_TXN] in serial mode
INFO  : Replayed AbortTxn Event for policy sampledb_replica.* with srcTxn 1150
INFO  : Starting task [Stage-27:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-28:DDL] in serial mode
INFO  : Starting task [Stage-29:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2933","eventType":"EVENT_ABORT_TXN","eventsLoadProgress":"7/18","loadTime":1563254122}
INFO  : Starting task [Stage-30:REPL_TXN] in serial mode
INFO  : Replayed OpenTxn Event for policy sampledb_replica.* with srcTxn [1151] and target txn id [1160]
INFO  : Starting task [Stage-31:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-32:DDL] in serial mode
INFO  : Starting task [Stage-33:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2934","eventType":"EVENT_OPEN_TXN","eventsLoadProgress":"8/18","loadTime":1563254122}
INFO  : Starting task [Stage-34:REPL_TXN] in serial mode
INFO  : Replayed AbortTxn Event for policy sampledb_replica.* with srcTxn 1151
INFO  : Starting task [Stage-35:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-36:DDL] in serial mode
INFO  : Starting task [Stage-37:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2935","eventType":"EVENT_ABORT_TXN","eventsLoadProgress":"9/18","loadTime":1563254122}
INFO  : Starting task [Stage-38:REPL_TXN] in serial mode
INFO  : Replayed OpenTxn Event for policy sampledb_replica.* with srcTxn [1152] and target txn id [1161]
INFO  : Starting task [Stage-39:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-40:DDL] in serial mode
INFO  : Starting task [Stage-41:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2936","eventType":"EVENT_OPEN_TXN","eventsLoadProgress":"10/18","loadTime":1563254122}
INFO  : Starting task [Stage-43:REPL_TXN] in serial mode
INFO  : Replayed OpenTxn Event for policy sampledb_replica.* with srcTxn [1153] and target txn id [1162]
INFO  : Starting task [Stage-44:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-45:DDL] in serial mode
INFO  : Starting task [Stage-46:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2943","eventType":"EVENT_OPEN_TXN","eventsLoadProgress":"11/18","loadTime":1563254123}
INFO  : Starting task [Stage-47:REPL_TXN] in serial mode
INFO  : Replayed alloc write Id Event for repl policy: sampledb_replica.* db Name : sampledb_replica txnToWriteIdList: [TxnToWriteId(txnId:1153, writeId:2)] table name: sampletble
INFO  : Starting task [Stage-48:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-49:DDL] in serial mode
INFO  : Starting task [Stage-50:DDL] in serial mode
INFO  : Starting task [Stage-51:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2944","eventType":"EVENT_ALLOC_WRITE_ID","eventsLoadProgress":"12/18","loadTime":1563254123}
INFO  : Starting task [Stage-52:DDL] in serial mode
INFO  : Starting task [Stage-53:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-54:DDL] in serial mode
INFO  : Starting task [Stage-55:DDL] in serial mode
INFO  : Starting task [Stage-56:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2945","eventType":"EVENT_ALTER_TABLE","eventsLoadProgress":"13/18","loadTime":1563254123}
INFO  : Starting task [Stage-57:DDL] in serial mode
INFO  : Starting task [Stage-58:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-59:DDL] in serial mode
INFO  : Starting task [Stage-60:DDL] in serial mode
INFO  : Starting task [Stage-61:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2946","eventType":"EVENT_ALTER_TABLE","eventsLoadProgress":"14/18","loadTime":1563254123}
INFO  : Starting task [Stage-62:COPY] in serial mode
INFO  : Copying data from hdfs://hdp31a.hdp.local:8020/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0/7bc2acc3-d788-4028-b188-49e92d124436/2947/sampledb.sampletble/data to hdfs://hdp31a.hdp.local:8020/warehouse/tablespace/managed/hive/sampledb_replica.db/sampletble/.hive-staging_hive_2019-07-16_05-15-21_755_7981027464462344763-23/-ext-10000
INFO  : Starting task [Stage-63:MOVE] in serial mode
INFO  : Moving data to directory hdfs://hdp31a.hdp.local:8020/warehouse/tablespace/managed/hive/sampledb_replica.db/sampletble from hdfs://hdp31a.hdp.local:8020/warehouse/tablespace/managed/hive/sampledb_replica.db/sampletble/.hive-staging_hive_2019-07-16_05-15-21_755_7981027464462344763-23/-ext-10000
INFO  : Starting task [Stage-65:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-64:REPL_TXN] in serial mode
INFO  : Replayed CommitTxn Event for replPolicy: sampledb_replica.* with srcTxn: 1153WriteEventInfos: [WriteEventInfo(writeId:2, database:sampledb_replica, table:sampletble, files:hdfs://hdp31a.hdp.local:8020/warehouse/tablespace/managed/hive/sampledb.db/sampletble/delta_0000002_0000002_0000/bucket_00000###delta_0000002_0000002_0000)]
INFO  : Starting task [Stage-66:DDL] in serial mode
INFO  : Starting task [Stage-67:DDL] in serial mode
INFO  : Starting task [Stage-68:DDL] in serial mode
INFO  : Starting task [Stage-69:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2947","eventType":"EVENT_COMMIT_TXN","eventsLoadProgress":"15/18","loadTime":1563254123}
INFO  : Starting task [Stage-70:REPL_TXN] in serial mode
INFO  : Replayed OpenTxn Event for policy sampledb_replica.* with srcTxn [1154] and target txn id [1163]
INFO  : Starting task [Stage-71:DEPENDENCY_COLLECTION] in serial mode
INFO  : Starting task [Stage-72:DDL] in serial mode
INFO  : Starting task [Stage-73:REPL_STATE_LOG] in serial mode
INFO  : REPL::EVENT_LOAD: {"dbName":"sampledb_replica","eventId":"2948","eventType":"EVENT_OPEN_TXN","eventsLoadProgress":"16/18","loadTime":1563254123}
INFO  : Starting task [Stage-74:REPL_STATE_LOG] in serial mode
INFO  : REPL::END: {"dbName":"sampledb_replica","loadType":"INCREMENTAL","numEvents":18,"loadEndTime":1563254123,"dumpDir":"hdfs://hdp31a.hdp.local:8020/tmp/hive/repl/38896729-67d5-41b2-90dc-46eeed4c5dd0/7bc2acc3-d788-4028-b188-49e92d124436","lastReplId":"2948"}
INFO  : Completed executing command(queryId=hive_20190716051521_57336e5c-9250-4f2d-b269-bfd9d6993fae); Time taken: 1.888 seconds
INFO  : OK
No rows affected (2.871 seconds)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment