Skip to content

Instantly share code, notes, and snippets.

@bvaradar
bvaradar / Create Table
Last active October 27, 2020 22:29
Debezium Load Generation
CREATE TABLE temp_20200817(
id integer PRIMARY KEY,
temp_id integer,
temp_status VARCHAR(16),
update_type VARCHAR(14),
price numeric(16,8),
quantity numeric,
timestamp timestamp,
seqno integer,
ts timestamp,
@bvaradar
bvaradar / patch
Last active October 13, 2020 23:18
0.6 Patch for Debezium Support
diff --git a/hudi-common/src/main/java/org/apache/hudi/common/model/DebeziumAvroPayload.java b/hudi-common/src/main/java/org/apache/hudi/common/model/DebeziumAvroPayload.java
new file mode 100644
index 00000000..cae8f13a
--- /dev/null
+++ b/hudi-common/src/main/java/org/apache/hudi/common/model/DebeziumAvroPayload.java
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
@bvaradar
bvaradar / Confluent Installation and load generation
Last active April 4, 2022 03:17
Structured Streaming Simple Testbed setup using kafka
#Download Confluent platform as zip locally : https://www.confluent.io/download/?_ga=2.149291254.1340520780.1594928883-290224092.1594928883&_gac=1.220398892.1594951593.EAIaIQobChMIm6Cmz5nT6gIVCa_ICh0IeAjlEAAYASAAEgLWkfD_BwE
#Choose zip option. Unzip after download. setup in your home directory.
export CONFLUENT_HOME=<path_to_confluent_home>
export PATH=$PATH:$CONFLUENT_HOME/bin
# Start services
confluent local start
Hoodie Columns removed
{
"type" : "record",
"name" : "spark_schema",
"fields" : [ {
"name" : "timestamp",
"type" : [ "null", "double" ],
"default" : null
}, {
{
"type" : "record",
"name" : "spark_schema",
"fields" : [ {
"name" : "timestamp",
"type" : [ "null", "double" ],
"default" : null
}, {
"name" : "_row_key",
"type" : [ "null", "string" ],
@bvaradar
bvaradar / screenshot.png
Last active October 25, 2019 04:17
Screenshot
Screen shot Page
Configs: Attached (ds_configs.tgz)
Upload configs:
tar -zxvf ds_configs.tgz
hadoop fs -copyFromLocal -r ds_configs <DFS_CONFIG_ROOT>/
Spark Submit Command:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_CONF_DIR=/home/guoyihua/wireline/hadoop-conf
export HUDI_UTILITIES_BUNDLE=<PATH_TO>/hoodie-utilities-0.4.8-SNAPSHOT.jar
@bvaradar
bvaradar / gist:ec8a2af4bc06fe74a63825e2e1d6b074
Last active June 7, 2019 00:32
Hoodie CLI output using new CLI
->compactions show all
╔═════════════════════════╤═══════════╤═══════════════════════════════╗
║ Compaction Instant Time │ State │ Total FileIds to be Compacted ║
╠═════════════════════════╪═══════════╪═══════════════════════════════╣
║ 20190605181247 │ COMPLETED │ 1106 ║
╟─────────────────────────┼───────────┼───────────────────────────────╢
║ 20190605115126 │ COMPLETED │ 1204 ║
╟─────────────────────────┼───────────┼───────────────────────────────╢
║ 20190605053033 │ COMPLETED │ 1303 ║
╚═════════════════════════╧═══════════╧═══════════════════════════════╝
@bvaradar
bvaradar / Spark Stage Retry Reproducing Patch
Last active April 29, 2019 22:42
Spark Stage Retry Reproducing Patch
diff --git a/docker/compose/docker-compose_hadoop284_hive233_spark231.yml b/docker/compose/docker-compose_hadoop284_hive233_spark231.yml
index bbb9f10e..015c9e2b 100644
--- a/docker/compose/docker-compose_hadoop284_hive233_spark231.yml
+++ b/docker/compose/docker-compose_hadoop284_hive233_spark231.yml
@@ -145,6 +145,45 @@ services:
- "8081:8081"
environment:
- "SPARK_MASTER=spark://sparkmaster:7077"
+ - "SPARK_WORKER_WEBUI_PORT=8081"
+ links:
#Apply patch to add DFS properties
https://github.com/bvaradar/hudi/commit/a4f79a7ab6955503e3cca0a36876305a544991ee
Instead of Step (1) in demo
varadarb-C02SH0P1G8WL:hudi varadarb$ docker exec -it adhoc-2 /bin/bash
# Creating DFS Root Directory
root@adhoc-2:/opt#hadoop fs -mkdir -p /var/data/input_batch/