This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.hudi.client.validator.SparkPreCommitValidator | |
import org.apache.spark.sql.Dataset | |
import org.apache.spark.sql.Row | |
class CustomPreCommitValidator extends SparkPreCommitValidator { | |
override def validateRecordsBeforeAndAfter(before: Dataset[Row], | |
after: Dataset[Row], partitionsAffected: Set[String]): Unit = { | |
// Custom validation logic | |
// Perform data quality checks, apply business rules, etc. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE TABLE demo ( | |
id int, | |
name string, | |
email string, | |
phoneNumber string, | |
ts timestamp | |
) | |
USING hudi | |
OPTIONS ( | |
primaryKey = "id", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- INSERT statements | |
INSERT INTO demo VALUES (1, 'TestName1', CURRENT_TIMESTAMP); | |
INSERT INTO demo VALUES (2, NULL, CURRENT_TIMESTAMP), (3, 'TestName1', CURRENT_TIMESTAMP); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- CREATE TABLE statement | |
CREATE TABLE demo ( | |
id int, | |
name string, | |
ts timestamp | |
) | |
USING hudi | |
OPTIONS ( | |
primaryKey = "id", | |
preCombineField = "ts", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
spark.write.format("hudi").option("hoodie.precommit.validators", | |
"org.apache.hudi.client.validator.ValidatorClass1, | |
org.apache.hudi.client.validator.ValidatorClass2").save("path/to/data") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
spark.sql("select marketplace, product_id, count(*), | |
avg(star_rating) from <TABLE> where product_category = 'Books' | |
group by 1,2 having avg(star_rating) >= 4 order by 3 desc,4 desc").show(20) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select * from tableA where customer='xyz' and dt between '2022-12-01' AND '2023-01-31' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
spark.sql("select marketplace, product_id, count(*), | |
avg(star_rating) from <TABLE> where product_category = 'Books' | |
group by 1,2 having avg(star_rating) >= 4 order by 3 desc,4 desc").show(20) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
set sql-client.execution.result-mode=tableau; | |
-- create the datagen table | |
CREATE TABLE sourceT ( | |
uuid varchar(20), | |
name varchar(10), | |
age int, | |
ts timestamp(3) | |
) WITH ( | |
'connector' = 'datagen', |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
execution.checkpointing.interval: 30s | |
execution.checkpointing.mode: EXACTLY_ONCE | |
state.backend: rocksdb | |
state.checkpoints.dir: file:///${work_path}/hudi-demo/ckps | |
state.checkpoints.num-retained: 10 | |
state.backend.incremental: true |
NewerOlder