Student: Kai Jiang (jiangkai@gmail.com)
Mentor: Kenneth Knowles
https://docs.google.com/spreadsheets/d/12iO0vnPWJC-SFp1dBXd_iClf2ERjewl6IRAC2Z0AzdY/edit#gid=0
- List TPC-H performances on Spark, Flink and Dataflow
- List unsupported features Beam SQL missing.
PR opened: https://github.com/apache/beam/pulls/vectorijk
TPC-H batch test suite for Beam SQL branch: https://github.com/vectorijk/beam/tree/tpch
Benchmark Beam SQL on Spark both with standalone cluster and yarn
- Compare with SparkSQL as baseline
- Beam SQL issues with Spark Runner
Benchmark Beam SQL on Flink both with standalone cluster and yarn
CI (Jenkins regression test)
I would like to thank my mentor Kenneth for this opportunity. It was such a pleasure to work with Beam Community and SQL team.