Project: Apache Beam
Organization: The Apache Software Foundation
Student: Tanay Tummalapalli (ttanay100@gmail.com)
Mentor: Pablo Estrada (pabloem@google.com)
Project: A Python Sink for BigQuery with File Loads in Streaming
JIRA Issue: BEAM-6611
This project's aim was to add support for File Loads method of inserting data into BigQuery for streaming pipelines.
The PR#8871 added support for file loads method of writing to BigQuery in
streaming mode. It has been merged.
Since the main project finished before the estimated time, I worked on PR#9242 to harden the BigQuery file loads sink. It has been merged.
I also worked on other issues in Apache Beam during GSoC with the general area being BigQuery I/O. This work includes tests, documentation as well new PTransforms.
You can find the list of PRs I worked on here.
You can also find the Kanban board of things I worked on here.
- BEAM-8012 Optimize Streaming Inserts in BigQuery IO
- BEAM-3759 Add full support for PaneInfo descriptor in Python SDK
It was a great learning experience working on Apache Beam. I got an opportunity to work with and learn from the best engineers in the world. I like the community that this project has. I'll continue to contribute to Apache Beam even after GSoC is over.
I'd like to thank my mentor Pablo. I learned a great deal from him and am grateful for this opportunity.
I'd also like to thank the amazing Beam community for being supportive and encouraging.