Skip to content

Instantly share code, notes, and snippets.

@ttanay

ttanay/GSoC.md

Last active Jan 20, 2020
Embed
What would you like to do?
Wrap-up Report for Google Summer of Code '19

Google Summer of Code '19 Report

Project Details

Project: Apache Beam
Organization: The Apache Software Foundation

Student: Tanay Tummalapalli (ttanay100@gmail.com)
Mentor: Pablo Estrada (pabloem@google.com)

Project: A Python Sink for BigQuery with File Loads in Streaming
JIRA Issue: BEAM-6611

This project's aim was to add support for File Loads method of inserting data into BigQuery for streaming pipelines.

Progress

The PR#8871 added support for file loads method of writing to BigQuery in streaming mode. It has been merged.
Since the main project finished before the estimated time, I worked on PR#9242 to harden the BigQuery file loads sink. It has been merged.

I also worked on other issues in Apache Beam during GSoC with the general area being BigQuery I/O. This work includes tests, documentation as well new PTransforms.
You can find the list of PRs I worked on here.

You can also find the Kanban board of things I worked on here.

Future Work

  • BEAM-8012 Optimize Streaming Inserts in BigQuery IO
  • BEAM-3759 Add full support for PaneInfo descriptor in Python SDK

Conclusion

It was a great learning experience working on Apache Beam. I got an opportunity to work with and learn from the best engineers in the world. I like the community that this project has. I'll continue to contribute to Apache Beam even after GSoC is over.

I'd like to thank my mentor Pablo. I learned a great deal from him and am grateful for this opportunity.
I'd also like to thank the amazing Beam community for being supportive and encouraging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment