Skip to content

Instantly share code, notes, and snippets.

@cherbel
Last active August 23, 2021 16:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cherbel/44c0f2ab58ee455bd7921ef5554c575d to your computer and use it in GitHub Desktop.
Save cherbel/44c0f2ab58ee455bd7921ef5554c575d to your computer and use it in GitHub Desktop.

Adding Dynamic Partitioning for Interval Joins to AsterixDB (GSoC 2021)

  • By Caleb Herbel

  • Mentors: Preston Carman, and Tin Vu

Project Description

The goal of this project was to add dynamic partitioning for interval joins in Asterix Database (AsterixDB). Currently, interval joins in AsterixDB only support a static range hint that must be supplied in the query. A static range hint means that a user needs to preprocess their data to determine split points that evenly distribute the data among partitions. Adding support for a dynamic range hint means that the range hint is calculated at runtime––making it easier to quickly run an interval join query. This project creates a new optimized query plan for dynamic interval joins and adds the supporting code for that plan.

Current State

This project is not currently in its final state. The new query plan is optimizing correctly and has been completed. The supporting code for that query plan has been implemented, which includes updates to interval merge join to allow for dynamic access to the range hint, but there are a few bugs that I was not able to work out before I ran out of time. The biggest issue is currently the final step in creating the range map, which is found in AbstractUnionIntervalRangeMapAggregateFunction.java; the correct information is being supplied to this function, but I was having trouble writing the range map to bytes. Since this function is completing, and this function is towards the end of the query plan––I imagine that once this issue is solved the range map will make it to the join. If not, mostly everything going forward should be minor bug fixes. The second issue that may have to be addressed is in the join framework; it's possible that the Allen's relations factories might not be able to access the range map dynamically. This is an issue that I did not expect, and one that might require work that I probably wouldn't have had time for this summer either way. Lastly, if this code makes it to code review, the query plan will have to be reviewed and possibly changed before its final submission.

Future Work

I laid out a few plans for this project above, but in the future, it would be nice to see dynamic partitioning for interval joins using data sampling rather than calculating split points based on the overall interval range.

As far as my future plans for this project, after spending 600+ hours over various projects in AsterixDB and committing over 13,000+ lines of code, I think my time working on this project has come to a close.

Project Challenges

My challenges with this project turned out to be quite different than any of the issues I've experienced in Asterix before. I had a lot of issues with the query plan. Since this was a new area of AsterixDB for me, it took me a while to start grasping how it worked and because of this I had a lot of issues trying to debug it. Along with only working on code part-time, and having a more difficult summer outside of GSOC, I ended up grasping the material later than I had hoped. Ultimately, all of these things ended up putting me behind my original schedule. Even so, I think if I had another week––with help, I think I could have wrapped up most of the outstanding issues that I know exist.

My Commits and Repositories

My Repository

Optimized Interval Join Commit

See Commit Message for Details

Thanks

Thanks to Dr. Carman and Tin Vu for meeting with me every week and giving me lots of useful feedback on code, advice, and direction. Our weekly meetings were very helpful to me; they helped me become better at presenting my problems, presenting the code that I had worked on, and they helped me to become better at coding in general.

Thanks to Tin for always swiftly answering my questions, and being readily available if I needed him.

Thanks to Preston Carman for meeting with me whenever I needed help outside of our weekly meetings when I got stuck, or when I needed advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment