Student: Sergio Esteves
Mentors: Von Gosling and Xinyu Zhou
This project aimed at designing, implementing and evaluating a plugin that integrates RocketMQ with HBase, a large-scale non-relational database. The plugin comprises two parts: (i) the HBase sink, that replicates HBase tables to RocketMQ topics, and (ii) the HBase source, which replicates RocketMQ topics to HBase tables.
The HBase sink involved creating a replication endpoint for HBase. This endpoint can track updates (put and delete operations) performed on specified tables, and replicate them to a RocketMQ topic. For this replication process, I created a RocketMQ producer, using reliable synchronous transmission, that effectively pushes the messages to a RocketMQ server.
The HBase source consisted of creating a daemon program that is continuously pulling messages, at regular time intervals, from specified RocketMQ topics and writing them to HBase tables. To pull messages from a RocketMQ server, I created a RocketMQ consumer that effectively pulls the messages from the RocketMQ topics, thereby using a broadcast message model. Finally, I created an HBase client to effectively write messages to HBase tables in batch.
This integration plugin for HBase improves RocketMQ offline storage capabilities and benefits users with stringent large-scale and data-intensive processing needs.
Repository containing all code developed for GSoC
Pull request to main (apache) repository
- add fault tolerance capabilities
It was a very interesting and enriching experience spending the summer working on a massive scale publish/subscribe message queue system such as RocketMQ. I have had the opportunity to learn about the architecture, principles, concepts, and properties that make a message system achieve low latency, high throughput, and high scalability while being reliable in the presence of arbitrary failures.
LGTM, It would be helpful if you could make a pr to external repository