Skip to content

Instantly share code, notes, and snippets.

@vkmc
Last active August 29, 2015 13:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vkmc/5b80c7e1332d8f2c7f81 to your computer and use it in GitHub Desktop.
Save vkmc/5b80c7e1332d8f2c7f81 to your computer and use it in GitHub Desktop.
"Add a new storage backend to the OpenStack Message Queuing Service" proposal for Google Summer of Code 2014.
Student

Victoria Martínez de la Cruz

Mentor

Alejandro Cabrera

Project

Add a new storage backend to the OpenStack Message Queuing Service

Table of Contents

Abstract

Marconi has support for two different storage engines, MongoDB and SQLAlchemy. While these are great storage options, we just cannot settle with only that.

Different user needs would require different storage solutions, and being one step ahead by providing support for them is an expected feature from a software solution like Marconi.

The main goal of this proposal is to identify a storage engine that fits Marconi's interests and to implement a driver for it.

Why we want to add a new storage driver

The current Marconi version has support for two different storage engines, MongoDB and SQLAlchemy. While these are great database engine options, we just cannot settle with only that.

Different applications would require different storage solutions, and being one step ahead by providing support for them is an expected feature from a software solution like Marconi.

From the project development point of view, it would be a great addition to be used by the different queue flavors support1 that will be implemented on a future release.

Besides, having more drivers will allow the Marconi community to tease out what parts of the API are hard to map.

This feature is of vital importance for the project and is being addressed by several blueprints, including:

Goals

The main goal of this proposal is to identify a storage engine that fits Marconi's interests and to implement a driver for it, following the style of existing drivers to keep the consistency across the project.

To do this we need to do some research about the current storage engines available in the market. perform an analysis of their main features, and select one according to Marconi's goals.

The selection of the storage engine can be made considering different aspects, like performance, security, ease of use or utility.

Then we can start the development stage by writing the code for the different components of the driver, the corresponding unit tests and the associated documentation in parallel.

Benefits

Adding a new storage backend has many benefits. Some of them are:

  • Efficiency and performance. It will be possible to deliver features needed for particular applications with a service that better fits their needs. Considering the requirements these applications may have and providing storage support for them can have a great impact on overall system efficiency and performance.
  • Compatibility. Providing support for more storage alternatives will enhance the compatibility with other services, therefore simplifying the deployment.
  • Scalability. It will be feasible to switch between storage solutions if the applications requirements change, improving the scalability as well.

Selecting an storage engine

Differences between databases continue to increase with the availability of many SQL, NewSQL and NoSQL alternatives. Choosing the appropriate one for a project right at the beginning is an important task.

There are several storage engines available we could take into account. Here are some possibilities classified by their purpose and popularity.

  • OpenStack Swift - Designed to withstand hardware failures without any downtime. It is great for plenty of different applications, from small deployments for storing VM images, to mission critical storage clusters for high-volume websites. Implementing a driver for it has an added value since it would provide a better integration with OpenStack.
  • RethinkDB - It scale to multiple machines with very little effort. Sharding and replication operations can be done in a few simple steps. It's easy to set up and learn, and it has an intuitive query language that support many useful queries like table joins, groupings, and aggregations. It is a good choice for applications that need flexible schemas and scalability. It's also great if you value ease of use and you want cluster administration to be an easier task.
  • Riak - Focused on fault tolerance. Provides high availability and scalability. It is a great solution for applications where even seconds of downtime are unbearable.
  • CouchDB - Its main goal is consistency and ease of use. It is usually implemented in applications where versioning is important.

Not widely known but with great features storage engines

  • Couchbase - Memcache compatible, but with persistence and clustering. It is suitable for any application where low-latency data access, high concurrency support and high availability are required.
  • VoltDB - It is an in-memory database intended for fast transactions and rapidly changing data. Great for applications where you need to act fast on massive amounts of incoming data.

Special purpose storage engines

  • ElasticSearch - Provides scalable search, has near real-time search, and supports multitenancy. Works like a charm when the applications deals with objects with (flexible) fields, and an advanced search functionality is needed.
  • Neo4j - It stores data structured in graphs rather than in tables. It is ideal for storing interconnected data.

Implementing the driver

Given that I have MongoDB and SQLAlchemy drivers as reference models, I have a good idea of the modules that have to be implemented. For this reason I will proceed with the implementation by following a bottom-up approach.

I will start working with queues, messages and claims controllers, continue with catalogue and shards controllers and finish with the integration of all of the implemented controllers in the driver. In the following section I list all the methods that have to be implemented - based on an inspection of current storage backends -.

It is important to write several unit tests in parallel to make sure that the code works as expected and that the implementation gets done on time. I also consider important to add detailed documentation to each controller in this stage to make the code more legible and, because of that, maintainable.

Schedule and milestones

Before start coding I want to become more familiar with the Marconi team and their workflow. To do this, I plan to:

  • Discuss with the Marconi community the selection of the storage engine. I will probably bring up this topic during the weekly team meeting.
  • Inspect currently available storage drivers in Marconi - MongoDB and SQLAlchemy. This will give me a better background to start coding the driver for the selected storage engine.
  • Write new tests and improve existing tests. Since I don't have much experience working with the testing tools used in Marconi, I would like to gain some insight in this area by performing testing tasks. This way, I would contribute with some tests while making more experience in testing.

Then I will start the implementation following this dependency chain:

  • queues > messages > claims
  • catalogue > shards

By this time, I expect that most of the driver functionality will be complete.

After that, I would like to spend the last weeks of the internship performing more testing tasks and improving the existing documentation - done in parallel with the implementation -.

It's important to mention that the first three weeks of the internship I will have exams at college and I won't be able to work full time (I estimate 75% of full speed). I will make up for the missing hours by working during weekends. Usually exams at my college are separated by several days, so I will let my mentor know in advance when I will be away and when I will make up those hours.

Select a storage engine (first milestone) (2 weeks)

Research about existing storage engines (0.5 weeks)

  • Learn development internals about the current storage engines supported by Marconi.
  • Look for storage engines available and make a pre-selection.
  • Compare the selected storage engines and analyze which may be the best according to the Marconi goals for the next release.

Discuss with Marconi team members the available possibilities (1 day)

  • During the weekly team meeting, request feedback from the Marconi team on the storage engines I had pre-selected.
  • Consider their opinions and my personal ideas to select the storage engine to implement.

Inspect storage drivers currently available in Marconi (2 days)

Get familiar with the testing tools used by Marconi and contribute with some tests (1 week)

  • Learn about the testing tools used in Marconi.
  • Contribute with the creation of additional tests and with tests enhancements. There is a lot to do in this area, e.g. bug 1251740 and bug 1254158.

Implement the driver for the selected storage engine (second milestone) (6.5 weeks)

Set up the repository and stub out the driver (0.5 weeks)

  • Create a new Github project.
  • List Marconi as a dependency.
  • Stub out the driver implementation. This will include the following files,
    • driver.py - Manages connection with the DB
    • options.py - Storage backend driver configuration options
    • queues.py - Queues controller
    • messages.py - Messages controller
    • claims.py - Claims controller
    • catalogue.py - Queues catalogue controller
    • shards.py - Shards management storage controller

Implement driver connection to storage backend and configuration options (0.5 weeks)

  • driver.py
    • connection(conf)
    • is_alive(self)
  • options.py
    • _config_options()

Implement queues, messages and claims. Write unit tests and basic documentation (3 weeks)

  • queues.py
    • create(self, name, project=None)
    • delete(self, name, project=None) - Delete all messages/claims in queue
    • exists(self, name, project=None)
    • stats(self, name, project=None)
    • get_metadata(self, name, project=None)
    • set_metadata(self, name, metadata, project=None)
    • list(self, project=None, marker=None, limit=storage.DEFAULT_QUEUES_PER_PAGE, detailed=False)
  • messages.py
    • get(self, queue_name, message_id, project=None)
    • bulk_get(self, queue_name, message_ids, project=None)
    • post(self, queue_name, messages, client_uuid, project=None)
    • delete(self, queue_name, message_id, project=None, claim=None)
    • bulk_delete(self, queue_name, message_ids, project=None)
    • first(self, queue_name, project=None, sort=1)
    • _claimed(self, queue_name, claim_id, expires=None, limit=None, project=None)
    • _unclaim(self, queue_name, claim_id, project=None)
    • _active(self, queue_name, marker=None, echo=False, client_uuid=None, fields=None, project=None, limit=None)
    • _count(self, queue_name, project=None, include_claimed=False)
    • list(self, queue_name, project=None, marker=None, limit=storage.DEFAULT_MESSAGES_PER_PAGE, echo=False, client_uuid=None, include_claimed=False)~
  • claims.py
    • get(self, queue, claim_id, project=None)
    • create(self, queue, metadata, project=None, limit=storage.DEFAULT_MESSAGES_PER_CLAIM)
    • update(self, queue, claim_id, metadata, project=None)
    • delete(self, queue, claim_id, project=None)

Implement catalogue and shards. Write unit tests and basic documentation (2 weeks)

  • catalogue.py
    • insert(self, project, queue, shard)
    • delete(self, project, queue)
    • update(self, project, queue, shard=None)
    • get(self, project, queue)
    • exists(self, project, queue)
    • list(self, project)
    • drop_all(self)
  • shards.py
    • create(self, name, weight, uri, options=None)
    • delete(self, name)
    • update(self, name, **kwargs)
    • get(self, name, detailed=False)
    • exists(self, name)
    • list(self, marker=None, limit=10, detailed=False)
    • drop_all(self)

Implement driver support for the added controllers. Write unit tests and basic documentation (0.5 week)

  • driver.py
    • queues_database(self)
    • queue_controller(self)
    • message_databases(self)
    • message_controller(self)
    • claim_controller(self)
    • shards_controller(self)
    • catalogue_controller(self)

Run additional tests and enhance technical documentation (third milestone) (2.5 weeks)

Run additional tests (1.5 weeks)

  • Create and run additional tests to assess the nuances of the added backend.
  • Reuse currently available functional tests to run against the added storage engine.

Create an user manual and improve technical documentation in general (1 week)

  • Create a brief user manual using the documentation added during the implementation stage and all details that could be missing.
  • Update and extend, if necessary, the technical documentation added during the implementation stage.

If time permits...

It would be nice to provide more extensive documentation for the added storage engine. I could include:

  • Detailed features provided by the storage engine and a brief comparative with similar solutions
  • Benchmarks comparing the added backend with existing backends (e.g. MongoDB)
  • Examples demonstrating where the added backend would be more effective

Also, it would be great to create a package and make it available on PyPI.

About me

I'm Victoria Martínez de la Cruz, a Licentiate in Computer Sciences student at Universidad Nacional del Sur in Bahía Blanca, Buenos Aires, Argentina. From January to April 2013 I worked as an OpenStack intern as part of the Outreach Program for Women (OPW) .

In college, among other things, I learned about computer architectures, operative system internals, good practices in software development and, of course, several programming languages including Python, Java and C.

During my internship in OpenStack, I was able to apply my knowledge and got some experience working with Python. I also had the chance to learn about tools and practices used in real world software development organizations, which I enjoyed sharing in my personal blog. Some of my most visited posts are In a nutshell: How OpenStack works, Logging and debugging in OpenStack and Getting started with Marconi, the message queue for OpenStack.

I have been looking forward for start contributing to an open-source organization from many years now - I was 13 years old when I heard about and became interested in the open-source philosophy - and thanks to the OPW I got in touch with the OpenStack community.

Since last year, I have been contributing mainly to OpenStack Dashboard (Horizon) and OpenStack Internationalization.

The Marconi project caught my attention from its beginning but due to time constraints, mainly because of college duties, I haven't been able to make stronger contributions. I'm really excited to have this opportunity to finally contribute to Marconi with a significant feature and be able to blog about that so new contributors can join our effort in the near future.

For more details about my contributions to OpenStack, you can visit my personal blog and also check a list of recent contributions in Gerrit.

You can reach me on IRC at irc.freenode.org in #openstack, #openstack-horizon and #openstack-marconi - my IRC handle is vkmc -, or through my personal email at victoria@vmartinezdelacruz.com.


  1. Users will be able to modify queues options like durability, persistence, ordering, access and storage backend depending on their use case

@MatheMatrix
Copy link

A nice proposal, good luck!

@vkmc
Copy link
Author

vkmc commented Mar 31, 2014

Thanks Damon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment