Skip to content

Instantly share code, notes, and snippets.

@braoru
Last active March 31, 2016 12:15
Show Gist options
  • Save braoru/792640c020571d0bea77169c63ecae39 to your computer and use it in GitHub Desktop.
Save braoru/792640c020571d0bea77169c63ecae39 to your computer and use it in GitHub Desktop.

Our RIAK install

On production we have tqo 5 nodes rings called ClusterA and ClusterB. THe two clustesr use replication on both directions.

Naming conv

In our internal naming micro-services are called "component" (Example : The catalog micro-services will be called the catalog component or just "catalog").

The global new architecture

Globally what we are building is a micro-services (components) based environment. Each component do only as little business task as possible. All component are run within a PAAS (Openshift and RIAK) and are scaled (number of instances) related to the load.

Depending on the component's need of data persistence, we choose different backend. Currently we are using MSSQL, Redis, Riak, InfluxDB, MongoDB & ElasticSearch.

There are several use-case when we use RIAK and this documents try to list some of them.

All these components are runing on full active-active mode on 2 active data-center (ClusterA/MTA and ClusterB/MTB).

When we need to read many key in a bucket

Sometime we need to read a lot of keys from a bucket.

What we do is :

Create all the key we needs Create a set used as an inverted index

When we need to read a lot of keys, we load the set then use informations within the set to read all the required keys.

Backends

We are using bitcask only and we have multiple bitcask configurations based on our needs.

multi_backend.bitcask_1d.storage_backend = bitcask
multi_backend.bitcask_1d.bitcask.expiry = 1d
multi_backend.bitcask_1d.bitcask.data_root = $(platform_data_dir)/data/bitcask_1d

multi_backend.bitcask_90d.storage_backend = bitcask
multi_backend.bitcask_90d.bitcask.expiry = 90d
multi_backend.bitcask_90d.bitcask.data_root = $(platform_data_dir)/data/bitcask_90d

multi_backend.bitcask_noexpire.storage_backend = bitcask
multi_backend.bitcask_noexpire.bitcask.data_root = $(platform_data_dir)/data/bitcask_noexpire

Component monitoring

As we don't know directly how many instance of each components are running on each data-center at a time T, we need a way to be able to know the states of the systems (Minimal business running requirements).

All components create a bucket of their name and periodically update it. This buckets contains sets with the last update date and some health check informations.

Monitoring tools just read information within all the buckets to know the state of each component across the two data-center.

Jobs monitoring

Some components need to run periodic jobs but, jobs need to run on only 1 instance across the two data-center. To ensure a single run, we are using a Quorum based approach. The Quorum itself is stored in MSSQL (to ensure absolute atomicity of some operation).

But, to monitor all the jobs, each components use a bucket in RIAK to store sets of informations about jobs (last executions, processing time, errors and so on).

The cache-support case

CacheSupport is a library in charge of allowing components to "share a cache".

All components instance need data in a local memory cache.

The first component instance able to run cache populating jobs (this jobs is quorum based) run it. The job read all the required database, process the data and store them (can be very huge but we are working on splinting it) as a serialized, encoded and compressed JSON within RIAK.

All other components periodically read the cache and copy all the data in memory.

Authentication

Our authentication component use Riak to store users and users's enciphered cryptographic material. Once a component or a user is authenticated, he receive a JWT token to authenticate requests on all components

Token

We use JWT token for user authentication/authorization when they access a component. We use JWT token for component authentication/authorization when they access other components.

Those tokens are signed and enciphered with a rotating pool of keys. All the key are enciphered and stored within Riak. Then, all component need to read them to open and validate tokens.

Cart

In the future we would like to store users cart within Riak

Component business monitoring

Business monitoring is our way to transfer bushiness oriented logs (ex : Bob create a 245$ order, Tom added tomato to his cart at 12:45pm and so on) from components to the BAM components.

The process goes like :

Component A want to send messages to Bam component

First component A try to send a message to BAM with an HTTPS POST directly to the BAM component.

If component A can't connect to BAM, A store messages within riak. A create a bucket for the relative minute in Riak then store events in that bucket. Periodically A will try to re-send stored messages. When all messages are send, corresponding minute buckets are deleted.

When Bam receive a message, the message is stored. Bam create a bucket for the relative minute in Riak then store events in that bucket. When the event is stored Bam respond to the POST with a 200. As said before, a set is added in the bucket to help with reading all the key. In parallel to storing the message in Riak, the message is sent to all backends (SQL and Elasticsearch). When all backends have accepted the message, it is removed from Riak.

Periodically, Bam try to read unprocessed events from Riak and send them to corresponding back-end (SQL and Elasticsearch). When the event is stored in all destination backend, we remove it from Riak.

Under normal operation, a key stays in Riak for a few milliseconds. Keys are stored for a longer time only in case of failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment