Skip to content

Instantly share code, notes, and snippets.

@csabahenk
Created September 29, 2013 14:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save csabahenk/6752787 to your computer and use it in GitHub Desktop.
Save csabahenk/6752787 to your computer and use it in GitHub Desktop.
Multiple accounts per volume for Gluster for Swift

Multiple accounts per volume for Gluster for Swift

authors: Csaba Henk / csaba at redhat dot com, Ramana Raja / rraja at redhat dot com

Problem Statement

At present, Gluster for Swift (G4S) allows only one account to reside in a Gluster volume. An account maps to a single Gluster volume, i.e., the root directory of the Gluster volume mount point serves as the account. ( /mnt/gluster-object/$vol_name ).

G4S needs to allow multiple accounts to reside in the same Gluster volume. To enable such a feature, an account would need to be implemented as a subdirectory (child) of a Gluster volume mount point's root directory ( /mnt/gluster-object/$vol_name/$acc_name ).

Proposed Design and Implementation

Store the account name and volume name (Gluster volume) in the ring files

The ring files "map the names of entities stored on disk and their physical location"¹. At present only the volume name is stored.

One of the fields of the ring data structure is a list of devices in the cluster. An element of the list of devices is a dictionary, which helps identify the drive where the data actually resides. The 'device' key of the dictionary corresponds to the "on disk name name of the device on the server"². G4S currently sets the 'device' key to the Gluster volume name. The idea is to use the dictionary to also store the account name. This would be done by setting the pre-existing but currently unused 'meta' key³ of the dictionary to the account name.

The above would be done by modifying the gluster-swift-gen-builders script, which builds the ring files for G4S. The script would now take both volume names and the accounts (to be created in the volumes) as input parameters.

E.g. pass the volume names and account names to the ring builder script as follows,

# gluster-swift-gen-builders acc1[:vol1] acc2[:vol2] ...

whereby

  • the volume part of the argument (after the colon) is optional – if not given is assumed to be the same as the account name (this way providing backward compatibility for the cli UI);
  • we enforce the account names to be pairwise distinct (as the account → volume mapping should be well defined).

In the gluster-swift-gen-builders script the devices would be added to the cluster using the following command,

swift-ring-builders <builder_file> add [--region <region>] --zone <zone> --ip <ip> \
 --port  <port> --replication-ip <r_ip> --replication-port <r_port> \
 --device <GlusterFS volume name> --meta <account name> --weight <weight>

The present gluster-swift-gen-builders script creates new builder files each time it's run. This means that every time G4S users want to add a device to a cluster they would also have to pass the previously existing devices in the cluster as command line arguments for the script. So the users are encumbered with the task of finding the existing devices. The task can currently be done fairly easily, by listing the Gluster volumes. But it should be noted that this procedure fails to yield the correct list when all the Gluster volumes of the users aren't configured to be Swift devices.

However, after separating the volumes and the accounts the exact configuration would be stored only in the rings files. We can't require users to remember the account:volume list by themselves or to unpack the ring files to extract the list. So a listing command is required, tentatively named gluster-swift-list-accounts. It would:

  • extract the volume/account pairs from each ring file ( /etc/swift/{account,container,object}.ring.gz )
  • consistency check 1: check whether the volume and the account exist for each node in the ring files
  • consistency check 2: check whether the volume/account pair lists of the ring files are identical
  • if the above checks fail, err out by suggesting "ring files are corrupt or Swift deployment is not Gluster based"
  • if the checks pass, print out the pairs in account:volume ... format

The users could now add accounts in the following form:

acc2v_old=`gluster-swift-list-accounts` && gluster-swift-gen-builders $acc2v_old newacc1:vol1

This would still be suboptimal, as proper, fault-tolerant account-adding should be built into gluster-swift-gen-builders; however, we refrain from making substantial changes to the current form of the script, as we think it should be rewritten in Python before trying to enhance it.

Modify the account → device (volume) lookup

  • The REST client makes a request of the form /account[/container[/object]].
  • The request is intercepted by the swift proxy server which looks up the device (G4S: volume) corresponding to this URL and passes it on to the appropriate internal server (account / containter / object). In details:
  1. the main Swift routine that takes care of routing is swift.proxy.controllers.base.Controller.GETorHEAD_base
  2. it fetches the backend parameters from the ring in the form of node dicts via the iter_nodes method (in case of G4S there is just a single node)
  3. that is passed down to the http_connect method which sends the request to the appropriate internal server using the /device/partition/account[/container[/object]] address format.

In case of G4S, 2. is monkey-patched to use gluster.swift.common.ring.Ring._get_part_nodes method which currently looks up the node whose 'device' value is the sames as the account requested. We would instead look up the node whose 'meta' matches account, and thus its 'device' would become an independent specifier for the Gluster volume of the storage backend.

Modify the storage backend layout and the internal REST API → path mapping

G4S monkey-patches the internal account/container/object server's GET method by routines that map the /device/partition/account[/container[/object]] address to filesystem paths according to the layout used in the Gluster storage node.

The internal servers instantiate G4S specific classes – respectively,

whereby the first two are application-specific customization of a common base class, gluster.swift.common.DiskCommon.

These classes implement the request → filesystem path mapping and interact with the local filesystem. All the classes are initiated with the components of the REST request that include the two initiation parameters, device and account. But the account is silently ignored as, according to the current model, it does not carry additional information.

As discussed above, this will not be the case anymore, account will be independent information. In this context, it will present a new layer in the path hierarchy. The disk utility classes should also consider account and perform path manipulations accordingly. In particular,

Design Concerns

  • How do we make sure that a user cannot accidentally or intentionally access an another user's account, i.e., access other directories in the Gluster volume?

  • How can we limit the storage usage of an account/user? Maybe we can use Gluster-quota's CLI to enforce storage limit for an account/user?

  • How do we delete accounts?

  • Can we allow the previous users of G4S to smoothly upgrade to the revised G4S that'd allow multiple accounts to reside in a Gluster volume?

Swift Documentation, "Swift Architectural Overview"

Swift Documentation, "The Rings"

ibid.

^⁴ Being able to list the account is a desired introspection feature, regardless of the account adding problem. Therefore gluster-swift-list-accounts is not just a stop-gap measure until proper integrated account addition is implemented, but a useful component on its own.

^⁵ OpenStack Object Storage API v1 Reference: API for accounts, containers, objects.

^⁶ also the partition number, but in case of G4S that's a synthetic dummy value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment