Skip to content

Instantly share code, notes, and snippets.

@balamurugana
Created September 17, 2018 11:28
Show Gist options
  • Save balamurugana/cb20e9ca5c3c443c25ae24a3220c634b to your computer and use it in GitHub Desktop.
Save balamurugana/cb20e9ca5c3c443c25ae24a3220c634b to your computer and use it in GitHub Desktop.

DataStore

A lock-free storage which supports to upload, download and delete data using Get, Put and Delete respectively. Every Put uses tmp directory as interim storage and every Delete is staged and actual removal is done once all Get are finished.

DataStore
|-- data/
|   `-- <INDEX>/
|       `-- <UUID>/
|           |-- <DATA>
|           `-- <DATA>.checksum
`-- tmp/
  • INDEX is first two bytes (Most Significant Byte) of UUID.
  • All data stored under UUID are checksummed. The checksum file format is as follows
{Single line JSON of checksum header}\n
<Block-1 checksum>\n
<Block-2 checksum>\n
<Block-3 checksum>\n
...
...
<Block-N checksum>\n

Checksum header is as follows

	type ChecksumHeader struct {
		HashName   string `json:"HashName"`
		HashKey    string `json:"haskKey"`
		HashLength int    `json:"hashLength"`
		BlockSize  int    `json:"blockSize"`
		BlockCount int    `json:"blockCount"`
		DataLength int64  `json:"dataLength"`
	}

Erasure backend

Erasure backend is a disk containing multiple local and/or remote disks. The erasure disk does Put, Get, Delete and List of data with cluster level lock (involving all disks) with minimum critical region.

               ErasureDisk
                    |
  +---------+-------+---------------+
  |         |                       |
Disk-1    Disk-2    ...    ...    Disk-N

Any local disk or remote disk layout is as follows

<DISK>/
|-- buckets/
|   `-- <BUCKET>/
|       |-- meta.json
|       `-- objects/
|           `-- <OBJECT>/
|               |-- meta.json
|               `-- meta.json.<VERSION_ID>
|-- data/
|   `-- <INDEX>/
|       `-- <UUID>/
|           |-- <DATA>
|           `-- <DATA>.checksum
|-- tmp/
`-- trans/

Put with minimum lock

  1. Upload input stream into datastore by UUID to all disks.
  2. Lock cluster level.
  3. Create object meta.json with reference to datastore.
  4. Unlock cluster level.

Get with minimum lock

  1. Read-Lock cluster level.
  2. Read object's meta.json
  3. Get data stream from datastore of all disks.
  4. Read-unlock cluster level.
  5. Erasure decode the data stream and write to the client.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment