Skip to content

Instantly share code, notes, and snippets.

View sachin-j-joshi's full-sized avatar

Sachin Jayant Joshi sachin-j-joshi

View GitHub Profile
@sachin-j-joshi
sachin-j-joshi / Restore-state-from-Long-Term-Storage.md
Last active September 13, 2021 16:56
Restore state from Long Term Storage

Version 0.9.1

Long Term Storage

We are simplifying the Long-Term Storage design and laying foundation for supporting more external storage adapters and more exciting features in future.

We continue improving the new storage layout and in this release we have fixed critical issues (like 5853,5456 etc.) as well as added important improvements ( like 5698, 5460 etc.).

For more details, please refer to PDP-34 SLTS.

@sachin-j-joshi
sachin-j-joshi / SLTS-boot.md
Last active April 22, 2021 20:21
SLTS : Log Structured Snapshotting Metadata Store

Bootstrap

  1. During Segment Store startup container metadata and storage metadata table segments are added as pinned segments inside SLTS.
  2. During startup of the secondary services. Chunks for metadata and storage metadata table segments are discovered and metadata about them is populated in storage metadata segment. (This is done in order to avoid circular dependency.)

Why SLTS needs System Journal?

Turtle all the way downpaint

SLTS Storage System Segments.

SLTS stores metadata about all segments and their chunks in special metadata table segments. In this document we refer them as "Storage System Segments".

@sachin-j-joshi
sachin-j-joshi / SLTS-Garbage-Collector.md
Last active March 30, 2021 20:00
SLTS Garbage Collection Design

Background

Key Features

  • SLTS operations do not immediately delete the chunks that are not needed.
  • Instead Chunks to be deleted are are
    • First marked for deletion in metadata and that change is committed as part of SLTS operations (Eg during truncate, concat or write etc)
    • Name of the chunk is put into GC queue.
    • Each container has dedicated background thread that polls this GC queue and deletes the chunks and the associated metadata
    • The actual deletion task happens on storage thread.
@sachin-j-joshi
sachin-j-joshi / Simple-Storage-Subsystem-Design-Backup-11-12.md
Created November 12, 2019 18:45
Simple-Storage-Subsystem-Design-Backup-11-12.md

Background

This Document discusses design for Unified Chunk Management Layer which is part of a larger effort Pravega Simplified Storage Subsystem (S3).

This also covers design for PDP 34: CRWD Tier 2. https://github.com/pravega/pravega/wiki/PDP-34:-CRWD-Tier-2

Motivation

Here are some of the problems we are facing today that are motivating this design

  • Need to use complex fencing logic for HDFS making it slow and hard to maintain.
  • Impedence mismatch between current model and the way Amazon S3 works.
@sachin-j-joshi
sachin-j-joshi / pdp-38.md
Created October 30, 2019 18:27
PDP-38: Support for multi-tier storage

Summary

Motivation

Support multiple “tiers”​

  • Cloud storage tier - Amazon S3, Azure, GCP​
  • Cold storage tier​
  • Edge tier​
  • Fast/Expensive tiers with fancy hardware (E.g. Optane)

Current situation

Prerequisite

This PDP requires implementing PDP-34

@sachin-j-joshi
sachin-j-joshi / Feature-Design-Document-Template.md
Created October 29, 2019 18:52
Feature Design Document Template

Background

This Document discusses design for .

Motivation

Here are some of the problems we are facing today that are motivating this design

  • Problem 1
  • Problem 2
  • Problem 3
@sachin-j-joshi
sachin-j-joshi / Pravega-Simplified-Storage-Subsystem-(S3)-design-backup.md
Created October 14, 2019 21:02
This Document discusses design for Pravega Simplified Storage Subsystem (S3). This also covers design for PDP 34: CRWD Tier 2. https://github.com/pravega/pravega/wiki/PDP-34:-CRWD-Tier-2

Background

This Document discusses design for Pravega Simplified Storage Subsystem (S3).

This also covers design for PDP 34: CRWD Tier 2. https://github.com/pravega/pravega/wiki/PDP-34:-CRWD-Tier-2

Design Objectives

Goals

  • Support additional cloud storage options by simplifying the API contract and requirements on the tier-2 storage bindings.
@sachin-j-joshi
sachin-j-joshi / Pravega-Simplified-Storage-Subsystem-(S3)-design.md
Last active August 14, 2020 16:44
Pravega Simplified Storage Subsystem (S3)

Background

This Document discusses design for Unified Chunk Management Layer which is part of a larger effort Pravega Simplified Storage Subsystem (S3).

This also covers design for PDP 34: CRWD Tier 2. https://github.com/pravega/pravega/wiki/PDP-34:-Simplified-Tier-2

Motivation

Here are some of the problems we are facing today that are motivating this design

  • Need to use complex fencing logic for HDFS making it slow and hard to maintain.
  • Impedence mismatch between current model and the way Amazon S3 works.