brson/engine-abstraction-update.md Secret

## engine-abstraction-update.md

      
    Raw
  

              engine-abstraction-update.md
            
          
    TiKV engine abstraction status update

Hi All.
TiKV is in the process of encapsulating RocksDB in a family of generic traits,
with the intent to add support for more storage engines.
The tracking issue for this efort is
https://github.com/tikv/tikv/issues/6402

This is a status update on what has been done so far, what the plan is going
forward, and a description of how to help. Help is wanted.
Thanks to @5kbpers, @aknuds1, and @hicqu for help.
TL;DR

The engine_traits crate contains a new set of abstract key/value storage
engine traits, implemented by engine_rocks. Relatively soon, TiKV will
interact with the storage engine entirely through these traits, and have no
direct dependencies on RocksDB.
Enough work has been completed to have some confidence in the process, but
there is much more to be done.
You can help by claiming tasks on the tracking issue.
TiKV developers should read on to understand what is happening and should read
the engine_traits crate docs for details.
Current status

As of today there is a crate, engine_traits that defines a large family of
storage engine traits and and their associated types. These traits closely
mirror the design of the RocksDB engine wrappers defined in rust-rocksdb and
the current engine crate, with the primary exception that the crate has no
dependencies on any RocksDB code.
There is another crate, engine_rocks, that implements these traits for
RocksDB.
Neither are complete.
The (old) engine crate contains similar traits, with wrappers around
rust-rocksdb, but they are not isolated from the underlying engine. The first
phase of the current engine abstraction effort is to migrate callers of engine
to call engine_traits.
We have been slowly migrating TiKV to use engine_traits and engine_rocks,
learning how to do so in the process. There have been a lot of false starts
and backtracking, but today we have completed the following:

Redefined most of the APIs from engine in engine_traits

tikv/tikv#5445
tikv/tikv#5696


Migrated the sst_importer crate completely to engine_traits,
removing the concrete RocksDB dependency.

tikv/tikv#5657
tikv/tikv#5835


Migrated all Snapshot callers to engine_traits and engine_rocks

tikv/tikv#5901
tikv/tikv#6006


"Pulled up" generic Snapshots through parts of the TiKV codebase

tikv/tikv#6122


More Snapshot abstraction

tikv/tikv#6196


Detailed design

I'm not going to describe the detailed design here. Instead look at the crate
docs for engine_traits. It describes the design, the porting
process, and tips for how to use the new APIs successfully, particularly during
the transition to the new abstractions.
Note that at this time we are not attempting to redesign the storage engine
abstraction. That we will do in the future. For now we are simply trying to
eliminatet TiKV's direct dependency on RocksDB by adding an intermediate
abstraction layer.
What to expect in the future

The TiKV codebase is going to begin to carry more generic type parameters.
Essentially any code that transitively depends on the storage engine will carry
one extra type parameter, usually E: KvEngine, but sometimes over some other
trait.
The engine crate is going to quickly disappear. If you find it is missing APIs
you expect, look instead in engine_traits. For now it is possible in many
parts of TiKV to use these traits concretely through the Rocks* types in
engine_rocks, but in the future they will only be available through
bounded type parameters.
During the transition you will see .c() methods temporarily sprinkled through
the TiKV codebase. These are making conversions like from Arc<DB> to
RocksEngine, and will go away soon.
Please resist adding new dependencies on engine or rust-rocksdb. Especially
if you find yourself working on code that already using engine_traits and
engine_rocks, try to extend those instead.
How to help

We need help.
Read the engine_traits crate docs for design guidelines,
porting guidelines, and refactoring tips.
Coordinate on tikv/tikv#4184
The issue is updated to contain a checklist of tasks that definitely need to
happen, along with name of who is working on it. I do not know though in what
order they need to happen. Instead, I have documented my own efforts
here
When you decide to take a task, say so on the issue
before attempting it so that someone else doesn't duplicate your work.
Some porting principles

I know some people are eager for this work to be done so that they can
begin introducing new storage engines.
Unfortunately, I can not easily identify exactly what needs to happen beyond one
or two steps. But I can offer some direction, advice, and coordination.
In the effort so far I have consistently found that I only discovered the next
step after trying the wrong next step, often several times. This is a big
refactoring effort, so that's not surprising.
I have two guidelines that I follow to decide what to do next:

Use the crate system to forcibly break dependencies. Do this by completely
removing access to APIs that break the desired abstraction boundaries. e.g.

define the engine traits in their own crate that does not list any
concrete RocksDB dependencies in the manifest, so it is impossible to
break the abstraction.
e.g. when porting the sst_importer crate, completely remove any concrete RocksDB
dependencies, so that once the work is done it can't be accidentally undone.
e.g. when abstracting Snapshot, migrate all callers at once and delete the
old Snapshot API, so that it can't be reintroduced.


Never duplicate code. When code is duplicated in an active codebase, one of
those duplications will end up wrong. We migrate an entire subsystem at a time
without leaving the possibility of accessing the feature any way but through
the abstraction.

Based on what I've learned, the port is happening in several phases:
1) Migrating the `engine` abstractions
2) Eliminating direct-use of `rocksdb` re-exports
3) "Pulling up" the generic abstractions though TiKV
4) Isolating test cases from RocksDB

These are described in the engine_traits crate docs. Those
that would like to contribute should read the link.
These phases need to happen more-or-less in sequence, but can be partially
completed in parallel. For example, now that Snapshot has been migrated from
engine to engine_traits, with TiKV depending on concrete RocksSnapshot
types, efforts can begin to "pull up" generic type parameters through TiKV for
code that uses snapshots, eliminating the dependency on the concrete
RocksSnapshot types.
Obstacles encountered so far

Almost every patch I begin turns up other work that must be done first. Every
patch ends up being a process of hacking, stashing, hacking on a prerequisite,
stashing, hacking on a prerequisite of a prerequisite, etc. No patch so far has
ended with me actually accomplishing what I set out to. I've thrown away a lot
of code.
Associated types can't contain lifetimes. This requires "generic associated
types", which is not implemented in Rust. To compensate, some abstractions need
to be modified so that they carry reference counted pointers in order to hold
resources open.
I have recently started the generics "pull up" phase, that is, abstracting TiKV
over engines. In the process I have run into a new problem that needs to be
solved soon. And that has to do with an inability specificy that two different
associated types must be the same type. engine_traits is made up of a family
of traits, connected through associated types. Sometimes these associated types
identify the same type. e.g. Snapshot::KvEngine and
SstWriterBuilder::KvEngine. Sometimes callers need to know that these two
associated types are the same type in order to compile, but Rust can't express
this today.
Other observations

The engine_traits crate makes it quite easy to understand the
complex API surface of TiKV's storage engine, since it is mostly API,
little implementation, all in one place. This should be a boon for
understanding and contributing to TiKV.