DallasC/TiDB 2020 Hackathon RFC.md Secret

## TiDB 2020 Hackathon RFC.md

      
    Raw
  

              TiDB 2020 Hackathon RFC.md
            
          
    Proposal: TiAi


Author(s):     Dallas Coble
Last updated:  2020/12/24
Discussion at: https://gist.github.com/DallasC/

Abstract

Develop a Machine learning model using Deep Reinforcement Learning(DRL) to tune TiDB(/TiKV/RocksDB) to decrease latency and increase throughput.
Background

Database tuning is a hard problem and even experienced DBA's can have trouble getting optimal performance. This difficulty increases even more with a relatively new db like TiDB since DBA's don't have decades of experience to build on like with more traditional SQL database.
Large companies like Huawei, Oracle, Google, Microsoft, etc. have implemented AI systems to help automatically tune database deployments. TiDB has also done some experimenting in this area with auto-tikv which is based on ottertune that uses some ML techniques like Gaussian Process and clustering to learn tuning settings.
The large companies techniques are based on DRL which have demonstrated better performance when compared with Ottertune and custom tuning done by DBA's. This proposal aims to bring these same cutting edge machine learning performance benefits to TiDB's auto tuning offerings.
Proposal

Use DRL based on QTune's query-aware approach to develop a system that can automatically tune TiDB's configuration settings.
First we featurize the SQL queries to gather information about the queries including query type, tables, and query cost. Then we feed the query features into the DRL model to dynamically choose suitable configurations. For the DRL model we implement a Double-State Deep Deterministic Policy Gradient(DS-DDPG) using the actor-critic networks. The DS-DDPG model can automatically solve the tuning problem by learning the actor-critic policy according to both the database states and query information.
The overall plan can be broken down into 5 main components.
1. Query2Vec
Vectorize SQL request information so our models can be trained with it

Vectorize query information

Based on the SQL query capture the operation(insert/select/update/delete) and the table(s) to be queried and store them in a 4+t dimensional vector where t is the number of tables. These values can be 0 if not used or 1 if in use.
Vectorize cost information

To get query costs we can use the query plan generated by the db engine and the sum the cost estimation of operation then normalize the cost by subtracting mean and dividing the std deviation. We maintain a p dimensional vector where p is the number of operations available.
Create overall query vector

To do this we simply have to concatenate the query info vector and the cost vector.
Support multiple query vectors

For each query vector we get the union of the query vectors. For each table that is used, if it contains 1 we replace it with the row number of the table. For cost vector, we need to sum up all the costs.

Setup Environment
Get metrics and stats from the database to feed into our models.

Collect inner-state metrics. These are the configuration settings which can be tuned like cache size, memory, etc.
Collect outer state metrics. These are the performance indicators like nummber of transactions, deadlocks, etc.
Execute a workload and compute a reward based on performance

2. Train the Predictor
The Predictor Aims to predict the performance metrics change if processing a query in the database

Get training data


A set of tuples T={〈v,S,I,∆S〉}

v is a vector of a query
S is the outer metrics
I is the inner state
∆S is the outer metrics change when processing v


For each〈v,S,I〉, we train Predictor to output a value that is close to ∆S.
Training data is gotten for each query q by:

We first use Query2Vector to  generate v and  obtain  metrics S, I from the db.
Run q in the database and record the metrics change ∆S


Train Predictor which is composed of four fully connected layers:


input layer: accepts  the  feature  vector and outputs the mapped tensor to the hidden layers
2 hidden layers:  a series of non-linear data  transformations.
Output layer: restricts the tensor to the scale of the database state and generates a vector representing the predicted db metric changes.
Use ReLU for the hidden layers, the weights in the network are initialized by the standard normal distribution.
Use the adam algorithm to train the Predictor

Sudo Code:
TrainPredictor(W,TP)
// W: The weights of a neural network;
// TP: The training set
while (!converged) do
    foreach (v,S,I,∆S in TP) do 
        Generate the output G of〈v,S,I〉;
        Accumulate the backward propagation error: E=E+(.5*|G−∆S|^2);
    Compute gradient(E), update weights W

Training the Actor-Critic Module
The Actor-Critic module aims to tune the database configurations.

Get training data
Given a query workload, we randomly select a subset of queries and generate a sample workload.
Generate its feature vector via Query2Vector
Predict a database metrics S via the Predictor
Get an action A via the Actor
Deploy the actions in the databases
Run the database to get a reward R
Get a new data base metric S2 by updating S the new metrics
Repeat the above steps to get A2 and R2
Iteratively, we get a set of triples〈T1A= (S′1,A1,R1),(S′2,A2,R2),...,(S′t,At,Rt)〉until the average reward value is good enough (e.g., the average reward of ten runs is larger than 10.)
Training Actor-Critic model
Use the experience replay method to train the actor and critic with a training data set〈T1A= (S′1,A1,R1),(S′2,A2,R2),...,(S′t,At,Rt)〉(from above)
Use (S,A,R) and update the Actor and Critic as follows.
(1) We update the actor policy A using the gradient value ∇A=∇AiQ(S′i,Ai|πC)·∇θπAπA(S′i|θπA)
- θ is the parameters in A
- Q(S′i,Ai|πC) is the Q-value computed by Critic.
(2) We estimate the real action-value Y using the Bellman function to compute Y based on the reward and Q-value
- `Y=R+τ·Q(S+1,A(S+1|θπA)|πC)
- τ is a tuning factor to tradeoff the Q-value and the reward value
(3) We calculate the loss value L with Q and Y. Critic updates the weights in C by minimizing the loss value L= (Q(S′i,Ai|πC)−Yi)^2
We run the three steps for i=t−1 to i= 1. The algorithm terminates if the model is converged (e.g.,the performance improvement is smaller than a threshold); otherwise we select next training data T2A,T3A,···.
- To improve the stability of training, we can  introduce  two  extra  target  actor  and  critic  networks(whose  policies  are A and C respectively). These two networks are updated at every step and their weights (parameters of the policy) are updated slower than the normal networks. Then the weights in the normal critic network are updated by minimizing loss compared with the target.

Sudo Code:
TrainAgent(A,C,TA)
// A: The actor’s policy;
// C: The critic’s policy;
// TA:  training data
while (!converged) do
    Get a training data
    for (i=t−1 to 1) do 
        Update the weights in A;
        Estimate an action-value;
        Update the weights in C by minimizing the loss value

Overall
Given a request (a query or a query work-load),Query2Vec generates a vector based on the sql. Predictor utilizes the feature vector and generates a predicted state change ∆S. We pass the state change to the db and generates metrics S. Actor takes  the  metrics as  input, and outputs an action (a vector of suggested knob values). We deploy the new configurations in the db and executes the query.
This allows for tuning to optimize the latency (query-level) or throughput(workload-level). Although we can optimize for one at the cost of the other it has been shown that the one not being optimized for still outperforms other ML models like CDBTune, and Ottertune. There is a method to optimize for both by adding clustering to the queries. This adds a couple extra steps where we take the query vector and pass it through a another model that recognizes patterns then through clustering algorithms and train the model based on the clusters. This would be useful for things like analytical queries. This extra cluster level model can be added later for future work.
TrainDS-DDPG()
    Generate training dataTP;
    TrainPredictor(W,TP);
    Generate training dataTA;
    TrainAgent(A,C,TA);

Rationale

There are a number of studies on automatic knob tuning ranging from using heuristics to machine learning to more recently a number of studies using deep learning. like, BestConfig, OtterTune, and CDBTune.
Heuristic based approaches like BestConfig use a heuristic method to search for the optimal configuration from the history but comes with a big disadvantage in that it may not find good knob values if there is no similar configurations in the historic data. These perform better than default settings but often can't compete with experienced DBAs
OtterTune (which auto-Tikv is based on) utilizes machine learning techniques analyze configurations. This relies on a large  number of high-quality training examples from DBAs’ experience data, which can be hard to obtain. Ottertune also per
CDBTune uses deep reinforcement learning (DRL) with an actor-critic network similar to the proposal but only takes into account metrics and doesn't account for query information which can have a large effect on the tuning environment.
The draw back is each one takes more development effort than the previous. Heuristic based approaches are relatively straight forward to implement -> Ottertune adds more complexity and steps into using more modern machine learning techniques -> CDBTune takes it a step further and is based on leading edge machine learning research -> Qtune( what the proposal is based on) takes it a step further and adds more specific domain knowledge on top of cutting edge DRL techniques.
Testing Plan

Workload: Use different query workloads such as JOB5, Sysbench7, etc.

Measure: Use latency and throughput to evaluate the performance.
Compare: Compare with different approaches i.e. default settings, auto-tikv, (maybe manual DBA tuning if can find someone?)
Open issues (if applicable)

N/A