Skip to content

Instantly share code, notes, and snippets.

@DallasC
Created December 28, 2020 02:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DallasC/08c5b5861fe660b5a2a1ca418c2442af to your computer and use it in GitHub Desktop.
Save DallasC/08c5b5861fe660b5a2a1ca418c2442af to your computer and use it in GitHub Desktop.

Proposal: TiAi

Abstract

Develop a Machine learning model using Deep Reinforcement Learning(DRL) to tune TiDB(/TiKV/RocksDB) to decrease latency and increase throughput.

Background

Database tuning is a hard problem and even experienced DBA's can have trouble getting optimal performance. This difficulty increases even more with a relatively new db like TiDB since DBA's don't have decades of experience to build on like with more traditional SQL database.

Large companies like Huawei, Oracle, Google, Microsoft, etc. have implemented AI systems to help automatically tune database deployments. TiDB has also done some experimenting in this area with auto-tikv which is based on ottertune that uses some ML techniques like Gaussian Process and clustering to learn tuning settings.

The large companies techniques are based on DRL which have demonstrated better performance when compared with Ottertune and custom tuning done by DBA's. This proposal aims to bring these same cutting edge machine learning performance benefits to TiDB's auto tuning offerings.

Proposal

Use DRL based on QTune's query-aware approach to develop a system that can automatically tune TiDB's configuration settings.

First we featurize the SQL queries to gather information about the queries including query type, tables, and query cost. Then we feed the query features into the DRL model to dynamically choose suitable configurations. For the DRL model we implement a Double-State Deep Deterministic Policy Gradient(DS-DDPG) using the actor-critic networks. The DS-DDPG model can automatically solve the tuning problem by learning the actor-critic policy according to both the database states and query information.

The overall plan can be broken down into 5 main components.

1. Query2Vec Vectorize SQL request information so our models can be trained with it

  1. Vectorize query information
    Based on the SQL query capture the operation(insert/select/update/delete) and the table(s) to be queried and store them in a 4+t dimensional vector where t is the number of tables. These values can be 0 if not used or 1 if in use.
  2. Vectorize cost information
    To get query costs we can use the query plan generated by the db engine and the sum the cost estimation of operation then normalize the cost by subtracting mean and dividing the std deviation. We maintain a p dimensional vector where p is the number of operations available.
  3. Create overall query vector
    To do this we simply have to concatenate the query info vector and the cost vector.
  4. Support multiple query vectors
    For each query vector we get the union of the query vectors. For each table that is used, if it contains 1 we replace it with the row number of the table. For cost vector, we need to sum up all the costs.

Setup Environment Get metrics and stats from the database to feed into our models.

  1. Collect inner-state metrics. These are the configuration settings which can be tuned like cache size, memory, etc.
  2. Collect outer state metrics. These are the performance indicators like nummber of transactions, deadlocks, etc.
  3. Execute a workload and compute a reward based on performance

2. Train the Predictor The Predictor Aims to predict the performance metrics change if processing a query in the database

  1. Get training data
  • A set of tuples T={〈v,S,I,∆S〉}
    • v is a vector of a query
    • S is the outer metrics
    • I is the inner state
    • ∆S is the outer metrics change when processing v
  • For each〈v,S,I〉, we train Predictor to output a value that is close to ∆S.
  • Training data is gotten for each query q by:
    1. We first use Query2Vector to generate v and obtain metrics S, I from the db.
    2. Run q in the database and record the metrics change ∆S
  1. Train Predictor which is composed of four fully connected layers:
  • input layer: accepts the feature vector and outputs the mapped tensor to the hidden layers
  • 2 hidden layers: a series of non-linear data transformations.
  • Output layer: restricts the tensor to the scale of the database state and generates a vector representing the predicted db metric changes.
  • Use ReLU for the hidden layers, the weights in the network are initialized by the standard normal distribution.
  • Use the adam algorithm to train the Predictor

Sudo Code:

TrainPredictor(W,TP)
// W: The weights of a neural network;
// TP: The training set
while (!converged) do
    foreach (v,S,I,∆S in TP) do 
        Generate the output G of〈v,S,I〉;
        Accumulate the backward propagation error: E=E+(.5*|G−∆S|^2);
    Compute gradient(E), update weights W

Training the Actor-Critic Module The Actor-Critic module aims to tune the database configurations.

  1. Get training data
  2. Given a query workload, we randomly select a subset of queries and generate a sample workload.
  3. Generate its feature vector via Query2Vector
  4. Predict a database metrics S via the Predictor
  5. Get an action A via the Actor
  6. Deploy the actions in the databases
  7. Run the database to get a reward R
  8. Get a new data base metric S2 by updating S the new metrics
  9. Repeat the above steps to get A2 and R2
  10. Iteratively, we get a set of triples〈T1A= (S′1,A1,R1),(S′2,A2,R2),...,(S′t,At,Rt)〉until the average reward value is good enough (e.g., the average reward of ten runs is larger than 10.)
  11. Training Actor-Critic model
  12. Use the experience replay method to train the actor and critic with a training data set〈T1A= (S′1,A1,R1),(S′2,A2,R2),...,(S′t,At,Rt)〉(from above)
  13. Use (S,A,R) and update the Actor and Critic as follows.
  14. (1) We update the actor policy A using the gradient value ∇A=∇AiQ(S′i,Ai|πC)·∇θπAπA(S′i|θπA) - θ is the parameters in A - Q(S′i,Ai|πC) is the Q-value computed by Critic.
  15. (2) We estimate the real action-value Y using the Bellman function to compute Y based on the reward and Q-value - `Y=R+τ·Q(S+1,A(S+1|θπA)|πC) - τ is a tuning factor to tradeoff the Q-value and the reward value
  16. (3) We calculate the loss value L with Q and Y. Critic updates the weights in C by minimizing the loss value L= (Q(S′i,Ai|πC)−Yi)^2
  17. We run the three steps for i=t−1 to i= 1. The algorithm terminates if the model is converged (e.g.,the performance improvement is smaller than a threshold); otherwise we select next training data T2A,T3A,···. - To improve the stability of training, we can introduce two extra target actor and critic networks(whose policies are A and C respectively). These two networks are updated at every step and their weights (parameters of the policy) are updated slower than the normal networks. Then the weights in the normal critic network are updated by minimizing loss compared with the target.

Sudo Code:

TrainAgent(A,C,TA)
// A: The actor’s policy;
// C: The critic’s policy;
// TA:  training data
while (!converged) do
    Get a training data
    for (i=t−1 to 1) do 
        Update the weights in A;
        Estimate an action-value;
        Update the weights in C by minimizing the loss value

Overall Given a request (a query or a query work-load),Query2Vec generates a vector based on the sql. Predictor utilizes the feature vector and generates a predicted state change ∆S. We pass the state change to the db and generates metrics S. Actor takes the metrics as input, and outputs an action (a vector of suggested knob values). We deploy the new configurations in the db and executes the query.

This allows for tuning to optimize the latency (query-level) or throughput(workload-level). Although we can optimize for one at the cost of the other it has been shown that the one not being optimized for still outperforms other ML models like CDBTune, and Ottertune. There is a method to optimize for both by adding clustering to the queries. This adds a couple extra steps where we take the query vector and pass it through a another model that recognizes patterns then through clustering algorithms and train the model based on the clusters. This would be useful for things like analytical queries. This extra cluster level model can be added later for future work.

TrainDS-DDPG()
    Generate training dataTP;
    TrainPredictor(W,TP);
    Generate training dataTA;
    TrainAgent(A,C,TA);

Rationale

There are a number of studies on automatic knob tuning ranging from using heuristics to machine learning to more recently a number of studies using deep learning. like, BestConfig, OtterTune, and CDBTune.

Heuristic based approaches like BestConfig use a heuristic method to search for the optimal configuration from the history but comes with a big disadvantage in that it may not find good knob values if there is no similar configurations in the historic data. These perform better than default settings but often can't compete with experienced DBAs

OtterTune (which auto-Tikv is based on) utilizes machine learning techniques analyze configurations. This relies on a large number of high-quality training examples from DBAs’ experience data, which can be hard to obtain. Ottertune also per

CDBTune uses deep reinforcement learning (DRL) with an actor-critic network similar to the proposal but only takes into account metrics and doesn't account for query information which can have a large effect on the tuning environment.

The draw back is each one takes more development effort than the previous. Heuristic based approaches are relatively straight forward to implement -> Ottertune adds more complexity and steps into using more modern machine learning techniques -> CDBTune takes it a step further and is based on leading edge machine learning research -> Qtune( what the proposal is based on) takes it a step further and adds more specific domain knowledge on top of cutting edge DRL techniques.

Testing Plan

Workload: Use different query workloads such as JOB5, Sysbench7, etc.
Measure: Use latency and throughput to evaluate the performance. Compare: Compare with different approaches i.e. default settings, auto-tikv, (maybe manual DBA tuning if can find someone?)

Open issues (if applicable)

N/A

@DallasC
Copy link
Author

DallasC commented Jan 5, 2021

Update

After discussions with other people and further looking into the implementation I think that the project is a bit to big in scope for this event. Wth that in mind I think the following would be best.

Note: I haven't updated the original rfc yet but when I have some time I'll fill in the changes.

New Scope

Decrease the scope of the project by cutting out the query vectorization feature and just implement the deep reinforcement model. Wth this we just have to implement the model for TiKV and don't have to worry about TiDB which helps reduce complexity.

Preparatory Work

In addition to decreasing the scope ive been working of preparing the environment(metric collection, knob tuning, deployment) so during the event we can focus on the DL model and pipeline which is the fun part and core of the project.

Unresolved Concerns

Training time - There may not be enough time to train the model depending on how long it takes to implement and the resources available. This could effect having a complete demo. Not quite sure what to do about this yet

Future Implementation

Just as a note even after finished I think it would be cool to finish up the original idea and add query vectorization. I'm particularly interested to see how much of an impact it has given TiDBs unique architecture compared to a traditional db.

Also thinking even further ahead I think it would be interesting to also train it on when to scale out/in to optimize resource usage. If you could add costs as a factor(i.e. cloud computing server costs) it would be interesting to see if it could optimize for good performance but also optimize to keep costs low.

@cove9988
Copy link

cove9988 commented Jan 5, 2021

Very interesting and challenge one! I was thinking about a SQL bot for syntax validation, best practices and performance optimization recommendation. Are you still recruiting team members?

@DallasC
Copy link
Author

DallasC commented Jan 6, 2021

Hey, I also think the SQL bot would be a cool idea. Particularly parsing the query plans and if one is sub optimal/expensive and either inserting optimization hints or recommending a new query for the user would be interesting to do. I'm not quite sure how best to do that or how involved that would be though haha.

Yeah so far it's just me but I would be more than happy to work together. Also, the current idea is still up for change if you had any ideas, I started some light work but haven't gotten to far yet. Mostly just research and setting up the environment to be able to pull metrics and change knobs

@cove9988
Copy link

cove9988 commented Jan 6, 2021

Based on the SQL query capture the operation(insert/select/update/delete) and the table(s) to be queried and store them in a 4+t dimensional vector where t is the number of tables. These values can be 0 if not used or 1 if in use.
Most of the performance opt DBA focus on are those conditional select statements in real life, right? if the scope limited selection, it will reduce 4+t to t

@cove9988
Copy link

cove9988 commented Jan 6, 2021

catboost is probably a good option for ML as it is less complicated with super fast training speed, however, I am not sure if it can easy fit into the proposed RDL model.

@pentium3
Copy link

Nice to see you cited the AutoTiKV project :)

You can refer to this link for more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment