maguoshun/苍蝇腿也是肉.md Secret

## 苍蝇腿也是肉.md

      
    Raw
  

              苍蝇腿也是肉.md
            
          
    TiDB 优化器支持 Skip Scan


团队名称：苍蝇腿也是肉
作者：马国舜
项目进展：基本设计完成，可以开始开发，开发过程中可能会对设计有部分调整
https://github.com/maguoshun/tidb

项目介绍：

skip scan 是一种高效的索引扫描方式，在某些场景下，可以极大减少数据的扫描量，提高查询效率。
比如对如下场景：
create table t(a int primary key, b int, c int, d int, index idx_c_b(c, b));
insert into t values(1,2,3,100),(4,5,6,200),(7,8,9,300);
select b, c from t where b > 5;
在目前的优化方式中，会对 idx_c_b 索引进行 full index scan，扫描整个索引表。在有了 skip scan 之后，该查询会被分为 3 个 range 上的 index range scan，range 分别为
r1: (3/5/-oo, 3/5/+oo)
r2: (6/5/-oo, 6/5/+oo)
r3: (9/5/-oo, 9/5/-oo)
因为跳过了部分不符合要求的行，所以使用 skip scan 时可以减少扫描的行数，在索引的前几个字段区分度较差时，优化效果最为明显。
在 TiDB 的场景下，相当于把一次 index full scan，转为了多次的 index seek + index range scan，但是因为 seek 的开销较大，计算 cost 时需要考虑多次的 seek 是否会抵消扫描数据量减少所带来的收益。
也可参考 mysql skip scan
项目设计

实现的主要思路是在 Exector 的 Next 过程中，当场计算 next prefix 的 range，达到多个小 range 跳跃扫描的目的，而不是提前把大 range 计算好后再构造 Exector。注意代价计算时考虑 distinct 的数目。具体步骤如下：


支持 hint skip scan 和 hint no skip scan 语法，并在 explain 中支持 index skip scan；


对 OperandSelection 增加 index skip scan 的 transformation rule，并实现该 rule 相应的 Match 函数和 OnTransform 函数，Match 函数的实现参考 mysql 对 skip scan 的限制，OnTransform 函数中只对 condition 进行优化裁剪，但不针对索引进行 range 构造，确保 range 可以用于 skip scan。skip scan 的限制如下：


Table T has at least one compound index with key parts of the form ([A_1, ..., A_k,] B_1, ..., B_m, C [, D_1, ..., D_n]). Key parts A and D may be empty, but B and C must be nonempty.


The query references only one table.


The query does not use GROUP BY or DISTINCT.


The query references only columns in the index.


The predicates on A_1, ..., A_k must be equality predicates and they must be constants. This includes the IN() operator.


The query must be a conjunctive query; that is, an AND of OR conditions: (cond1(key_part1) OR cond2(key_part1)) AND (cond1(key_part2) OR ...) AND ...


There must be a range condition on C.


Conditions on D columns are permitted. Conditions on D must be in conjunction with the range condition on C.


对 skip scan 的 rule 实现对应的 implementation rule，并实现相应的 Match 函数和 OnImplement 函数，这里应该没有什么特殊的处理，和 index scan 基本一致；


实现 index skip scan 的 CalcCost 函数，计算代价时要考虑 distinct 值的数目；


实现 index skip scan 的 Exector，在 Next 的过程中，跳跃的进行扫描，一个 prefix 扫描完后，用 NextPrefix 调到下一个 prefix，并构造新的 range;