Skip to content

Instantly share code, notes, and snippets.

@indutny
Last active February 3, 2023 05:42
Show Gist options
  • Save indutny/ae44fd93dde2736205609d19a21b87cc to your computer and use it in GitHub Desktop.
Save indutny/ae44fd93dde2736205609d19a21b87cc to your computer and use it in GitHub Desktop.
Notes on FTS5 Merging Algorithm

Inserting new values

fts5FlushOneHash controls this and it puts a new segment on level 0, and then calls:

  1. fts5StructurePromote for level 0
  2. fts5IndexAutomerge
  3. fts5IndexCrisismerge

Level promotion

Promoting level of a structure means either:

  • Leaving it where it is
  • Promoting it up to the next populated level if it is larger (page-wise) than the last segment there
  • Promoting it down to the previous populated level if it is smaller (page-wise) than the some segment there

('optimize') command

Performed by sqlite3Fts5IndexOptimize. Calls fts5IndexOptimizeStruct which puts all segments from all levels to a newly created pStruct->nLevel + 1 ( unless all segments are already on the same level). Then fts5IndexMergeLevel is called repeatedly with nRem set to FTS5_OPT_WORK_UNIT (1000) until the highest level after fts5IndexOptimizeStruct has 0 segments.

fts5IndexMergeLevel merges at most nRem leaf pages (if pnRem is non-null) into a level iLvl + 1 (which might be created). level->nMerge is initially 0, but if not all leaf pages from the segment were merged - we set it to the number of segments at the time of the merge so that when we continue - we won't merge new segments.

'automerge' option

Performed by fts5IndexAutomerge on every hash flush (insert/update/delete). We pass to it the number of leaf pages that was just written (which is most usually 1), and it increments pStruct->nWriteCounter. The nRem passed to fts5IndexMerge is computed as:

(
    floor(newWriteCounter / FTS5_WORK_UNIT) -
    floor(oldWriteCounter / FTS5_WORK_UNIT)
) * FTS5_WORK_UNIT * pStruct->nLevel

Given that FTS5_WORK_UNIT is 64 and writeCounter is usually incremented by 1, every 64 inserts nRem becomes 64 * pStruct->nLevel. This acts as an optimization so that we don't perform automatic merges too often. Thus we pass this nRem to fts5IndexMerge and nMin is set to the actual automerge value (4 by default).

fts5IndexMerge finds the most populated level measured by either looking at the number of merge-in-progress segments (nMerge) or just number of segments. If the number of such segments is less than nMin (4) and no merging for this level is currently in progress - we end.

Otherwise fts5IndexMergeLevel is called for that level with nRem (see 'optimize' section above).

'crisismerge' option

Performed by fts5IndexCrisismerge on every hash flush (insert/update/delete). The action of this function is to merge and promote (starting from level 0) all levels that have more than crisismerge (16) segments. The merging stops as soon as we encounter a level with less than crisismerge (16) segments.

Each level is merged with fts5IndexMergeLevel (where pnRem is 0 and thus merging will complete by the end of the call) and promoted with fts5StructurePromote (see optimize for details).

('merge', N) command

Performed by sqlite3Fts5IndexMerge.

If N is negative - fts5IndexOptimizeStruct is called (see optimize above) and in most scenarios all segments from all levels are moved to a newly created level. Then fts5IndexMerge is called (see automerge) with nMin set to 2 (basically no minimum) and nRem set to abs(N) (which is -N because N is negative).

If N is positive - just fts5IndexMerge is called and nMin is set to the value of usermerge option (4 by default) and nRem is set to N. The meaning of this is that it will find most populated level (which is actually the one we just created with fts5IndexOptimizeStruct that has all the segments from the structure) with at least 4 pages and merge up to N leafs into a newly created level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment