STEP 1: When AddSymbol() is called
----------------------------------------
alenblen;label_name_1;label_value_1;label_name_2;label_name_3;label_value_2;label_value_3;
remarks: "alenblen" is the space reserved for symbol table metadata, like size of the symbol table, and number of symbols.
This is overwritten in the finishSymbol().
STEP 2: When finishSymbol() is called
----------------------------------------
<symbol_table_size_bytes>;<numer_of_symbols>;label_name_1;label_value_1;label_name_2;label_name_3;label_value_2;label_value_3;
This is then flused to the disk and loaded into memory by NewSymbols() for later faster access.
SYMBOL TABLE
-------------
<symbol_table_size_bytes>;<numer_of_symbols>;label_name_1;label_value_1;label_name_2;label_name_3;label_value_2;label_value_3;
STEP 3: When AddSeries(ref, lset, chunks) is called to insert as series_1
-------------------------------------------------------------
<symbol_table_size_bytes>;<numer_of_symbols>;label_name_1;label_value_1;label_name_2;label_name_3;label_value_2;label_value_3;<number_of_bytes_in_current_series_added>\
<number_of_label_pairs>;label_name_index_of_series_1;label_value_index_of_series_1;<number_of_chunks>;<series_1_chunk_0_mint>;<series_1_chunk_0_range>;<series_1_chunk_0_reference_number>;\
<series_1_chunk_1_mint>;<series_1_chunks_1_range>;<series_1_chunks_1_reference_number>;....series_1 ends....;<hash_for_series_1_contents>
STEP 4: When AddSeries(ref, lset, chunks) is called to insert as series_2
-------------------------------------------------------------
<symbol_table_size_bytes>;<numer_of_symbols>;label_name_1;label_value_1;label_name_2;label_name_3;label_value_2;label_value_3;<number_of_bytes_in_current_series_added>\
<number_of_label_pairs>;label_name_index_of_series_1;label_value_index_of_series_1;<number_of_chunks>;<series_1_chunk_0_mint>;<series_1_chunk_0_range>;<series_1_chunk_0_reference_number>;\
<series_1_chunk_1_mint>;<series_1_chunks_1_range>;<series_1_chunks_1_reference_number>;....series_1 ends....;<hash_for_series_1_contents>;<number_of_bytes_in_current_series_added>\
<number_of_label_pairs>;label_name_index_of_series_2;label_value_index_of_series_2;<number_of_chunks>;<series_2_chunk_0_mint>;<series_2_chunk_0_range>;<series_2_chunk_0_reference_number>;\
<series_2_chunk_1_mint>;<series_2_chunks_1_range>;<series_2_chunks_1_reference_number>;....series_2 ends....;<hash_for_series_2_contents>
SERIES TABLE
-------------
<number_of_bytes_in_series_1_added><number_of_label_pairs>;label_name_index_of_series_1;label_value_index_of_series_1;<number_of_chunks>;<series_1_chunk_0_mint>;<series_1_chunk_0_range>;<series_1_chunk_0_reference_number>;
<series_1_chunk_1_mint>;<series_1_chunks_1_range>;<series_1_chunks_1_reference_number>
in short, for each series added
<num_of_bytes_in_series_1><num_label_pairs>;label_name_index;label_value_index;....;<number_of_chunks>;<chunk_0_mint>;<chunk_0_range>;<chunk_0_reference_number>;
<chunk_1_mint>;<chunks_1_range>;<chunks_1_reference_number>;......;<hash>
STEP 5: writePostings() is called after all series are added to the SERIES TABLE
---------------------------------------------------------------------------------
--------------------------------START A TEMPERARY FILE, SAY POT (a file that represents Postings Offet Table)
2;label_name_1;label_value_1;<F1_row_1>
2;label_name_1;label_value_2;<F2_row_2>
....
--------------------------------START A TEMPORARY FILE, SAY P (a file that represents Postings)
<number_of_bytes_current_postings>;<number_of_series_ids_for_label_name_1_and_label_value_1>;series_id_1;series_id_2;series_id_3;......;<hash>
<number_of_bytes_current_postings>;<number_of_series_ids_for_label_name_2_and_label_value_2>;series_id_2;series_id_3;series_id_5;......;<hash>
....
STEP 6: writeLabelIndices() is called after finishing writing the POT & P files. writeLabelIndices() uses the POT
------------------------------------------------------------------------------------------------------------------
reads from the POT file and forms a list of "label_name: [array of value indexes]" and writes into the main index file
LABEL INDEX
---------------
<num_bytes>;1;<num_of_values_indexes>;<hash_from_1_till_now>;label_value_1;hash;label_value_2;hash;....;final_hash_sum;
STEP 7: writePostings() takes the file P and writes to main index file
STEP 8: writeLabelIndexesOffsetTable() writes the names for the values that were written in writeLabelIndices().
At that time, offsets of these written values were stored with the label name which is used here.
-------------------------------------------------------------------------------------------------------------------
<num_bytes_in_this_section>;<num_label_names>;hash;
1;label_name_1;<offset_for_values_in_label_indices>;hash;
1;label_name_1;<offset_for_values_in_label_indices>;hash;
1;label_name_1;<offset_for_values_in_label_indices>;hash;
hash_sum;
STEP 9: writePostingsOffsetTable() opens the POT file and writes the data from there into the main index file.
-----------------------------------------------------------------------------------------------------------------
<num_bytes_in_this_section>;<num_postings_offset_entries_a.k.a_num_label_pairs>;hash;
<num_keys>;label_name_1;label_value_1;offset;hash;
<num_keys>;label_name_1;label_value_2;offset;hash;
.....
hash_sum;
STEP 10: writeTOC() writes the autual byte position for each table
--------------------------------------------------------------------
symbol_table_pos;series_table_pos;label_indices;label_index_offset;postings;postings_offset
Created
December 16, 2021 11:27
-
-
Save Harkishen-Singh/e03c292cedfad074d065c882098aa711 to your computer and use it in GitHub Desktop.
This Gist explains Prometheus's tsdb index layout along with some implementation details at various steps.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment