Skip to content

Instantly share code, notes, and snippets.

@vmx
Last active January 20, 2023 15:47
Show Gist options
  • Save vmx/1bf16c12d751a619eaf78f7635fd71c5 to your computer and use it in GitHub Desktop.
Save vmx/1bf16c12d751a619eaf78f7635fd71c5 to your computer and use it in GitHub Desktop.
Filecoin PreCommit1 phase labelling

This is the pseudocode on how the labelling in PC1 works. It's for the 32GiB sector size production parameters.

Glossary:

  • node: A "node" refers to a 256-bit (32 bytes) value of the padded input data. A 32 GiB sector as 1m nodes.
  • label: A "label" refers to a 256-bit (32 bytes) value of one of the layers. It's the result of hashing the node index, layer index, replica ID and multiple labels together.

Inputs:

  • parent_file: That's a pre-generated file (it's fixed and the same for everyone) that contains a list of random 32-bit integers that are used as offsets to read from a file.
  • padded_data: A 32GiB file that has the padding already applied. "Padding" in this case means that each 0th and 1st bit is zero. It could be validated as:
def validate_padding(data):
    for (offset, byte) in enumerate(data):
        # Every 32-bits => every 4 bytes
        if (offset + 1) % 4:
            # The last two bits are not zero
            if byte | 0b00111111 != byte:
                raise Exception("invalid padding")
  • replica_id: The ID of the current replica.

Outputs:

  • The result are 11 layers, where the last one is the replication key. That key is used to create a unique replica of the input data. You can think of it as xor'ing it with the input data, just in a more fancy cryptographic way.
# The number of layers is hard-coded in the proofs.
NUM_LAYERS = 11

def open_layer_file(layer):
    '''Creates a file on disk where the data of the current layer is stored.
    '''
    pass

def nodes_iter(data):
    '''Returns an iterator, where each element is a 256-bit (32 bytes) long
    big integer.
    '''
    pass


def sha256(data):
    '''Returns the SHA-256 of the given input data.
    '''
    pass

def read_label(file, offset):
    '''Returns the current label, a 256-bit (32 bytes) value, from the given
    offset (multiplied by `NODE_SIZE`) of the given file.
    '''
    pass

def get_offsets(file):
    '''Returns a tuple consisting of 6 base label and 8 expansion label
    offsets.

    Each function call will advance the file cursor, by 14 offsets (32-bit)
    values so we'll just sequentially traversing the file. This means that file
    size is exactly `(SECTOR_SIZE / NODE_SIZE) * NUM_LABELS * MEM_SIZE_32_BIT`,
    e.g. `((32 * 1024^3) / 32) * 14 * 4`

    The base labels offsets will be used to get labels from the current layer,
    the expansion label offsets will be used to get the labels from the
    previous layer.
    '''

def pc1(parent_file, padded_data, replica_id):
    for layer in range(NUM_LAYERS)
        current_file = open_layer_file(layer)
        for (offset, node) in enumerate(nodes_iter(padded_data)):
            # The first node doesn't have a previous node, hence don't use any
            # base or expansion labels.
            if offset == 0:
                label = sha256(layer | offset | replica_id)
            else:
                [base_offsets, expansion_offsets] = get_offsets(parent_file)
                base_labels = [read_label(current_file, offset) for offset in base_offsets]

                # The first layer doesn't have a previous layer, hence we
                # cannot use the expansion labels from the previous layer.
                if layer == 0:
                    expansion_labels = []
                else:
                    expansion_labels = [read_label(parent_file, offset) for offset in expansion_offsets]

                label = sha256(layer | offset | replica_id | base_labels | expansion_labels)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment