mrjoshuak/c4_formal.md

## c4_formal.md

      
    Raw
  

              c4_formal.md
            
          
    Normative Language

C4 Base 58

C4 Base 58 shall be a positional numeral system with a radix (base) of 58 and
shall represent integer values 0 through 57, inclusive, using the case-sensitive
characters "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz" in that
order from the Latin set 1 as defined by ISO/IEC 8859-1. (These are the characters
for the numerals 1-9, followed by the upper-case letters A-Z, followed by the
lower-case letters a-z, excluding the characters “0” zero, “O” upper-case oh,
“I” upper-case, and “l” lower-case el.)
Note:   The above encoding can be matched with the regular expression:
[1-9A-HJ-NP-Za-km-z]

C4 Digest

A C4 Digest shall be a 64-byte SHA-512 message digest of the digital data unit,
to be identified as described by ISO/IEC 10118-3:2004 Information technology –
Security techniques – Hash-functions – Part 3: Dedicated hash-functions.
C4 ID

The C4 ID shall be presented as a 90-character string of alphanumeric
characters from the Latin set 1 as defined by ISO/IEC 8859-1, consisting of:
A. C4 Prefix: The 2-character string “c4” (the lower-case letter “c” followed by the numeral “4”).
B. C4 Suffix: An 88-character string representing the unique message digest of the identified digital data unit. The C4 Suffix shall be calculated as follows:

Encode the 64-byte C4 Digest as a C4 Base 58 integer having 88 digits.

Note:   The above encoding can be matched with the regular expression:
c4[1-9A-HJ-NP-Za-km-z]{88}

Examples:
Sample C4 IDs and corresponding decimal numbers:
"c41111111111111111111111111111111111111111111111111111111111111111111111111111111111111111"
0
"c411111111111111111111111111111111111111111111111111111111111111111111111111111111111BukQL"
123456789
"c467rpwLCuS5DGA8KGZXKsVQ7dnPb9goRLoKfgGbLfQg9WoLUgNY77E2jT11fem3coV9nAkguBACzrU1iyZM4B8roQ"
134078079299425970995740249982058461274793658205923933777235614437217640300735469768018742981
66903427690031858186486050853753882811946569946433649006084095
C4 ID Tree

The C4 ID Tree shall be presented as a string of 3 or more C4 IDs. Zero or more white space
characters may separate each C4 ID. The C4 ID Tree consists of:
A. Root ID: The first C4 ID.
B. Nodes: A set of 3 C4 IDs, the first C4 ID of a node shall be the ‘label’, the second C4 ID shall be the ‘left’ id, and the third C4 ID shall be the ‘right’ id.  The C4 ID Tree shall have one or more Nodes. The index of the left and right ids shall be relative to the index of the ‘label’ as follows:


Label index: i


Left index: 2*i+1


Right index: 2*i+2


The label of a node is computed as follows:


The left and right C4 IDs shall be decoded to the 64 byte message digest.


The left and right message digests shall be concatenated together to 128 bytes by appending the 64 bytes of the greater message digest after the 64 bytes of the lesser message digest


The label shall be the C4 ID of the 128 byte concatenated message digests.


C. Item ID: The C4 ID of data that shall represent one of the items in the list of items represented by the C4 ID tree. An item ID shall be a C4 ID for which the left and right ids have an index that is larger then the number of C4 IDs in the string, or the C4 ID does not match the label of a node formed by the left and right ids.
a binary hash tree, or Merkle tree, where the node labels are based on C4 IDs. The C4 Tree shall be calculated as follows:

A list of C4 IDs shall be sorted in ascending order according to the binary values in the ISO/IEC 8859-1 standard as well as in the list of character here:

123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz


From this list of C4 IDs a Branch shall be created for each pair.


A Branch shall then be created from each pair of Root IDs from the previous step.


If a list of C4 IDs is odd the remaining C4 ID shall be added to the list of Root IDs in the next round.


Steps 2 and 3 shall be repeated until a single Branch remains.


All Branches that form the C4 ID Tree shall be listed sequentially in left to right, and breadth first order starting from the last Branch to be generated.


A Branch of a C4 ID Tree shall be presented as a string of 3 consecutive C4 IDs, consisting of:


Root ID: A C4 ID that identifies the Left and Right Child IDs.


Left and Right Child IDs: Two C4 IDs in ascending order.


A C4 ID Tree Branch shall be created as follows:


Two C4 IDs shall be decoding into 2 64-byte message digests.


The 2 message digests shall be sorted in ascending order and concatenated to 128-bytes.


The lesser of the two Child IDs shall be the Left Child.[A question for Josh. Do we have to sort more than once?]


The Root ID shall be computed from the 128-byte concatenated message digests.


The 3 C4 IDs shall be combined into a string in the following order: Root ID, Left Child ID, Right Child ID.


Images of a sample trees: One even, one odd.
Pseudocode

Compute C4 ID from C4 Digest

SET temp to the input C4 Digest
SET c4Base58 to "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"
SET zero to "1"
SET base to 58
SET result to ""

LOOP WHILE temp > 0
  digit = temp MODULO base
  temp = temp DEVIDE base
  result = digit + result
END LOOP

LOOP WHILE LENGTH(result) < 88
  result = zero + result
END LOOP

result = "c4" + result

Compute C4 Digest from C4 ID

Note: result by be 64 bytes! How do we say this is pseudocode. - joshua
SET c4id to the input C4 ID
SET c4Base58 to "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"
SET base to 58
SET result to 0
SET i to 2

LOOP WHILE i < 90
  temp = INDEX OF temp[i] IN c4Base58
  result = result * base + temp 
  i = i + 1
END LOOP