Skip to content

Instantly share code, notes, and snippets.

@samcv
Last active March 7, 2017 10:42
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save samcv/fca0ba6e76dba9f8efa860d50ad124ca to your computer and use it in GitHub Desktop.
Save samcv/fca0ba6e76dba9f8efa860d50ad124ca to your computer and use it in GitHub Desktop.

Sorted table

This is part of my brainstorming for how to go from codepoint numbers to rows in the bitfield that store the property data for each codepoint. Due to there being hundreds of thounsands of codepoints, we cannot just store an array that stores hundreds of thousands of items.

sorted_table

Meaning of this table explained in the pseudocode below.

low high bitfield_row miss
14 27 0 14
71 90 39 24
107 122 35 40

Pseudocode example

if cp >= low && cp <= high {
  /* If it's between high and low, we have the result in the bitfield_row already */
  return bitfield_row
}
else {
  /* If it is not between low and high, but it is smaller than `low` in the next column
  we can subtract `miss` and get the bitfield row for that codepoint */
  return lookup[cp - miss]
}

Example

Here is an example that uses linear search. If this data structure is used, it does not preclude the use of binary rather than linear search, or fastpaths for the most common characters.

/* used for misses, contains bitfield row numbers */
uint16_t lookup[] = {
    0,0,0,0,0,0,0,0,0,1,2,3,4,5,6,6,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,36,9,14,38,38,38,38,..}

struct table  {
    uint32_t low;
    uint32_t high;
    uint32_t bitfield_row;
    uint32_t miss;
};
typedef struct table table;

table sorted_table[] = {
  {14, 27, 0, 14},
  {71, 90, 39, 24},
  {107, 22, 35, 40}
};
#define sorted_table_elems ...
int get_offset_new (uint32_t cp) {
    int i = 0;
    for (;1;i++) {
        if (cp < sorted_table[i+1].low) {
            if (cp >= sorted_table[i].low && cp <= sorted_table[i].high) {
                return sorted_table[i].bitfield_row;
            }
            /* Case to catch cp's less than the first point in the table
             * eventually there should be an extra row at the beginning of the table for this? */
            else if (cp < sorted_table[i].low) {
                return point_index[cp];
            }
            else {
                return point_index[cp - sorted_table[i].miss];
            }
        }
    }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment