Skip to content

Instantly share code, notes, and snippets.

@donmccurdy
Last active September 21, 2021 17:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save donmccurdy/dfd023683a3c35fa910f88c4a3267dbf to your computer and use it in GitHub Desktop.
Save donmccurdy/dfd023683a3c35fa910f88c4a3267dbf to your computer and use it in GitHub Desktop.
Meshopt Compression and Tabular Data

Evaluation: Meshopt Compression with Tabular Data

Summary: Short evaluation of Meshopt compression for use with tabular data. For the purposes of this evaluation, tabular data is defined as a dataset having many observations ("rows"), where each observation consists of one or more property values from a common schema.

Steps

I downloaded the results of the 2015 NYC Tree Census from BigQuery's NYC Street Trees public dataset, also available through NYC Open Data. The 2015 census consists of 683,788 rows and 41 columns, and is about 500 MB when exported as JSON. Much of that data is text, and because meshopt is designed for numeric input we have good reason to believe results for the numeric columns will be "at least as good" as results for the dataset as a whole. For purposes of simpler evaluation and an upper bound on compression ratio, only numeric columns were included in this evaluation.

Using the attached script main.js, we compute a series of estimates for individual columns, and for the entire dataset.

For each column:

  • Uncompressed byte size in smallest appropriate binary format
  • Meshopt-compressed byte size
  • Gzip-compressed byte size
  • Meshopt+Gzip-compressed byte size
  • Meshopt decoding time

For the entire dataset:

  • Total uncompressed byte size
  • Total gzip-compressed byte size
  • Total meshopt-compressed byte size
    • (a) with columns compressed in isolation
    • (b) with columns interleaved before compression
  • Total meshopt-and-gzip-compressed byte size
    • (a) with columns compressed in isolation
    • (b) with columns interleaved before compression

The purpose of evaluating columns compressed both interleaved and in isolation is to better understand pros/cons of accessor- and bufferView-based storage for tabular data in 3D Tiles and glTF. The current draft specification stores columns using one buffer view apiece, preventing interleaving. Accessor storage would allow interleaving, which has benefits including compression size and decoding speed, at least for triangle mesh data. Benefits for tabular data are unknown, and an outcome of interest for this evaluation.

Lossy filters were excluded from evaluation; all encoding below is lossless. Exponential filters may be appropriate for some column types, and should be considered in any production implementation. I see no reason to believe that filters would disproportionately benefit either interleaved or isolated arrays.

Meshopt conforms to the padding requirements of GPU APIs; each array element must align to 4-byte boundaries. For scalar uint8[] columns this requires 3 bytes of padding per 1 byte of data, which mostly disappears in compressed storage but is a significant increase in uncompressed memory. That additional padding is included in meshopt compression results, and omitted from uncompressed and gzip-only results.

More details about meshopt encoding are available in the meshoptimizer package documentation.

Results

After observing several script executions on the dataset, the table of results below appear typical. The difference in total decoding time (5% faster with interleaved columns) was not consistent in all trials. For much larger numbers of columns, or much smaller numbers of rows, it may become significant by taking better advantage of SIMD, but that effect was not observed here.

All sizes below are given in bytes:

┌─────────────────────┬──────────────┬───────────┬───────────┬──────────────┬─────────────┐
│ prop                │ uncompressed │   meshopt │      gzip │ meshopt+gzip │ decode (ms) │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ tree_id (uint32)    │    2,735,152 │ 1,218,851 │ 1,349,885 │    1,127,803 │        11.9 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ block_id (uint32)   │    2,735,152 │   853,827 │   700,396 │      755,100 │         9.8 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ tree_dbh (uint16)   │    1,367,576 │   598,936 │   520,736 │      503,556 │         8.8 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ stump_diam (uint8)  │      683,788 │   125,769 │    36,136 │       51,217 │         7.5 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ cb_num (uint16)     │    1,367,576 │    53,548 │     8,559 │        7,968 │         6.8 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ borocode (uint8)    │      683,788 │    42,854 │       753 │          200 │         6.7 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ cncldist (uint8)    │      683,788 │    71,408 │    17,535 │       19,386 │         6.8 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ st_assem (uint8)    │      683,788 │   104,110 │    30,896 │       37,327 │         8.8 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ st_senate (uint8)   │      683,788 │    87,424 │    24,164 │       27,681 │         6.8 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ boro_ct (uint32)    │    2,735,152 │    65,061 │    11,364 │       14,148 │         7.1 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ latitude (float32)  │    2,735,152 │ 1,041,441 │ 1,436,100 │      994,986 │         7.8 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ longitude (float32) │    2,735,152 │   990,308 │ 1,354,792 │      937,450 │         6.8 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ x_sp (float32)      │    2,735,152 │ 1,455,694 │ 1,825,429 │    1,401,403 │         6.5 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ y_sp (float32)      │    2,735,152 │ 1,608,317 │ 2,054,044 │    1,569,914 │         6.9 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ TOTAL (SEPARATE)    │   25,300,156 │ 8,317,548 │ 9,370,789 │    7,448,139 │       109.0 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ TOTAL (COMBINED)    │              │ 8,146,167 │           │    7,607,325 │       103.8 │
└─────────────────────┴──────────────┴───────────┴───────────┴──────────────┴─────────────┘

To summarize results, meshopt compression provides lower compression ratios than gzip alone, and does better still when combined with gzip. Appropriate use of exponential filters may improve the average compression ratio still further. However, not all results are unambigously positive:

  • For smaller storage types (uint8, uint16) meshopt increases size compared to gzip alone. meshopt+gzip is sometimes, but not always, smaller than gzip alone. This is likely a result of the required 4-byte element alignment.
  • No significant difference in compression ratio is apparent for interleaved or isolated columns. Interleaved does somewhat better before gzip, isolated does better after. The difference seems small enough to be case-dependent, but it's also very possible that gzip compression achieves better results with data in isolated arrays, e.g. to enable tightly-packed runs of similar values. We can conjecture that tabular data is more likely to have such runs than triangle mesh data.
  • No significant difference is decoding speed is apparent for interleaved or isolated columns. Such a difference would likely appear with a high enough columns-to-rows ratio, but large row counts seem like the more common scenario, as in this dataset.

Compression ratios are about -70% for Meshopt+Gzip and -63% for Gzip only. Use of lossy encoding filters would improve the Meshopt compression ratio — in earlier tests on triangle mesh data, the improvement was about -15% on average, compared to lossless Meshopt compression.

Recommendations

While I'd hoped that the test would indicate a conclusive advantage for interleaving property arrays, either in compression ratio or decoding speed, it does not. However, the process of implementing this did highlight the requirement of 4-byte alignment in array elements, which would not be possible in the current bufferView-based storage plan. As a result, use of meshopt compression with EXT_feature_metadata may require either (a) refactoring the schema to use accessor-based storage that supports padding, or (b) repacking the data into 4-byte array types, like Int32Array or Float32Array, when users want to take advantage of Meshopt.

import fs from 'fs';
import path from 'path';
import ndjson from 'ndjson';
import gzipSize from 'gzip-size';
import Table from 'cli-table';
import { performance } from 'perf_hooks';
import { MeshoptDecoder, MeshoptEncoder } from 'meshoptimizer';
import { createTable, schema } from './schema.js';
await Promise.all([MeshoptDecoder.ready, MeshoptEncoder.ready]);
const ROW_COUNT = 683788;
let rowCount = 0;
const table = createTable(ROW_COUNT);
fs.createReadStream(path.resolve('./new_york_tree_census_2015.json'))
.pipe(ndjson.parse())
.on('data', (row) => {
for (const prop in table) {
table[prop][rowCount] = schema[prop].parse(row[prop]);
}
rowCount++;
})
.on('end', main)
.on('error', error);
function main() {
console.log(`Parsed ${rowCount} rows, ${Object.keys(table).length} columns.`);
const totalBytes = new Array(4).fill(0);
let totalDecodeMillis = 0;
const results = new Table({
head: ['prop', 'uncompressed', 'meshopt', 'gzip', 'meshopt+gzip', 'decode (ms)'],
colAligns: ['left', 'right', 'right', 'right', 'right', 'right'],
});
for (const prop in table) {
const propBytes = new Array(4).fill(0);
const {compressed, decodeMillis} = compressColumn(table[prop]);
totalBytes[0] += (propBytes[0] = table[prop].byteLength);
totalBytes[1] += (propBytes[1] = compressed.byteLength);
totalBytes[2] += (propBytes[2] = gzipSize.sync(Buffer.from(table[prop].buffer)));
totalBytes[3] += (propBytes[3] = gzipSize.sync(compressed));
totalDecodeMillis += decodeMillis;
results.push([`${prop} (${formatType(table[prop])})`, ...propBytes.map(formatLong), formatMillis(decodeMillis)]);
}
results.push(['TOTAL (SEPARATE)', ...totalBytes.map(formatLong), formatMillis(totalDecodeMillis)]);
const {compressed, decodeMillis} = compressTable(table);
const compressedGzipped = gzipSize.sync(compressed);
results.push([
'TOTAL (COMBINED)',
'',
formatLong(compressed.byteLength),
'',
formatLong(compressedGzipped),
formatMillis(decodeMillis)
]);
console.log(results.toString());
process.exit(0);
}
function error() {
console.error(e);
console.error(`Error after ${rowCount} rows.`);
process.exit(1);
}
// -------- COMPRESSION HELPERS --------
function pad(n) {
return Math.ceil(n / 4) * 4;
}
function padArray(src) {
if (src.BYTES_PER_ELEMENT === 4) {
return src;
} else if (src.BYTES_PER_ELEMENT > 4) {
throw new Error('64-bit types not implemented.');
}
const dstStride = 4 / src.BYTES_PER_ELEMENT;
const dst = new src.constructor(src.length * dstStride);
for (let i = 0; i < src.length; i++) {
dst[i * dstStride] = src[i];
}
return dst;
}
function compressColumn(array) {
// TODO(confirm): Output and check RMSE.
const paddedArray = padArray(array);
const compressed = MeshoptEncoder.encodeVertexBuffer(paddedArray, ROW_COUNT, 4);
const target = new Uint8Array(paddedArray.byteLength);
const t0 = performance.now();
MeshoptDecoder.decodeVertexBuffer(target, ROW_COUNT, 4, compressed);
const decodeMillis = performance.now() - t0;
return {compressed, decodeMillis};
}
function compressTable(table) {
let bytesPerRow = 0;
for (const prop in table) {
bytesPerRow += table[prop].BYTES_PER_ELEMENT;
}
bytesPerRow = pad(bytesPerRow);
const tableBuffer = new ArrayBuffer(ROW_COUNT * bytesPerRow);
const tableView = new DataView(tableBuffer);
let rowByteOffset = 0;
for (const prop in table) {
for (let i = 0; i < ROW_COUNT; i++) {
const byteOffset = i * bytesPerRow + rowByteOffset;
const value = table[prop][i];
switch (table[prop].constructor) {
case Float32Array:
tableView.setFloat32(byteOffset, value);
break;
case Uint32Array:
tableView.setUint32(byteOffset, value);
break;
case Uint16Array:
tableView.setUint16(byteOffset, value);
break;
case Uint8Array:
tableView.setUint8(byteOffset, value);
break;
default:
throw new Error('Unsupported array type.');
}
}
rowByteOffset += table[prop].BYTES_PER_ELEMENT;
}
// TODO(confirm): Output and check RMSE.
const compressed = MeshoptEncoder.encodeVertexBuffer(new Uint8Array(tableBuffer), ROW_COUNT, bytesPerRow);
const target = new Uint8Array(tableBuffer.byteLength);
const t0 = performance.now();
MeshoptDecoder.decodeVertexBuffer(target, ROW_COUNT, bytesPerRow, compressed);
const decodeMillis = performance.now() - t0;
return {compressed, decodeMillis};
}
// -------- FORMATTING HELPERS --------
function formatLong(value) {
return value.toLocaleString();
}
function formatMillis(ms) {
return ms.toFixed(1);
}
function formatBytes(bytes, decimals = 2) {
if (bytes === 0) return '0 Bytes';
const k = 1024;
const dm = decimals < 0 ? 0 : decimals;
const sizes = ['Bytes', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'];
const i = Math.floor(Math.log(bytes) / Math.log(k));
return parseFloat((bytes / Math.pow(k, i)).toFixed(dm)) + ' ' + sizes[i];
}
function formatType(array) {
return array.constructor.name.replace('Array', '').toLowerCase();
}
{
"type": "module",
"dependencies": {
"cli-table": "^0.3.6",
"gzip-size": "^6.0.0",
"meshoptimizer": "^0.16.1",
"ndjson": "^2.0.0"
}
}
export const schema = {
tree_id: { parse: Number, Array: Uint32Array, required: true, },
block_id: { parse: Number, Array: Uint32Array, },
created_at: { parse: (v) => new Date(v).toLocaleDateString('en-US') },
tree_dbh: { parse: Number, Array: Uint16Array, },
stump_diam: { parse: Number, Array: Uint8Array },
curb_loc: { parse: String },
status: { parse: String },
health: { parse: String },
spc_latin: { parse: String },
spc_common: { parse: String },
steward: { parse: String },
guards: { parse: String },
sidewalk: { parse: String },
user_type: { parse: String },
problems: { parse: String },
root_stone: { parse: String },
root_grate: { parse: String },
root_other: { parse: String },
trunk_wire: { parse: String },
trnk_light: { parse: String },
trnk_other: { parse: String },
brch_light: { parse: String },
brch_shoe: { parse: String },
brch_other: { parse: String },
address: { parse: String },
zipcode: { parse: String },
zip_city: { parse: String },
cb_num: { parse: Number, Array: Uint16Array, },
borocode: { parse: Number, Array: Uint8Array },
boroname: { parse: String },
cncldist: { parse: Number, Array: Uint8Array },
st_assem: { parse: Number, Array: Uint8Array },
st_senate: { parse: Number, Array: Uint8Array },
nta: { parse: String },
nta_name: { parse: String },
boro_ct: { parse: Number, Array: Uint32Array, },
state: { parse: String },
latitude: { parse: Number, Array: Float32Array },
longitude: { parse: Number, Array: Float32Array },
x_sp: { parse: Number, Array: Float32Array },
y_sp: { parse: Number, Array: Float32Array },
};
export function createTable(count) {
const table = {};
for (const prop in schema) {
if (schema[prop].parse === Number) {
table[prop] = new schema[prop].Array(count);
}
}
return table;
}
@zeux
Copy link

zeux commented Sep 21, 2021

One thing I'd recommend trying when compressing arrays of uint8 or uint16 is artificially setting the stride to 4 (bytes), and adjusting the count accordingly. E.g. to compress a sequence of 7 uint8 numbers you can set the stride to 4, count to 2, and add a dummy padding element at the end.

In general the results here make sense to me. For cases when you see degradation in compression when meshopt+gzip is used instead of gzip it looks like the data is highly redundant so gzip has a very high degree of compression. meshopt assumes data is related but not necessarily redundant, and doesn't have a way to disable delta encoding, which may break up the structure a bit in some cases. Maybe there's some other subtler effects here, or maybe meshopt codec just needs to be tuned a bit more on this type of data.

One other note re: interleaving is that it restricts you a bit if you want to use filters, but I don't think it's very important in this particular case, because here when the data is integer you shouldn't use any filters, and if it's floating point then using exponential filters doesn't prevent interleaving, as long as only floating point data is interleaved with itself.

@zeux
Copy link

zeux commented Sep 21, 2021

P.S. If the point of this exercise is to figure out whether a future extension should use accessors or not, then it looks to me like compression-wise the answer is "it doesn't really matter one way or the other" :) Assuming my suggestion wrt uint8/uint16 works fine which I think it should -- it's not super effective for morph targets but that's for a very specific reason that I don't think would apply to tabular data generally.

@zeux
Copy link

zeux commented Sep 21, 2021

P.P.S. And yeah for gzip without meshopt it's common to see better results on deinterleaved data because gzip can take advantage of repetition or different statistics in different parts of the stream. Although there are also cases where it can be the reverse because the cost of repeating the entire row wholesale is much higher with deinterleaved data because you need to spend match bits several times. Once you apply meshopt compression the difference should largely disappear because meshopt codecs internally do bytewise deinterleaving.

@donmccurdy
Copy link
Author

Thank you @zeux! As you mentioned, the original prompt for this was to determine whether an extension for tabular data would have technical reasons to prefer accessors, and that doesn't seem to be necessary. This isn't a Khronos extension, so a bit of adjustment or custom application of the meshopt codec might not be out of the question down the road, we're just trying to lay the groundwork properly. I tested your suggestion of setting the stride artificially and padding the end of the array (rather than every element) and the results are quite interesting.

  • Significant decrease in pre-gzip size of uint8 and uint16 column, as expected
  • Mixed results in post-gzip size of each column, overall a modest decrease
  • Significant improvement in decoding time
  • Implementation of combined / interleaved case was not changed, so no difference there

Full results below:

┌─────────────────────┬──────────────┬───────────┬───────────┬──────────────┬─────────────┐
│ prop                │ uncompressed │   meshopt │      gzip │ meshopt+gzip │ decode (ms) │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ tree_id (uint32)    │    2,735,152 │ 1,218,851 │ 1,349,885 │    1,127,803 │        12.6 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ block_id (uint32)   │    2,735,152 │   853,827 │   700,396 │      755,100 │         9.9 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ tree_dbh (uint16)   │    1,367,576 │   592,008 │   520,736 │      512,688 │         4.4 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ stump_diam (uint8)  │      683,788 │   100,963 │    36,136 │       48,990 │         2.1 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ cb_num (uint16)     │    1,367,576 │    34,982 │     8,559 │        9,044 │         3.7 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ borocode (uint8)    │      683,788 │    10,969 │       753 │          196 │         2.4 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ cncldist (uint8)    │      683,788 │    53,999 │    17,535 │       23,344 │         2.0 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ st_assem (uint8)    │      683,788 │    93,140 │    30,896 │       43,615 │         2.1 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ st_senate (uint8)   │      683,788 │    71,434 │    24,164 │       32,523 │         2.6 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ boro_ct (uint32)    │    2,735,152 │    65,061 │    11,364 │       14,148 │         7.3 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ latitude (float32)  │    2,735,152 │ 1,041,441 │ 1,436,100 │      994,986 │         6.6 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ longitude (float32) │    2,735,152 │   990,308 │ 1,354,792 │      937,450 │         8.3 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ x_sp (float32)      │    2,735,152 │ 1,455,694 │ 1,825,429 │    1,401,403 │         7.9 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ y_sp (float32)      │    2,735,152 │ 1,608,317 │ 2,054,044 │    1,569,914 │         6.5 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ TOTAL (SEPARATE)    │   25,300,156 │ 8,190,994 │ 9,370,789 │    7,471,204 │        78.5 │
├─────────────────────┼──────────────┼───────────┼───────────┼──────────────┼─────────────┤
│ TOTAL (COMBINED)    │              │ 8,146,167 │           │    7,607,325 │       110.0 │
└─────────────────────┴──────────────┴───────────┴───────────┴──────────────┴─────────────┘

@zeux
Copy link

zeux commented Sep 21, 2021

Cool thanks. The practical implications of the increased stride is that the delta compression works in gaps of 4 or 2, instead of being sequential. Which is often not that big of a deal.

If you're feeling adventurous you could compare it with a stride=1 or stride=2:

  • You'll need to change the .js wrappers to remove the assertions that check stride alignment
  • You'll need to disable SIMD in the decoder .js

I think the scalar implementation of decoder, and the encoder, will work correctly with stride=1 after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment