Skip to content

Instantly share code, notes, and snippets.

@anjijava16
Created October 3, 2022 00:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anjijava16/d1b2327d277230ee35f9cc4ca6c96773 to your computer and use it in GitHub Desktop.
Save anjijava16/d1b2327d277230ee35f9cc4ca6c96773 to your computer and use it in GitHub Desktop.

Parquet compression options​ Parquet is designed for large-scale data with several types of data compression formats supported. Depending on your data format, you might want a different compression.

LZ4: Compression codec loosely based on the LZ4 compression algorithm, but with an additional undocumented framing scheme. The framing is part of the original Hadoop compression library and was historically copied first in parquet-mr, then emulated with mixed results by parquet-cpp. LZO: Compression codec based on or interoperable with the LZO compression library. GZIP: Compression codec based on the GZIP format (not the closely-related "zlib" or "deflate" formats) defined by RFC 1952. Snappy: Default compression for parquet files. ZSTD: Compression codec with the highest compression ratio based on the Zstandard format defined by RFC 8478.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment