Skip to content

Instantly share code, notes, and snippets.

@niyue
Last active December 8, 2021 11:15
Show Gist options
  • Save niyue/6c5918795bc7c06fc16f87b21d988377 to your computer and use it in GitHub Desktop.
Save niyue/6c5918795bc7c06fc16f87b21d988377 to your computer and use it in GitHub Desktop.
Arrow IPC file format

magic ARROW1

padding to 8 bytes

streaming format with EOS

Schema

fields (Field[])

  • name (string)
  • nullable (bool)
  • type (Type)
    • List
    • Map
    • Union
    • Int32
    • Int64
    • Utf8
    • ...
  • children (Field[])
  • dictionary_encoding (DictionaryEncoding)
    • id (long)
    • indexType (Int)
    • isOrdered (bool)
    • dictionaryKind (DictionaryKind)
  • custom_metadata (KeyValue)

custom_metadata (KeyValue[])

features (Feature[])

  • UNUSED
  • DICTIONARY_REPLACEMENT
  • COMPRESSED_BODY

dictionary[0]

  • continuation (32 bit 0xFFFFFFFF)
  • metadata size (int32, little endian)
  • metadata buffer (Message)
    • MetadataVersion
    • MessageHeader (DictionaryBatch)
      • id (long)
      • data (RecordBatch)
      • isDelta (bool)
    • compression (BodyCompression)
    • bodyLength (long)
    • custom_metadata (KeyValue[])
  • padding
  • body (compressed or uncompressed array buffers)

dictionary[1]

...

record batch[0]

  • continuation (32 bit 0xFFFFFFFF)
  • metadata size (int32, little endian)
  • metadata buffer (Message)
    • MetadataVersion
    • MessageHeader (RecordBatch)
      • length (long)
      • nodes (FieldNode[])
        • length (long)
        • null_count (long)
      • buffers (Buffer[])
        • offset (long)
        • length (long)
    • compression (BodyCompression)
      • codec (CompressionType, lz4|zstd)
      • method (BodyCompressionMethod, 'BUFFER')
    • bodyLength (long)
    • custom_metadata (KeyValue[])
      • key (string)
      • value (string)
  • padding
  • body (compressed or uncompressed array buffers)

record batch[1]

...

EOS (end of stream [optional])

continuation (0xFFFFFFFF)

metadata size (0x00000000)

footer

MetadataVersion

  • V1 | V2 | V3 | V4 | V5

Schema

Block[0] dictionary

Block[1] dictionary

...

Block[0] record batch

offset (long)

metadata length (int)

body length (long)

Block[1] record batch

...

KeyValue[] custom_metadata

footer length (int32, little endian)

magic ARROW1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment