Skip to content

Instantly share code, notes, and snippets.

View tresys's full-sized avatar

Tresys Technology tresys

View GitHub Profile

Describing Self-Descriptive Data with DFDL and Apache Daffodil

Self-descriptive data can be difficult to process since the format of the data is not fixed, but is instead described by metadata. The [Data Format Description Language (DFDL)] and [Apache Daffodil] are powerful tools that can describe and parse a wide variety of data, but even some self-descriptive data formats can prove to be a challenge, particularly when they are logically self-descriptive. Below we detail what DFDL is, what Apache Daffodil is, and a generic approach to use them to describe and parse complex self-descriptive data.

Introduction to DFDL

The [Data Format Description Language (DFDL)] is a specification, developed by the [Open Grid Forum], capable of describing many data formats, including both textual and binary, scientific and numeric, legacy and modern, commercial record-oriented, and many industry and military standards. It defines a language that is a subset of W3C XML schema to describe the logical format of the dat