Skip to content

Instantly share code, notes, and snippets.

@colstrom
Last active November 16, 2023 03:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save colstrom/44b30fdddc8b0a9bfb44b09972a68676 to your computer and use it in GitHub Desktop.
Save colstrom/44b30fdddc8b0a9bfb44b09972a68676 to your computer and use it in GitHub Desktop.
Discovery Process for Undocumented JSON Formats

Discovery Process for Undocumented JSON Formats

The following process should be fairly universal for any sort of deserializable data format. Since the data was JSON, the examples will use that, but same general process works for YAML/TOML/etc.

Create a new project

cargo new --bin sample-project
cd sample-project

Add a few dependencies

  • serde (with the derive feature enabled) for generic (de)serialization.
  • serde_json for (de)serializing with JSON specifically.
  • anyhow for simple error handling.
cargo add serde serde_json anyhow
cargo feature serde +derive

Write a very small program

In src/main.rs:

#[derive(Debug, Clone, serde::Deserialize)]
#[serde(deny_unknown_fields)]
struct Sample;

fn main() -> anyhow::Result<()> {
  let json = std::fs::read_to_string("./path/to/sample.json")?;
  let sample: Sample = serde_json::from_str(&json)?;
  Ok(())
}

This is a very simple program that attempts to load a single JSON file from a fixed path, and deserialize that into a Sample struct. Sample has serde(deny_unknown_fields) set, so attempting to deserialize into it will fail for any valid non-empty JSON object.

Run it

cargo run
  • This will fail with an error.
    • That error will indicate what (currently undefined) field was encountered.
  • Add it to your Sample struct.
    • Make a guess as to the type, it's not important yet.
  • Run it again.
    • If the field cannot be deserialized into the type you guessed, it will fail with an error.
      • This (new) error will give you some insight into the value. You can also just look at the JSON.
      • Update your Sample struct accordingly.
    • Otherwise, return to the beginning of this list until there are no more errors.

Now do it again with another sample file.

  • If you encounter a (new) error that one of the fields you have defined is missing, then that field must be optional.
    • Change the type from T to Option<T> and continue.
  • If you encounter an error relating to mapping types, define another struct, and use it as the type for that field, following the same process to discover its fields. Continue this process until you encounter no errors for all available samples.

For a sufficiently large and diverse set of valid samples, this process should produce an equally comprehensive and correct set of data structures.

If you have enough sample data to be confident that it is representative of all reasonable variations, consider looking at any fields with a String type.

Do they all have the same limited set of values? That field might be an enum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment