clemensv/avroschema.md

## avroschema.md

      
    Raw
  

              avroschema.md
            
          
Apache Avro Schema - Formal Specification


Abstract

This document provides a comprehensive specification of the schema definition
system used by Apache Avro. It details the structure and syntax of Avro
schemas.
The serialization rules of the Avro binary and JSON encodings are
not defined in this document.

Contents


Apache Avro Schema - Formal Specification

Abstract
Contents
1. Introduction
2. Notational Conventions
3. Schema Specification

3.1. Schema Declarations

3.1.1. Schema documents
3.1.2. Media Type


3.2. Documentation Strings
3.3. Named Types

3.3.1. Alias Names


3.4. Naming conventions
3.5. Extensibility
3.6. Primitive Type Schemas

3.6.1. null
3.6.2. boolean
3.6.3. int
3.6.4. long
3.6.5. float
3.6.6. double
3.6.7. bytes
3.6.8. string


3.7. Fixed Type
3.8. Logical Types

3.8.1. decimal
3.8.2. UUID
3.8.3. date
3.8.4. time-millis
3.8.5. time-micros
3.8.6. timestamp-millis
3.8.7. timestamp-micros
3.8.8. local-timestamp-millis
3.8.9. local-timestamp-micros
3.8.10. duration


3.9. record Type

3.9.1. record field Declarations


3.10. enum Type
3.11. array
3.12. map
3.13. Type Unions


4. The "Parsing Canonical Form" for Avro Schemas

4.1. Transforming into Parsing Canonical Form


5. Schema Fingerprints
6. Security Considerations
7. IANA Considerations

7.1. Media Type Registration


7. References


1. Introduction

Apache Avro is a serialization framework used for data serialization within
Apache Hadoop and many other messaging and eventing contexts. Avro provides a
compact, fast binary data format and a simple integration with dynamic
languages. Avro depends on schemas, defined in JSON format, that define what
data is being serialized and deserialized. This document is a formal specification
of the Avro schema system, detailing the syntax and semantics of Avro schemas.
Avro Schemas are defined in JSON, which is easily readable and writable.
2. Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in RFC 2119.
3. Schema Specification

3.1. Schema Declarations

An Avro schema is a JSON value or object that defines the structure of data
being serialized or deserialized. Primitive type schemas are represented as JSON
values (strings), while logical and complex type schemas are represented as JSON
objects. Type unions are represented as JSON arrays.


Type kind
Avro Schema


primitive
"string"


logical
{ "type": "int", "logicalType": "date" }


complex
{ "type": "array", "items": "string" }


union
["null", "string"]


Complex Avro schemas of type record, enum, fixed are
named types, which have a fullname composed of a
namespace and a name. The namespace is a string that commonly
identifies the schema's organization or project, and the name is a string
that identifies the schema within the namespace.
All named types used within a schema MUST be declared where they are first used.
Named type declations are visible within the entire schema document once
declared, independent of where in the overall type hierarchy the declaration
occurs.
Subsequent references to a declared named type MUST be made by its fullname.
3.1.1. Schema documents

An Avro schema document, which is a restriction of the general Avro schema
pattern to enable sharing of schemas across different parties, MUST contain
either a single named type or a union of named types at its root. This
restriction ensures that code generation tools can generate code for the schema
with unambiguous type names.
All complex types used in a schema document MUST be defined within the same
schema document. There is no import or include mechanism for referencing types
defined in other schema documents. This restriction ensures that the schema is
self-contained and can be easily shared and distributed.
3.1.2. Media Type

The media type for Avro schema documents is application/vnd.apache.avro.schema+json.
See IANA Considerations for more information.
3.2. Documentation Strings

All Avro schemas and record field declarations MAY contain an OPTIONAL doc
attribute, which is a string that provides human-readable documentation for the
schema. The doc attribute is used to describe the purpose and usage of the
schema.
Example:
{
  "type": "record",
  "name": "Employee",
  "fields": [
    { "name": "name", "type": "string", "doc": "The name of the employee" },
    { "name": "email", "type": "string", "doc": "The email address" }
  ],
  "doc": "A record representing an employee"
}
3.3. Named Types

Named types MUST be defined with a REQUIRED name and OPTIONAL namespace
attribute. Schemas with record, enum, and fixed types are named types.
The name attribute is a REQUIRED string that identifies the schema within the
namespace. The namespace attribute is an OPTIONAL string that identifies a
scope for names.
When the namespace attribute is not present, the schema is in the namespace of
its enclosing schema. When there is no enclosing schema, the schema is in the
default namespace. The default namespace is an empty string.
A schema MAY contain multiple named types within the same namespace or
across different namespaces.
The value of the name attribute MUST be a not-empty string and start with a
letter from a-z or A-Z. Subsequent characters MUST be letters from a-z or
A-Z, digits, or underscores (_). This restriction ensures that the name
attribute is a valid identifier in most programming languages and databases.
The value of the namespace attribute MUST be sequence of one or more
name-like strings separated by dots (.).
The fullname of a named type is the concatenation of the namespace and
name attributes, separated by a dot (.).
name = ALPHA *(ALPHA / DIGIT / "_")
namespace = name *("." name)
fullname = (namespace ".") name
The following is an example of a record schema named Contact in the
com.example namespace. It has a nested record schema named Address defined
at first use for the mailingAddress field, which inherits the namespace from
its enclosing schema. The type os referenced again by fullname for the
billingAddress field. The "fullname" of the resulting schema is
com.example.Contact.
{
  "type": "record",
  "name": "Contact",
  "namespace": "com.example",
  "fields": [
    { "name": "name", "type": "string" },
    { "name": "email", "type": "string" },
    {
      "name": "mailingAddress",
      "type": {
        "type": "record",
        "name": "Address",
        "fields": [
          { "name": "street", "type": "string" },
          { "name": "city", "type": "string" },
          { "name": "state", "type": "string" },
          { "name": "zip", "type": "string" }
        ]
      }
    },
    { "name": "billingAddress", "type": "com.example.Address" }
  ]
}
3.3.1. Alias Names

Named types MAY have an OPTIONAL aliases attribute, which is an array of
strings that are alternative names for the named type. The aliases attribute
MUST NOT contain the name attribute of the named type.
The aliases attribute is used to maintain compatibility when the name of a
named type changes. When a named type is renamed, the aliases attribute can be
used to specify the old name of the type. This allows readers to recognize the
old name and map it to the new name.
3.4. Naming conventions

It is RECOMMENDED for the namespace attribute to be a reverse domain name of a
domain that your organization controls, such as com.example, to avoid naming
conflicts. It is also RECOMMENDED for the namespace expression to be in
lowercase.
It is RECOMMENDED for the name attribute of named types to use PascalCase,
where the first letter of each word is capitalized and there are no spaces or
underscores.
It is RECOMMENDED for the name attribute of record fields to use camelCase,
where the first letter of the first word is lowercase and the first letter of
each subsequent word is capitalized, with no spaces or underscores.
3.5. Extensibility

Avro schemas are extensible, allowing for the addition of any user-defined
attributes to any schema. Extension attributes are ignored by Avro's built-in
processing, but can be used by custom processing tools. Extension attributes
MUST be made accessible by Apache Avro implementations for reading and writing.
To avoid conflicts with future Avro extensions, the names of user-defined
attributes SHOULD be chosen to avoid collisions. It is RECOMMENDED to use a
prefix, as in myorg_myattribute, to denote user-defined attributes.
3.6. Primitive Type Schemas

The primitive types in Avro are defined in this section.
3.6.1. null

Represents an absence of a value. Used in Avro to allow optional fields or to
represent non-existent values in data records.
3.6.2. boolean

Represents a boolean value, true or false. This type is commonly utilized for
flags and boolean status indicators in data.
3.6.3. int

Represents a 32-bit signed integer. It accommodates integer values ranging from
$(-2^{31})$ to $(2^{31}-1)$.
3.6.4. long

Represents a 64-bit signed integer. It can store values from $(-2^{63})$ to
$(2^{63}-1)$.
3.6.5. float

Represents a single precision 32-bit IEEE 754 floating-point number. Suitable
for numerical values that do not require the precision of double-precision types
but need to cover a broad range of values. IEEE 754 single-precision floats have
an approximate precision of 7 decimal digits and can represent values ranging
from approximately $(1.4 \times 10^{-45})$ to $(3.4 \times 10^{38})$.
3.6.6. double

Represents a double precision 64-bit IEEE 754 floating-point number. This type
provides roughly double the precision of the float type, with an approximate
precision of 15 decimal digits. It can accommodate values ranging from about
$(4.9 \times 10^{-324})$ to $(1.8 \times 10^{308})$.
3.6.7. bytes

Represents a sequence of 8-bit unsigned bytes. Used to store raw binary data,
such as file contents or binary-encoded values.
3.6.8. string

Represents a sequence of Unicode characters encoded in UTF-8. This type is ideal
for textual data that may include any character from the Unicode standard.
3.7. Fixed Type

The fixed type is a named type that represents a
fixed-size sequence of bytes. The size of the fixed-size sequence is defined by
the size attribute, which is an integer.
For example, a SHA-256 hash value can be represented as a fixed type with a
size of 32 bytes.
{
  "type": "fixed",
  "name": "SHA256",
  "size": 32
}
Since the fixed type is a named type, it MUST be declared where it is first
used and can then be referenced by its fullname.
3.8. Logical Types

Logical types provide a way to extend the primitive types with additional
semantics.
3.8.1. decimal

The decimal logical type represents arbitrary-precision fixed-point numbers.
It is defined by two attributes: precision and scale. The precision
attribute specifies the total number of digits in the number, while the scale
attribute specifies the number of digits to the right of the decimal point.
The decimal logical type is represented in Avro as a bytes or fixed type,
where the bytes contain the two's complement representation of the decimal
number. The REQUIRED precision and OPTIONAL scale attributes are stored as
metadata in the schema.
{
  "type": "bytes",
  "logicalType": "decimal",
  "precision": 10,
  "scale": 2
}
3.8.2. UUID

The uuid logical type represents a universally unique identifier (UUID) as
defined by RFC 4122. The UUID is a
128-bit value that is typically represented as a 32-character hexadecimal string
with hyphens separating the parts.
The uuid logical type annotates the string primitive type to indicate that
the string value is a UUID.
Example:
{
  "type": "string",
  "logicalType": "uuid"
}
3.8.3. date

The date logical type represents a calendar date without a time component. It
is defined as the number of days since the Unix epoch, January 1, 1970. The
date logical type annotates the int primitive type.
Example:
{
  "type": "int",
  "logicalType": "date"
}
3.8.4. time-millis

The time-millis logical type represents a time of day with millisecond
precision. It is defined as the number of milliseconds after midnight. The
time-millis logical type annotates the int primitive type.
Example:
{
  "type": "int",
  "logicalType": "time-millis"
}
3.8.5. time-micros

The time-micros logical type represents a time of day with microsecond
precision. It is defined as the number of microseconds after midnight. The
time-micros logical type annotates the long primitive type.
Example:
{
  "type": "long",
  "logicalType": "time-micros"
}
3.8.6. timestamp-millis

The timestamp-millis logical type represents an instant in time with
millisecond precision. It is defined as the number of milliseconds since the Unix
epoch, January 1, 1970 00:00:00.00 UTC. The timestamp-millis logical type annotates the long
primitive type.
Example:
{
  "type": "long",
  "logicalType": "timestamp-millis"
}
3.8.7. timestamp-micros

The timestamp-micros logical type represents an instant in time with
microsecond precision. It is defined as the number of microseconds since the Unix
epoch, January 1, 1970 00:00:00.00 UTC. The timestamp-micros logical type annotates the long
primitive type.
Example:
{
  "type": "long",
  "logicalType": "timestamp-micros"
}
3.8.8. local-timestamp-millis

The local-timestamp-millis logical type represents an instant in time with
millisecond precision in the local timezone. It is defined as the number of
milliseconds since the Unix epoch, January 1, 1970 00:00:00.00 in the local
timezone. The local-timestamp-millis logical type annotates the long primitive type.
Example:
{
  "type": "long",
  "logicalType": "local-timestamp-millis"
}
3.8.9. local-timestamp-micros

The local-timestamp-micros logical type represents an instant in time with
microsecond precision in the local timezone. It is defined as the number of
microseconds since the Unix epoch, January 1, 1970 00:00:00.00 in the local
timezone. The local-timestamp-micros logical type annotates the long primitive type.
Example:
{
  "type": "long",
  "logicalType": "local-timestamp-micros"
}
3.8.10. duration

The duration logical type represents an amount of time defined by a number of
months, days and milliseconds. This is not equivalent to a number of
milliseconds, because, depending on the moment in time from which the duration
is measured, the number of days in the month and number of milliseconds in a day
may differ. Other standard periods such as years, quarters, hours and minutes
can be expressed through these basic periods.
A duration logical type annotates Avro fixed type of size 12, which stores three
little-endian unsigned integers that represent durations at different
granularities of time. The first stores a number in months, the second stores a
number in days, and the third stores a number in milliseconds.
Example:
{
  "type": "fixed",
  "name": "Duration",
  "size": 12,
  "logicalType": "duration"
}
3.9. record Type

The record type is a named type that represents a set of
named fields. Each field has a name and a type. The record type is used to
define structured data types.
The following attributes are used to define a record type:

name, namespace, aliases: See Named Types.
doc: See Documentation Strings.
fields: An array of field declarations

3.9.1. record field Declarations

A field declaration is an object that contains the following attributes:

name: The name of the field. The value of the name attribute MUST be a
not-empty string and start with a letter from a-z or A-Z. Subsequent
characters MUST be letters from a-z or A-Z, digits, or underscores (_).
This restriction ensures that the name attribute is a valid identifier in
most programming languages and databases.
aliases: See Alias Names.
type: The type of the field. The type attribute's value MUST be an
Avro schema expression.
doc: See Documentation Strings.
default: The default value of the field. The default attribute's value
MUST be a valid value of the field's type. The default attribute is OPTIONAL.
order: The sort order of the field. The order attribute is OPTIONAL and
MUST be one of the following string values:

ascending: The field is sorted in ascending order.
descending: The field is sorted in descending order.
ignore: The field is not sorted.


The default attribute is used to provide a default value for the field when
the field is not present in the serialized data.
The value of the default attribute MUST be a valid value of the field's type.
Since the value is declared as a JSON value in the Avro Schema, the default
value MUST be encoded in JSON in accordance with the following mapping:


Avro Type
JSON Type
Example
Note


null
null
null


boolean
boolean
true


int
number
42


long
number
42


float
number
3.14


double
number
3.14


bytes
string
"\u00FF"
Bytes are encoded as unicode escape sequences


string
string
"hello"


fixed
string
"\u00FF"
Fixed values are encoded as unicode escape sequences


enum
string
"SYMBOL"


array
array
[]


map
object
{}


3.10. enum Type

The named enum type defines a set of symbols. An enum typed value MUST one of those
symbols.
The following attributes are used to define an enum type:

name, namespace, aliases: See Named Types.
doc: See Documentation Strings.
symbols: An array of strings that represent the symbols of the enum.
default: OPTIONAL. The default value of the field. The default attribute's value
MUST be one of the values declared in symbols if defined.

The string values in the symbols array MUST be unique. The string values are
subject to the same naming conventions as the name attribute of named types.
Example:
{
  "type": "enum",
  "name": "Color",
  "namespace": "com.example",
  "symbols": ["RED", "GREEN", "BLUE"]
}
3.11. array

The array type represents a list of values, all of the same type specified by
the items attribute.
The following attributes are used to define an array type:

items: The type of the elements in the array. The items attribute's value
MUST be an Avro schema expression.
default: The default value of the array. The default attribute's value
MUST be a valid value of the array's type. The default attribute is OPTIONAL.

Example:
{
  "type": "array",
  "items": "string"
}
3.12. map

The map type represents a set of key-value pairs, where the keys are strings
and the values are of the specified type.
The following attributes are used to define a map type:

values: The type of the values in the map. The values attribute's value
MUST be an Avro schema expression.
default: The default value of the map. The default attribute's value MUST
be a valid value of the map's type. The default attribute is OPTIONAL.

Example:
{
  "type": "map",
  "values": "int"
}
3.13. Type Unions

A type union is an array of Avro schema expressions. A value of a type union
MUST be a valid value of exactly one of the types in the union.
All types in a type union MUST be distinct.
Any primitive type MUST be included at most once, which also applies to logical
type annotations. A UUID logical type, which annotates string, and a
string primitive type therefore MUST NOT appear in the same type union.
A union MUST NOT contain more than one array type and NOT more than one map
type. Multiple array or map types therefore need to be modeled with type unions for
the array's items or map's values type.
A union MAY contain multiple, distinct named types directly or by reference.
Named types are distinct if they have different fullnames.
A very common use case for type unions is to declare optionality for values by
joining the desired type of the value with the null type in type union. The
following example shows a type union that represents a string or a null value.
["null", "string"]
Type unions can otherwise be used to represent values that may be of different
types. The following example shows a type union that represents a string or a
boolean value.
["string", "boolean"]
An other fairly common case for type unions is to provide a choice of two or
more record types. This pattern MAY also be used to define a collection of
record types in a single schema document.
With multiple records in a type union being permitted, it is RECOMMENDED for all
such records to be structurally distinct. This means that the records should
have different fields or field types. This is to help avoid ambiguity when
reading data that is serialized with a type union in cases where data
structuress are described with Avro Schema, but a data serialization model is
used where the data encoding does not support type markers.
[
  {
    "type": "record",
    "name": "Person",
    "fields": [
      { "name": "name", "type": "string" },
      { "name": "age", "type": "int" }
    ]
  },
  {
    "type": "record",
    "name": "Organization",
    "fields": [
      { "name": "name", "type": "string" },
      { "name": "employees", "type": { "type": "array", "items": "Person" } }
    ]
  }
]
4. The "Parsing Canonical Form" for Avro Schemas

One of the defining characteristics of Avro's binary encoding is that a reader
must use the schema used by the writer of the data in order to know how to read
the data. This assumption results in a data format that’s compact and also
amenable to many forms of schema evolution. However, the specification so far
has not defined what it means for the reader to have the “same” schema as the
writer. Does the schema need to be textually identical? Well, clearly adding or
removing some whitespace to a JSON expression does not change its meaning. At
the same time, reordering the fields of records clearly does change the meaning.
So what does it mean for a reader to have "the same" schema as a writer?
The Parsing Canonical Form is a transformation of a writer’s schema that let’s
us define what it means for two schemas to be "the same" for the purpose of
reading data written against the schema. It is called Parsing Canonical Form
because the transformations strip away parts of the schema, like "doc"
attributes, that are irrelevant to readers trying to parse incoming data. It is
called Canonical Form because the transformations normalize the JSON text
(such as the order of attributes) in a way that eliminates unimportant
differences between schemas. If the Parsing Canonical Forms of two different
schemas are textually equal, then those schemas are "the same" as far as any
reader is concerned, i.e., there is no serialized data that would allow a reader
to distinguish data generated by a writer using one of the original schemas from
data generated by a writing using the other original schema.
The next subsection specifies the transformations that define Parsing Canonical
Form. But with a well-defined canonical form, it can be convenient to go one
step further, transforming these canonical forms into simple integers
(“fingerprints”) that can be used to uniquely identify schemas. The subsection
after next recommends some standard practices for generating such fingerprints.
4.1. Transforming into Parsing Canonical Form

Assuming an input schema (in JSON form) that’s already UTF-8 text for a valid
Avro schema (including all quotes as required by JSON), the following
transformations will produce its Parsing Canonical Form:

[PRIMITIVES] Convert primitive schemas to their simple form (e.g., int
instead of {"type":"int"}).
[FULLNAMES] Replace short names with fullnames, using applicable namespaces
to do so. Then eliminate namespace attributes, which are now redundant.
[STRIP] Keep only attributes that are relevant to parsing data, which are:
type, name, fields, symbols, items, values, size. Strip all
others (e.g., doc and aliases).
[ORDER] Order the appearance of fields of JSON objects as follows: name,
type, fields, symbols, items, values, size. For example, if an
object has type, name, and size fields, then the name field should appear
first, followed by the type and then the size fields.
[STRINGS] For all JSON string literals in the schema text, replace any
escaped characters (e.g., \uXXXX escapes) with their UTF-8 equivalents.
[INTEGERS] Eliminate quotes around and any leading zeros in front of JSON
integer literals (which appear in the size attributes of fixed schemas).
[WHITESPACE] Eliminate all whitespace in JSON outside of string literals.

5. Schema Fingerprints

"[A] fingerprinting algorithm is a procedure that maps an arbitrarily large data
item (such as a computer file) to a much shorter bit string, its fingerprint,
that uniquely identifies the original data for all practical purposes" (quoted
from Wikipedia). In the Avro context, fingerprints of Parsing Canonical Form can
be useful in a number of applications; for example, to cache encoder and decoder
objects, to tag data items with a short substitute for the writer’s full schema,
and to quickly negotiate common-case schemas between readers and writers.
In designing fingerprinting algorithms, there is a fundamental trade-off between
the length of the fingerprint and the probability of collisions. To help
application designers find appropriate points within this trade-off space, while
encouraging interoperability and ease of implementation, we recommend using one
of the following three algorithms when fingerprinting Avro schemas:

When applications can tolerate longer fingerprints, we recommend using the
SHA-256 digest algorithm to generate 256-bit fingerprints of Parsing Canonical
Forms. Most languages today have SHA-256 implementations in their libraries.
At the opposite extreme, the smallest fingerprint we recommend is a 64-bit
Rabin fingerprint. Below, we provide pseudo-code for this algorithm that can
be easily translated into any programming language. 64-bit fingerprints should
guarantee uniqueness for schema caches of up to a million entries (for such a
cache, the chance of a collision is 3E-8). We don’t recommend shorter
fingerprints, as the chances of collisions is too great (for example, with
32-bit fingerprints, a cache with as few as 100,000 schemas has a 50% chance
of having a collision).
Between these two extremes, we recommend using the
MD5 message digest to generate 128-bit
fingerprints. These make sense only where very large numbers of schemas are
being manipulated (tens of millions); otherwise, 64-bit fingerprints should be
sufficient. As with SHA-256, MD5 implementations are found in most libraries
today.

These fingerprints are not meant to provide any security guarantees, even the
longer SHA-256-based ones. Most Avro applications should be surrounded by
security measures that prevent attackers from writing random data and otherwise
interfering with the consumers of schemas. We recommend that these surrounding
mechanisms be used to prevent collision and pre-image attacks (i.e., “forgery”)
on schema fingerprints, rather than relying on the security properties of the
fingerprints themselves.
Rabin fingerprints are
cyclic redundancy checks
computed using irreducible polynomials. In the style of the
Appendix of RFC 1952 (pg 10), which
defines the CRC-32 algorithm, here’s our definition of the 64-bit AVRO
fingerprinting algorithm:
long fingerprint64(byte[] buf) {
  if (FP_TABLE == null) initFPTable();
  long fp = EMPTY;
  for (int i = 0; i < buf.length; i++)
    fp = (fp >>> 8) ^ FP_TABLE[(int)(fp ^ buf[i]) & 0xff];
  return fp;
}

static long EMPTY = 0xc15d213aa4d7a795L;
static long[] FP_TABLE = null;

void initFPTable() {
  FP_TABLE = new long[256];
  for (int i = 0; i < 256; i++) {
    long fp = i;
    for (int j = 0; j < 8; j++)
      fp = (fp >>> 1) ^ (EMPTY & -(fp & 1L));
    FP_TABLE[i] = fp;
  }
}
Readers interested in the mathematics behind this algorithm may want to read
Chapter 14 of the Second Edition of Hacker’s Delight. (Unlike RFC-1952 and the
book chapter, we prepend a single one bit to messages. We do this because CRCs
ignore leading zero bits, which can be problematic. Our code prepends a one-bit
by initializing fingerprints using EMPTY, rather than initializing using zero as
in RFC-1952 and the book chapter.)
6. Security Considerations

Care must be taken when processing Avro schemas and data to avoid schema
injection attacks, unauthorized data exposure, and issues arising from malformed
data structures.
7. IANA Considerations

7.1. Media Type Registration

This specification defines the application/vnd.apache.avro.schema+json media
type for Avro Schema document that shall be registered with IANA.
7. References


RFC 2119: Key words for use in RFCs to
Indicate Requirement Levels#
RFC 3986: Uniform Resource Identifier
RFC 4648: The Base16, Base32, and Base64 Data Encodings
RFC 5646: Tags for Identifying Languages
RFC 7231: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
RFC 8259: The JavaScript Object Notation (JSON) Data Interchange Format
Type kind	Avro Schema
primitive	`"string"`
logical	`{ "type": "int", "logicalType": "date" }`
complex	`{ "type": "array", "items": "string" }`
union	`["null", "string"]`
Avro Type	JSON Type	Example	Note
null	null	`null`
boolean	boolean	`true`
int	number	`42`
long	number	`42`
float	number	`3.14`
double	number	`3.14`
bytes	string	`"\u00FF"`	Bytes are encoded as unicode escape sequences
string	string	`"hello"`
fixed	string	`"\u00FF"`	Fixed values are encoded as unicode escape sequences
enum	string	`"SYMBOL"`
array	array	`[]`
map	object	`{}`