Last active
May 26, 2021 21:51
-
-
Save swcurran/7d88f9866175af96a70e5c6c00fa0148 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Aries RFC 0017: Attachments | |
- Authors: [Daniel Hardman](daniel.hardman@gmail.com), Sam Curren, Andrew Whitehead | |
- Status: [ACCEPTED](/README.md#accepted) | |
- Since: 2019-01-31 | |
- Status Note: Used in a number of other RFCs. | |
- Start Date: 2018-12-24 | |
- Tags: [concept](/tags.md#concept) | |
## Summary | |
Explains the three canonical ways to attach data to | |
an agent message. | |
## Motivation | |
[DIDComm](../0005-didcomm/README.md) messages | |
use a structured format with a defined schema and a | |
small inventory of scalar data types (string, number, date, etc). | |
However, it will be quite common for messages to supplement formalized | |
exchange with arbitrary data--images, documents, or types of | |
media not yet invented. | |
We need a way to "attach" such content to DIDComm messages. This method | |
must be flexible, powerful, and usable without requiring new schema | |
updates for every dynamic variation. | |
## Tutorial | |
### Messages versus Data | |
Before explaining how to associate data with a message, it is worth | |
pondering exactly how these two categories of information differ. | |
It is common for newcomers to DIDComm to argue that messages are just | |
data, and vice versa. After all, any data can be transmitted over | |
DIDComm; doesn't that turn it into a message? And any message can | |
be saved; doesn't that make it data? | |
While it is true that messages and data are highly related, | |
some semantic differences matter: | |
* __Messages are primarily about communication__. Their meaning is tied | |
to a communication context. [Messages are a mechanism whereby | |
state evolves in a protocol](../0003-protocols/README.md#ingredients). | |
Protocols are [versioned according to the structure and semantics of | |
messages](../0003-protocols/README.md#semver-rules-for-protocols). | |
Messages are usually small, consisting of a modest number of fields with | |
a structure that's focused on furthering the goals of their protocol. | |
* __Data has meaning at rest__, in many different DIDComm protocols, or | |
in important contexts beyond DIDComm. Data may be very large and very | |
complex. It may come in formats that are quite independent from | |
DIDComm, versioned independently from the protocols that share it. | |
Data may be produced, consumed or handled as part of a | |
protocol, but the actual content of the data is usually not where | |
processing at the protocol level focuses. In agent codebases, it would | |
be common for data handling to be implemented in different classes | |
or libraries from the handlers for messages. | |
Some examples: | |
* A protocol to negotiate the release of medical records might cause | |
X-Rays, genomes, and many other artifacts to be transmitted. These | |
artifacts are data, whereas the information packets that arrange the | |
transmission and provide a carrying mechanism for the artifacts are | |
messages. | |
* A DIDComm message can be used to [report an error](../../features/0035-report-problem/README.md). Descriptive | |
parameters that change how the error is processed are probably | |
part of the message, whereas a log file that provides supporting | |
information should be thought of as data rather than the message. | |
* The protocol for issuing credentials consists of messages that flow | |
through certain steps. One of the steps eventually delivers a credential. | |
The credential is _data_; it has meaning even when the protocol is | |
complete, and the protocol version may evolve independent of the data | |
format of the credential itself. The fact that the credential is transmitted | |
through a message does not change the credential's primary status as | |
data. | |
* A protocol to schedule a venue for an event might produce a confirmation | |
message when it finishes. This message might include a map of the | |
venue, instructions about how to unlock the gate, pictures of certain | |
resources, and so forth. This collateral is _data_, whereas the messages | |
that signal progression through the steps of scheduling are not. | |
* The [DID Exchange Protocol](../../features/0023-did-exchange/README.md) | |
exchanges messages to establish a connection between two parties. Part of | |
what's exchanged is a DID Doc. The DID Doc is more like _data_ than it is | |
like an ordinary _message_, since it has meaning at rest and outside the | |
protocol. | |
The line between these two concepts may not be perfectly crisp in all cases, | |
and that is okay. It is clear enough, most of the time, to provide context | |
for the central question of this RFC, which is: | |
> How do we send data along with messages? | |
### 3 Ways | |
Data can be "attached" to DIDComm messages in 3 ways: | |
1. Inlining | |
2. Embedding | |
3. Appending | |
#### Inlining | |
In __inlining__, data is directly assigned as the value paired with a JSON key | |
in a DIDComm message. For example, a message about arranging a rendezvous may | |
inline data about a location: | |
[![inlined location](inlined.png)](rendezvous.json) | |
This inlined data is in Google Maps pinning format. It has a meaning at rest, | |
outside the message that conveys it, and the versioning of its structure | |
may evolve independently of the versioning of the rendezvous protocol. | |
Only JSON data can be inlined, since any other data format would break | |
JSON format rules. | |
#### Embedding | |
In __embedding__, a JSON data structure called an __attachment descriptor__ | |
is assigned as the value paired with a JSON key in a DIDComm message. (Or, an array of | |
attachment descriptors could be assigned.) By convention, the key name for such | |
attachment fields ends with `~attach`, making it a [field-level decorator]( | |
../0011-decorators/README.md#decorator-scope) that | |
can share common handling logic in agent code. The attachment descriptor structure | |
describes the MIME type and other properties of the data, in much the | |
same way that MIME headers and body describe and contain an attachment | |
in an email message. Given an imaginary protocol that photographers could | |
use to share their favorite photo with friends, the embedded data might | |
manifest like this: | |
[![embedded photo](embedded.png)](photo-share.json) | |
Embedding is a less direct mechanism than inlining, because the data is no | |
longer readable by a human inspecting the message; it is | |
base64url-encoded instead. A benefit of this approach is that the data | |
can be any MIME type instead of just JSON, and that the data comes | |
with useful metadata that can facilitate saving it as a separate | |
file. | |
#### Appending | |
__Appending__ is accomplished using the `~attach` decorator, which can be added to any | |
message to include arbitrary data. The decorator is an array of attachment | |
descriptor structures (the same structure used for embedding). For example, a | |
message that conveys evidence found at a crime scene might include the | |
following decorator: | |
[![appended attachments](appended.png)](crime-scene.json) | |
### Choosing the right approach | |
These methods for attaching sit along a continuum that is somewhat like | |
the continuum between strong, statically typed languages versus | |
dynamic, duck-typed languages in programming. The more strongly | |
typed the attachments are, the more strongly bound the attachments | |
are to the protocol that conveys them. Each choice has advantages | |
and disadvantages. | |
![comparison](3-approaches-comparison.png) | |
Inlined data is strongly typed; the schema for its associated | |
message must specify the name of the data field, plus what type | |
of data it contains. Its format is always some kind of JSON--often | |
JSON-LD with a `@type` and/or `@context` field to provide greater | |
clarity and some independence of versioning. Simple and small data | |
is the best fit for inlining. As mentioned earlier, the Connection | |
Protocol inlines a DID Doc in its `connection_request` and | |
`connection_response` messages. | |
Embedded data is still associated with a known field in the | |
message schema, but it can have a broader set of possible | |
formats. A credential exchange protocol might embed a credential | |
in the final message that does credential issuance. | |
Appended attachments are the most flexible but also the hardest | |
to run through semantically sophisticated processing. They do not require | |
any specific declaration in the schema of a message, although they can be | |
referenced in fields defined by the schema via their nickname (see below). | |
A protocol that needs to pass an arbitrary collection of artifacts | |
without strong knowledge of their semantics might find this | |
helpful, as in the example mentioned above, where scheduling a venue | |
causes various human-usable payloads to be delivered. | |
### IDs for attachments | |
The `@id` field within an attachment descriptor is used to refer unambiguously | |
to an appended (or less ideally, embedded) attachment, and works like an HTML | |
anchor. It is resolved relative to the root `@id` of the message and only has | |
to be unique within a message. For example, | |
imagine a fictional message type that's used to apply for an art scholarship, | |
that requires photos of art demonstrating techniques A, B, and C. | |
We could have 3 different attachment descriptors--but what if the same | |
work of art demonstrates both technique A and technique B? We don't want | |
to attach the same photo twice... | |
What we can do is stipulate that the datatype of `A_pic`, `B_pic`, and `C_pic` | |
is an __attachment reference__, and that the | |
references will point to appended attachments. A fragment of the result might | |
look like this: | |
[![@ids example](ids.png)](ids.json) | |
Another example of nickname use appeared in the [first example of | |
appended attachments above](#appending), where the `notes` field | |
refered to the `@id`s of the various attachments. | |
This indirection offers several benefits: | |
* The same attachment may be referenced at more than one place in the | |
schema, without duplicating the content. | |
* The message may now include any | |
number of attachments besides the 3 that are required, and mingle them | |
freely with the others. (In our example, | |
perhaps the student wants to add side and overhead shots of each work of art, | |
not just front views). | |
* Attachments may now have formal semantics, instead of requiring human | |
intelligence to handle. (How many times have we written emails with multiple | |
attachments, and added verbiage like, "My photo of the door is attached as | |
image1.jpeg; my photo of the damaged trunk is attached as image2.jpeg"?) | |
We could use this same technique with embedded attachments (that is, assign | |
a nickname to an embedded attachment, and refer to that nickname in another | |
field where attached data could be embedded), but this is not | |
considered best practice. The reason is that it requires a field in the schema | |
to have two possible data types--one a string that's a nickname reference, and | |
one an attachment descriptor. Generally, we like fields to have a single datatype | |
in a schema. | |
### Content Formats | |
There are multiple ways to include content in an attachment. Only one method | |
should be used per attachment. | |
#### base64url | |
This content encoding is an obvious choice for any content different than JSON. | |
You can embed content of any type using this method. Examples are plentiful | |
throughout the document. Note that this encoding is always [base64url encoding, not plain base64](https://tools.ietf.org/html/rfc4648#section-5), and that padding is not required. When decoding, all Aries software should accept content that is base64 or base64url encoded, padded or unpadded. | |
#### json | |
If you are embedding an attachment that is JSON, you can embed it directly in | |
JSON format to make access easier, by replacing `data.base64` with `data.json`, | |
where the value assigned to `data.json` is the attached content: | |
[![embedded JSON example](embedded-json.png)](embedded-json.json) | |
This is an overly trivial example of [GeoJSON](https://tools.ietf.org/html/rfc7946), | |
but hopefully it illustrates the technique. In cases where there is no mime type | |
to declare, it may be helpful to use [JSON-LD's `@type` construct]( | |
../../concepts/0047-json-ld-compatibility/README.md#type) | |
to clarify the specific flavor of JSON in the embedded attachment. | |
#### links | |
All examples discussed so far include an attachment *by value*--that is, the | |
attachment's bytes are directly inlined in the message in some way. This | |
is a useful mode of data delivery, but it is not the only mode. | |
Another way that attachment data can be incorporated is *by reference*. For | |
example, you can link to the content on a web server by replacing `data.base64` | |
or `data.json` with `data.links` in an attachment descriptor: | |
[![links example](crime-scene-links.png)](crime-scene-links.json) | |
When you provide such a link, you are creating a logical association between the | |
message and an attachment that can be fetched separately. This makes it possible | |
to send brief descriptors of attachments and to make the downloading of the heavy | |
content optional (or parallelizable) for the recipient. | |
The `links` field is plural (an array) to allow multiple locations to be | |
offered for the same content. This allows an agent to fetch attachments using | |
whichever mechanism(s) are best suited to its individual needs and capabilities. | |
##### Supported URI Types | |
The set of supported URI types in an attachment link is limited to: | |
- HTTP | |
- HTTPS | |
Additional URI types may be added via updates to this RFC. | |
If an attachment link with an unsupported URI is received, the agent SHOULD | |
respond with a [Problem Report](../../features/0035-report-problem/README.md) | |
indicated the problem. | |
An ecosystem (coordinating set of agents working in a specific business area) | |
may agree to support other URI types within that ecosystem. As such, implementing a | |
mechanism to easily add support for other attachment link URI types might be useful, | |
but is not required. | |
### Signing Attachments | |
In some cases it may be desirable to sign an attachment in addition to or instead of | |
signing the message as a whole. Consider a home-buying protocol; the home inspection | |
needs to be signed even when it is removed from a messaging flow. Attachments may | |
also be signed by a party separate from the sender of the message, or using a different | |
signing key when the sender is performing key rotation. | |
Embedded and appended attachments support signatures by the addition of a `data.jws` field | |
containing a signature in [JWS (RFC 7515) format](https://tools.ietf.org/html/rfc7515) | |
with [Detached Content](https://tools.ietf.org/html/rfc7515#appendix-F). The payload of the JWS is the raw bytes of the attachment, appropriately base64url-encoded per JWS rules. If these raw bytes are incorporated by value in the DIDComm message, they are already base64url-encoded in `data.base64` and are thus directly substitutable for the suppressed `data.jws.payload` field; if they are externally referenced, then the bytes must be fetched via the URI in `data.links` and base64url-encoded before the JWS can be fully reconstituted. Signatures over inlined JSON attachments are not currently defined as this | |
depends upon a canonical serialization for the data. | |
Sample JWS-signed attachment: | |
```jsonc | |
{ | |
"@type": "https://didcomm.org/xhomebuy/1.0/home_insp", | |
"inspection_date": "2020-03-25", | |
"inspection_address": "123 Villa de Las Fuentes, Toledo, Spain", | |
"comment": "Here's that report you asked for.", | |
"report~attach": { | |
"mime-type": "application/pdf", | |
"filename": "Garcia-inspection-March-25.pdf", | |
"data": { | |
"base64": "eyJ0eXAiOiJKV1QiLA0KICJhbGciOiJIUzI1NiJ... (bytes omitted to shorten)", | |
"jws": { | |
// payload: ..., <-- omitted: refer to base64 content when validating | |
"header": { | |
"kid": "did:key:z6MkmjY8GnV5i9YTDtPETC2uUAW6ejw3nk5mXF5yci5ab7th" | |
}, | |
"protected": "eyJhbGciOiJFZERTQSIsImlhdCI6MTU4Mzg4... (bytes omitted)", | |
"signature": "3dZWsuru7QAVFUCtTd0s7uc1peYEijx4eyt5... (bytes omitted)" | |
} | |
} | |
} | |
} | |
``` | |
Here, the JWS structure inlines a public key value in | |
[did:key format](../../features/0360-use-did-key/README.md) within the unprotected header's | |
`kid` field. It may also use a DID URL to reference a key within a resolvable DIDDoc. | |
Supported DID URLs should specify a timestamp and/or version for the containing | |
document. | |
The JWS protected header consists of at least the following parameter indicating | |
an Edwards curve digital signature: | |
```jsonc | |
{ | |
"alg": "EdDSA" | |
} | |
``` | |
Additional protected and unprotected header parameters may be included in the JWS | |
and must be ignored by implementations if not specifically supported. Any registered | |
header parameters defined by the JWS RFC must be used according to the specification | |
if present. | |
Multiple signatures may be included using the JWS General Serialization syntax. When a | |
single signature is present, the Flattened Serialization syntax should be preferred. | |
Because each JWS contains an unprotected header with the signing key information, the | |
JWS Compact Serialization cannot be supported. | |
### Size Considerations | |
DIDComm messages should be small, as a general rule. Just as it's a bad | |
idea to send email messages with multi-GB attachments, it would be bad | |
to send DIDComm messages with huge amounts of data inside them. Remember, a | |
message is about advancing a protocol; usually that can be done without | |
gigabytes or even megabytes of JSON fields. Remember as well that DIDComm messages | |
may be sent over channels having size constraints tied to the transport--an | |
HTTP POST or Bluetooth or NFC or AMQP payload of more than a few MB | |
may be problematic. | |
Size pressures in messaging are likely to come from attached data. A | |
good rule of thumb might be to not make DIDComm messages bigger than | |
email or MMS messages--whenever more data needs to be attached, use the | |
inclusion-by-reference technique to allow the data to be fetched | |
separately. | |
#### Security Implications | |
Attachments are a notorious vector for malware and mischief with email. For | |
this reason, agents that support attachments MUST perform input validation | |
on attachments, and MUST NOT invoke risky actions on attachments until such | |
validation has been performed. The status of input validation with respect | |
to attachment data MUST be reflected in the Message Trust Context associated | |
with the data's message. | |
#### Privacy Implications | |
When attachments are inlined, they enjoy the same security and transmission | |
guarantees as all agent communication. However, given the right context, | |
a large inlined attachment may be recognizable by its size, even if it is | |
carefully encrypted. | |
If attachment content is fetched from an external source, then new | |
complications arise. The security guarantees may change. Data streamed from a CDN | |
may be observable in flight. URIs may be correlating. Content may not be | |
immutable or tamper-resistant. | |
However, these issues are not necessarily a problem. If a DIDComm message | |
wants to attach a 4 GB ISO file of a linux distribution, it may be perfectly | |
fine to do so in the clear. Downloading it is unlikely to introduce strong | |
correlation, encryption is unnecessary, and the torrent itself prevents | |
malicious modification. | |
Code that handles attachments will need to use wise policy to decide whether | |
attachments are presented in a form that meets its needs. | |
## Reference | |
### Attachment Descriptor structure | |
* `@id`: A [JSON-LD construct]( | |
../../concepts/0047-json-ld-compatibility/README.md#id) | |
that uniquely identifies attached content within the scope of a given | |
message. Recommended on appended attachment descriptors. Possible but generally | |
unused on embedded attachment descriptors. Never required if no references | |
to the attachment exist; if omitted, then there is no way to | |
refer to the attachment later in the thread, in error messages, and so forth. | |
Because `@id` is used to compose URIs, it is recommended that this | |
name be brief and avoid spaces and other characters that require URI | |
escaping. | |
* `description`: An optional human-readable description of the content. | |
* `filename`: A hint about the name that might be used if this attachment is | |
persisted as a file. It is not required, and need not be unique. If this field | |
is present and `mime-type` is not, the extension on the filename may be used | |
to infer a MIME type. | |
* `mime-type`: Describes the MIME type of the attached content. Optional but | |
recommended. | |
* `lastmod_time`: A hint about when the content in this attachment was last | |
modified. | |
* `byte_count`: Optional, and mostly relevant when content is included by | |
reference instead of by value. Lets the receiver guess how expensive it will be, | |
in time, bandwidth, and storage, to fully fetch the attachment. | |
* `data`: A JSON object that gives access to the actual content of the | |
attachment. Contains the following subfields: | |
* `jws`: A JSON Web Signature over the content of the attachment. Optional. | |
* `sha256`: The hash of the content. Optional. Used as an integrity check if | |
content is inlined. if content is only referenced, then including this field | |
makes the content tamper-evident. This may be redundant, if the content is | |
stored in an inherently immutable container like content-addressable storage. | |
This may also be undesirable, if dynamic content at a specified link is | |
beneficial. Including a hash without including a way to fetch the content via | |
link is a form of proof of existence. | |
* `links`: A list of zero or more locations at which the content may be fetched. | |
Optional. | |
* `base64`: Base64url-encoded data, when representing arbitrary content inline instead | |
of via `links`. Optional. | |
* `json`: Directly embedded JSON data, when representing content inline instead of | |
via `links`, and when the content is natively conveyable as JSON. Optional. | |
## Drawbacks | |
By providing 3 different choices, we impose additional complexity on | |
agents that will receive messages. They have to handle attachments | |
in 3 different modes. | |
## Rationale and alternatives | |
Originally, we only proposed the most flexible method of attaching--appending. | |
However, feedback from the community suggested that stronger binding to | |
schema was desirable. Inlining was independently invented, and is suggested | |
by JSON-LD anyway. Embedding without appending eliminates some valuable | |
features such as unnamed and undeclared ad-hoc attachments. So we ended | |
up wanting to support all 3 modes. | |
## Prior art | |
Multipart MIME (see RFCs [822](https://tools.ietf.org/html/rfc822), | |
[1341](https://tools.ietf.org/html/rfc1341), and | |
[2045](https://tools.ietf.org/html/rfc2045)) defines a mechanism | |
somewhat like this. Since we are using JSON instead of email | |
messages as the core model, we can't use these mechanisms directly. | |
However, they are an inspiration for what we are showing here. | |
# Unresolved questions | |
- N/A | |
## Implementations | |
The following lists the implementations (if any) of this RFC. Please do a pull request to add your implementation. If the implementation is open source, include a link to the repo or to the implementation within the repo. Please be consistent in the "Name" field so that a mechanical processing of the RFCs can generate a list of all RFCs supported by an Aries implementation. | |
Name / Link | Implementation Notes | |
--- | --- | |
[Aries Cloud Agent - Python](https://github.com/hyperledger/aries-cloudagent-python) | in credential exchange | |
[Streetcred.id](https://streetcred.id/) | Commercial mobile and web app built using Aries Framework - .NET |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment