iay/Aggregator Specification.md

## Aggregator Specification.md

      
    Raw
  

              Aggregator Specification.md
            
          
    Aggregator Specification

This is a simplified specification for a metadata aggregator with the general capabilities of the eduGAIN service. The specification attempts to make mandatory those aspects of operation which are necessary for the integrity of the service without unnecessarily constraining the metadata which is exchanged through the service.
Upstream Metadata

Upstream Configuration

Information required to configure an upstream channel:

Location to fetch the metadata from. The aggregator MUST be able to handle both http:-scheme and https:-scheme locations.
RSA public key with which the metadata document will be signed. This will normally be made available in the form of an X.509 certificate.
The registrationAuthority value to be associated with the channel. The aggregator need not be capable of associating more than one registrationAuthority with each channel.

Upstream Metadata Acquisition

Processing of each upstream metadata document includes the following:

Fetch the aggregate from the configured location using HTTP GET.

In the case of an https: location, the aggregator MUST ignore the TLS connection for trust purposes.
If a metadata document has previously been fetched from this channel, HTTP conditional GET SHOULD be used.
If the metadata fetch fails for whatever reason, use a previously fetched version of the metadata document for this channel, as long as that document's validUntil instant is not in the past.


Verify that the fetched document schema-validates against the SAML metadata schema.
Verify that the document's document element is md:EntitiesDescriptor.
Verify that the document's document element possesses a validUntil attribute.
Verify that the validUntil on the metadata document is not in the past.
Verify that the validUntil on the metadata document is not more than 30 days in the future.
Verify that the XML signature on the document meets the following criteria:

The signature is valid.
The signature was made using an explicit ID reference, not an empty reference¹
The signature reference refers to the document element²
The signature was made with the public key configured for the channel.³
The signature's digest algorithm is at least as strong as SHA-256

Specifically, MD5 and SHA-1 are not permitted as digest algorithms⁴


The signature's signature method is RSA with an associated digest at least as strong as SHA-256

Specifically, MD5 and SHA-1 are not permitted as digest algorithms⁴


The signature's transforms contain only permissable values:

Enveloped signature
Exclusive canonicalisation with or without comments


An upstream channel which can not deliver a document (fetched or from cache) that meets all of the above tests is regarded as empty.
If an upstream channel does deliver a document that meets all of the above tests, that document MAY be cached (along with is validUntil instant, and headers from the associated GET response) for use in later conditional GET operations and in future fetches where no document is available.
The resulting metadata document is disassembed into individual md:EntityDescriptor documents for further processing; the disassembly process MUST allow for the possibility of nested md:EntitiesDescriptor elements. All md:EntitiesDescriptor elements from the input document are discarded.
Upstream Entity Processing

The per-entity documents from the upstream metadata acquisition process described above are processed independently.⁵ Errors detected while processing an individual md:EntityDescriptor document MUST NOT have any effect on the processing of other entities from the same upstream channel.
The following processing is performed on each entity document:

Remove all instances of the following:

*/@xsi:schemaLocation
*/@xml:base
md:EntityDescriptor/@ID⁶
md:EntityDescriptor/@validUntil⁷
md:EntityDescriptor/@cacheDuration⁷


Validate against the following schemas:

MDRPI


Verify that the entity's md:EntitiesDescriptor contains an md:Extensions which in turn contains an mdrpi:RegistrationInfo whose registrationAuthority exactly matches the configured value for this channel.

Failure to validate against any of the rules described in this section results in the entity being removed from the collection being presented by the upstream channel.
Metadata Combination

The separate entity collections from each upstream feed are combined into a single collection. If an md:EntityDescriptor/@entityID value appears in more than one upstream feed, the resulting collection MUST contain only one of the entities; the others MUST be discarded. In particular, the aggregator MUST NOT attempt to merge or otherwise combine the clashing entity descriptions.
The algorithm used to determine which of a set of clashing entities survives MUST be deterministic and predictable. Options include:

A precedent-based algorithm based on the historically first upstream feed to present an entity with the given entityID.
A precedence-based algorithm in which a published fixed ordering on the upstream feeds determines which upstream feed takes precedence.

Downstream Aggregate

The signature profile for the downstream aggregate SHALL conform to the signature profile described in section 3.1 of [SAML2Meta]. In particular:

Enveloped signature,
ds:Reference containing a URI reference to the document element's ID attribute.

In addition, the following constraints will apply:

The downstream aggregate's ID attribute SHALL be either randomly generated or based on the date and time to at least second resolution,
Digest method of SHA-256,
Signature method of RSA + SHA-256,
Exclusive canonicalisation (with or without comments)
Transforms other than the enveloped signature and exclusive canonicalisation transforms SHALL NOT be included.

Footnotes


Note: some current eduGAIN upstreams do not meet this rule. It should probably be made configurable on a per-channel basis, with the default being to disallow empty references. ↩


This helps to avoid "wrapping attacks". ↩


If the public key for the channel is supplied in the form of an X.509 certificate, other aspects of the certificate such as its expiry date MUST NOT form part of signature verification. This is in accordance with the SAML metadata interoperability profile. ↩


Note: most current eduGAIN upstreams do not meet this rule. It should probably be made configurable on a per-channel basis, with the default being to disallow the weak MD5 and SHA-1 digests. ↩ ↩²


There should be a single exception to this rule, which is that an upstream aggregate MUST NOT contain more than one md:EntityDescriptor with the same entityID attribute value. It's not obvious whether the resolution of this condition should be rejection of the aggregate, rejection of one of the entities in question or rejection of all of the clashing entities. Clashes between upstream feeds must of course be handled elsewhere in any case. ↩


Although ID attributes can sometimes be useful for debugging, the fact that they are defined as unique within a particular XML document means that combining multiple sources of ID can result in clashes and an invalid output document. The simplest way to avoid this is to strip all incoming ID values; there really isn't sufficient benefit to a more active collision-avoidance approach. ↩


These SHOULD NOT appear other than on the document element anyway, but sometimes do. Discarding them is a superior solution compared to inventing a meaning for something we shouldn't see. ↩ ↩²