Skip to content

Instantly share code, notes, and snippets.

@dch
Last active August 29, 2015 13:56
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dch/9344878 to your computer and use it in GitHub Desktop.
Save dch/9344878 to your computer and use it in GitHub Desktop.
PPSP-TP feedback

Tracker Review

Summary

  • the draft is too long for the 4 requirements from rfc6972
  • much of the included content is not directly relevant and arguably confusing
  • there is not enough separation of concern between TP and PPSP
  • technically it might be replaced with simpler alternatives (with some compromises)
  • I'm not intending to implement this in its current form

Requirements from rfc6972

These are the minimum requirements a tracker implementation would need to meet, if it were to comply with the original requirements.

section 6.3 specifies the following requirements for the TP:

PPSP.TP.REQ-1

  • MUST be able to get a list of peers. The tracker may optionally tailor that response.

PPSP.TP.REQ-2

  • The tracker protocol MUST report the peer's activity in the swarm to the tracker.

I'm interested in how much information MUST be reported. We already participate in the swarm, and I'd like to minimise duplication.

PPSP.TP.REQ-3

  • the tracker MUST take the frequency of message exchange and bandwidth use into consideration when communicating chunk availability information.

Why is this a requirement for the tracker? This is a swarm aka peer to communicate chunk information,

PPSP.TP.REQ-4

The tracker protocol MUST have a provision for the tracker to authenticate the peer.

Detailed Feedback

Right now I don't see a need to implement TP at all. i.e. all the information I need to locate a suitable swarm can be provided with far simpler approaches, and the bulk of the information about management of a swarm can be provided from the controlling / seeding peers I have within the swarm. Finally, if my peers are not involved in the swarm at all, I don't see a need to provide a tracker service for that swarm at all. The only useful feature is to be able to re-request a list of valid peers from the tracker if for some reason the current ones were either found to be unreliable or unavailable.

The tracker protocol would be much simpler if in fact most of it were moved back into the peer protocol. There is significant duplication required of peer state, chunk lists, etc as a result of this. We have transaction ids, which are effectively duplicating channels that already exist in PPSP. And each of these now requires trackers & peers to duplicate that state as well, with more & more almost-duplicated code.

Clearly I wasn't involved in the earlier discussions & decisions to split these in the WG, but as an implementer looking for alternate ways to provide the same functionality.

There is clear value for telcos and providers in a lot of this information, but that doesn't mean we need to specify and mandate it in the RFC. What would happen if this functionality was left to providers and implementers, to include within their products? A simple extension field to include arbitrary information or fields could be all that is sufficient. Approx 50% of the draft covers MAY use cases that are only of use if both the tracker and the peer support these features; there's a strong chance that in wider deployment the peers connecting to a swarm will be RFC compliant but not identical providers. Consider bittorrent and http clients/servers as a suitable example.

Useful features such as certification and authorisation, could add considerable value and justify a separate tracker protocol. So we need to make our minds up, what is the TP here to do, what is the minimum feature set to do so, and do we need a separate protocol / transport, along with all the code & memory to operate this in practice?

Objective

  • communication between trackers and peers
  • provide meta info
  • report streaming status
  • obtain peer lists from trackers (in addition to PPSP internal swarm discovery)
  • used for content registration and location

Concerns

  • lack of clarity in some places (2.2, ...) of what is IN the tracker protocol, or simply a dialogue to support the discussion & understanding of the flow. This is clear when you understand PPSP in detail, but will not be so for a newcomer to the RFCs.
  • too much padding information (... not in the scope of this document...). Leave these out completely.
  • use of XML to support PPSP, a protocol designed to be efficient & lightweight, exchanging data that does not need to be self-validating. It's not that complicated! Let's slim this down.
  • conflation of connecting to a swarm with swarm management (stats etc)
  • we will need to be very careful about looking "like" HTTP. Most of the internet (firewalls, proxies etc) will simply assume we are HTTP. So we do need to be compliant with that completely. Extensions will be tricky.
  • a number of the proposed parameters are simply impossible for a peer to obtain reliably, and arguably of no benefit in its current form.

Review

2.2 Enrollment and Bootstrap

There are 3 phases as I see this:

  • locate the content you want to receive (outside tracker protocol)
  • receive enough information to contact the swarm (hash, some peer addressses)
  • connect to the swarm (peer protocol, outside tracker again)
  • send periodic updates to the tracker
  • possibly leave the swarm & tracker

Not clear whether this diagram is a mandatory to implement

2.4 State Machines

Would make more sense to call STARTED state as ACTIVE or RUNNING.

Do we really need a state machine defined per peer in the tracker protocol? I'd be happier to leave that out and simply say that the tracker needs to keep track of the involved peers, and recommend what information a tracker might need to observe. It's important to understand that most of this information is not required for inter-operability.

3.1 Request/Response Syntax and Semantics

Re format, please pick one, binary or text. IMO binary is the way to go but I'm open to discussion. If binary, drop HTTP compliance. If text, keep HTTP.

Please split this section into explanatory text & a separate table, with MUST/MAY etc. Personally I find C struct approach hard to follow, considering the number of optional parameters. Are there alternative formats used in other RFCs that we could use?

The majority of information requested as MAY should IMO be dropped. It will likely not be implemented, and is of arguable benefit.

E.g. what is the significance of the asn or the connection type, whether I'm using NAT or not from the tracker's perspective, if I have concurrent active links (host common is that?), what my bandwidth level is? And how does a client know this information? Is it important that I'm connecting over 3G, when in fact at my home, 3G is more reliable and faster than ADSL (outside business hours!). How would the behaviour of a peer in the PPSP layer change, if this information was known? I posit not at all.

4.1 CONNECT Request

The connect request with LEECH or SEED suggests a binary mode of operation for peers. this will not be the case, for example a live streaming peer may never get to a full SEED state, having dropped initial packets. This is not Bit Torrent ;-).

CONNECT LEAVE seems weird to a native English speaker. It's like clicking on Start button in Windows to Shut Down the computer.

"When a peer plans to leave a previously joined swarm, it should set"

Please use normative language here; MAY/SHOULD/MUST. IMO this information is already available in the peers themselves, as a list of active peers. A peer may, as a result of loss of connectivity, not be able to send a CONNECT LEAVE message.

4.2 FIND Request

Must be specific about how a well-formed FIND request can be validated, and also what happens if it is NOT valid. Do we ignore them?

4.3 STAT_REPORT

Re Keep-alive, duplicates functionality within the peer protocol itself.

4.4 Error and Recovery conditions

"If a peer fails to read" does this mean that if the peer receives but cannot parse the response? If so, re-sending will not help. The peer should re-send its tracker request, rather than require the tracker to decide if the condition is transient or not. Let's push state management out to peers and not try to manage it in the tracker, where possible.

5 Operations and Manageability

As section 5 is mainly boiler-plate, I make a few general comments.

Why do we "propose" syslog and SNMP? These are surely internal management tools, and of no relevance to an internet-facing tracker protocol. Is this simply an IETF expected norm for RFCs?

6 Security Considerations

I'm concerned that the requirements referred to here need to be improved significantly, and in many sections (e.g. 6.3) simply speculate on possible options.

6.1 Authentication between Tracker and Peers

Please, please, please spare me from the hell that is implementing OAuth 2.0, just to use a simple tracker protocol. Do we have any idea what the impact of this is on the fast-start ideals put forward in PPSP itself?

6.4 Pro-incentive parameter trustfulness

This discussion really belongs in the peer protocol. We've IMO made an effort there to avoid specifying incentive schemes and methods; I don't believe this should be included in the tracker protocol.

Alternatives - using (m)DNS for location

NB doesn’t meet all requirements for TP at this point however I believe the information is already available within the peer protocol.

  • obtaining a list of peers (leecher mode) could be supported via simple DNS or MDNS within a link-local network (e.g. a single site)

  • proposing a new content streem (seeder mode) can also be supported via DNS...

  • in practice I think a peer in either mode will not participate in multiple swarms from a tracker's point of view, i.e. for a given hash a peer (even a high-end datacentre based peer) will likely not participate in multiple swarms for the same hash, i.e. the same hash may be provided through multiple peers

  • a short TTL, and the option of providing different DNS responses to different peers (IP addresses) has worked well in other geo-aware services I've worked in.

  • this could be supported functionally by using a ppsp:// url:

    ppsp://peer.id/hash?number&key1=value1&key2=value2&key3=value3

where key/value represent the desired swarm parameters such as chunk size etc. when the peer.id DNS name is looked up, the DNS server will return the list of peers in the usual DNS format. TTLs ensure that the operator can trade off stability for accuracy. We could debate using PTR records or other approaches but this is definitely an option.

I'd like to see a defined / standard port for the PPSP layer too btw, I'm not sure if the TP also needs its own one.

I'd see this as very similar to a simple web page with a variety of links with different peer options. In practice I think peer streams will be defined once by an operator, and the user e.g. in a web browser

Nits

  • "Then the leecher starts to initiate" -> "Then the leecher initiates"
  • "If the leecher plan to switch to another straw, it will initiate " s/straw/swarm/
  • "How the portal learn the encoding type" s/learn/learns/
  • "peer obtain above information" s/obtain/obtains/
  •   Content-Lenght: <ContentLenght> -> Content-Length: <ContentLength>
    
  • "Priority: the preference of IP address on which the requesting peer get the swarm." -> ?? I don't understand this at all sorry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment