dch/ppsp-tp-feedback.md

## ppsp-tp-feedback.md

      
    Raw
  

              ppsp-tp-feedback.md
            
          
    Tracker Review

Summary


the draft is too long for the 4 requirements from rfc6972
much of the included content is not directly relevant and arguably confusing
there is not enough separation of concern between TP and PPSP
technically it might be replaced with simpler alternatives (with some
compromises)
I'm not intending to implement this in its current form

Requirements from rfc6972

These are the minimum requirements a tracker implementation would need to
meet, if it were to comply with the original requirements.
section 6.3 specifies the
following requirements for the TP:
PPSP.TP.REQ-1


MUST be able to get a list of peers. The tracker may optionally tailor
that response.

PPSP.TP.REQ-2


The tracker protocol MUST report the peer's activity in the swarm to the
tracker.

I'm interested in how much information MUST be reported. We already
participate in the swarm, and I'd like to minimise duplication.
PPSP.TP.REQ-3


the tracker MUST take the frequency of message exchange and bandwidth use
into consideration when communicating chunk availability information.

Why is this a requirement for the tracker? This is a swarm aka peer to
communicate chunk information,
PPSP.TP.REQ-4

The tracker protocol MUST have a provision for the tracker to authenticate
the peer.
Detailed Feedback

Right now I don't see a need to implement TP at all. i.e. all the
information I need to locate a suitable swarm can be provided with far
simpler approaches, and the bulk of the information about management of a
swarm can be provided from the controlling / seeding peers I have within the
swarm. Finally, if my peers are not involved in the swarm at all, I don't
see a need to provide a tracker service for that swarm at all. The only
useful feature is to be able to re-request a list of valid peers from the
tracker if for some reason the current ones were either found to be
unreliable or unavailable.
The tracker protocol would be much simpler if in fact most of it were
moved back into the peer protocol. There is significant duplication required
of peer state, chunk lists, etc as a result of this. We have transaction
ids, which are effectively duplicating channels that already exist in PPSP.
And each of these now requires trackers & peers to duplicate that state as
well, with more & more almost-duplicated code.
Clearly I wasn't involved in the earlier discussions & decisions to split
these in the WG, but as an implementer looking for alternate ways to provide
the same functionality.
There is clear value for telcos and providers in a lot of this information,
but that doesn't mean we need to specify and mandate it in the RFC. What
would happen if this functionality was left to providers and implementers,
to include within their products? A simple extension field to include
arbitrary information or fields could be all that is sufficient. Approx 50%
of the draft covers MAY use cases that are only of use if both the tracker
and the peer support these features; there's a strong chance that in wider
deployment the peers connecting to a swarm will be RFC compliant but not
identical providers. Consider bittorrent and http clients/servers as a
suitable example.
Useful features such as certification and authorisation, could add
considerable value and justify a separate tracker protocol. So we need to
make our minds up, what is the TP here to do, what is the minimum feature
set to do so, and do we need a separate protocol / transport, along with all
the code & memory to operate this in practice?
Objective


communication between trackers and peers
provide meta info
report streaming status
obtain peer lists from trackers (in addition to PPSP internal swarm
discovery)
used for content registration and location

Concerns


lack of clarity in some places (2.2, ...) of what is IN the tracker
protocol, or simply a dialogue to support the discussion & understanding
of the flow.
This is clear when you understand PPSP in detail, but will not be so for a
newcomer to the RFCs.
too much padding information (... not in the scope of this document...).
Leave these out completely.
use of XML to support PPSP, a protocol designed to be efficient &
lightweight, exchanging data that does not need to be self-validating.
It's not that complicated! Let's slim this down.
conflation of connecting to a swarm with swarm management (stats etc)
we will need to be very careful about looking "like" HTTP. Most of the
internet (firewalls, proxies etc) will simply assume we are HTTP. So we
do need to be compliant with that completely. Extensions will be tricky.
a number of the proposed parameters are simply impossible for a peer to
obtain reliably, and arguably of no benefit in its current form.

Review

2.2 Enrollment and Bootstrap

There are 3 phases as I see this:

locate the content you want to receive (outside tracker protocol)
receive enough information to contact the swarm (hash, some peer
addressses)
connect to the swarm (peer protocol, outside tracker again)
send periodic updates to the tracker
possibly leave the swarm & tracker

Not clear whether this diagram is a mandatory to implement
2.4 State Machines

Would make more sense to call STARTED state as ACTIVE or RUNNING.
Do we really need a state machine defined per peer in the tracker protocol?
I'd be happier to leave that out and simply say that the tracker needs to
keep track of the involved peers, and recommend what information a tracker
might need to observe. It's important to understand that most of this
information is not required for inter-operability.
3.1 Request/Response Syntax and Semantics

Re format, please pick one, binary or text. IMO binary is the way to go
but I'm open to discussion. If binary, drop HTTP compliance. If text, keep
HTTP.
Please split this section into explanatory text & a separate table, with
MUST/MAY etc. Personally I find C struct approach hard to follow,
considering the number of optional parameters. Are there alternative formats
used in other RFCs that we could use?
The majority of information requested as MAY should IMO be dropped. It will
likely not be implemented, and is of arguable benefit.
E.g. what is the significance of the asn or the connection type, whether I'm
using NAT or not from the tracker's perspective, if I have concurrent active
links (host common is that?), what my bandwidth level is? And how does a
client know this information? Is it important that I'm connecting over 3G,
when in fact at my home, 3G is more reliable and faster than ADSL (outside
business hours!). How would the behaviour of a peer in the PPSP layer
change, if this information was known? I posit not at all.
4.1 CONNECT Request

The connect request with LEECH or SEED suggests a binary mode of operation
for peers. this will not be the case, for example a live streaming peer may
never get to a full SEED state, having dropped initial packets. This is not
Bit Torrent ;-).
CONNECT LEAVE seems weird to a native English speaker. It's like clicking on
Start button in Windows to Shut Down the computer.
"When a peer plans to leave a previously joined swarm, it should set"
Please use normative language here; MAY/SHOULD/MUST. IMO this information is
already available in the peers themselves, as a list of active peers. A peer
may, as a result of loss of connectivity, not be able to send a CONNECT
LEAVE message.
4.2 FIND Request

Must be specific about how a well-formed FIND request can be validated, and
also what happens if it is NOT valid. Do we ignore them?
4.3 STAT_REPORT

Re Keep-alive, duplicates functionality within the peer protocol itself.
4.4 Error and Recovery conditions

"If a peer fails to read" does this mean that if the peer receives but
cannot parse the response? If so, re-sending will not help. The peer should
re-send its tracker request, rather than require the tracker to decide if
the condition is transient or not. Let's push state management out to peers
and not try to manage it in the tracker, where possible.
5 Operations and Manageability

As section 5 is mainly boiler-plate, I make a few general comments.
Why do we "propose" syslog and SNMP? These are surely internal management
tools, and of no relevance to an internet-facing tracker protocol. Is this
simply an IETF expected norm for RFCs?
6 Security Considerations

I'm concerned that the requirements referred to here need to be improved
significantly, and in many sections (e.g. 6.3) simply speculate on possible
options.
6.1 Authentication between Tracker and Peers

Please, please, please spare me from the hell that is implementing OAuth
2.0, just to use a simple tracker protocol. Do we have any idea what the
impact of this is on the fast-start ideals put forward in PPSP itself?
6.4 Pro-incentive parameter trustfulness

This discussion really belongs in the peer protocol. We've IMO made an
effort there to avoid specifying incentive schemes and methods; I don't
believe this should be included in the tracker protocol.
Alternatives - using (m)DNS for location

NB doesn’t meet all requirements for TP at this point however I believe
the information is already available within the peer protocol.


obtaining a list of peers (leecher mode) could be supported via simple
DNS or MDNS within a link-local network (e.g. a single site)


proposing a new content streem (seeder mode) can also be supported via
DNS...


in practice I think a peer in either mode will not participate in
multiple swarms from a tracker's point of view, i.e. for a given hash a
peer (even a high-end datacentre based peer) will likely not participate
in multiple swarms for the same hash, i.e. the same hash may be provided
through multiple peers


a short TTL, and the option of providing different DNS responses to
different peers (IP addresses) has worked well in other geo-aware services
I've worked in.


this could be supported functionally by using a ppsp:// url:
ppsp://peer.id/hash?number&key1=value1&key2=value2&key3=value3


where key/value represent the desired swarm parameters such as chunk size
etc. when the peer.id DNS name is looked up, the DNS server will return the
list of peers in the usual DNS format. TTLs ensure that the operator can
trade off stability for accuracy. We could debate using PTR records or other
approaches but this is definitely an option.
I'd like to see a defined / standard port for the PPSP layer too btw, I'm
not sure if the TP also needs its own one.
I'd see this as very similar to a simple web page with a variety of links
with different peer options. In practice I think peer streams will be
defined once by an operator, and the user e.g. in a web browser
Nits


"Then the leecher starts to initiate"
-> "Then the leecher initiates"
"If the leecher plan to switch to another straw, it will initiate "
s/straw/swarm/
"How the portal learn the encoding type" s/learn/learns/
"peer obtain above information" s/obtain/obtains/

  Content-Lenght: <ContentLenght> -> Content-Length: <ContentLength>


"Priority: the preference of IP address on which the requesting peer
get the swarm." -> ?? I don't understand this at all sorry.