public
Last active

Possible way to provide POSTable URI in RDF

  • Download Gist
gistfile1.txt
1 2 3 4 5 6 7 8 9 10 11 12
<http://www.amazon.com/gp/product/B000QECL4I>
eg:reviews <http://www.amazon.com/product-reviews/B000QECL4I> ;
eg:order "http://www.amazon.com/gp/product/B000QECL4I{?copies}" ;
.
 
and then the definition of eg:reviews would say "the object of this property
provides reviews of the subject of this property" and the definition of
eg:order would say "POST to the URI generated by expanding the URI template
value of this property where the copies variable is the number of copies to
be ordered"
 
dunno on question of whether URI template should have its own datatype

if we look at the order and assume you have to POST payment information (as an entity, not as URI parameters), you would need to communicate the payment scheme either in the "order" definition, or ideally, clients could do an OPTIONS on the payment URI and see what media types (i.e., payment types) the order service accepts. how would you see this scenario playing out in the context shown here?

assuming the POST URI is not parametrized, it would become a regular URI and not be a literal, right? assuming that, how would that change the overall processing model of RDF (or maybe it's more appropriate to say "Linked Data" here) where the assumption is that you can always GET things from URIs? would you assume that the GET then GETs a description of what needs to be POSTed and how? that might even make some sort of sense, logically, but is probably not something you'd like to implement that way, for performance reasons to start with. but is that how interactions would logically work in RDF? the interaction resource is identified by URI and describes itself in terms of how you interact, and what that means? to some extent, that is exactly what media types do, they specify for hypermedia links what it means for a client to traverse that link, and how it is supposed to do that. so would RESTful RDF clients make interaction-describing RDF GETable at each hypermedia URI? if that were the case, that would need to seep deep down into the processing model of RDF clients, right? if you GET such a behavior description, it's not data to treat as static data, it's a recipe how to behave when interacting with that URI according to the service's model.

As discussed on twitter, I think I'd put the definition of what you would POST etc in the definition of the property for the vocabulary, because an application that uses the data to take action really needs to understand what happens in the real world when they do that POST, and that's all completely vocabulary specific. I'd expect the application to have hard-wired understanding of the vocabulary (based on documentation/specification of the vocabulary) to use such data. So I guess that auto-discovery of the capabilities of the URI might be useful, but only to the extent that it is in any other RESTful protocol; some of it can be discovered as you suggest through an OPTIONS, does an application really need to know more?

i think now we're really getting somewhere. you're saying "i'd expect the application to have hard-wired understanding of the vocabulary (based on documentation/specification of the vocabulary) to use such data", and that makes a lot of sense to me. because that's how it is in REST, as you say. but there's one more thing that's an important part in REST: you package the assumptions around what to expect in terms of vocabulary and actionable links (and what all of that means) in media types, because there is an application scenario for which that media type is defined, and clients need to understand that context and support it. when you make that step, then i think you pretty much where REST is, only that you've decided on an encoding of the behavior in RDF (instead of going the more informal route that most media types go), but that does not really achieve all that much, because you need specific implementations for that application anyway, so developers need to read the documentation so that they can implement the client. at that point, the RDF might be a nice starting point for generating documentation, if it contains human-readable descriptions, but that would be about it, right? if a client encounters a "link type" it hasn't been coded for, it would need to stop and say "i've found a link i could follow because there's a property that tells me that it is a actionable link and i can also find part of the required signature how to follow the link, but i don't really know what happens when i follow it, so i won't follow it."

one last thing: in RESTland, how would those hard-wired capabilities (i.e., supported interaction protocols and vocabularies exchanged in the context of these protocols) be communicated between clients and servers? by using media types, so that instead of GETting application/xml and then being presented with a bunch of XML where i don't know the meaning of anything, there's a shared understanding of the application-level protocols (in terms of media type interactions, not in terms of HTTP as the foundation for them), and clients and servers can make their support explicit by using media types identifiers in HTTP conversations. [[ and i think by now pretty much everybody would agree that media type identifiers should be URIs instead of being what they are, but that is probably not going to change anytime soon, even though it would be a better design: http://chmod777self.blogspot.com/2012/04/on-future-of-mime-media-types.html ]]

An alternate approach to this is to model the service that does the ordering, and then relate a resource to that service. This has the advantage of describing more of the interaction, i.e. the HTTP method, acceptable media types and parameters. This basically mirrors HTML forms into RDF. This is what Mark Baker suggested as part of RDF Forms.

E.g.

<http://www.amazon.com/gp/product/B000QECL4I>
  ex:Product;
  ex:reviews <http://www.amazon.com/product-reviews/B000QECL4I> ;
  ex:order <http://example.org/order-service> ;

<http://example.org/order-service>
  rdf:type ex:Service, ex:OrderService;
  ex:method "POST";
  ex:mediaType "application/x-www-form-urlencoded";
  ex:param [
    rdfs:label "item";
    ex:type ex:Product;
  ];
  ex:param [
    rdfs:label "copies";
    ex:type xsd:integer;
  ];

We can do a GET on the order service to retrieve a description of how to interact with it. Clients can implement support for the generic form interaction. This reduces the amount of specification needed for each new property and provides some hope that a client could still offer some useful behaviour for a newly encountered service (e.g. rendering a form).

Its worth noting that we currently do refer to services in Linked Data. VoID allows us to refer to a sparql endpoint, an item description service or a search endpoint as URIs associated with a dataset. But it doesn't say how to interact with them, clients have instead implemented operational semantics for each of the specific predicates. This approach can cleanly extend that.

I had hoped that SPARQL Service Description might help achieve that for sparql endpoints, but the WG has avoided specifying any relationship to VoID. It has also unfortunately decided to avoid describing the endpoint service explicitly, instead using a "endpoint" property to refer to its URL; which is a mistake I think. (See example at http://www.w3.org/TR/sparql11-service-description/#example)

@dret re: "the RDF might be a nice starting point for generating documentation, if it contains human-readable descriptions, but that would be about it, right"

Well if there's a richer description of the service, as I outlined in my other comment, then we can do more than render behaviour, we could query for services that can offer particular types of interaction, or can operate on particular types of resources (as described by input parameters). E.g: find me a list of annotation or review services; find me a list of services that operate on foaf:Person resources.

@ldodds, thanks :)

Having slept on it, my thought was that the right place to define the operational semantics of a given URI (eg whether and what you can POST to it) was through the class of a resource rather than on the property itself, but of course that you could get to the relevant class from the range/domain of the property. So I think a pattern like this.

A vocabulary author publishes:

ex:order a rdf:Property ;
  rdfs:comment "A resource used to order a product."
  rdfs:domain ex:Product ;
  rdfs:range ex:OrderingResource ;
  .

ex:OrderingResource a rdfs:Class ;
  rdfs:comment "A resource that can be POSTed to to order something; the POSTed details of the order must be provided through an RDF graph that includes ex:Order individuals" ;
  rdfs:subTypeOf rest:POSTableResource ;
  .

Then Amazon would publish:

<http://www.amazon.com/gp/product/B000QECL4I>
  ex:Product ;
  ex:reviews <http://www.amazon.com/product-reviews/B000QECL4I> ;
  ex:order <http://www.amazon.com/order> ;
  .

<http://www.amazon.com/order>
  a ex:OrderingResource ;
  rest:acceptable "application/rdf+xml" , "text/turtle" ;
  .

and a Linked Data Platform spec would define rest:POSTableResource as "a resource that can be POSTed to" and rest:acceptable as "a media type that can be accepted in a POST or PUT to a given POSTable or PUTtable resource". Or something.

Through OWL, the ex: vocabulary could define additional constraints on ex:OrderingResource such as that it must have at least one value for rest:acceptable, or that one of the values must be "application/rdf+xml" or whatever.

When it comes to parameters, I think the Linked Data Platform WG needs to explore the pros and cons of a URI template approach vs a service description approach: it's certainly not obvious to me whether one or other approach is better, or whether they can work together somehow. Perhaps if the URI were parameterised rather than POSTing a graph, an approach like this might work:

<http://www.amazon.com/order>
  rdf:type ex:OrderingResource ;
  rest:uriPattern "http://www.amazon.com/order{?copies}" ;
  .

ex:OrderingResource
  rest:uriParam [
    a rest:URIparameter ;
    rdfs:label "copies" ;
    rdfs:comment "The number of copies to be ordered." ;
  ] ;
  .

I'm not sure. It needs some thinking about to what extent this is useful for auto-discovery, generated documentation, generated code and so on, and to what extent applications will simply hard-code knowledge about vocabularies that they are interested in. My feeling is that we should learn from experience from the XML stack (WSDL, UDDI etc) and not over-engineer.

+1 to not over-engineering and also building on previous work as much
as possible.

I like the approach of including some service description as part of
the class but, as your example shows, a particular instance might have
different opinions on, e.g. acceptable media types.

L.

On 14 June 2012 12:28, Jeni Tennison
reply@reply.github.com
wrote:

@ldodds, thanks :)

Having slept on it, my thought was that the right place to define the operational semantics of a given URI (eg whether and what you can POST to it) was through the class of a resource rather than on the property itself, but of course that you could get to the relevant class from the range/domain of the property. So I think a pattern like this.

A vocabulary author publishes:

   ex:order a rdf:Property ;
     rdfs:comment "A resource used to order a product."
     rdfs:domain ex:Product ;
     rdfs:range ex:OrderingResource ;
     .

   ex:OrderingResource a rdfs:Class ;
     rdfs:comment "A resource that can be POSTed to to order something; the POSTed details of the order must be provided through an RDF graph that includes ex:Order individuals" ;
     rdfs:subTypeOf rest:POSTableResource ;
     .

Then Amazon would publish:

   http://www.amazon.com/gp/product/B000QECL4I
     ex:Product ;
     ex:reviews http://www.amazon.com/product-reviews/B000QECL4I ;
     ex:order http://www.amazon.com/order ;
     .

   http://www.amazon.com/order
     a ex:OrderingResource ;
     rest:acceptable "application/rdf+xml" , "text/turtle" ;
     .

and a Linked Data Platform spec would define rest:POSTableResource as "a resource that can be POSTed to" and rest:acceptable as "a media type that can be accepted in a POST or PUT to a given POSTable or PUTtable resource". Or something.

Through OWL, the ex: vocabulary could define additional constraints on ex:OrderingResource such as that it must have at least one value for rest:acceptable, or that one of the values must be "application/rdf+xml" or whatever.

When it comes to parameters, I think the Linked Data Platform WG needs to explore the pros and cons of a URI template approach vs a service description approach: it's certainly not obvious to me whether one or other approach is better, or whether they can work together somehow. Perhaps if the URI were parameterised rather than POSTing a graph, an approach like this might work:

   http://www.amazon.com/order
     rdf:type ex:OrderingResource ;
     rest:uriPattern "http://www.amazon.com/order{?copies}" ;
     .

   ex:OrderingResource
     rest:uriParam [
       a rest:URIparameter ;
       rdfs:label "copies" ;
       rdfs:comment "The number of copies to be ordered." ;
     ] ;
     .

I'm not sure. It needs some thinking about to what extent this is useful for auto-discovery, generated documentation, generated code and so on, and to what extent applications will simply hard-code knowledge about vocabularies that they are interested in. My feeling is that we should learn from experience from the XML stack (WSDL, UDDI etc) and not over-engineer.


Reply to this email directly or view it on GitHub:
https://gist.github.com/2927644

Leigh Dodds
CTO, Talis Systems Ltd
Mobile: 07850 928381
http://talis.com
http://kasabi.com

Talis Systems Ltd
The Exchange
19 Newhall Street
Birmingham
B3 3PJ

btw, with a URI template approach unless you want to build all of the operational semantics into the property (e.g. definitions of the variables, http method, media types, etc) then you need to describe at least some of the service.

As a data point, the Google Discovery API uses URI templates to aid URL construction, but still has definitions of the parameters including type indications:

https://developers.google.com/discovery/v1/using#build-compose

@dret regarding media types: I just want to make sure that you're not suggesting something like inventing application/order+rdf+xml or application/order+rdfs+foaf+dct+rdf+xml or something as you start using more vocabularies in your RDF...

Of course the way in which some data is processed needs to be "defined by the media type", but that only means that there should be a follow-your-nose method of working out how to process the data based on the media type. In the case of RDF media types, the follow-your-nose method is that the media type (eg application/rdf+xml or text/turtle) says how to build an RDF graph and then indicates how that is interpreted at an RDF level by reference to the RDF specs. The RDF specs then say that the semantics (operational/interactional or otherwise) of a property/class etc is defined by the RDF assertions about that property/class (ie by the vocabulary, which you can discover by resolving the URI for the property/class).

So you don't need to have separate media types for each combination of vocabularies used in a particular message: you just say that the message is of an RDF media type and locating the definition of the meaning of that particular message then follows naturally.

What's interesting as you know is what level of standardisation (classes/properties that can be used as superclasses/properties or to annotate classes/properties) you need to get useful generic behaviour (eg having a class for "Collection" that implies operational semantics similar to that provided in Atom feeds)...

@JeniT what i am not understanding is how that approach could possibly work in scenarios beyond reading data. i see it working in reading, where like in XML you just GET application/xml and hope for the best, searching for namespaces and either finding stuff you understand or just stop when there's nothing. but for scenarios beyond read, there must be a way to communicate expectations, and that's where media types play a crucial role. how would i know what i am supposed to PUT/POST somewhere? the web's flavor of REST tells you that this kind of expectation is communicated via HTTP, so that clients know what to transfer. state transfer means that there is an agreement between peers what to transfer in the context of an application scenario (which is covered by one or more media types), and how the flow of state between clients and servers works. you can always add on at runtime if that's within the design space of the application ("profile" is where we're trying to make this explicit as well), but you need to set the baseline (and make that visible at the protocol level) of what the state transfer has to look like to make the application work.

@dret I really don't understand what you're trying to say I'm afraid. Can you outline (preferably with this example) what you think would work, or is needed?

i'll try to keep it short and you can let me know where i should elaborate. i am making up an example, hoping it connects the dots.

  • let's say a service has an "order" link where the client is supposed to submit payment information (i.e., transfer payment state from client to server).
  • there's an expectation on the server side what a client POSTs, it must follow the payment schema supported by the server.
  • let's assume the server also supports payment according to three other payment schemes, which allow payment state to be transferred as well, two of them XML-based, one RDF-based.
  • the server should indicate what is acceptable as payment through the "order" link by using the media types of the payment protocols (similar to AtomPub's <accept> http://tools.ietf.org/html/rfc5023#section-8.3.4).
  • if a client attempts to use an unsupported payment protocol (read: it submits something not according to the payment schemas acceptable by the server), the server responds with a 415, repeating the accepted media types in the HTTP response (ideally).

media types are necessary to communicate expectations wrt state transfers (they answer the simple question: "how do you represent state, and how do you find the interaction links"), and application/rdf has the same problem application/xml has: it does not identify a model, it identifies a metamodel. it took a while for the XML and JSON communities to start minting meaningful media types instead of the generic ones, but now we're getting there. we have to expose a service's interaction semantics within the fabric of the web, not in a framework that requires clients to reach within specific representations.

@JeniT, i am just repeating my question here, because i really would like to see how your scenario would work. you say that "you don't need to have separate media types for each combination of vocabularies used in a particular message: you just say that the message is of an RDF media type and locating the definition of the meaning of that particular message then follows naturally." once again, for GETting RDF that might work, but how does that work when a client is supposed to PUT something? the client has application state (a rather abstract "order intent") and needs to get that to the server. representing that order intent needs a framework where the client knows which order representation is acceptable to the server. there could be various order representations in RDF that a client knows about (because it may be capable of talking to different services, for example), so if the "order" link simply is described by "send me some RDF", how does the client decide which order vocabulary to use? please explain to me how you see this working, i really want to understand this part of the puzzle.

@dret I think I see the point you're making: you need to express what you can POST or PUT to a given URI using some method that makes those restrictions discoverable by HTTP machinery. I've looked at HTTPbis and I can't immediately see where that machinery is. You imply that there's something like an "Accept-Content-Types" header that enables a server to list acceptable content types for POST/PUTting, but I can't see it. Can you point me at it or if there isn't one, explain how a server that supported POSTing using a particular +json media type would express that constraint (I guess in response to an OPTIONS request on the URI)?

i'm afraid you're right that for a reason i cannot think of right now, Accept is a request header only, which means you could put the list in a 415 error document, but that would be convention and not the standard. so no, there is no machinery communicating the list of acceptable media types back to the client in case of an error. but looking at the first part of the question, how to even know what to submit, there is a way how that state is transferred to the client (here's the book info, there's the order link, and here's what to POST to it as payment info), and my question still is how you would communicate that expectation to the client. the client needs to know what it is expected to submit, so just telling it "submit RDF" is not sufficient.
asking the same question in a different way: in existing RDF services, how does that work? does a client just "know" what it is supposed to POST/PUT to a given URI if it wants to interact with the service in the context of an application? how does it acquire that knowledge? in media types, that knowledge would be coupled to the link relation, either implicitly (submit something using this vocabulary when traversing such a link), or explicitly (often using link/@type or link/@accept attributes in XML vocabularies). this allows clients to choose according to their capabilities and preferences, if servers provide alternatives, and those alternatives are communicated through media types. new capabilities may show up when a server starts supporting additional interactions, but clients often need to be updated (learning about the new media types) to be able to take advantage of these new capabilities.

Hi,

My understanding is as follows. I'll make reference to @mamund's H Factors. I'm keen to see where my understanding breaks down:

  • A media type can provide templated queries (LT) and support for updates (LN, LI). Importantly, it can also provide control factors to indicate meaning of a link (CL), and also what media types should be used for an interaction (CR, CU). This allows the communication of the necessary information from a server to a client (which has requested an instance of that format) about how to construct and format requests, including what it is happy to accept.

  • A media type also captures the expected content of a request. E.g. the structure of an order element. This allows clients and servers to have a shared understanding of how to exchange some state

Atom uses link elements, in the body of a message in order to indicate edit links, etc. HTML uses form elements. In both cases we don't communicate at the HTTP level anything other than the media type. But we do use media types in the content to provide the necessary control information.
RDF doesn't provide a way to annotate links, e.g. to add media type. But we can annotate properties in a schema, or model interactions more explicitly, i.e. similar to HTML forms. I think this would let us add in the missing H Factors. Now, lets assume we took the approach I described earlier in the discussion -- i.e. forms based service description.

This does give us the ability to describe, for example, that a service could accept a SPARQL query (application/sparql-query) e.g. to execute it or maybe to store it for later execution. We could also describe services that accept application/atom+xml or whatever.

We can also say that a service supports submission of application/ld+json or application/rdf+xml or text/turtle. This is fine in the case where we're advertising the capabilities of a storage service. E.g. a graph store to which I can send some data for later querying. As no further knowledge of the graph contents is required by the server.

But if we want to describe the format of a graph that describes an order, then according to REST over HTTP, we really need a media type: e.g. x example/order+turtle (or something). That allows us to document the required graph structure in a media type and achieve some shared understanding between client and server.

An RDF Schema or OWL ontology don't let us describe the structure of a graph in the same way that we could define a schema for an order document in XML. So that doesn't give us enough leverage.

The question is: are people creating services that exchange RDF documents in this way, that do more than just store or update data? It might be worth looking for examples of what services people are creating?

The services that I most commonly see referenced in Linked Data aren't RDF consuming services: they're SPARQL endpoints, search endpoints, item description endpoints, all of which use other media types or simple link construction. The other kinds of services that are being used are SPARQL Update, HTTP Uniform Protocol, and services that simply store or patch RDF. Others might exist, but they're not yet common AFAICT.

But having said that I'm not sure I see a big problem with creating, say, a media type for an "order graph" expressed in turtle, if that helps document what a service expects. So long as there is a way to capture the media types in an RDF description, then there will be sufficient support to enable that, I think.

If we don't use media types then we could plug the gap by offering other methods of describing the necessary input to a service, but this would be bespoke to RDF.

| Atom uses link elements, in the body of a message in order to indicate
| edit links, etc. HTML uses form elements. In both cases we don't
| communicate at the HTTP level anything other than the media type. But
| we do use media types in the content to provide the necessary control
| information.

yes, from this point of view, Atom and HTML are at the exact same level.
fwiw, we're thinking about adding query capabilities to Atom
(http://geofeeds.org/earthquakes/query_schema.xml is what we've
experiemented with so far, and at http://geofeeds.org/client/map_app you
can see how these declarative queries drive runtime form generation, and
the fact that all the data is spatial is just an implementation detail;
all data is feed-based in this scenario), and with the parameter
specification and URI template (we're a bit richer than HTML when it
comes to parameter types), Atom then is pretty much exactly where HTML is.

and just as a side note: our feed queries of course easily could be
mapped to a SPARQL query in the back-end, should the back-end be
implemented in a way that manages data in an RDF store.

| RDF doesn't provide a way to annotate links, e.g. to add media type.
| But we can annotate properties in a schema, or model interactions more
| explicitly, i.e. similar to HTML forms. I think this would let us add
| in the missing H Factors.

yes it would. the difference would be that in many other scenarios, you
specify the media type and that's almost always human-readable (schema,
interactions, processing model, and so forth, a lot of prose, usually),
because there's a limit to what you can do in machine-readable formats
anyway. and like @JeniT mentioned, clients will have to be hand-coded
for supporting these scenarios anyway.

interestingly, neither XML nor JSON ever made the step to add links to
their general model. XML and JSON have no idea about links, it's only
the vocabulary and its semantics that allow a client to understand that
something is a link. i don't think that this was a conscious decisions,
but i think it demonstrates that overformalizing at least in those cases
("let's build an abstract layer for representing the link concept and
then we build media types on that") was not what people were doing. Atom
sort of did that, but in a way that is directly and immediately useful,
and also has a good extension model.

| Now, lets assume we took the approach I described earlier in the
| discussion -- i.e. forms based service description.
| This does give us the ability to describe, for example, that a service
| could accept a SPARQL query (application/sparql-query) e.g. to execute
| it or maybe to store it for later execution. We could also describe
| services that accept application/atom+xml or whatever.

just to note: SPARQL is always a slippery slope here because it usually
build on the assumption you're SPARQLing into the back-end data, right?
in most other information management scenarios today, a lot of machinery
has been developed to avoid this; decoupling the service logic (i call
it the service surface) and the service's data model in the back-end has
proven to be a good idea for a variety of reasons, ranging from security
to performance issues. multi-tier is what pretty much everybody does.

| We can also say that a service supports submission of
| application/ld+json or application/rdf+xml or text/turtle. This is
| fine in the case where we're advertising the capabilities of a storage
| service. E.g. a graph store to which I can send some data for later
| querying. As no further knowledge of the graph contents is required by
| the server.

absolutely. if your service is a generic "store and query RDF" service,
then generic media types are the way to go.

same for XML: https://twitter.com/dret/status/213363704803241984

even though for some additional concepts exposed by such a generic
storage service (collections, users, all kinds of management such as
service load), you would also have a specific "service surface" how
these concepts are exposed ("i want to buy more storage space, here's my
payment info and the order").

| But if we want to describe the format of a graph that describes an
| order, then according to REST over HTTP, we really need a media type:
| e.g. x-example/order+turtle (or something). That allows us to document
| the required graph structure in a media type and achieve some shared
| understanding between client and server.

yes, i absolutely agree with that, but i think @JeniT disagrees with that.

| An RDF Schema or OWL ontology don't let us describe the structure of a
| graph in the same way that we could define a schema for an order
| document in XML. So that doesn't give us enough leverage.

the problem is that RDF has no concept of validation. but there should
be something similar, right? checking a graph against expectations can
surely be done somehow, is there some framework for that?

| The question is: are people creating services that exchange RDF
| documents in this way, that do more than just store or update data? It
| might be worth looking for examples of what services people are
| creating?

that is a very good question and a very important one. and it seems to
me that as soon as you move out of the fairly tightly coupled scenarios,
where people freely query each others stores, there's no way you can
continue doing it. the security implications are enormous. if i do BI
and aggregate all kind of company data using linked data, will i expose
a SPARQL endpoint over that dataset to 3rd-party analytics components?
in many cases, that might be immediately illegal in terms of data
privacy laws, and any risk analysis would immediately flag these things.
you need a service surface that only exposes what you want to expose
("here's how many health insurance applications we received in the last
24hrs"), and then you map that to some canned SPARQL. anything else just
doesn't fly in a decentralized and open environment.

and once you've come to the conclusion that in such a world, you need
service surfaces, then the question whether you design these in RDF,
XML, or JSON becomes a question of how you can realize the biggest
value, or maybe you design two or all three of them, if there's demand.

| The services that I most commonly see referenced in Linked Data aren't
| RDF consuming services: they're SPARQL endpoints, search endpoints,
| item description endpoints, all of which use other media types or
| simple link construction. The other kinds of services that are being
| used on SPARQL Update, HTTP Uniform Protocol, and services that simple
| store or patch RDF.

you're correct, and that is the reason why uptake in the enterprise
world has been close to zero. when you tell people you're going to
expose generic query capabilities to a potentially vast collection of
enterprise data, they get "SQL injection" and similar flashbacks, and
rightly so. enterprise data needs to be protected, and like i said
above, for health and similar data, you'll actually end up in jail if
you happily expose all the data you have.

EMC has mostly very large customers, and our biggest selling point for
many products is compliance: when you buy our stuff, we give you many
controls over how you can make sure that the right things get exposed to
the right people and services. this becomes mind-bogglingly complex when
you're a company that has many thousands of suppliers (we have cases
like this). we would like to leverage linked data's capabilities to
aggregate data from many sources and make sense out of them, but we
absolutely need services that we can customize and control (we need a
services platform to build on, that's why we joined LDP). otherwise,
nobody will buy the things we make, and rightly so. the vast majority of
data out there is not open, so instead of trying to do LODP, our working
group really should be LDP. (hey, i like that. maybe that will become my
new slogan!)

| But having said that I'm not sure I see a big problem with creating,
| say, a media type for an "order graph" expressed in turtle, if that
| helps document what a service expects. So long as there is a way to
| capture the media types in an RDF description, then there will be
| sufficient support to enable that, I think.

ok, i think we're getting somewhere. how we represent the service
surface is a question we have to discuss, and like i said, it should be
driven by how much value we create based on possible consumers (ours
are, since we're very cross-platform, mostly XML). the really important
aspect is that we are creating RESTful services based on media types.

thanks a lot for taking the time to go through this, your comments about
"everybody is just remote-SPARQling anyway" helped me a lot to
understand why different people see different problems and solutions.

On 15 Jun 2012, at 23:17, Erik Wilde wrote:

But if we want to describe the format of a graph that describes an
order, then according to REST over HTTP, we really need a media type:
e.g. x-example/order+turtle (or something). That allows us to document
the required graph structure in a media type and achieve some shared
understanding between client and server.

yes, i absolutely agree with that, but i think @JeniT disagrees with that.

I don't disagree that a custom media type is useful, I am questioning whether it is the only thing that works.

At one level, I have a pragmatic concern that there are multiple syntaxes for RDF, and people who operate LDP-based services will find it burdensome to define specific media types for each of the different flavours: text/vnd.amazon.order+turtle, application/vnd.amazon.order+xml, application/vnd.amazon.order+json and so on. (Note that I'm assuming that the +xml variant is RDF/XML and the +json variant is JSON-LD if we went down this path we should work with IETF to define a structured syntax suffix registration for at least +turtle.)

At another level, I want there to be specification-level clarity that states that a custom media type for each service that accepts a POST/PUT, and guidance on how to use them. It is not clear to me, when someone says "according to REST over HTTP we must..." which specification they are referring to where this constraint is specified. It could be:

  1. that the HTTP specification states that on an OPTIONS request, the server
    MUST provide a response with a (eg) Accept-Content-Types header that lists
    acceptable media types, and further that all POST/PUT requests that include
    content that is valid according to that media type MUST be successful (ie
    that the media type given in Accept-Content-Types must be defined at
    a granular level, so you can't just say Accept-Content-Types: application/xml
    unless you really do accept all XML)

  2. that the need for a specific media type is only actually at the level of a
    REST best practice rather than a constraint at the HTTP specification
    level, but we want to make it a tighter constraint in LDP because we want
    LDP to follow all REST best practices

#1 does not seem to be the case. I'm totally fine with #2 as long as we are honest that this is what we are doing and provide sufficient detail such that developers writing servers and clients know what they need to do to satisfy it.

I think that the REST best practice is not so much "the constraints on a POST/PUT should be identified through a media type" as "the constraints on a POST/PUT should be discoverable". @dret said:

in media types, that knowledge would be coupled to the link relation, either implicitly (submit something using this vocabulary when traversing such a link), or explicitly (often using link/@type or link/@accept attributes in XML vocabularies). this allows clients to choose according to their capabilities and preferences, if servers provide alternatives, and those alternatives are communicated through media types. new capabilities may show up when a server starts supporting additional interactions, but clients often need to be updated (learning about the new media types) to be able to take advantage of these new capabilities.

Taking Atom as an example of good RESTful practice, I note that its link element has a @type attribute, but it is only defined in terms of the media type of the response to a GET on the @href, not on limiting what can be submitted when POSTing to that URI. The edit link relation defined in the Atom publishing protocol doesn't say anything about the interpretation of the @type attribute in this context either. The Atom service descriptions do have an accept element, but it's not specified how these are located. It would be really good to have an example of an RESTful API that is actually doing this right, that we could follow. Presumably you have an example in mind, @dret?

But anyway, let's explore some possible patterns in an XML world. As @dret said, the first possibility would be for the link relation to implicitly describe what is expected by the endpoint, so when you GET information about a product, it includes a link like:

href="http://amazon.com/order" />

and by knowledge of the link relation http://example.com/relation/order (which is presumably defined at that URI, although there's no constraint to make that so within Atom so far as I can tell), an application can work out what it can send to the endpoint. This only works if there aren't endpoint-specific constraints on what's acceptable.

A second pattern is that the owner of the web service specifies a media type application/vnd.order+xml and that's used in the @type attribute, with the link relation http://example.com/relation/order specifying that the @type attribute indicates the media type of what can be POSTed to the URI in the @href:

href="http://amazon.com/order"
type="application/vnd.amazon.order+xml" />

A third possible pattern would be to have the application/xml media type specify some media type parameters that enabled people to specify a schema location for and document element of some XML (there are multiple ways to cut that of course; I'm more interested in the pattern of using media type parameters than the niceties of what that would mean for XML). In that case, the link would look like:

href="http://amazon.com/order"
type='application/xml;schema="http://amazon.com/schema/order.xsd";root="{http://amazon.com/schema/order}Order"' />

A fourth possible pattern would be to add @x:schema and @x:root attributes to Atom's link element to provide equivalent information, like this:

href="http://amazon.com/order"
type="application/xml"
x:schema="http://amazon.com/schema/order.xsd"
x:root="{http://amazon.com/schema/order}Order" />

A fifth pattern would be to not define anything on the link element itself, but for the documentation of the link relation http://example.com/relation/order to state that applications can query on the @href URI using the OPTIONS method, and what should be returned in that case, and for that response to specify the constraints. So the document containing the link would have:

href="http://amazon.com/order" />

just like in the first example, but doing an OPTIONS request on http://amazon.com/order would result in something like:

xmlns:atom="http://www.w3.org/2005/Atom">

atom:titleAmazon/atom:title

atom:titleOrders/atom:title
application/vnd.amazon.order+xml


with of course also the possibility for the accept element in this case to follow any of the patterns above.

There may be other plausible patterns. All these patterns are possible for RDF-based services too.

My hypothesis is that it's impossible in the general case to specify all possible constraints on acceptable POST/PUT entities. Some constraints are going to be unknowable because they depend on the state of the world at submission time (eg are there sufficient items in stock to fulfil the order). Other constraints are going to be endpoint specific (eg is the item of a type that the vendor sells).

So you have to draw the line somewhere. I think as a developer the crucial thing is discoverability: I would prefer to have the link relation/link/endpoint specify application/xml plus the schema and document element of the expected XML than for it to specify an unregistered media type of application/vnd.amazon.order+xml. But I may have missed some REST theory that states that this is not a good way of specifying constraints?

The equivalent for RDF would be for the property/endpoint metadata/endpoint itself to specify an RDF serialisation (application/rdf+xml, text/turtle etc) plus something that defines acceptable RDF graphs. As @ldodds said:

An RDF Schema or OWL ontology don't let us describe the structure of a
graph in the same way that we could define a schema for an order
document in XML. So that doesn't give us enough leverage.

the problem is that RDF has no concept of validation. but there should
be something similar, right? checking a graph against expectations can
surely be done somehow, is there some framework for that?

This is a gap in the RDF stack (and one that's come up a few times during TAG discussions over the last few days). OWL inference can be run in a "closed world" mode that does a kind of validation. We have SPARQL graph patterns, but using them as a means of validating RDF would be like doing XML validation solely through XPath expressions. It would be nice to have a grammar more like RELAX NG for RDF graphs; I think that Eric Prud'hommeaux is interested in doing something like that, but it would surprise me if there weren't something similar around already from which we could learn.

We should really be on the LDP mailing list to discuss this rather than here...

| At one level, I have a pragmatic concern that there are multiple syntaxes for RDF, and people who operate LDP-based services will find it burdensome to define specific media types for each of the different flavours: text/vnd.amazon.order+turtle, application/vnd.amazon.order+xml, application/vnd.amazon.order+json and so on. (Note that I'm assuming that the +xml variant is RDF/XML and the +json variant is JSON-LD if we went down this path we should work with IETF to define a structured syntax suffix registration for at least +turtle.)

i absolutely agree that this is not nice on a variety of levels. i
wouldn't get my hopes too high on fixing the media types spec, though.
there's a lot of history to it, it's even bigger than the web, so making
any changes is a very sensitive thing to do. regarding the suffixes,
maybe that's something that could be done, but you'd end up answering a
lot of questions that are very hard to answer.

| 1. that the HTTP specification states that on an OPTIONS request, the server
| MUST provide a response with a (eg) Accept-Content-Types header that lists
| acceptable media types, and further that all POST/PUT requests that include
| content that is valid according to that media type MUST be successful (ie
| that the media type given in Accept-Content-Types must be defined at
| a granular level, so you can't just say Accept-Content-Types: application/xml
| unless you really do accept all XML)

i agree that this is not written down in these absolute terms anywhere.
and as you know, 99.99% of application/xml services then would have to
be application/xdm anyway (if there were such a media type).

| I think that the REST best practice is not so much "the constraints on a POST/PUT should be identified through a media type" as "the constraints on a POST/PUT should be discoverable". @dret said:

i like the term discoverable here, but then again the question remains
through what means. it could be HTTP (even thought it's not mandatory),
it could be registrations somewhere (that's the media type route), or it
could be through runtime mechanisms (which then need machinery that is
capable of using them).

| Taking Atom as an example of good RESTful practice, I note that its link element has a @type attribute, but it is only defined in terms of the media type of the response to a GET on the @href, not on limiting what can be submitted when POSTing to that URI. The edit link relation defined in the Atom publishing protocol doesn't say anything about the interpretation of the @type attribute in this context either. The Atom service descriptions do have an accept element, but it's not specified how these are located. It would be really good to have an example of an RESTful API that is actually doing this right, that we could follow. Presumably you have an example in mind, @dret?

atompub does specify the expected media types in the media type
registration itself (defining the link relations and what clients are
supposed to do when the follow these links). i haven't written the spec,
but i assume the idea was to only specify those media types which are
dynamic at runtime (@accept). service descriptions are discoverable
through "service", which for some reason i still don't understand is
listed in
http://www.iana.org/assignments/link-relations/link-relations.xml as
specified in RFC 5023, when it very clearly isn't. @jasnell may have the
background on this, but i think it became apparent that making service
documents discoverable was a good idea, and adds very little overhead
(just one link relation).

i think overall, atompub gets it right. like you mentioned earlier,
clients need to be coded to support these interaction patterns of a
media type anyway, and because of that, it does not hurt that not all
expectations about media types in link interactions are discoverable at
runtime. only if they are variable there should be a runtime mechanism.

| But anyway, let's explore some possible patterns in an XML world. As @dret said, the first possibility would be for the link relation to implicitly describe what is expected by the endpoint, so when you GET information about a product, it includes a link like:
| | href="http://amazon.com/order" />
| and by knowledge of the link relation http://example.com/relation/order (which is presumably defined at that URI, although there's no constraint to make that so within Atom so far as I can tell), an application can work out what it can send to the endpoint. This only works if there aren't endpoint-specific constraints on what's acceptable.

"http://example.com/relation/order" is not a link, it's an identifier
(http://tools.ietf.org/html/rfc5988#section-4.2). clients have knowledge
of the link relations they can traverse (because then implement them),
and other links are meaningless to them. i am not 100% sure what you
mean by "endpoint-specific constraints". if the media type or the
registered link relation specify a media type that is expected when
following that link, then that's what a server should accept. of course
it might reject it because of service aspects (invalid product number in
order), is that what you're referring to?

| A second pattern is that the owner of the web service specifies a media type application/vnd.order+xml and that's used in the @type attribute, with the link relation http://example.com/relation/order specifying that the @type attribute indicates the media type of what can be POSTed to the URI in the @href:
| | href="http://amazon.com/order"
| type="application/vnd.amazon.order+xml" />

i've seen that quite a bit for GET, but not for POST, i think. but it
would work for POSTs as well, as long as the link relation (either in
the media type or in the link relation registration) makes it clear that
@type refers to the request, and not to the response.

| A third possible pattern would be to have the application/xml media type specify some media type parameters that enabled people to specify a schema location for and document element of some XML (there are multiple ways to cut that of course; I'm more interested in the pattern of using media type parameters than the niceties of what that would mean for XML). In that case, the link would look like:
| | href="http://amazon.com/order"
| type='application/xml;schema="http://amazon.com/schema/order.xsd";root="{http://amazon.com/schema/order}Order"' />
| A fourth possible pattern would be to add @x:schema and @x:root attributes to Atom's link element to provide equivalent information, like this:
| | href="http://amazon.com/order"
| type="application/xml"
| x:schema="http://amazon.com/schema/order.xsd"
| x:root="{http://amazon.com/schema/order}Order" />

that i don't like that much because in many cases, media types not just
specify a schema, but also a processing model for the client (how to
handle extensions of the base schema, for example). if all you can
specify is a schema, then you cannot specify a processing model.

| A fifth pattern would be to not define anything on the link element itself, but for the documentation of the link relation http://example.com/relation/order to state that applications can query on the @href URI using the OPTIONS method, and what should be returned in that case, and for that response to specify the constraints. So the document containing the link would have:
| | href="http://amazon.com/order" />
| just like in the first example, but doing an OPTIONS request on http://amazon.com/order would result in something like:
| | xmlns:atom="http://www.w3.org/2005/Atom">
|
| atom:titleAmazon/atom:title
|
| atom:titleOrders/atom:title
| application/vnd.amazon.order+xml
|
|
|
| with of course also the possibility for the accept element in this case to follow any of the patterns above.

that would be perfectly legitimate behavior for a media type, making as
many thing runtime as possible. the question is what you're buying with
this pattern, i.e. are you really expecting that clients will support
different order media types and then can maybe specify their supported
media types in the request via accept when they follow the order link.
it's doable, but i have not seen that level of radical openness. i'd say
that typically, media types encode an application scenario and assume
that clients are interaction within that framework. they might define
extension points and places where clients can find additional links, but
within the media type scenario, things are typically designed with
making some decisions design time, and only making those decisions
runtime where there's a specific goal for doing that.

| My hypothesis is that it's impossible in the general case to specify all possible constraints on acceptable POST/PUT entities. Some constraints are going to be unknowable because they depend on the state of the world at submission time (eg are there sufficient items in stock to fulfil the order). Other constraints are going to be endpoint specific (eg is the item of a type that the vendor sells).

of course you should not hardcode the available products into the media
type, that would be a pretty terrible design. but you can hardcode all
the things that make sense for your application scenario, strategically
leaving those things up to runtime that can change at runtime. that's
how you usually design services that are as easy to use as possible, at
least from the SOA point of view.

| So you have to draw the line somewhere. I think as a developer the crucial thing is discoverability: I would prefer to have the link relation/link/endpoint specify application/xml plus the schema and document element of the expected XML than for it to specify an unregistered media type of application/vnd.amazon.order+xml. But I may have missed some REST theory that states that this is not a good way of specifying constraints?

the crucial this is "understandability", which might be a little bit
different. media types are supposed to be "self-describing" (not in the
semweb sense of the word) in the sense that you see an instance, there
is a way how to find information that helps you to understand what it
means. the media type is the label you start with, and then you go to
the registry and can find the definition.

application/vnd.amazon.order+xml should be documented somewhere, and
there's google. that will get you to a document that tells you the
conversational context. if you're just linked to the schema, you can
auto-generate an instance, but you don't understand the conversational
scenario (get a shopping card id, add items to it, get your customer id,
and then submit an order with your shopping cart and customer id). a
service has almost always more context than just one isolated
interaction, and the media type establishes that context. that's why the
important part about atompub is the protocol, and not the schemas (which
are fairly minimal, as a diff with atom).

| This is a gap in the RDF stack (and one that's come up a few times during TAG discussions over the last few days). OWL inference can be run in a "closed world" mode that does a kind of validation. We have SPARQL graph patterns, but using them as a means of validating RDF would be like doing XML validation solely through XPath expressions. It would be nice to have a grammar more like RELAX NG for RDF graphs; I think that Eric Prud'hommeaux is interested in doing something like that, but it would surprise me if there weren't something similar around already from which we could learn.

i think that for any kind of service scenario, validation is essential.
it's the first line of defense, effective when backed by a good schema
language, and thus takes load off the actual service implementation. and
even for the "just POST some RDF graph to an RDF database", i would
guess that in all settings with loose coupling, you would want to have
some control over what people are POSTing.

| We should really be on the LDP mailing list to discuss this rather than here...

now that you're mentioning it ;-) feel free to link to the gist, maybe
for tomorrow's meeting people would like to read some of that. and as
usual, thanks a lot for your great comments!

Hi,

why not use SPARQL (or rather, graph patterns) to describe inputs, outputs and relation between input and output? Most of the current approaches on http://linkedservices.org/ use that type of description.

HATEOAS URIs could just be embedded into the RDF that's returned.

Best regards,
Andreas.

Interesting discussion! Thanks @JeniT, @dret and @ldodds! And no, I'm still not convinced that there's value in defining new media types for every application. That's an anti-pattern needed to cope with formats that don't have hypermedia capabilities and can't be extended in a standard way.

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.