Skip to content

Instantly share code, notes, and snippets.

@yawnston
Last active November 9, 2022 07:18
Show Gist options
  • Save yawnston/298534cb5127da48471eb0ec7af56f1c to your computer and use it in GitHub Desktop.
Save yawnston/298534cb5127da48471eb0ec7af56f1c to your computer and use it in GitHub Desktop.
First proposal for the querycat query language (inspired by SPARQL) for querying categorical data

Query Example

For each customer, return their name and surname, as well as the list of all products this customer bought, which have a maximum price of 150 EUR.

SELECT {
  ?customer cheapItems _:cheapItems ;
            name       ?name        ;
            surname    ?surname     .

  _:cheapItems item ?item .

  ?item name  ?itemName  ;
        price ?itemPrice .
}
WHERE {
  ?customer -12/14 ?order .
  ?order -42/40 ?item .
  ?item 56 ?itemName  ;
        57 ?itemPrice .
        
  FILTER (?itemPrice <= 150)
}

Syntax Explanation

  • WHERE clause does pattern matching on the schema category - generates solutions matching this pattern (using triples subject predicate object .)
    • ?customer is a variable named customer
    • 56 is the ID of a morphism, in this case the morphism from Product to Name. Morphisms prefixed with - are duals, i.e. opposite direction.
      • Note: I arbitrarily assigned IDs to the morphisms from the categorical approach PDF from Pavel Koupil.
    • We can form paths using morphisms: -12/14 means to traverse the dual of 12, and then traverse 14.
    • ; is syntactic sugar for having multiple triples with the same subject, equivalent to simply repeating the subject again in the next triple.
    • FILTER removes some solutions from consideration. Only solutions matching the FILTER are returned.
  • SELECT clause describes the returned data - it implicitly defines a schema category, and returned data will be instances of this category.
    • Using an alphanumeric string in the predicate position in SELECT defines a new morphism (and implicitly its dual also).
    • Using bound variables will simply substitute that variable's data.
    • _:cheapItems is syntax for a blank node named cheapItems - this is necessary in order to add new objects to the returned schema, otherwise we could only use bound variables.
      • In this case, because _:cheapItems is first defined in the object position, a new instance will be created for each ?customer instance.
      • If _:cheapItems was first defined in the subject position, there would be a single object shared between all customers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment