Welcome to the sixth episode in a series of short screencasts about the Ruby neo4j gem. Previously we learned how to chain associations together to create queries which can hop across entities and use any part of the entire path to query records of interest. In this episode we dig deeper into the tools behind the magic. A familiarity with the Cypher query language is advised. See the show notes for a link to a Cypher tutorial.
As discussed previously, when we call associations we get a proxy object:
user.created_assets
There are two kinds of proxy objects: AssociationProxy
and QueryProxy
. The details aren't important for the moment, but behind the scenes both are causing a Query
object to be built. We can see this Query
object by calling the query method at any point:
user.created_assets.query
Query objects are what ActiveNode uses to build queries incrementally and so, not surprisingly, they are also built using method chaining. We can build our own query from scratch:
Neo4j::Session.current.query.match(a: :Asset).match('a<-[:CREATED]-(u:User)').where(a: {public: true}).limit(2).pluck(:u)
But often it is more convenient to start from a proxy object. Note that when we pass values into our where
method, the Query class automatically uses a parameter. Parameters are important for performance and for preventing injection attacks. They should be used whenever possible which is why the Query class uses them wherever it can.
Here we should clarify the difference between proxy objects and Query objects. Proxy objects are always scoped to a particular model and a set of that model's nodes. In the case of an association the scope will be the model which is being targeted and the nodes that the association represents. For this association call, the scope is the Asset model and the assets which the user has created:
user.created_assets
Here the scope is the Category model and the categories for the assets which have been created by the user.
user.created_assets.categories
When we get a Query object from a proxy chain, however, we leave the world of ActiveNode. Query objects are a way for us to define our own Cypher query. So while we must specify more detail about what we want, we have a lot more control over it. We also can do things with Query objects which we can't do with proxies, like using a Cypher WITH
clause:
query = user.created_assets(:asset).categories.query_as(:category)
query.with(:asset, 'count(category) AS count').where('count > 2').pluck(:asset)
Query objects also don't care about the order in which clause methods are called. When the query is executed the clauses are grouped together and ordered in the way that Cypher expects:
query = user.created_assets(:asset).categories.where(name: 'Graph theory').query_as(:category)
query.limit(2).where(asset: {public: true}).pluck(:asset)
Note how the second WHERE
clause has both the condition from the association chain limiting categories by name combined with the check on the public
property from the Query chain. Of course sometimes you don't want clauses to combine like this. In these cases it can be helpful to use the break
method:
query.break.optional_match('asset<-[:VIEWED]-(viewer:User)').limit(2).pluck(:viewer)
Note that since Cypher's WITH
clause acts as a natural boundary in queries, with
will automatically use break
to start fresh.
Another way to build your queries is to use the ampersand operator which will apply the chain from the second query to the first. Here we get the Query object defining a path from a user to their assets to the assets' categories. We combine that with a query from all assets to users who have viewed those assets:
query = user.created_assets(:asset).categories(:category).query
& Asset.as(:asset).allowed_users(:user, :rel).query
query.limit(1).pluck(:asset, 'collect(category)[0..2]', 'collect(user)[0..2]')
Since we gave the same variable for the assets in both parts, Cypher will take them as the same, giving us a forked path. Of course we're not required to use the same variable. Once you have a query object it is up to you to define queries as you please. The Query class makes no attempt to parse queries for correctness, leaving that responsibility to the Neo4j server.
Sometimes it is also desirable to return to a proxy chain once you have made a Query. This can be done using the proxy_as
method:
User.as(:user).created_assets.query_as(:asset).with(:user, 'count(asset) AS asset').where('count > 3').proxy_as(User, :user)
This can be particularly handy when you want to define a class method which can be called and then further chained upon.
Now that we have some ideas on how to build queries, let's look at some of the smaller details. We'll start with a simple query getting all users with a specified name:
User.where(name: 'Sally').to_cypher
If we give a nil
value the Query class will automatically convert it to an IS NULL
check.
User.where(name: nil).to_cypher
The same intelligence is built in for Arrays:
User.where(name: ['Sally', 'Jim']).to_cypher
Regular expressions:
User.where(name: /.*Sally.*/i).to_cypher
And ranges:
User.where(age: 18..25).to_cypher
You can even specify Neo4j IDs:
User.where(neo_id: 1).to_cypher
All of these apply to query objects as well:
User.query_as(:user).where(user: {neo_id: 1}).to_cypher
One item that only applies to proxy chains is specifying the model's ID property:
User.where(id: 'fd8815a6-eec9-45a6-8f79-c038389a7803').to_cypher
Here the id
condition is transformed to a condition on the uuid
property since that is the default unique ID property for ActiveNode models. If you specify your own ID property then that is what will be used here.
There are a lot of other things that you can do with Query objects and I strongly recommend that you look at the link to the gem's documentation in the show notes. For now that's all the time that we have. Happy graphing!
Show notes:
Cypher tutorial: http://neo4j.com/developer/cypher-query-language/
Query methods documentation: http://neo4jrb.readthedocs.org/en/stable/QueryClauseMethods.html