Skip to content

Instantly share code, notes, and snippets.

@rjurney
Created June 5, 2011 09:25
Show Gist options
  • Save rjurney/1008821 to your computer and use it in GitHub Desktop.
Save rjurney/1008821 to your computer and use it in GitHub Desktop.
Solution to slow subgraph problem :)
require 'graph_client'
require 'pacer'
client = GraphClient.new
k = 16
graph = client.get "louise.kitchen@enron.com"
#k_nodes = graph.v.filter{|v| v.out_e.count > k}.result
#subgraph = k_nodes.out_e.in_v.only(k_nodes).subgraph
puts graph
d = []
graph.v.each do |v|
if v.out_e.count < k
d << v
end
end
d.each do |v|
graph.remove_vertex v
end
puts graph
@pangloss
Copy link

As of Pacer 0.8.1 (to be released shortly) you could replace lines 12 to 21 with the following which would be significantly faster.

graph.v.select { |v| v.out_edges.count < k }.delete!

v.out_edges is much faster than v.out_e because out_edges is the native method for an individual vertex which returns an Iterable that then simply gets counted. On the other hand, v.out_e wraps the vertex in a 4 step route and turns that to a pipeline (IdentityPipe -> OutEdgesPipe -> CountSideEffectPipe -> SideEffectCapPipe) that then must be fully iterated. That construction of the route and pipeline isn't terribly heavyweight but inside a loop it is best to avoid it if possible.

Using .delete! on a route is much faster than looping through each of its vertices and calling remove_vertex on each one because it wraps the job using bulk_job so that the entire process can be done in a minimal number of transactions rather than a separate transaction for every vertex plus (potentially) each connected edge for each deleted vertex.

Finally, doing the entire job within the context of a route is much more memory efficient because each vertex is processed as it is emitted from the iterator, so no more than 1 vertex actually needs to be in memory at a time. Much better than collecting everything into an array in advance of processing!

Cheers,
Darrick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment