Skip to content

Instantly share code, notes, and snippets.

@mudge
Last active July 28, 2017 21:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mudge/5405f2c0275e97ce81bdd04d4b49ceea to your computer and use it in GitHub Desktop.
Save mudge/5405f2c0275e97ce81bdd04d4b49ceea to your computer and use it in GitHub Desktop.
Applying bin packing techniques to GNIP PowerTrack rules
domains = ARGF.readlines.map(&:chomp)
queries = domains.map { |domain| %[url_contains:"/#{domain}" OR url_contains:".#{domain}"] }
queries.sort_by!(&:size)
rule_limit = 1024
operator_limit = 30
rules = [queries.pop]
queries.reverse_each do |query|
best_fit = rules.select { |rule|
proposed_rule = "#{rule} OR #{query}"
proposed_rule.size <= rule_limit && proposed_rule.scan(' OR ').size <= operator_limit
}.max_by(&:size)
if best_fit
best_fit << " OR #{query}"
else
rules << query
end
end
puts rules
domains = ARGF.readlines.map(&:chomp)
queries = domains.map { |domain| %[url_contains:"/#{domain}" OR url_contains:".#{domain}"] }
queries.sort_by!(&:size)
rule_limit = 1024
operator_limit = 30
rules = [queries.pop]
queries.reverse_each do |query|
first_fit = rules.find { |rule|
proposed_rule = "#{rule} OR #{query}"
proposed_rule.size <= rule_limit && proposed_rule.scan(' OR ').size <= operator_limit
}
if first_fit
first_fit << " OR #{query}"
else
rules << query
end
end
puts rules
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment