Skip to content

Instantly share code, notes, and snippets.

Created July 14, 2010 08:53
What would you like to do?
Try to scrape formula information from @formula.homepage.
# Small utility which uses the homepage and nokogori to get a description from the formula's homepage.
# As written in the homebrew wiki:
# > Homebrew doesn’t have a description field because the homepage is always up to date,
# > and Homebrew is not. Thus it’s less maintenance for us. To satisfy the description
# > we’re going to invent a new packaging microformat and persuade everyone to publish
# > it on their homepage.
# Too bad no packaging microformat has yet been invented, but brew-more just first looks for a
# `<meta name="description">` tag, then for an `a#project_summary_link` tag (which is used in
# While this does not lead to a good description for all formulas, it works for quiet a few,
# try e.g. `brew more rubinius`.
# Note: this command depends on `nokogori`, `json` and `rubygems`
# Edit: non-sudo gem install works fine (by adamv)
# Edit: use google search instead of title as fallback - returns pretty good results :)
# Edit: ensure error contains json & nokogiri gem
require 'formula'
require 'uri'
require 'open-uri'
require 'rubygems'
require 'json'
require 'nokogiri'
rescue LoadError
onoe "command requires 'json' and 'nokogiri' gem..."
exit 2
# split description at 80 chars
class Object
# Define try() method to simplify Nokogori scrape-ing
def try(method, *args); self.nil? ? nil : self.send(method, *args) end
# Print usage
def usage(code = 0)
puts "Usage: brew more [formula] ... (formula description scraper)"
def scrape_info(formula)
more = "<No description>"
google = false
if doc = Nokogiri::HTML(open(formula.homepage))
part = doc.xpath('/html/head/meta[@name="description"]').first.try(:[], 'content') || doc.css('a#project_summary_link').first.try(:text)
unless part
# try a google search :)
if hash = JSON.load(open("{URI.escape(formula.homepage)}&v=1.0")).try(:[], 'responseData').try(:[], 'results').try(:first)
part = Nokogiri::HTML(hash['title']).text + ' ' + Nokogiri::HTML(hash['content']).text
google = true
more,c = part.split(/ +/).inject([' ',1]) do |res, i|
if (i.length + 1 + res[1]) > MAX_CHARS
res[0] << "\n "
res[1] = 1
[res[0] << " " << i, res[1] + 1 + i.length]
end if part
more = "(Description via Google) \n" << more if google
morebody = formula.homepage.to_s, more, "\n"
ohai "#{} #{formula.version}" + ( ? " (installed)" : ""), morebody
if ARGV.include?('-h') || ARGV.include?('--help')
elsif ARGV.named.empty?
onoe "please specifiy a formula"
ARGV.formulae.each { |formula| scrape_info(formula) }
Copy link

Zearin commented May 28, 2011

w00t! Thank you.

Copy link

Zearin commented May 28, 2011

Hey, have you seen the CommonJS package format ? It’s basically the spec for the package.json files used by npm.

I know, I know…that’s not so helpful for Homebrew, but there are some packages that are registered in both Homebrew and npm.

Under the section Required Field, it lists fields for both “description” and “keywords”.

There’s also DOAP for project metadata. It’s surprisingly rare on GitHub, but it is nevertheless in wide use elsewhere. Maybe you don’t think it’s worth it, but I’m trying to at least raise awareness of DOAP in order to encourage the use of existing standards. A handy page for quickly learning what you can scrape from a DOAP file is the DOAP a Matic. (The DOAP Homepage is available as well, but not really structured for being quickly usable.)

Copy link

lwe commented May 29, 2011

Mhh, sounds interesting, though the main issue is probably discovery of these resources. DOAP as sometimes linked with a <link rel="meta" title="DOAP" .../>, but no idea if that's standard :)

Copy link

Zearin commented Sep 16, 2011

(…I didn’t know you can’t make pull requests for Gists!)

Hey, I made a couple of changes to the output for brew more. Basically it makes each formula stand out better (using ohai()), and puts the “via Google” disclaimer up front.

I made the changes for readability, and also because when using brew more on multiple formulae, the “via Google” disclaimer made it harder to read because it was always in a different place. Now, when the disclaimer appears, it is in a predictable place, and also lets you know before reading the description that it’s the second-choice source for grabbing a formula’s description.

Would you consider copying it? It’s here:

Copy link

lwe commented Sep 19, 2011

Hey, jep, too bad it's not possible to bring in changes from other gists... well anyway, copied your changes, thx!

Copy link

Hnasar commented Mar 30, 2012

I just spent 20 minutes trying to figure out why brew-more.rb couldn't find nokogiri, though I installed it....It turns out that I didn't have json installed (line 26), and THIS was falsely causing the nokogiri error. Annoying.

Copy link

lwe commented Mar 31, 2012

good catch, thanks :)

Copy link

How does one install this command?

Copy link

Sorry for maybe stupid question, but could someone explain how to install this formula. Thank you!

Copy link

tjnycum commented Jun 4, 2015

May I suggest storing this in a regular repo ("homebrew-more") so it can be tapped and installed easily?

Copy link

Note that brew-desc is now part of the core and all formulae have a desc field (at least in the core).

Copy link

@Drewshg312 Assuming ~/bin exists and is in your PATH this should work:

cd ~/bin
chmod u+x brew-more.rb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment