Skip to content

Instantly share code, notes, and snippets.

@woodie
Created January 17, 2010 10:44
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save woodie/279329 to your computer and use it in GitHub Desktop.
Save woodie/279329 to your computer and use it in GitHub Desktop.
Mechanize with Hpricot on AppEngine

Mechanize with Hpricot on AppEngine

Here is a Rack app that does a Google Search using Mechanize:

mechanize-hpricot.appspot.com

We are using Mechanize 0.8.5 and Hpricot 0.8.2. The most recent version Mechanize uses Nokogiri, which requires native libraries and therefore does not work on AppEngine. There is an effort to finish up the pure-Java Nokogiri, maybe YOU can help.

www.serabe.com/2009/12/31/helping-nokogiri-take-ii

Special thanks to _Why, Ola Bini and Nick Sieger for creating, porting and maintaining Hpricot. Mechanize is my favorite gem of all time, so thanks to Aaron Patterson and Mike Dalessio for creating such an awesome tool. Here is a nice screencast.

www.bestechvideos.com/2009/12/07/railscasts-191-mechanize

When using gems with Java extensions, appengine-tools drops the appropriate jars into WEB-INF/lib for you.

find .gems -name "*.jar"
.gems/bundler_gems/jruby/1.8/gems/hpricot-0.8.2-java/lib/fast_xs.jar
.gems/bundler_gems/jruby/1.8/gems/hpricot-0.8.2-java/lib/hpricot_scan.jar
require 'appengine-rack'
require 'appengine-apis/urlfetch'
require 'mechanize'
AppEngine::Rack.configure_app(
:application => "mechanize-hpricot",
:precompilation_enabled => true,
:version => "1")
def my_search(query)
out = []
a = WWW::Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari' }
a.get('http://google.com/') do |page|
search_result = page.form_with(:name => 'f') do |search|
search.q = query
end.submit
search_result.links.each do |link|
out << link.text if link.text =~ /#{query}/i
end
end
out
end
def my_html(query)
title = "Mechanize with Hpricot on AppEngine"
html = <<HTML
<html><head><title>#{title}</title></head><body>
<h2>#{title}</h2><p>This is a Google Search for: #{query}</p>
<ul><li>#{my_search(query).join("</li><li>")}</li></ul>
</body></html>
HTML
html
end
run lambda { |env| [200, {}, my_html('appengine-jruby') ] }
# Critical default settings:
disable_system_gems
disable_rubygems
bundle_path ".gems/bundler_gems"
# List gems to bundle here:
gem "appengine-rack"
gem "appengine-apis"
gem 'hpricot', '0.8.2'
gem 'mechanize', '0.8.5'
$ dev_appserver.rb .
=> Booting DevAppServer
=> Press Ctrl-C to shutdown server
=> Bundling gems
Calculating dependencies...
Updating source: http://gems.rubyforge.org
Caching: appengine-apis-0.0.12.gem
Caching: appengine-rack-0.0.6.gem
Downloading hpricot-0.8.2-java.gem
Downloading mechanize-0.8.5.gem
Downloading rack-1.1.0.gem
Installing hpricot (0.8.2)
Installing rack (1.1.0)
Installing appengine-rack (0.0.6)
Installing appengine-apis (0.0.12)
Installing mechanize (0.8.5)
Done.
=> Packaging gems
Installing fast_xs.jar
Installing hpricot_scan.jar
The server is running at http://localhost:8080/
@yjx723
Copy link

yjx723 commented May 25, 2010

"no such file to load -- appengine-apis/urlfetch from config.ru:3 "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment