Skip to content

Instantly share code, notes, and snippets.

@rojotek
Created March 2, 2012 05:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rojotek/1955943 to your computer and use it in GitHub Desktop.
Save rojotek/1955943 to your computer and use it in GitHub Desktop.
Ruby code to detect the language of a webpage.
# gem install nokogiri
# gem install cld
# Use CLD, a wrapper around the google Compact Language Detector to detect the language that a
# webpage is in. Use Nokogiri to pull out the text of the page.
require 'cld'
require 'open-uri'
require 'nokogiri'
def detect_language url
f = open url
s = f.read
doc = Nokogiri::HTML.parse(s)
text = doc.text
CLD.detect_language(text)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment