Skip to content

Instantly share code, notes, and snippets.

@twada
Created February 17, 2014 11:14
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save twada/9048777 to your computer and use it in GitHub Desktop.
Save twada/9048777 to your computer and use it in GitHub Desktop.
# -*- coding: utf-8 -*-
# Find japanese entry for each en.wikipedia.org link in TARGET_URL (quick & dirty hack)
# USAGE: ruby wikipedia_ja.rb TARGET_URL
require 'rubygems'
require 'nokogiri'
require 'open-uri'
def ja_url_for(url)
link_ja = Nokogiri::HTML(open(url)).css('li.interlanguage-link.interwiki-ja a').first
link_ja ? link_ja[:href] : nil
end
Nokogiri::HTML(open(ARGV[0])).css('a').map{|link| link[:href]}.each do |url|
next unless url =~ /\A(?:http(?:s)?:)?\/\/en.wikipedia.org\//
begin
puts "#{url}\t#{ja_url_for(url) || 'NOT_FOUND'}"
rescue OpenURI::HTTPError => e
puts "#{url}\tERROR"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment