Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
# -*- coding: utf-8 -*-
# Find japanese entry for each en.wikipedia.org link in TARGET_URL (quick & dirty hack)
# USAGE: ruby wikipedia_ja.rb TARGET_URL
require 'rubygems'
require 'nokogiri'
require 'open-uri'
def ja_url_for(url)
link_ja = Nokogiri::HTML(open(url)).css('li.interlanguage-link.interwiki-ja a').first
link_ja ? link_ja[:href] : nil
end
Nokogiri::HTML(open(ARGV[0])).css('a').map{|link| link[:href]}.each do |url|
next unless url =~ /\A(?:http(?:s)?:)?\/\/en.wikipedia.org\//
begin
puts "#{url}\t#{ja_url_for(url) || 'NOT_FOUND'}"
rescue OpenURI::HTTPError => e
puts "#{url}\tERROR"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment