Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save znz/113382 to your computer and use it in GitHub Desktop.
Save znz/113382 to your computer and use it in GitHub Desktop.
OpenOffice.org日本語プロジェクトの公式ニュースの抜粋とRSS生成
#!/usr/bin/ruby1.8 -Ku
# -*- coding: utf-8 -*-
=begin
= jaooo news list with rss
== usage
$ mkdir cache
$ mkdir out
$ ruby update-ooopackages-jaooo-recent-announce.rb
or
$ mkdir ~/cache
$ sudo mkdir -p /var/www/ooopackages/jaooo
$ sudo chown $USER /var/www/ooopackages/jaooo
and run by cron:
/path/to/update-ooopackages-jaooo-recent-announce.rb /var/www/ooopackages/jaooo
when debug:
$ wget "http://ja.openoffice.org/servlets/BrowseList?listName=announce" -O list
$ ruby update-ooopackages-jaooo-recent-announce.rb out list
== License (MIT License)
Copyright (c) 2009 Kazuhiro NISHIYAMA
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
=end
require 'cgi'
require 'digest'
require 'digest/sha1'
require 'open-uri'
require 'pathname'
require 'rss'
# [ruby-list:46019]
module RSS
module Maker
class ChannelBase
attr_writer :author unless method_defined?(:author=)
end
end
end
unless RSS::Maker.respond_to?(:supported?)
def (RSS::Maker).supported?(version)
RSS::Maker::MAKERS.include?(version)
end
end
out_dir = Pathname.new(ARGV.shift || "out")
uri = ARGV.shift || "http://ja.openoffice.org/servlets/BrowseList?listName=announce"
unless out_dir.exist?
raise "directory does not exist: #{out_dir}"
end
@cache_dir = Pathname.new("cache")
unless @cache_dir.exist?
raise "directory does not exist: #{@cache_dir}"
end
file_path = out_dir + "recent-announce.html"
tmp_file_path = out_dir + "recent-announce.tmp.html"
def cached_read(uri)
path = @cache_dir + Digest::SHA1.hexdigest(uri)
if path.exist?
path.read
else
content = open(uri){|f| f.read }
path.open("w") {|f| f.write content }
content
end
end
html = open(uri){|f| f.read }
entries = []
html.scan(%r{<tr class="[ab]">\s*<td>([^<>]+)</td>\s*<td>(<a href="([^\"]+)">([^<>]+)</a>)</td>\s*<td>([^<>]*)</td>\s*</tr>})[0,5].each do |m|
h = {
:author => m[0],
:link => m[1],
:href => CGI.unescapeHTML(m[2]),
:title => CGI.unescapeHTML(m[3]),
:date => m[4],
}
h[:author].sub!(/@.*/, '')
h[:link].sub!('>', ' target="_new">')
entries << h
end
exit if entries.empty?
def rename_if_updated(tmp_file_path, file_path)
if tmp_file_path.read == (file_path.read rescue nil)
tmp_file_path.unlink
else
File.rename(tmp_file_path, file_path)
end
end
entries.each do |h|
content = cached_read(h[:href])
header, body = content.scan(/<pre>[^<>]+<\/pre>/)
if header
h[:subject] = CGI.unescapeHTML(header[/^Subject: (.+)/, 1])
h[:date] = Time.parse(header[/^Date: (.+)/, 1])
end
if body
h[:body] = CGI.unescapeHTML(body)
end
end
link_elements = []
{
#"rss091.xml" => "0.91",
"rss10.xml" => "1.0",
"rss20.xml" => "2.0",
#"atom" => "atom",
}.each do |ext, version|
rss_path = out_dir + "recent-announce.#{ext}"
tmp_rss_path = out_dir + "recent-announce.tmp.#{ext}"
rss_uri = "http://ooopackages.good-day.net/jaooo/recent-announce.#{ext}"
next unless RSS::Maker.supported?(version)
rss = RSS::Maker.make(version) do |maker|
maker.channel.about = rss_uri
maker.channel.title = "jaooo news list"
maker.channel.description = "ja: OpenOffice.org日本語プロジェクトのお知らせ"
maker.channel.link = "http://ja.openoffice.org/"
maker.channel.language = "ja"
maker.channel.author = "good-day"
maker.channel.date = Time.now
if version == "atom"
link_elements << %Q!#<link rel="alternate" type="application/atom+xml" title="Atom" href="#{CGI.escapeHTML(rss_uri)}" />!
else
link_elements << %Q!<link rel="alternate" type="application/rss+xml" title="RSS #{version}" href="#{CGI.escapeHTML(rss_uri)}" />!
end
maker.items.do_sort
entries.each do |h|
maker.items.new_item do |item|
item.link = h[:href]
item.title = h[:subject]
item.date = h[:date]
item.description = h[:body]
end
end
end
tmp_rss_path.open("w") {|f| f.write rss.to_s }
rename_if_updated(tmp_rss_path, rss_path)
end
link_elements.sort!
tmp_file_path.open("w") do |out|
out.puts <<-HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>jaooo news list</title>
<style media="screen,tv" type="text/css">
li {font-size: small}
</style>
#{link_elements.join("\n").gsub(/^/, " ")}
</head>
<body>
<ul>
HTML
entries[0,5].each do |h|
out.puts %Q[<li>#{h[:link]} <i>#{h[:author]} - #{h[:date].strftime("%Y-%m-%d")}</i></li>]
end
out.puts <<-HTML
</ul>
</body>
</html>
HTML
end
rename_if_updated(tmp_file_path, file_path)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment