Skip to content

Instantly share code, notes, and snippets.

@zunda
Last active January 25, 2023 19:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save zunda/734129f361330075eb1bd701bd083239 to your computer and use it in GitHub Desktop.
Save zunda/734129f361330075eb1bd701bd083239 to your computer and use it in GitHub Desktop.
ブラウザから保存したTogetterさんのページからツイートを抽出してタブ区切りファイルとして整形する
#!/usr/bin/ruby
#
# Copyright 2023 zunda <zundan at gmail.com>
#
# Permission is granted for use, copying, modification, distribution,
# and distribution of modified versions of this work as long as the
# above copyright notice is included.
#
require 'time'
Dir.glob("togetter-*.html") do |src|
File.read(src).scan(%r|<div class="list_box type_tweet impl_profile" data-index="(\d+)">(.*?)</div>|m).each do |i, entry|
idx = Integer(i)
username = entry.scan(%r|<span class="status_name">@(.*?)</span>|).flatten.first
time = Time.at(Integer(entry.scan(%r|<a class="link" .* data-timestamp="(\d+)"|).flatten.first))
text = entry.scan(%r|<p class="tweet">(.*?)</p>|m).flatten.first.chomp
text.gsub!(%r|<img draggable="false" class="emoji" alt="(.*?)".*?>|, '\1')
text.gsub!(%r|<(\w+).*?>.*?</\1>|m, "")
text.gsub!(/&gt;/, ">")
text.gsub!(/&lt;/, "<")
text.gsub!(/&nbsp;|\t| /, " ")
text.gsub!(/\s+/, " ")
puts [idx, time.utc.iso8601, username, text].join("\t")
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment