Created
January 26, 2016 13:50
-
-
Save zakuroishikuro/30c0f4ef8fd4bd63ae19 to your computer and use it in GitHub Desktop.
住所を分ける正規表現で失敗した市名を表示するやつ
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'csv' | |
# 住所データを取得 (同じフォルダ内のKEN_ALL.csvを読み込む) | |
print "\nKEN_ALL.csvをパース中... " | |
address_list = [] | |
csv_path = File.expand_path("../KEN_ALL.CSV", __FILE__) | |
CSV.foreach csv_path, encoding:"Shift_JIS:UTF-8" do |row| | |
#都道府県、市区町村、町域名のみ取り出す | |
#このcsvはデータ構造がうんこで、本当は町域名の結合とかする必要があるんだけど、今回はしなくても問題ない | |
address_list << row[6..8] | |
end | |
puts "完了" | |
# 無限ループ (control + Cで終了) | |
trap :INT, :exit | |
loop do | |
# 正規表現を取得 | |
puts "\n正規表現を入力してください (control + cで終了):" | |
begin | |
rex = /#{gets.chomp}/ | |
rescue RegexpError | |
puts "正規表現の作成に失敗しました。", $!.message | |
next | |
end | |
# 正規表現のマッチ結果を取得 | |
result = address_list.map do |address| | |
address.join.match(rex).captures rescue [] | |
end | |
# 判定 | |
cities = [] | |
address_list.zip(result).each do |address, match| | |
city = address[1] | |
cities << city if address != match && city =~ /.+市$/ | |
end | |
cities.uniq!.compact! | |
# 表示 | |
puts "\n★失敗した市:", cities | |
puts | |
puts "(?:#{cities.map{|a|a[0..-2]} * ?|})市" | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
★失敗した市: | |
旭川市 | |
伊達市 | |
石狩市 | |
盛岡市 | |
奥州市 | |
田村市 | |
南相馬市 | |
那須塩原市 | |
東村山市 | |
武蔵村山市 | |
羽村市 | |
十日町市 | |
上越市 | |
富山市 | |
野々市市 | |
大町市 | |
蒲郡市 | |
四日市市 | |
姫路市 | |
大和郡山市 | |
廿日市市 | |
下松市 | |
岩国市 | |
田川市 | |
大村市 | |
(?:旭川|伊達|石狩|盛岡|奥州|田村|南相馬|那須塩原|東村山|武蔵村山|羽村|十日町|上越|富山|野々市|大町|蒲郡|四日市|姫路|大和郡山|廿日市|下松|岩国|田川|大村)市 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(...??[都道府県])(.+?郡.+?[町村]|.+?市.+?区|.+?[市区町村])(.+) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment