Last active
February 23, 2020 07:12
-
-
Save zakuroishikuro/33c7c8a6a6ed4bc141dd to your computer and use it in GitHub Desktop.
なるべく短い正規表現で住所を「都道府県/市区町村/それ以降」に分けるエクストリームスポーツ ref: http://qiita.com/zakuroishikuro/items/066421bce820e3c73ce9
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rex = /ごにょごにょ/ | |
p "東京都文京区後楽1丁目3−61".match(rex).captures | |
#=> ["東京都", "文京区", "後楽1丁目3−61"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(.+?[都道府県])(.+?[市区町村])(.+) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(...??[都道府県])(.+?[市区町村])(.+) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(...??[都道府県])(.+?市.+?区|.+?[市区町村])(.+) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(...??[都道府県])(.+?郡.+?[町村]|.+?市.+?区|.+?[市区町村])(.+) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(...??[都道府県])((?:旭川|伊達|石狩|盛岡|奥州|田村|南相馬|那須塩原|東村山|武蔵村山|羽村|十日町|上越|富山|野々市|大町|蒲郡|四日市|姫路|大和郡山|廿日市|下松|岩国|田川|大村)市|.+?郡.+?[町村]|.+?市.+?区|.+?[市区町村])(.+) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(...??[都道府県])((?:旭川|伊達|石狩|盛岡|奥州|田村|南相馬|那須塩原|東村山|武蔵村山|羽村|十日町|上越|富山|野々市|大町|蒲郡|四日市|姫路|大和郡山|廿日市|下松|岩国|田川|大村)市|.+?郡(?:玉村|大町|.).*?[町村]|.+?市.+?区|.+?[市区町村])(.+) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(...??[都道府県])((?:旭川|伊達|石狩|盛岡|奥州|田村|南相馬|那須塩原|東村山|武蔵村山|羽村|十日町|上越|富山|野々市|大町|蒲郡|四日市|姫路|大和郡山|廿日市|下松|岩国|田川|大村)市|.+?郡.+?[町村]|.+?市.+?区|.+?[市区町村])(.+) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(...??[都道府県])((?:旭川|伊達|石狩|盛岡|奥州|田村|南相馬|那須塩原|東村山|武蔵村山|羽村|十日町|上越|富山|野々市|大町|蒲郡|四日市|姫路|大和郡山|廿日市|下松|岩国|田川|大村)市|.+?郡(?:玉村|大町|.).*?[町村]|.+?市.+?区|.+?[市区町村])(.+) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'csv' | |
# 住所データを読み込む (同じフォルダ内にKEN_ALL.csvを入れておくこと) | |
print "\nKEN_ALL.csvをパース中... " | |
address_list = [] | |
csv_path = File.expand_path("../KEN_ALL.CSV", __FILE__) | |
CSV.foreach csv_path, encoding:"Shift_JIS:UTF-8" do |row| | |
#都道府県、市区町村、町域名のみ取り出す | |
#このcsvはデータ構造がうんこで、本当は町域名の結合とかする必要があるんだけど、今回はしなくても問題ない | |
address_list << row[6..8] | |
end | |
puts "完了" | |
# 無限ループ (control + Cで終了) | |
trap :INT, :exit | |
loop do | |
# 正規表現を取得 | |
puts "\n正規表現を入力してください (control + cで終了):" | |
begin | |
rex = /#{gets.chomp}/ | |
rescue RegexpError | |
puts "正規表現の作成に失敗しました。", $!.message | |
next | |
end | |
# 正規表現のマッチ結果を取得 | |
result = address_list.map do |address| | |
address.join.match(rex).captures rescue [] | |
end | |
# 判定 | |
fail_count = 0 | |
address_list.zip(result).each do |address, match| | |
if address != match | |
puts "失敗... #{address * ?|}\t(#{match * ?|})" | |
fail_count += 1 | |
end | |
end | |
# 結果を出力 | |
all = address_list.count | |
pct = (fail_count.to_f / all * 100).to_i | |
puts "\n正規表現: ", rex.source | |
puts "\n失敗数: ", "#{fail_count}/#{all} (#{(pct).to_i}%)" | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rex = /ごにょごにょ/ | |
p "東京都文京区後楽1丁目3−61".match(rex).captures | |
#=> ["東京都", "文京区", "後楽1丁目3−61"] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment