Skip to content

Instantly share code, notes, and snippets.

@junara
Last active May 15, 2018 03:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save junara/09ec007e0addb0f8b483894e42e129b9 to your computer and use it in GitHub Desktop.
Save junara/09ec007e0addb0f8b483894e42e129b9 to your computer and use it in GitHub Desktop.
住所から都道府県を推定する ref: https://qiita.com/junara/items/dc5ab1ef4fb7c8872330
gem install levenshtein
> keyword_prediction = KeywordPrediction.new(path: "./name2prefecture.csv")
=> #<KeywordPrediction:0x00007ff9ca2afb68 @path="./name2prefecture.csv", @keyword_col="keyword", @prediction_col="prefecture_name"
> keyword_prediction.load
=> 省略
> keyword_prediction.predict('高松市')
=> ["香川県"] # 完全にマッチする県を表示します。
> keyword_prediction.predict('府中市')
=> ["東京都", "広島県"] # 候補が複数あるときは全て表示します。
> keyword_prediction.predict('高 松市')
=> ["香川県", "高知県"] # 曖昧な場合は類似度が近い順に表示します
require 'csv'
require 'levenshtein'
class KeywordPrediction
DEFAULT_KEYWORD_COL = 'keyword'
DEFAULT_PREDICTION_COL = 'prefecture_name'
attr_accessor :path, :index, :keyword_col, :prediction_col
def initialize(args = {})
@path = args[:path] ? args[:path] : 'lib/name2prefecture.csv'
@keyword_col = args[:keyword_col] ? args[:keyword_col] : DEFAULT_KEYWORD_COL
@prediction_col = args[:prediction_col] ? args[:prediction_col] : DEFAULT_PREDICTION_COL
end
def load(index_array: [])
@index = index_array.length > 0 ? load_array(index_array) : load_csv
end
def similarity(str1, str2)
1 - Levenshtein.normalized_distance(str1, str2)
end
def compare_all(str)
list = @index.map do |row|
row.to_hash.merge({'similarity' => similarity(row['keyword'], str)})
end
list.sort_by {|h| [-h['similarity'], h['length']]}
end
def match(str, similarity = 0)
# Get maximum match every prefecture
similarities = {}
compare_all(str).select do |row|
if (row['similarity'] >= similarity) && (similarities[row[@prediction_col]].nil? || row['similarity'] > similarities[row[@prediction_col]])
similarities[row[@prediction_col]] = row['similarity']
true
else
false
end
end
end
def predict(str, similarity = 0)
results = match(str, similarity)
max_similarity = results.map {|result| result['similarity']}.sort.last
results.select {|result| result['similarity'] == max_similarity}.map do |result|
{'name' => str,
@prediction_col => result[@prediction_col],
'similarity' => result['similarity'],
@keyword_col => result[@keyword_col]}
end
end
private
def load_csv
index = []
csv = CSV.read(@path, headers: true)
csv.each do |row|
index << row.to_hash.merge('length' => row[@keyword_col].length)
end
index
end
def load_array(array)
# csvファイルを読み込ませる代わりに、Railsで Model.all.map {|d| d.attributes} としてarrayを作成し、load(index_array: Model.all.map {|d| d.attributes}) として、指定することも可能です。
array
end
end
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 2 columns, instead of 1. in line 5.
keyword,prefecture_name
高松市,香川県
府中市,東京都
府中市,広島県
高知市,高知県
などなど・・・
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment