ryohji/making-dict-without-loop.md Secret

## making-dict-without-loop.md

      
    Raw
  

              making-dict-without-loop.md
            
          
    たとえば『ゼロから作るDeep Learning (2)』に以下のコード片がある。(P.66)
def preprocess(text):
  text = text.lower()
  text = text.replace('.', ' .')
  words = text.split(' ')

  word_to_id = {}
  id_to_word = {}
  for word in words:
    if word not in word_to_id:
      new_id = len(word_to_id)
      word_to_id[word] = new_id
      id_to_word[new_id] = word

  corpus = np.array([word_to_id[w] for w in words])

  return corpus, word_to_id, id_to_word
ぼくだったら、これはこう書きたい：
def preprocess(text: str) -> Tuple[np.array, Dict[str, int], Dict[int, str]]:
  words = text.lower().replace('.', ' .').split(' ')
  id_to_word = {k: v for k, v in enumerate(set(words))}
  word_to_id = {v: k for k, v in id_to_word.items()}
  corpus = np.array([word_to_id[w] for w in words])
  return corpus, word_to_id, id_to_word
id_to_word は dict(enumerate(set(words))) で計算してもよい。
とにかく「for 文を書いたら負け」とくらいおもっている。
words の単語の重複を set で取りのぞき、その要素それぞれに enumerate でインデックスをつける。

これを辞書にして id_to_word という名前をつける。そして id_to_word のキーと値をひっくり返した辞書を word_to_id とする。

──単語それぞれにユニークな番号を振る。そして単語から番号へと、番号から単語への逆引きができるようにする。

あるいはたまたまだけれど、新人教育の一環でみた INI ファイルの亜種を相手にするコードの解答のひとつがこんな具合：
def line_to_dict(lines: List[str]) -> Dict[str, str]:
  reading = False
  result = {}
  for line in lines:
    if line == '[PARAM START]':
      reading = True
    elif line == '[PARAM END]':
      reading = False
    elif reading:
      words = line.split('=')
      result[words[0]] = words[1]
  return result
いくつか暗黙の仮定があることはごめんなさいなんだけれど、これもぼくだったらこう書く（こう書くことを勧めたい）：
def line_to_dict(lines: List[str]) -> Dict[str, str]:
  beg, end = (lines.index(s) for s in ('[PARAM START]', '[PARAM END]'))
  return dict(_mk_kv_pair(l) for l in lines[beg + 1:end] if '=' in l)

def _mk_kv_pair(line: str) -> Tuple[str, str]:
  i = line.index('=')
  return (line[:i], line[i + 1:])
for 文を書いたら負けだし、なんだったら「if の条件分岐だって書いたら負け」という戦をしている。
（dict コンストラクターでのジェネレーター内包表記で、フィルターとしてつかってしまった。負けた。負け戦だ……）