Skip to content

Instantly share code, notes, and snippets.

@kzkn
Created November 14, 2017 15:41
Show Gist options
  • Save kzkn/47e72bd722ab3ddc850cba97d9c0690d to your computer and use it in GitHub Desktop.
Save kzkn/47e72bd722ab3ddc850cba97d9c0690d to your computer and use it in GitHub Desktop.
日本語の禁則文字を意識して tokenize する prawn への monkey patch
module Prawn
module Text
module Formatted #:nodoc:
# @private
class LineWrap #:nodoc:
def scan_pattern(encoding = ::Encoding::UTF_8)
ebc = break_chars(encoding)
eshy = soft_hyphen(encoding)
ehy = hyphen(encoding)
ews = whitespace(encoding)
# https://ja.wikipedia.org/wiki/%E7%A6%81%E5%89%87%E5%87%A6%E7%90%86
gyoto_kinsoku_moji = ',)\]}、〕〉》」』】〙〗〟’”⦆»ゝゞーァィゥェォッャュョヮヵヶぁぃぅぇぉっゃゅょゎゕゖㇰㇱㇲㇳㇴㇵㇶㇷㇸㇹㇷ゚ㇺㇻㇼㇽㇾㇿ々〻‐゠–〜~ ?!‼⁇⁈⁉・:;/。.'
gyomatu_kinsoku_moji = '(\[{〔〈《「『【〘〖〝‘“⦅«'
bunri_kinsi_moji = '—…‥〳〴〵'
cjk = '\p{Han}\p{Hiragana}\p{Katakana}'
patterns = [
"[#{cjk}][#{gyoto_kinsoku_moji}#{bunri_kinsi_moji}]+",
"[#{gyomatu_kinsoku_moji}]+[#{cjk}]",
"[#{cjk}]",
"[^#{ebc}]+#{eshy}",
"[^#{ebc}]+#{ehy}+",
"[^#{ebc}]+",
"[#{ews}]+",
"#{ehy}+[^#{ebc}]*",
eshy.to_s,
]
pattern = patterns
.map { |p| p.encode(encoding) }
.join('|')
Regexp.new(pattern)
end
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment