Skip to content

Instantly share code, notes, and snippets.

Created November 3, 2010 15:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anonymous/661217 to your computer and use it in GitHub Desktop.
Save anonymous/661217 to your computer and use it in GitHub Desktop.
UTF-8 aware string chop
# UTF-8 aware string chop. Returns an array with two elements, the first
# contains the iven string excluding the last character, and the second,
# also the last, contains the last character.
def chop_utf8(s)
return unless s
a = s.unpack('C*')
c, w = 0, 0
lead, last = '', ''
while c < a.length
case a[c]
when 0x00..0x7E; w = 1
when 0xC2..0xDF; w = 2
when 0xE0..0xEF; w = 3
when 0xF0..0xF4; w = 4
else w = 1 # other ASCII
end
if (c + w) >= a.length
last = a[c..c+(w-1)].pack('c*')
else
lead << a[c..c+(w-1)].pack('c*')
end
c += w
end
[lead, last]
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment