public
Last active

MatchData.begin(n) always returns 0 on jruby with chinese/japanese characters in it

  • Download Gist
result with jruby-1.6.7.2 1.9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
$ jruby --1.9 -S test.rb
==== Text in English ====
#<MatchData "@chichi dog dog" 1:"@" 2:"chichi" 3:" dog dog">
1. 11
2. 12
3. 18
==== Text in Chinese ====
#<MatchData "@chichi 狗狗" 1:"@" 2:"chichi" 3:" 狗狗">
1. 0
2. 0
3. 0
==== Text in Japanese ====
#<MatchData "@chichi ドッグ" 1:"@" 2:"chichi" 3:" ドッグ">
1. 0
2. 0
3. 0
result with ruby-1.9.2-p290
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
$ ruby test.rb
==== Text in English ====
#<MatchData "@chichi dog dog" 1:"@" 2:"chichi" 3:" dog dog">
1. 11
2. 12
3. 18
==== Text in Chinese ====
#<MatchData "@chichi 狗狗" 1:"@" 2:"chichi" 3:" 狗狗">
1. 6
2. 7
3. 13
==== Text in Japanese ====
#<MatchData "@chichi ドッグ" 1:"@" 2:"chichi" 3:" ドッグ">
1. 12
2. 13
3. 19
test.rb
Ruby
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
# encoding: utf-8
 
text = 'i love dog @chichi dog dog'
text_cn = '我爱你狗狗 @chichi 狗狗'
text_jp = '私はあなたの犬を愛して @chichi ドッグ'
reg = /([@@])([a-zA-Z0-9_]{1,20})(.*)/o
 
puts "==== Text in English ===="
text.scan(reg) do |before, at, screen_name, list_slug|
puts $~.inspect
 
puts "1. #{$~.begin(1)}"
puts "2. #{$~.begin(2)}"
puts "3. #{$~.begin(3)}"
end
 
puts "==== Text in Chinese ===="
text_cn.scan(reg) do |before, at, screen_name, list_slug|
puts $~.inspect
 
puts "1. #{$~.begin(1)}"
puts "2. #{$~.begin(2)}"
puts "3. #{$~.begin(3)}"
end
 
puts "==== Text in Japanese ===="
text_jp.scan(reg) do |before, at, screen_name, list_slug|
puts $~.inspect
 
puts "1. #{$~.begin(1)}"
puts "2. #{$~.begin(2)}"
puts "3. #{$~.begin(3)}"
end

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.