Created
July 5, 2012 09:28
-
-
Save larryzhao/3052569 to your computer and use it in GitHub Desktop.
MatchData.begin(n) always returns 0 on jruby with chinese/japanese characters in it
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ jruby --1.9 -S test.rb | |
==== Text in English ==== | |
#<MatchData "@chichi dog dog" 1:"@" 2:"chichi" 3:" dog dog"> | |
1. 11 | |
2. 12 | |
3. 18 | |
==== Text in Chinese ==== | |
#<MatchData "@chichi 狗狗" 1:"@" 2:"chichi" 3:" 狗狗"> | |
1. 0 | |
2. 0 | |
3. 0 | |
==== Text in Japanese ==== | |
#<MatchData "@chichi ドッグ" 1:"@" 2:"chichi" 3:" ドッグ"> | |
1. 0 | |
2. 0 | |
3. 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ruby test.rb | |
==== Text in English ==== | |
#<MatchData "@chichi dog dog" 1:"@" 2:"chichi" 3:" dog dog"> | |
1. 11 | |
2. 12 | |
3. 18 | |
==== Text in Chinese ==== | |
#<MatchData "@chichi 狗狗" 1:"@" 2:"chichi" 3:" 狗狗"> | |
1. 6 | |
2. 7 | |
3. 13 | |
==== Text in Japanese ==== | |
#<MatchData "@chichi ドッグ" 1:"@" 2:"chichi" 3:" ドッグ"> | |
1. 12 | |
2. 13 | |
3. 19 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# encoding: utf-8 | |
text = 'i love dog @chichi dog dog' | |
text_cn = '我爱你狗狗 @chichi 狗狗' | |
text_jp = '私はあなたの犬を愛して @chichi ドッグ' | |
reg = /([@@])([a-zA-Z0-9_]{1,20})(.*)/o | |
puts "==== Text in English ====" | |
text.scan(reg) do |before, at, screen_name, list_slug| | |
puts $~.inspect | |
puts "1. #{$~.begin(1)}" | |
puts "2. #{$~.begin(2)}" | |
puts "3. #{$~.begin(3)}" | |
end | |
puts "==== Text in Chinese ====" | |
text_cn.scan(reg) do |before, at, screen_name, list_slug| | |
puts $~.inspect | |
puts "1. #{$~.begin(1)}" | |
puts "2. #{$~.begin(2)}" | |
puts "3. #{$~.begin(3)}" | |
end | |
puts "==== Text in Japanese ====" | |
text_jp.scan(reg) do |before, at, screen_name, list_slug| | |
puts $~.inspect | |
puts "1. #{$~.begin(1)}" | |
puts "2. #{$~.begin(2)}" | |
puts "3. #{$~.begin(3)}" | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment