Skip to content

Instantly share code, notes, and snippets.

@jordansissel
Created December 17, 2011 21:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jordansissel/1491437 to your computer and use it in GitHub Desktop.
Save jordansissel/1491437 to your computer and use it in GitHub Desktop.
Confused unescaping happening when doing String#sub ?
string = "hello world"
expected = DATA.read.chomp
# Replace the entire string with the expected value.
actual = string.sub(string, expected)
puts "Input: #{string}"
puts "Expected result: #{expected}"
puts "Actual result: #{actual}"
puts "Equal: #{actual == expected}"
__END__
Hurray for slashes! \\testing//

While debugging some ruby grok bugs, I found something very strange where String#sub replaces all double backslashes with single backslashes

Input: hello world
Expected result: Hurray for slashes! \\testing//
Actual result: Hurray for slashes! \testing//
Equal: false

This is quite confusing. I said "replace this string with that string" and it took the 'that string' and replaced all double backslashes with single backslashes? This is very strange and certainly a bug.

This is probably an integration problem due to String#sub supporting captured groups when used with regexps; from the ruby docs - http://www.ruby-doc.org/core-1.9.3/String.html :

If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \d, where d is a group number, or \k, where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash. However, within replacement the special match variables, such as &$, will not refer to the current match.

I think this is a bug. If the first argument to String#sub() is a string, then there will be no capturing so there is no need to process backslashes and capture groups in the replacement string.

@headius
Copy link

headius commented Dec 17, 2011

I don't think this a bug as much as a peculiar edge case. Keep in mind that even with a String there's still a group available: https://gist.github.com/1491509

I think it would be odd to have backrefs only be processed when passing in a Regexp, since there's almost certainly cases out there using the \0 pattern with a String.

@meineerde
Copy link

I noticed the same effect but hadn't tracked it down that far. However, I noticed that it works with the block syntax as done here: https://github.com/chiliproject/chiliproject/blob/unstable/lib/chili_project/liquid/template.rb#L88

Both code lines code should be equivalent and behave the same but in fact they don't (with key and value both being strings):

result.sub!(key, value)

vs.

result.sub!(key) { |match| value }

@jordansissel
Copy link
Author

@meineerde - I hadn't tried the block method, but I will now

Prior I was doing this:

hack = string.sub(string, expected.sub("\\", "\\\\\\\\"))

The block mechanism works and I'll use that instead of the string escaping madness.

@nicklewis
Copy link

I can see why this is somewhat surprising, but it doesn't seem incorrect. It would be more surprising if it were inconsistent between the String case and the Regexp case, I think.

And @meineerde, per https://github.com/rubyspec/rubyspec/blob/master/core/string/sub_spec.rb#L274, it seems \ sequences are indeed ignored for the block. Presumably this is because inside the block, you have $1, etc available and don't need other special rules.

@jordansissel
Copy link
Author

@headius - I buy the \0 usage, but not fully. If you do:

foo.sub(bar, "fizzle \0")

It would be just as trivial to, say, do this:

foo.sub(bar, "fizzle #{bar}")

And probably wouldn't be as confusing to me. Still, I suppose I am glad that all rubies do this and not just one ruby ;)

@jordansissel
Copy link
Author

The reason I noticed this is because both arguments to my String#sub() call are from user input, and the user, pleasantly unknowing of the implementation oddities, has no reason to expect that two consecutive backslashes should be transformed into a single backslash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment