Skip to content

Instantly share code, notes, and snippets.

@kernelsmith
Created August 5, 2021 21:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kernelsmith/d03926b71f99b0672cf6a737b05919c1 to your computer and use it in GitHub Desktop.
Save kernelsmith/d03926b71f99b0672cf6a737b05919c1 to your computer and use it in GitHub Desktop.
Ruby string encoding defaults to UTF-8, but String#strip doesn't alter its definition of whitespace to match the encoding, it's always defined as: '\x00\t\n\v\f\r '. This does not include unicode whitespace no matter the string's encoding, see Ruby Regexp Character Classes for more info. It would appear that [[:space:]] does in fact include unic…
# This was probably encountered and overcome a long time ago, but I ran into it in my own Ruby dealings and thought maybe it could be an issue elsewhere:
# Ruby string encoding defaults to UTF-8, but String#strip doesn't alter its
# definition of whitespace to match the encoding
# https://ruby-doc.org/core-2.6.8/String.html#method-i-strip
# String#strip removes lead/trail whitespace defined as: '\x00\t\n\v\f\r '
# null, horiz tab, line feed, vert tab, form feed, carriage return, & space
# This does not include unicode whitespace no matter the string's encoding,
# see Regexp for more info
# https://ruby-doc.org/core-2.6.8/Regexp.html#class-Regexp-label-Character+Classes
# It would appear that [[:space:]] does in fact include unicode whitespace
# (at least) when the string encoding is UTF-8. Unicode has all sorts of
# whitespace chars that you've probably never heard of like ogham space mark
# https://en.wikipedia.org/wiki/Whitespace_character#Unicode
STRIP_REGEX = /(?:\A[[:space:]]+|[[:space:]]+\Z)/
def strip_harder!(str)
str.gsub!(STRIP_REGEX, '')
end
def strip_harder(str)
str.gsub(STRIP_REGEX, '')
end
s = "https://www.app.moc/support/security-bulletins.html\u00a0"
#=> "https://www.app.moc/support/security-bulletins.html "
strip_harder(s)
#=> "https://www.app.moc/support/security-bulletins.html"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment