Skip to content

Instantly share code, notes, and snippets.

@dginev
Created September 29, 2015 04:10
Show Gist options
  • Save dginev/f6da5e94335d545e0a7b to your computer and use it in GitHub Desktop.
Save dginev/f6da5e94335d545e0a7b to your computer and use it in GitHub Desktop.
UTF8 safe truncate for Rust strings
fn utf8_truncate(input : &mut String, maxsize: usize) {
let mut utf8_maxsize = input.len();
if utf8_maxsize >= maxsize {
{ let mut char_iter = input.char_indices();
while utf8_maxsize >= maxsize {
utf8_maxsize = match char_iter.next_back() {
Some((index, _)) => index,
_ => 0
};
} } // Extra {} wrap to limit the immutable borrow of char_indices()
input.truncate(utf8_maxsize);
}
}
@dginev
Copy link
Author

dginev commented Sep 29, 2015

Thanks to aatch from IRC, for the minimal UTF8 smiley mutlilation example demonstrating why this is needed:

playbot: let mut s = String::from("☺");  s.truncate(2);
playbot: thread '<main>' panicked at 'assertion failed: 
             self.is_char_boundary(new_len)', ../src/libcollections/string.rs:530

While .truncate() and .len() work on raw bytes, UTF8-encoded characters often use several bytes, so blind truncating would break the encoding.

P.S. The irony that I couldn't use the original example's smiley char because GitHub refuses high-end Unicode:

Sorry! We couldn't save your comment — your comment contains unicode characters above 0xffff. Please try again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment