Skip to content

Instantly share code, notes, and snippets.

@k12ish
Created July 15, 2021 21:31
Show Gist options
  • Save k12ish/ba9bdb36fe2011186e08c23a1d137f10 to your computer and use it in GitHub Desktop.
Save k12ish/ba9bdb36fe2011186e08c23a1d137f10 to your computer and use it in GitHub Desktop.
Find the byte index of substrings, unambiguously.
use core::ops::Range;
fn main() {
let quote = String::from("I scream, you scream, we all scream for ice cream");
assert_eq!(&quote[2..8], "scream");
assert_eq!(&quote[14..20], "scream");
assert_eq!(&quote[29..35], "scream");
// substring_index(&quote, "scream")
assert_eq!(substring_index(&quote, &quote[2..8]), 2..8);
assert_eq!(substring_index(&quote, &quote[14..20]), 14..20);
assert_eq!(substring_index(&quote, &quote[29..35]), 29..35);
// quote.find( "scream" )
assert_eq!(quote.find(&quote[2..8]), Some(2));
assert_eq!(quote.find(&quote[14..20]), Some(2));
assert_eq!(quote.find(&quote[29..35]), Some(2));
}
/// Return the range of bytes where substring is in string
fn substring_index(string: &str, substring: &str) -> Range<usize> {
let start_ptr: *const u8 = string.as_ptr();
let substring_ptr: *const u8 = substring.as_ptr();
let end_ptr: *const u8;
// SAFETY:
//
// “Boxed types never allocate more than isize::MAX bytes, so
// vec.as_ptr().add(vec.len()) is always safe.”
//
// --- https://doc.rust-lang.org/std/primitive.pointer.html#safety-5
unsafe {
end_ptr = start_ptr.add(string.len());
}
// Check if we have a 'true' substring, ie start_ptr <= substring_ptr <= end_ptr
assert!(start_ptr <= substring_ptr && substring_ptr <= end_ptr);
let start;
// SAFETY:
// The pointers are in bounds, derived from same object and a multiple
// of 8 bits apart.
// This operation is equivalent to `ptr_into_vec.offset_from(vec.as_ptr())`,
// so it will never "wrap around" the address space.
unsafe { start = substring_ptr.offset_from(start_ptr) as usize }
Range {
start,
end: start + substring.len(),
}
}
@k12ish
Copy link
Author

k12ish commented Jul 16, 2021

What is left is to write a few tests:

#[test]
fn test_all() {
    let text = String::from("abcdefghijklmnopqrstuvwxyz1234567890");
    let all_slices = (0..text.len())
        .into_iter()
        .flat_map(|x| (x..text.len()).map(move |y| x.clone()..y));
    for range in all_slices {
        assert_eq!(substring_index(&text, &text[range.clone()]), range);
    }
}

Then check for undefined behaviour using miri:

krish@Krish-PC:~/Desktop/rust/substring_index/src$ cargo miri test --release
    Finished test [optimized] target(s) in 0.00s
     Running unittests (/home/krish/Desktop/rust/substring_index/target/miri/x86_64-
unknown-linux-gnu/debug/deps/substring_index-d85b85580ca31c27)

running 1 test
test test_all ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment