Skip to content

Instantly share code, notes, and snippets.

@mjmusante
Last active July 13, 2023 01:14
Show Gist options
  • Save mjmusante/ec1e5a6df11e28b7be069bb11b5ef230 to your computer and use it in GitHub Desktop.
Save mjmusante/ec1e5a6df11e28b7be069bb11b5ef230 to your computer and use it in GitHub Desktop.

Fun with loops

Let's say you have a function which reads in a file, converts it into a script, and returns a Result. It takes one argument: the name of the file to load.

struct Script;
type ParseError = Box<dyn std::error::Error>;

fn load_script(_filename: &str) -> Result<Script, ParseError> {
  // open file, parse data, return Err(ParseError) on failure
  // then...
  Ok(Script {})
}

And now let's say you have a list of scripts to parse. For simplicity, let's make a global we can use for the rest of this gist:

static SCRIPT_LIST: [&str; 2] = ["script_foo.ext", "script_bar.ext"];

The simplest approach is to loop over the list, and call load_script() on each one. Whether we stop after the first failure, or process all of the scripts and then return, is a decision to be made by the programmer or the customer requirements. For now, let's say we abort on the first failure.

fn basic_loop() -> Result<Vec<Script>, ParseError> {
    let mut scripts = Vec::new();

    for filename in &SCRIPT_LIST {
        let script = load_script(filename)?;
        scripts.push(script);
    }

    Ok(scripts)
}

Very straightfowrard, very readable: loop over all names, call load_script() on each one, and collect the results in a mutable vector. We use the ? operator on the function call line to abort the loop on the first failure.

But we're Rustacians! We should be doing this functionally! We should avoid having mutable variables! Let's see what that looks like:

fn basic_map() -> Result<Vec<Script>, ParseError> {
    SCRIPT_LIST
        .iter()
        .map(|filename| load_script(filename))
        .collect()
}

This has the same effect - loop over the entries in the list, and abort on the first failure. It is a bit on the verbose side, though. One of the features of Rust is the ability to just pass in a function pointer to the map, if the conversion is simple. So instead of map(|filename| load_script(filename)) we should be able to write map(load_script). Let's try it!

fn fancy_basic_map() -> Result<Vec<Script>, ParseError> {
    SCRIPT_LIST
        .iter()
        .map(load_script)
        .collect()
}

Uh oh, we get an error message:

error[E0631]: type mismatch in function arguments
   --> src/main.rs:35:14
    |
6   | fn load_script(filename: &str) -> Result<Script, ParseError> {
    | ------------------------------------------------------------ found signature defined here
...
35  |         .map(load_script)
    |          --- ^^^^^^^^^^^ expected due to this
    |          |
    |          required by a bound introduced by this call
    |
    = note: expected function signature `fn(&&str) -> _`
               found function signature `for<'a> fn(&'a str) -> _`
note: required by a bound in `map`
   --> /Users/mmusante/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/iter/traits/iterator.rs:799:12
    |
799 |         F: FnMut(Self::Item) -> B,
    |            ^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `Iterator::map`
help: consider borrowing the argument
    |
6   | fn load_script(filename: &&str) -> Result<Script, ParseError> {
    |                          +

Fortunately, the error message tells us how to fix it. Just add another & to the function signature. Wait, really? That's pretty ugly. Let's try it:

fn load_script(filename: &&str) -> Result<Script, ParseError> {
  // ...
}

Yep, that builds and runs. But there should be a better way, right? Fortunately, there is. We just need to use a type that can be converted from a &&str to something usable. Rust provides the Path type in std::path::Path which implements AsRef<Path> for str. And rust will deref those for us as needed

fn load_script<T: AsRef<std::path::Path>>(filename: T) -> Result<Script, ParseError> {
  // ...
}

With this updated function signature, we can now do:

fn basic_fancy_map() -> Result<Vec<Script>, ParseError> {
  SCRIPT_LIST.iter().map(load_script).collect()
}

Great! Is that easier to read than the for-loop that we started out with? Possibly. It's subjective, right? If you're used to reading that style, it's easy enough to grok. If not, then the for-loop works.

But what if we want to go one step further? Instead of aborting on the first failure, let's collect the failures into their own separate vector.

Let's start with the for-loop version:

fn full_loop() -> (Vec<Script>, Vec<ParseError>) {
    let mut scripts = Vec::new();
    let mut errors = Vec::new();

    for filename in &SCRIPT_LIST {
        match load_script(filename) {
            Ok(script) => scripts.push(script),
            Err(e) => errors.push(e),
        }
    }

    (scripts, errors)
}

OK, now can we make this into something rust-functional-ish? There is an iterator function called partition() which will do this for us, sort of. It takes a closure which returns true or false in order to split the values into two separate vectors.

Let's give it a shot:

fn full_map() -> (Vec<Script>, Vec<ParseError>) {
    SCRIPT_LIST
        .iter()
        .map(load_script)
        .partition(Result::is_ok)
}

The Result::is_ok is a function on Result which will return true/false, so that's perfect for our needs, right? Not quite:

error[E0308]: mismatched types
  --> src/main.rs:63:5
   |
62 |   fn full_map() -> (Vec<Script>, Vec<ParseError>) {
   |                    ------------------------------ expected `(Vec<Script>, Vec<Box<(dyn std::error::Error + 'static)>>)` because of return type
63 | /     SCRIPT_LIST
64 | |         .iter()
65 | |         .map(load_script)
66 | |         .partition(Result::is_ok)
   | |_________________________________^ expected `(Vec<Script>, Vec<...>)`, found `(_, _)`
   |
   = note: expected tuple `(Vec<Script>, Vec<Box<(dyn std::error::Error + 'static)>>)`
              found tuple `(_, _)`

What's happening here is that the partition is dividing up the Ok from the Err but they're still both Result types. We want to extract the ok and error values separately as their own types. It means doing a bit more work:

fn full_map() -> (Vec<Script>, Vec<ParseError>) {
    let (scripts, errs): (Vec<_>, Vec<_>) = SCRIPT_LIST
        .iter()
        .map(load_script)
        .partition(Result::is_ok);
    (
        scripts.iter().map(Result::unwrap).collect(),
        errs.iter().map(Result::unwrap_err).collect(),
    )
}

So now we extract the two Vec<Result<...>> objects and then iterate over them, mapping them either to an unwrap() or an unwrap_err() function. We know those calls will succeed because the partition guarantees that. However, we still didn't get it right:

error[E0631]: type mismatch in function arguments
   --> src/main.rs:78:28
    |
78  |         scripts.iter().map(Result::unwrap).collect(),
    |                        --- ^^^^^^^^^^^^^^
    |                        |   |
    |                        |   expected due to this
    |                        |   found signature defined here
    |                        required by a bound introduced by this call
    |
    = note: expected function signature `fn(&Result<Script, Box<dyn std::error::Error>>) -> _`
               found function signature `fn(Result<_, _>) -> _`
note: required by a bound in `map`

The issue here is that the map was given a &Result and we can't pass that into a function that's expecting a Result. We need to give the map the right thing, and we can do that with into_iter(). We have to consume the iterator instead of just iterating by reference. Once we do that, we have:

fn full_map() -> (Vec<Script>, Vec<ParseError>) {
    let (scripts, errs): (Vec<_>, Vec<_>) = SCRIPT_LIST
        .iter()
        .map(load_script)
        .partition(Result::is_ok);
    (
        scripts.into_iter().map(Result::unwrap).collect(),
        errs.into_iter().map(Result::unwrap_err).collect(),
    )
}

And this builds. Is it better than the for-loop? I don't think so. This is a situation in which the "functional" or "rusty" way of doing something actually makes readability worse.

And, before you ask about the O(2n) issue here, I don't know whether there is one. Rust's optimiser is very good and may figure out how to make it O(n) on its own. I'll leave that as an exercise for the reader, and instead I'll just use the for-loop variant of this solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment