Skip to content

Instantly share code, notes, and snippets.

@wldhx
Last active February 7, 2018 15:30
Show Gist options
  • Save wldhx/3f2f348a899ab9a0cfd565dcb823b75a to your computer and use it in GitHub Desktop.
Save wldhx/3f2f348a899ab9a0cfd565dcb823b75a to your computer and use it in GitHub Desktop.
Few trivial code samples contrasting Python and Rust

Python vs. Rust

(For reference: I know why it is the way it is, the stark difference is just kind of fun to contrast on trivial examples.)

Reading from stdin

Python

a = input()
print(a)

Rust

fn main() {
    // make a scope
    let a = {
        // we'll need a buffer to read into
        let mut buf = String::new();

        // won't bother with error handling here - just unwrap the Result
        std::io::stdin().read_line(&mut buf).unwrap();

        // get rid of Unicode whitespaces on ends of input
        // and convert &str to String to get proper lifetimes
        buf.trim().to_string()
    };
    println!("{}", &a);
}

String indexing

ASCII

Python

a = "abc"
print(a[1])

Rust

fn main() {
    let a = "abc";

    // get nth element of characters iterator
    // this might fail at runtime due to index being out of bounds
    // therefore we unwrap
    let b = a.chars().nth(1).unwrap();

    println!("{}", &b);
}

Unicode

Python

from uniseg import graphemecluster

a = "a̐éö̲\r\n"
b = list(graphemecluster.grapheme_clusters(a))
print(b[1])

Rust

extern crate unicode_segmentation;

// UnicodeSegmentation is implemented for str
// therefore, using it gives our strs additional methods
use unicode_segmentation::UnicodeSegmentation;

fn main() {
    let a = "a̐éö̲\r\n";
    let b = &a.graphemes(true)   // returns an iterator
        .collect::<Vec<&str>>(); // collect it and specify type to cast to
    println!("{}", &b[1]);
}
@Omrigan
Copy link

Omrigan commented Jun 15, 2016

According to those examples, Rust doesn't seem easy 😆

@wldhx
Copy link
Author

wldhx commented Jun 15, 2016

@Omrigan I just fixed #string-indexing: actually, it's not as easy in Python too.

Rust in fact has good reasons to have things the way they are.

Yes, reading from stdin might be kind of complex, but it leaves any magic out. read_line's implementation is literally one line, and we have text_io crate which hides all those buffers - shall you not want them in your way - under pretty macros.

String indexing is a more fun topic. The thing is, strings are Complicated. As in there are tons of variants of encoding visibly the same one even within one encoding (yes, UTF-8, I'm looking at you): that is, characters, bytes and graphemes in Unicode are things which don't correspond basically.
So, by default Python indexes by chars, which is fine and all until we get to Unicode. Than everything breaks. Rust, however, makes whatever you're doing explicit: you either take a .chars() iterator, .bytes() iterator, or a .graphemes() one (also, note how beautifully just importing a library extends our str type).
So, that was mostly about explicitness, but Python could do better with internal storage / operations too: details here, I'll quote a bit:

For the vast majority of programs there is no encoding/decoding necessary because they accept UTF-8, just need to run a cheap validation check, process on UTF-8 strings and then don't need an encode on the way out. If they need to integrate with Windows Unicode APIs they internally use the WTF-8 encoding which quite cheaply can convert to UCS2 like UTF-16 and back.

At any point can you convert between Unicode and bytes and munch with the bytes as you need. Then you can later run a validation step and ensure that everything went as intended. This makes writing protocols both really fast and really convenient. Compared this to the constant encoding and decoding you have to deal with in Python just to support O(1) string indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment