Skip to content

Instantly share code, notes, and snippets.

@hadronized
Created October 16, 2016 12:10
Show Gist options
  • Save hadronized/17bca63ba60589073047921bf8a728cd to your computer and use it in GitHub Desktop.
Save hadronized/17bca63ba60589073047921bf8a728cd to your computer and use it in GitHub Desktop.
mmap explaination
14:07 < phaazon> is there a way to have access to mmap in rust?
14:08 < kimundi> phaazon: Sure, and there are crates for it too
14:08 < phaazon> kimundi: I thought it’d be in libc
14:08 < kimundi> phaazon: The main issue is just that it is unsafe to create a safe slice from the memroy region returned by it if it backs a file
14:08 < phaazon> https://docs.rs/libc/0.2.17/libc/fn.mmap.html
14:08 < phaazon> ok, it is :D
14:09 < phaazon> kimundi: I’m actually discovering the use of mmap
14:09 < phaazon> I don’t quite really get the point
14:09 < phaazon> if I understand correctly, it creates a mapping from a file into RAM
14:09 < kimundi> yes
14:09 < phaazon> so, why are we not always using mmap instead of read() ?
14:09 < phaazon> it seems more performant
14:10 < phaazon> and also, how does it work?
14:10 < phaazon> I saw page related issues
14:10 < bleeding_stars> if the file is truncated under your feet, you'll get a segfault
14:10 < phaazon> like, it loads the file page per page
14:10 < bleeding_stars> moreover, faulting on a page that's not already in RAM is slow, significantly slower than a syscall
14:10 < bleeding_stars> mmap is useful in some cases but not all
14:11 < bleeding_stars> the way it works is the OS internally records that the page corresponds to a file, and marks the page as nonexistent
14:11 < phaazon> bleeding_stars: I’ve been advised to use mmap to parse a huge file
14:11 < phaazon> (15 Mo)
14:11 < phaazon> instead of loading / streaming it to memory
14:11 < phaazon> I don’t really get why
14:11 < bleeding_stars> when your process tries to access it, it causes a page fault, the OS sees that the page is mapped to a file, and loads the page from disk directly into that page
14:11 < phaazon> to me, a BufReader is exactly there for that very purpose
14:12 < phaazon> bleeding_stars: ok, I see
14:12 < phaazon> so if you “advance” into the memory
14:12 < phaazon> it will unload previous pages and load a new page from the file
14:12 < bleeding_stars> indeed, streaming from the disk via mmap() can easily be slower than BufReader
14:12 < phaazon> hence, the memory consumption is low
14:12 -!- jrmuizel [jrmuizel@moz-kp4912.fibre.fibrestream.ca] has quit [Connection closed]
14:12 < bleeding_stars> in general, if you're accessing something once, mmap() is almost certainly slower than using read()
14:13 < phaazon> bleeding_stars: it’s for an IRC bot
14:13 < phaazon> when someone types in !q nick
14:13 < phaazon> I need to go through the log file
14:13 -!- jrmuizel [jrmuizel@moz-kp4912.fibre.fibrestream.ca] has joined #rust
14:13 < phaazon> looking for nick talking
14:13 < bleeding_stars> the benefit of mmap() is that if you are repeatedly accessing a few places in the file, it's extremely cheap, and also the OS can unmap the pages when it's under memory pressure
14:13 < bleeding_stars> for an IRC bot it will be inconsequential
14:13 < phaazon> ok I see
14:13 < phaazon> so
14:13 < phaazon> mmap is good for random access
14:13 < bleeding_stars> don't bother. the bugs aren't worth it, and any speedup will be eaten by IO latency
14:13 < phaazon> (that’s why it loads into, RAM, like, random access memory :D)
14:14 < bleeding_stars> yes. mmap() makes sense if you have, say, a database.
14:14 < phaazon> oook
14:14 < phaazon> thank you very much for the crystal clear explaination!
14:14 < bleeding_stars> yw
14:14 < phaazon> I’ll go with a BufReader + streaming + regex :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment