Created
October 16, 2016 12:10
-
-
Save hadronized/17bca63ba60589073047921bf8a728cd to your computer and use it in GitHub Desktop.
mmap explaination
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14:07 < phaazon> is there a way to have access to mmap in rust? | |
14:08 < kimundi> phaazon: Sure, and there are crates for it too | |
14:08 < phaazon> kimundi: I thought it’d be in libc | |
14:08 < kimundi> phaazon: The main issue is just that it is unsafe to create a safe slice from the memroy region returned by it if it backs a file | |
14:08 < phaazon> https://docs.rs/libc/0.2.17/libc/fn.mmap.html | |
14:08 < phaazon> ok, it is :D | |
14:09 < phaazon> kimundi: I’m actually discovering the use of mmap | |
14:09 < phaazon> I don’t quite really get the point | |
14:09 < phaazon> if I understand correctly, it creates a mapping from a file into RAM | |
14:09 < kimundi> yes | |
14:09 < phaazon> so, why are we not always using mmap instead of read() ? | |
14:09 < phaazon> it seems more performant | |
14:10 < phaazon> and also, how does it work? | |
14:10 < phaazon> I saw page related issues | |
14:10 < bleeding_stars> if the file is truncated under your feet, you'll get a segfault | |
14:10 < phaazon> like, it loads the file page per page | |
14:10 < bleeding_stars> moreover, faulting on a page that's not already in RAM is slow, significantly slower than a syscall | |
14:10 < bleeding_stars> mmap is useful in some cases but not all | |
14:11 < bleeding_stars> the way it works is the OS internally records that the page corresponds to a file, and marks the page as nonexistent | |
14:11 < phaazon> bleeding_stars: I’ve been advised to use mmap to parse a huge file | |
14:11 < phaazon> (15 Mo) | |
14:11 < phaazon> instead of loading / streaming it to memory | |
14:11 < phaazon> I don’t really get why | |
14:11 < bleeding_stars> when your process tries to access it, it causes a page fault, the OS sees that the page is mapped to a file, and loads the page from disk directly into that page | |
14:11 < phaazon> to me, a BufReader is exactly there for that very purpose | |
14:12 < phaazon> bleeding_stars: ok, I see | |
14:12 < phaazon> so if you “advance” into the memory | |
14:12 < phaazon> it will unload previous pages and load a new page from the file | |
14:12 < bleeding_stars> indeed, streaming from the disk via mmap() can easily be slower than BufReader | |
14:12 < phaazon> hence, the memory consumption is low | |
14:12 -!- jrmuizel [jrmuizel@moz-kp4912.fibre.fibrestream.ca] has quit [Connection closed] | |
14:12 < bleeding_stars> in general, if you're accessing something once, mmap() is almost certainly slower than using read() | |
14:13 < phaazon> bleeding_stars: it’s for an IRC bot | |
14:13 < phaazon> when someone types in !q nick | |
14:13 < phaazon> I need to go through the log file | |
14:13 -!- jrmuizel [jrmuizel@moz-kp4912.fibre.fibrestream.ca] has joined #rust | |
14:13 < phaazon> looking for nick talking | |
14:13 < bleeding_stars> the benefit of mmap() is that if you are repeatedly accessing a few places in the file, it's extremely cheap, and also the OS can unmap the pages when it's under memory pressure | |
14:13 < bleeding_stars> for an IRC bot it will be inconsequential | |
14:13 < phaazon> ok I see | |
14:13 < phaazon> so | |
14:13 < phaazon> mmap is good for random access | |
14:13 < bleeding_stars> don't bother. the bugs aren't worth it, and any speedup will be eaten by IO latency | |
14:13 < phaazon> (that’s why it loads into, RAM, like, random access memory :D) | |
14:14 < bleeding_stars> yes. mmap() makes sense if you have, say, a database. | |
14:14 < phaazon> oook | |
14:14 < phaazon> thank you very much for the crystal clear explaination! | |
14:14 < bleeding_stars> yw | |
14:14 < phaazon> I’ll go with a BufReader + streaming + regex :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment