Skip to content

Instantly share code, notes, and snippets.

@maciej
Last active November 3, 2022 08:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maciej/bc8a1b43a2d65b0c6f6d9368617a7da2 to your computer and use it in GitHub Desktop.
Save maciej/bc8a1b43a2d65b0c6f6d9368617a7da2 to your computer and use it in GitHub Desktop.
Impact of buffer size on line-splitting bufio.Scanner in Go

Go sets the default buffer size of na new bufio.Scanner to 4096 bytes. I've scanned a ~155MB file with an average line length of 62 bytes with smaller and larger buffer sizes. Here are the results running on a 2019 x86 MacBookPro:

file size: 155605069
lines: 2501619
avg line length: 62.20
  1024: 182.103886ms
  2048: 116.351501ms
  4096: 85.373947ms
  8192: 69.776855ms
 16384: 62.339557ms
 32768: 56.198547ms
 65536: 53.285957ms

The results were fairly consistent between multiple runs.

package main
import (
"bufio"
"flag"
"fmt"
"os"
"time"
)
var input = flag.String("input", "", "Input file")
func main() {
flag.Parse()
if *input == "" {
panic("no input provided")
}
f, err := os.Open(*input)
if err != nil {
panic(fmt.Sprintf("error opening input: %v", err))
}
defer f.Close()
stat, err := f.Stat()
if err != nil {
panic(fmt.Sprintf("stat failed: %v", err))
}
// Read the file once for warm-up
linesRead, _ := timeScan(f, 4096)
fmt.Printf("file size: %d\n", stat.Size())
fmt.Printf("lines: %d\n", linesRead)
fmt.Printf("avg line length: %.2f\n", float64(stat.Size())/float64(linesRead))
for i := 1; i <= 64; i *= 2 {
bufSize := i * 1024
_, duration := timeScan(f, bufSize)
fmt.Printf("%6d: %v\n", bufSize, duration)
}
}
func timeScan(f *os.File, bufSize int) (linesRead int, d time.Duration) {
if _, err := f.Seek(0, 0); err != nil {
panic("seek failed")
}
start := time.Now()
defer func() {
d = time.Now().Sub(start)
}()
s := bufio.NewScanner(f)
s.Split(bufio.ScanLines)
s.Buffer(make([]byte, bufSize), bufio.MaxScanTokenSize)
for s.Scan() {
linesRead++
_ = s.Bytes()
}
return
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment