Skip to content

Instantly share code, notes, and snippets.

@skeeto
Last active August 29, 2021 22:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save skeeto/d8b18fb4663f4a282e8c0db252a70faf to your computer and use it in GitHub Desktop.
Save skeeto/d8b18fb4663f4a282e8c0db252a70faf to your computer and use it in GitHub Desktop.
quniq
@jftuga
Copy link

jftuga commented May 5, 2020

Hi skeeto,

I wonder if you remember your post:

I just learned how to compile C code under Windows using the command line version of the C compiler, cl.
This was nice to be able to compile within a Docker container and thus not having to install VS. Since I just
figured out how to do this, I thought I would compare them and share the results.

I was testing quniq against my Go program found here:

Benchmarks

Here are some benchmarks under Linux and Windows 10.

  • File size: of test-file 'c' is 4,969,612,772 bytes

  • It has 259,373,437 lines, 372,762 are unique

  • File created under Windows with this command:

    • dir /s/b c:\ | sed "s,\\,\n,g" > c
  • Under Windows, test-file was located on a RAM disk

  • Under AWS Linux EC2, test-file was located on a NVMe disk

  • freq was compiled on both platforms with: go build -ldflags="-s -w"

  • quniq was compiled under Linux with: gcc -O3 quniq.c -o quniq

  • Under Windows, it was compiled and linked with the same options that zlib uses:

  • cl -c -nologo -MD -W3 -O2 -Oy- -Zi quniq.c

  • link -nologo -debug -incremental:no -opt:ref quniq.obj

Results

(mumbai1)(~) uname -a
Linux ip-172-31-28-145.ap-south-1.compute.internal 4.14.173-137.229.amzn2.x86_64
#1 SMP Wed Apr 1 18:06:08 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

(ec2)(~) time freq -b < /tmp/c > /dev/null

real    0m40.389s
user    0m40.698s
sys     0m2.418s

(ec2)(~) time quniq < /tmp/c > /dev/null

real    0m36.011s
user    0m9.322s
sys     0m2.534s

R:\>osver.exe
10.0.17763.1158

R:\>gdate & freq -b < c > nul & gdate
Tue May  5 06:13:06 EDT 2020
Tue May  5 06:13:26 EDT 2020

R:\>set /a 26 - 6
20 seconds

R:\>gdate & quniq < c > nul & gdate
Tue May  5 06:10:03 EDT 2020
Tue May  5 06:10:24 EDT 2020

R:\>set /a 24 - 3
21 (seconds)

I have not done enough experimentation to draw any conclusions other than they run about the same speed, while quniq uses less memory than freq.

If you have the time, I would be interested in knowing if there is any way that you see to speed up my Go program.

Direct Link:

https://github.com/jftuga/freq/blob/master/freq.go

Thanks,

-John

@skeeto
Copy link
Author

skeeto commented May 5, 2020 via email

@jftuga
Copy link

jftuga commented May 5, 2020

Hi Chris,

This is some really great feedback! I really appreciate it. I have a habit of switching between vim and VSCode, which automatically uses gofmt. I need to get into the habit of running it when using vim or find some type of plug-in which does this on save.

I am going to experiment with buffered output and using byte slices instead of strings -- those are excellent ideas.

Could you please give me a good GitHub example that is not too complex but implements a configuration struct and a "filter" interface? I am not too familiar with these paradigms. You also mentioned good design is generally preferred over a small performance boost. My code is definitely ugly (but works). I would love to make it cleaner and more maintainable.

Thanks again,

-John

@skeeto
Copy link
Author

skeeto commented May 5, 2020

Could you please give me a good GitHub example that is not too complex but implements a configuration struct

https://github.com/skeeto/passphrase2pgp/blob/870b0a4/passphrase2pgp.go#L83

That struct is filled out by the option parser and gets passed around the program so that the different components can access the parts of the configuration they need. If I add or remove an option I don't have to change all the call sites to add/remove parameters/arguments.

and a "filter" interface?

Here are two approaches: functional and interface. I'll start with the one I prefer:

package main

import (
	"fmt"
	"strings"
)

type filter func(string) string

func identity(s string) string {
	return s
}

func chain(a, b filter) filter {
	return func(s string) string {
		return b(a(s))
	}
}

func substring(beg, end int) filter {
	return func(s string) string {
		return s[beg:end]
	}
}

func reverse(s string) string {
	r := []rune(s)
	for i, j := 0, len(r)-1; i < len(r)/2; i, j = i+1, j-1 {
		r[i], r[j] = r[j], r[i]
	}
	return string(r)
}

func main() {
	f := filter(identity)
	f = chain(f, strings.ToLower)
	f = chain(f, substring(0, 4))
	f = chain(f, reverse)
	fmt.Println(f("FooBar"))
}

Note how I start with the identity filter, then chain on new filters dynamically. Your option parser could append new filters as it parses arguments, so that the order of filters depends on the order the of arguments. Also notice how I could use strings.ToLower directly because it was already a filter.

Here's the interface version:

package main

import (
	"fmt"
	"strings"
)

type filter interface {
	Transform(string) string
}

type identity struct{}

func (_ *identity) Transform(s string) string {
	return s
}

type chain [2]filter

func (c *chain) Transform(s string) string {
	return c[1].Transform(c[0].Transform(s))
}

type lower struct{}

func (_ *lower) Transform(s string) string {
	return strings.ToLower(s)
}

type substring struct{ beg, end int }

func (x *substring) Transform(s string) string {
	return s[x.beg:x.end]
}

type reverse struct{}

func (_ *reverse) Transform(s string) string {
	r := []rune(s)
	for i, j := 0, len(r)-1; i < len(r)/2; i, j = i+1, j-1 {
		r[i], r[j] = r[j], r[i]
	}
	return string(r)
}

func main() {
	f := filter(&identity{})
	f = &chain{f, &lower{}}
	f = &chain{f, &substring{0, 4}}
	f = &chain{f, &reverse{}}
	fmt.Println(f.Transform("FooBar"))
}

Quite a bit more verbose, but potentially more flexible if you filters become more complex. In the function version the substring filter captured its configuration in a closure. In the interface version it's stored on a struct.

@jftuga
Copy link

jftuga commented May 6, 2020

Thanks again for another great reply - and what a cool reverse() function! I also noticed that your passphrase code imports your own optparse library. The configuration struct idea seems pretty straight forward.

The functional version it easier for me to follow. The chaining concept looks really useful. I am going to experiment with it. OTOH, the interface version seems syntactically cumbersome and is harder for me to follow. I think ease of use is going to win out for me vs potential flexibility.

@skeeto
Copy link
Author

skeeto commented May 6, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment