Skip to content

Instantly share code, notes, and snippets.

@steveruizok
Last active March 31, 2023 14:43
Show Gist options
  • Save steveruizok/3d1bf2b95c1bbac9b46658527a2ca717 to your computer and use it in GitHub Desktop.
Save steveruizok/3d1bf2b95c1bbac9b46658527a2ca717 to your computer and use it in GitHub Desktop.
weak map gist
export class Cache<T extends object, K> {
items = new WeakMap<T, K>()
get<P extends T>(item: P, cb: (item: P) => K) {
if (!this.items.has(item)) {
this.items.set(item, cb(item))
}
return this.items.get(item)!
}
access(item: T) {
return this.items.get(item)
}
set(item: T, value: K) {
this.items.set(item, value)
}
has(item: T) {
return this.items.has(item)
}
invalidate(item: T) {
this.items.delete(item)
}
bust() {
this.items = new WeakMap()
}
}
class Example {
getOutline(shape) {
const sides = 5
const ratio = 1
const cx = w / 2
const cy = h / 2
const ix = (cx * ratio) / 2
const iy = (cy * ratio) / 2
const step = PI2 / sides / 2
return Array.from(Array(sides * 2)).map((_, i) => {
const theta = -TAU + i * step
return new Vec2d(
cx + (i % 2 ? ix : cx) * Math.cos(theta),
cy + (i % 2 ? iy : cy) * Math.sin(theta)
)
})
}
outline(shape: T) {
return outlines.get<T>(shape, (shape) => this.getOutline(shape))
}
@chenglou
Copy link

chenglou commented Sep 14, 2022

Tldr modern performance is mostly about shuffling your memory into the right representation*. Which is a nice endeavor because starting from data structures is important anyway.

But yes, if it's not in the hot path, then don't bother! Though there might be an argument that this is the better data layout even just for ergonomics. I wouldn't know that without trying perfect-freehand though

* again, my rough approximation is that if you're doing the sort of optimizations that increases compute at the expense of memory, by e.g. caching or whatever, then you might have already lost, especially in latency

@steveruizok
Copy link
Author

Quick update—this is the fastest that I've gotten it so far. Surprisingly the destructuring trick is fast!

function pointsOnEllipse4(cx, cy, rx, ry, n) {
  const xs = new Float32Array(n);
  const ys = new Float32Array(n);

  const step = Math.PI / (n / 2);
  let sin = 0;
  let cos = 1;

  const a = Math.cos(step);
  const b = Math.sin(step);

  for (let i = 0; i < n; i++) {
    xs[i] = cx + rx * cos;
    ys[i] = cy + ry * sin;
    ;[sin, cos] = [b * cos + a * sin, a * cos - b * sin];
  }

  return { xs, ys }
}

@chenglou
Copy link

chenglou commented Sep 14, 2022

I wouldn't pay too much attention to that destructuring, which is more microbench-y really. It invokes array iterator, but here the engine clearly sees there's nothing to be invoked. Won't necessarily be true if the code shifts around. Plus, readability. Aim for simple and fast code that looks like C (this has been a solid advice for the past 20 years no matter what magic JIT has done); C doesn't have a JIT, so coding while pretending there's no JIT helps too.

Regarding Float32Array: they might be faster, but the reason is likely that they're 32-bits instead of 64-bits. So yeah here again it's about having a smaller memory footprint. But on the other hand, you've now traded off your number precision (that we just fixed) that you might need in the future. Imo this is especially relevant for SVGs (as opposed to raster graphics and 3D), since e.g. if you have 2 same shapes with different colors perfectly on top of each other, you should see a single outline, not some colors from the background shape bleeding out (aka numerical imprecision). I don't know though. Just a general observation.

@chenglou
Copy link

chenglou commented Sep 15, 2022

The structure-of-array transform we did is fast for 2 reasons:

  • Way fewer tiny allocations. Tiny objects are still relatively fast to create because they're are usually allocated around the same places ("arenas"), so, good memory locality, thus perf, and so their drawback of being more allocate-y doesn't show up as much. But when readability is the same, I try not to rely on that assumption too much and just do the predictable less allocate-y thing. Plus there's also deallocation: GC pauses etc.
  • The transform makes iteration and branch predictor very happy. Iterating over a plain array of numbers is the best-case scenario for modern CPUs (and GPUs, and eh, TPUs or whatever comes next I guess. This will always be relevant).

So that's why you see that 25% perf boost at basically no cost.

One last note on your initial problem: I think if we have to cache the outlines, we should at least lift the cache to a higher level, to a single big one. Having so many little disparate caches is a bit scary.

Update: final follow-up tweet at https://twitter.com/_chenglou/status/1571430079829528577

@metaTora
Copy link

Verify Github on Galxe. gid:YZP8XDSaNU8GruWpvhJpFZ

@robpalme
Copy link

It's mentioned that the cache is slow. This is to be expected when using a WeakMap.

Was inverting the cache considered? Meaning storing the x,y cached properties inside each associated original shape (either as public or private fields). This ensures there is no increase in the count of objects and makes cache access super fast. I am wondering if fast caching changes the balance here.

@chenglou
Copy link

Good point. For Steve's use-case it seems storing the entire outline array on the instance every time the shape transforms, might be viable. It does nothing for latency and uses more memory, but still, seems better than a WeakMap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment