Skip to content

Instantly share code, notes, and snippets.

@dsharletg
Last active August 18, 2020 03:56
Show Gist options
  • Save dsharletg/cd3c2cd28a4cabd9a53b0c2e4d9b6eff to your computer and use it in GitHub Desktop.
Save dsharletg/cd3c2cd28a4cabd9a53b0c2e4d9b6eff to your computer and use it in GitHub Desktop.
// Compute the sum of f in [0, extent_x) x [0, extent_y)
Func s("s");
RDom r(0, extent_x, 0, extent_y);
s() += f(r.x, r.y);
// Schedule: rfactor the reduction in to vector_width x 4 tiles.
// This gives 4-way instruction level parallelism with SIMD.
// Requires that the extents are a multiple of the tile size.
RVar rxo, rxi, ryo, ryi;
Var x, y;
s.compute_root().update(0)
.tile(r.x, r.y, rxo, ryo, rxi, ryi,
target.natural_vector_size<float>(), 4)
.rfactor({{rxi, x}, {ryi, y}})
.compute_at(s, Var::outermost())
.vectorize(x)
.update(0)
.vectorize(x)
.unroll(y);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment