Skip to content

Instantly share code, notes, and snippets.

View robieta's full-sized avatar

Taylor Robie robieta

  • Lightning AI
  • Menlo Park, Ca
View GitHub Profile
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import collections
import timeit
import numpy as np
import torch
def loop_expand(values, repeats):
output = []
for v, r in zip(values, repeats):
Percent of index_select baseline. Lower is better.
[0] use gather (SmallVector)
[1] use gather (std::vector)
[2] sharded loop (shard_size = 2048)
Quadratic spacing. (Sparse)
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Size (output) | Baseline (us) [0] [1] [2]
841 | 8.8 61.1% 60.7% 30.4%
2430 | 18.2 35.0% 34.2% 24.3%
Percent of index_select baseline. Lower is better.
[0] use gather (SmallVector)
[1] use gather (std::vector)
[2] sharded loop (shard_size = 2048)
Quadratic spacing. (Sparse)
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Size (output) | Baseline (us) [0] [1] [2]
841 | 4.8 115.8% 113.9% 46.7%
2430 | 5.7 119.4% 117.3% 88.9%
import collections
import json
import sys
import time
import timeit
import numpy as np
import torch
torch.set_num_threads(1)
#!/bin/bash
set -e
source ~/miniconda3/etc/profile.d/conda.sh
RESULTS="/tmp/${USER}/results.txt"
> ${RESULTS}
measure () {
local conda_env=$1
conda activate ${conda_env}
Improved (>10%): 188 (17%)
Regressed (>10%): 344 (32%)
Within 10%: 553 (51%)
Improvement Absolute | dtype numel mask_reuse mask_true_pct x_layout mask_layout
==================================================================================================================
-98% 1.6 us | float64 33 33 67% contiguous contiguous
-97% 1.6 us | float32 33 33 67% contiguous contiguous
-97% 1.4 us | int8 10 1 92% contiguous contiguous
-91% 1.5 us | float32 6 6 36% contiguous contiguous
gpu: 128 / 128
27 samples were culled, 1893 remain
========================================
== GPU =================================
========================================
Improved (>5%): 462 ( 24%)
Regressed (>5%): 266 ( 14%)
Within 5%: 1165 ( 62%)
gpu: 128 / 128 cpu: 128 / 128
814 samples were culled, 1746 remain
========================================
== CPU =================================
========================================
Improved (>5%): 36 ( 5%)
Regressed (>5%): 626 ( 93%)
Within 5%: 8 ( 1%)
gpu: 128 / 128 cpu: 128 / 128
1066 samples were culled, 2774 remain
========================================
== CPU =================================
========================================
Improved (>5%): 385 ( 40%)
Regressed (>5%): 294 ( 31%)
Within 5%: 272 ( 29%)