NOTE: This is a question I found on StackOverflow which I’ve archived here, because the answer is so effing phenomenal.
If you are not into long explanations, see [Paolo Bergantino’s answer][2].
""" | |
SmoothQuant implementation. See: https://arxiv.org/pdf/2211.10438.pdf | |
Some details are model-specific, so the code may need tweaking. | |
""" | |
import functools | |
import torch | |
from torch import nn, Tensor | |
from typing import Dict, Iterable, Tuple | |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
def convert_tb_data(root_dir, sort_by=None): | |
"""Convert local TensorBoard data into Pandas DataFrame. | |
Function takes the root directory path and recursively parses | |
all events data. | |
If the `sort_by` value is provided then it will use that column | |
to sort values; typically `wall_time` or `step`. | |
*Note* that the whole data is converted into a DataFrame. | |
Depending on the data size this might take a while. If it takes |
# encoding: utf-8 | |
import bokeh.models as bkm | |
import bokeh.core as bkc | |
from bokeh.util.compiler import JavaScript | |
class AudioPlayerModel(bkm.layouts.Column): | |
""" | |
Audio player using https://howlerjs.com/. |
NOTE: This is a question I found on StackOverflow which I’ve archived here, because the answer is so effing phenomenal.
If you are not into long explanations, see [Paolo Bergantino’s answer][2].
Latency Comparison Numbers (~2012) | |
---------------------------------- | |
L1 cache reference 0.5 ns | |
Branch mispredict 5 ns | |
L2 cache reference 7 ns 14x L1 cache | |
Mutex lock/unlock 25 ns | |
Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
Compress 1K bytes with Zippy 3,000 ns 3 us | |
Send 1K bytes over 1 Gbps network 10,000 ns 10 us | |
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD |