Skip to content

Instantly share code, notes, and snippets.

View camel-cdr's full-sized avatar

Camel Coder camel-cdr

  • 08:27 (UTC +02:00)
View GitHub Profile
@camel-cdr
camel-cdr / Dockerfile
Created August 27, 2023 08:10
Simulating tenstorent ocelot (now bobcat?) rvv 1.0 core based on SonicBOOM
FROM continuumio/miniconda3
RUN apt-get update \
&& apt-get install -y build-essential wget git unzip python3 sudo file python3-vcstools libboost-dev vim cpio binutils \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
RUN conda install conda-lock=1.4
RUN git clone https://github.com/tenstorrent/chipyard/
@camel-cdr
camel-cdr / README.md
Last active May 5, 2024 14:03
Implement LMUL=8 vcompress.vm using existing LMUL=1 RVV primitives

The implementation complexity of vcompress.vm for large vector length and higher LMUL has been somewhat debated.

Existing implementations exhibit very poor scaling when dealing with larger operands:

VLEN e8m1 e8m2 e8m4 e8m8
c906 128 4 10 32 136
c908 128 4 10 32 139.4
c920 128 0.5 2.4 5.4 20.0
bobcat 256 32 64 132 260
@camel-cdr
camel-cdr / README.md
Created May 21, 2024 16:24
RISC-V benchmark: spilling GPRs to different locations

cycles for 128 iterations of a spilling function (see complex_reduction):

                  XiangShan            XuanTie C908           SpacemiT X60
             5 spills | 14 spills | 5 spills | 14 spills | 5 spills | 14 spills
stack:           2309 |      3439 |     6898 |     18220 |     6693 |     17734
fp:              3193 |      7037 |     8483 |     32325 |     8248 |     31434
rvv_best:        3210 |      7095 |     8459 |     32343 |     8250 |     31448
rvv_zvl128b:      N/A |      7837 |     9532 |     36685 |     9290 |     35550
rvv_worst_merge: 4572 |     23013 |    12042 |     50894 |    11722 |     49232
rvv_worst_slide: N/A | 36385 | 12975 | 55166 | 12379 | 53113