Skip to content

Instantly share code, notes, and snippets.

@jbush001
jbush001 / QuartusVerilatorError.md
Last active July 18, 2020 20:25
Things that Quartus flags as an error that Verilator does not

(Tested with Quartus 16 and Verilator 3.912)

  • Assign a value to an enumerated type without specifying width:

    typedef enum logic[3:0] {
       FOO = 0,
       BAR = 1
    } my_enum_t;
function cubic_bezier(control_points, steps) = [
for (t = [0 : 1 / steps : 1]) [
for (i = [0:1])
pow(1 - t, 3) * control_points[0][i] + 3 * pow(1 - t, 2) * t * control_points[1][i]
+ 3 * (1 - t) * pow(t, 2) * control_points[2][i] + pow(t, 3) * control_points[3][i]
]
];
function bezier_path(points, steps) = [

Problem Description

Currently, the primitive for inter-thread/core synchronization is a spinlock, which is supported on this processor using the sync_load and sync_store instructions. This requires busy waiting, as the processor checks the variable in a tight loop until it changes.

On a multithreaded processor, this steals cycles that could be used by other threads. In the worst case, multiple threads may be waiting for a lock held by another thread on the same core, which slows it down and increases contention.

Problem Description

The current Nyuzi TLB implementation caches virtual->physical translations on a page granularity, where pages are fixed size at 4k. However, the performance of programs that touch a large area of memory can be limited by the overhead of handling TLB misses. A way to mitigate this is to allow mapping pages that are larger than 4k, usually a power-of-two multiple like 4MB. Example use cases include mapping a physical memory alias into the kernel and mapping a graphics framebuffer (which is often contiguous).

Implementation

Although the TLB implementation is software managed and thus technically could use any encoding for page translations,

Problem Description

Writes to addresses at high physical memory addresses, instead of going through the normal cache hierachy, use a special I/O bus. While this is useful for relatively low-speed peripherals, it has performance limitiations when used with coprocessors such as texture fetch units:

  • It can only write a single, 32-bit word at a time. For vectorized compute code, this requires 32 instructions to copy a vector value into our out of it (getlane/write).
  • It takes a rollback for every read or write, since the I/O bus is shared by all cores and transactions need to globally arbitrate
@jbush001
jbush001 / conventions.md
Last active June 4, 2017 22:27
Documentation Conventions

Prefer present tense

No:

This function will update the value

Yes:

This function updates the value

Motivation: To be consistent and avoid switching tenses. Present tense is slightly more concise to write. It also avoids implying something hasn't been implemented yet.

@jbush001
jbush001 / testfloat.txt
Last active May 28, 2017 22:55
Generating floating point test cases with Berkeley testfloat
git clone https://github.com/ucb-bar/berkeley-testfloat-3.git
git clone https://github.com/ucb-bar/berkeley-softfloat-3.git
cd berkeley-softfloat-3/build/Linux-386-GCC/
make
cd ../../../berkeley-testfloat-3/build/Linux-386-GCC/
make
./testfloat_gen -precision32 f32 2 f32_add | awk '{ print "{ FADD, 0x" $1 ", 0x" $2 ", 0x" $3 " }," }' > test_cases.inc
./testfloat_gen -precision32 f32 2 f32_sub | awk '{ print "{ FSUB, 0x" $1 ", 0x" $2 ", 0x" $3 " }," }' >> test_cases.inc
./testfloat_gen -precision32 f32 2 f32_mul | awk '{ print "{ FMUL, 0x" $1 ", 0x" $2 ", 0x" $3 " }," }' >> test_cases.inc
./testfloat_gen -precision32 f32 f32_to_i32 | awk '{ print "{ ITOF, 0x" $1 ", 0x" $2 ", 0x" $3 " }," }' >> test_cases.inc
@jbush001
jbush001 / mktorus.py
Created May 14, 2017 13:41
Create 3D model of a Torus
#!/usr/bin/env python
#
# Copyright 2011-2015 Jeff Bush
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
@jbush001
jbush001 / jekll_html2text_wrapper.py
Created January 1, 2017 01:00
Convert HTML to markup for Jekyll
#
# The Jekyll import tool (http://import.jekyllrb.com/docs/blogger/)
# creates HTML files. I'd like to use html2text
# (https://github.com/aaronsw/html2text) to convert those to Markdown.
# The challenge is that Jekyll files have a YAML header at the top that gets
# mangled by the conversion. This strips the header, passes the remainder
# of the body into html2text, then adds the header back to the result.
#
import os
@jbush001
jbush001 / import_blogger_images.py
Created December 31, 2016 16:48
Import blogger images into Jekyll
# Jekyll will import posts from Blogger, but they still contain image
# references to Blogger's CDN. This script:
# - Finds all image references in an imported blogger page
# - Downloads the images into the assets/ directory
# - Rewrites the page with the appropriate image link
import re
import sys
import urllib