Skip to content

Instantly share code, notes, and snippets.

View sophiawisdom's full-sized avatar
🥰

Sophia Wisdom sophiawisdom

🥰
View GitHub Profile
* Many different layers of hierarchy -- 128 SMs/GPU, 4 SMSPs/SM, 32 threads/lanes per SMSP
* Coordination is cheaper the lower you go in the hierarchy, but you can scale more the higher you go, so key to think about how to trade these off
* Lots of registers
* Many functional units, each can operate independently
* Combine these: you can do super-hyperthreading, which gets you great utilization of your at the cost of latency
* Every (vector) operation happens in units of 32
* Doing the same thing/vectorization/loss of control matters but only on the scale of 32 things -- it's free to have different cores to do different things
* Memory coalescing is an important optimization
density, not locality
@sophiawisdom
sophiawisdom / index.html
Last active July 12, 2024 02:20
Screenshot hoster. Download these two files, put your screenshots in the same directory, run lister.py, and you have a fun website that shows all your screenshots.
<html>
<title> vibe images <!-- change this to whatever you want! --> </title>
<!-- Add any text here if you so desire; the images won't be drawn over it. -->
<div id="container">
</div>
<script>
function shuffleArray(array) {
for (let i = array.length - 1; i > 0; i--) {
const j = Math.floor(Math.random() * (i + 1));
422YpCbCr8: False
AV1: False
Animation: False
AppleProRes422: True
AppleProRes422HQ: True
AppleProRes422LT: True
AppleProRes422Proxy: True
AppleProRes4444: True
AppleProRes4444XQ: True
AppleProResRAW: True
@sophiawisdom
sophiawisdom / hw_accel.py
Last active March 17, 2024 13:30
First, `pip3 install PyObjc`, then run this script!
import VideoToolbox
for variable in dir(VideoToolbox):
# https://developer.apple.com/documentation/coremedia/1564239-video_codec_constants?language=objc
if not variable.startswith("kCMVideoCodecType_"): continue
num = getattr(VideoToolbox, variable)
# https://twitter.com/never_released/status/1769353248484258158
VideoToolbox.VTRegisterSupplementalVideoDecoderIfAvailable(num)
# https://developer.apple.com/documentation/videotoolbox/2887343-vtishardwaredecodesupported?language=objc
print(f"{variable[18:]}:\t{VideoToolbox.VTIsHardwareDecodeSupported(num)}")
[Wilson] #Texans center Scott Quessenberry has torn his ACL and MCL, per a league source
https://twitter.com/AaronWilson_NFL/status/1687188173451870209?s=20
/u/BungoPlease
Good thing we drafted two Centers lol
/u/JPAnalyst
Between all of your centers, you still have 5 ACLs and 5 MCLs remaining...for a total of 10 CLs. That’s probably more than most teams.
/u/BungoPlease
This is the kind of hard hitting analysis that makes me love you
/u/atworkjohnny
He says what we're all thinking.
// I'm assuming these are 8 bytes for the moment because C doesn't have generic functions.
struct list {
int capacity;
int start_i;
int end_i;
}
struct list *allocate(int size) {
int list_size = size * sizeof(void *);
struct list *stuff = malloc(list_size + sizeof(struct list));
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#define NUM_RUNS 100000
int simulate(int iterations, int bankroll, int bet, int increment) {
int rand;
Analysis of sampling speed_test (pid 23830) every 1 millisecond
Process: speed_test [23830]
Path: /Users/USER/Library/Developer/Xcode/DerivedData/Stimulator-byqvmoledzphfpbaqwnyahjuqrgn/Build/Products/Release/speed_test
Load Address: 0x100c30000
Identifier: speed_test
Version: 0
Code Type: X86-64
Parent Process: Xcode [22998]
Date/Time: 2021-08-27 22:33:24.213 -0700
Need a more recent version of /usr/lib/system/introspection/libdispatch.dylib -- missing suppport for the queue_item_complete hook function.
Location is /var/root/Library/Developer/Xcode/DerivedData/mods-hiqpvfikerrvwrbgoskpjqwmglif/Build/Products/Debug/daemon
Input PID to pause: 36907
Injecting into PID 36907
Magic number is -17958193
got a command of type 25
got a command of type 25
got a command of type 25
got a command of type 25