Skip to content

Instantly share code, notes, and snippets.

why doesn't radfft support AVX on PC?

So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.

Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.

[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]

The other issue is to do with CPU power management.

@acolyer
acolyer / jessfraz.md
Created November 19, 2017 13:39
Containers, operating systems and other fun things from The Morning Paper

FWIW: I (@rondy) am not the creator of the content shared here, which is an excerpt from Edmond Lau's book. I simply copied and pasted it from another location and saved it as a personal note, before it gained popularity on news.ycombinator.com. Unfortunately, I cannot recall the exact origin of the original source, nor was I able to find the author's name, so I am can't provide the appropriate credits.


Effective Engineer - Notes

What's an Effective Engineer?

@harlow
harlow / golang_job_queue.md
Last active April 24, 2024 10:21
Job queues in Golang
@thebucknerlife
thebucknerlife / authentication_with_bcrypt_in_rails_4.md
Last active January 17, 2024 23:54
Simple Authentication in Rail 4 Using Bcrypt

#Simple Authentication with Bcrypt

This tutorial is for adding authentication to a vanilla Ruby on Rails app using Bcrypt and has_secure_password.

The steps below are based on Ryan Bates's approach from Railscast #250 Authentication from Scratch (revised).

You can see the final source code here: repo. I began with a stock rails app using rails new gif_vault

##Steps

@1wErt3r
1wErt3r / SMBDIS.ASM
Created November 9, 2012 22:27
A Comprehensive Super Mario Bros. Disassembly
;SMBDIS.ASM - A COMPREHENSIVE SUPER MARIO BROS. DISASSEMBLY
;by doppelganger (doppelheathen@gmail.com)
;This file is provided for your own use as-is. It will require the character rom data
;and an iNES file header to get it to work.
;There are so many people I have to thank for this, that taking all the credit for
;myself would be an unforgivable act of arrogance. Without their help this would
;probably not be possible. So I thank all the peeps in the nesdev scene whose insight into
;the 6502 and the NES helped me learn how it works (you guys know who you are, there's no
@hellerbarde
hellerbarde / latency.markdown
Created May 31, 2012 13:16 — forked from jboner/latency.txt
Latency numbers every programmer should know

Latency numbers every programmer should know

L1 cache reference ......................... 0.5 ns
Branch mispredict ............................ 5 ns
L2 cache reference ........................... 7 ns
Mutex lock/unlock ........................... 25 ns
Main memory reference ...................... 100 ns             
Compress 1K bytes with Zippy ............. 3,000 ns  =   3 µs
Send 2K bytes over 1 Gbps network ....... 20,000 ns  =  20 µs
SSD random read ........................ 150,000 ns  = 150 µs

Read 1 MB sequentially from memory ..... 250,000 ns = 250 µs

@jboner
jboner / latency.txt
Last active June 8, 2024 14:56
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD