Skip to content

Instantly share code, notes, and snippets.

View RestartToCursorCommand.cs
// This is "restart to cursor", a complement to VS's "run to cursor" which will first restart (and recompile if you have
// it set up like that) the currently running program (if any) and then execute the "run to cursor" command.
// To make an extension on your own, create a C# VSIX template, then Add Item -> Extensibility -> Command and replace the
// file with the contents of this Gist. I bind it to Ctrl-Shift-F10 since Ctrl-F10 is "run to cursor".
using EnvDTE80;
using EnvDTE;
using Microsoft.VisualStudio.Shell;
using Microsoft.VisualStudio.Shell.Interop;
using System;
pervognsen /
Last active Jan 9, 2022
My Visual Studio setup for C++ development

I recently upgraded to Windows 11 and had to set up Visual Studio C++ again. A few things have changed in Windows 11 with the tiled window management and deeper Windows Terminal integration, so I ended up revisiting my usual VS setup and made a bunch of changes. I'm documenting them here for my own future benefit (yes, I do know about VS import/export settings) and on the off chance that anyone else might get some ideas for their own setup.

It should be mentioned that I'm a single monitor user (multimon gives me neck pain as I've gotten older) and the monitor I currently use is 25". A laptop screen can't really accommodate the 2x2 full screen layout I'm using as my default here, so when I'm on a laptop I only use the vertical layout out of necessity.


x64 addss:

VADDSS (EVEX encoded versions)
    IF (EVEX.b = 1) AND SRC2 *is a register*

IF k1[0] or no writemask

View multistep.asm
multistep: # @multistep
push r15
push r14
push r13
push r12
push rbx
mov rcx, qword ptr [rsi]
mov r10, qword ptr [rsi + 8]
mov rdx, qword ptr [rsi + 24]
mov r11, qword ptr [rsi + 32]
View puzzle.c
// For N = 11, a call to perms(p, N, 0) only takes 0.36 seconds.
// Why would a debugger seemingly hang forever when you try to
// step over the outermost recursive call to perms in the loop?
// Hint: How is "step over" ("next" in gdb/lldb) implemented?
u64 perms(u64 p[N], int n, u64 s) {
if (n == 1) {
s += signature(p);
} else {
for (int i = 0; i < n; i++) {
View rh_grow.c
// This can grow a Robin Hood linear probing hash table near word-at-a-time memcpy speeds. If you're confused why I use 'keys'
// to describe the hash values, it's because my favorite perspective on Robin Hood (which I learned from Paul Khuong)
// is that it's just a sorted gap array which is MSB bucketed and insertion sorted per chain:
// The more widely known "max displacement" picture of Robin Hood hashing also has strengths since the max displacement
// can be stored very compactly. You can see a micro-optimized example of that here for small tables where the max displacement
// can fit in 4 bits: Sub-nanosecond Searches Using Vector Instructions,
void grow(Table *table) {
u64 exp = 64 - table->shift;
// We grow the table downward in place by a factor of 2 (not counting the overflow area at table->end).
View tagged_prefetched_eval.c
u32 index(Ref r) {
return r >> 4;
u32 tag(Ref r) {
return r & 15;
View value_speculation.c
// Estimating CPU frequency...
// CPU frequency: 4.52 GHz
// sum1: value = 15182118497126522709, 0.31 secs, 5.14 cycles/elem
// sum2: value = 15182118497126522709, 0.17 secs, 2.93 cycles/elem
#define RW(x) asm("" : "+r"(x))
typedef struct Node {
u64 value;
struct Node *next;
View mispredict_penalty.c
#define NOP ({ int x; asm("nop" : "=r"(x)); asm("" : : "r"(x)); })
double calc_mispredict_penalty(void) {
i64 n = 1ull << 28;
u64 rnglen = (n + 63) / 64;
u64 *rng = malloc(rnglen * sizeof(u64));
for (u64 i = 0; i < rnglen; i++)
rng[i] = random_u64();
View vm_bench.c
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#if defined(__x86_64__)
#define BREAK asm("int3")
#error Implement macros for your CPU.