Skip to content

Instantly share code, notes, and snippets.

@mmozeiko
Last active March 8, 2026 18:47
Show Gist options
  • Select an option

  • Save mmozeiko/81e9c0253cc724638947a53b826888e9 to your computer and use it in GitHub Desktop.

Select an option

Save mmozeiko/81e9c0253cc724638947a53b826888e9 to your computer and use it in GitHub Desktop.
How to avoid linking to CRT with MSVC in modern way

How to avoid linking to CRT with MSVC in modern way

Compile & link like this:

cl.exe main.c /nologo /W3 /WX /O2 /GS- /link /fixed /incremental:no /opt:icf /opt:ref /subsystem:windows libvcruntime.lib

and put your code in:

  • int WinMainCRTStartup() if you use /subsystem:windows
  • int mainCRTStartup() if you use /subsystem:console

For C++ code add extern "C" in front of these functions.

Instead of relying on various hacks and manually declaring implicit symbols for functions or globals that compiler needs, just let it use them from libvcruntime.lib - there are very few things it will take from there. Linker will take only needed symbols, and remove unreferenced ones (due to /opt:ref).

Explanation of arguments:

  • /nologo - do not display cl.exe/link.exe headers in output, less garbage to look at
  • /W3 /WX - enable warning as errors, to better catch errors in your code
  • /O2 - enables optimizations, can use /O1 for fewer optimizations, but smaller size
  • /GS- - prevents adding buffer check code that requires extra CRT runtime functionality
  • /fixed - does not add relocation section to .exe file, it's not really needed for .exe (so you get a bit smaller size), but do NOT use this when building .dll files, those need relocations
  • /incremental:no - does not generate extra code needed for incremental linking
  • /opt:icf - perform identical code folding, in case multiple functions generates identical code bytes merge them into one copy in output binary
  • /opt:ref - remove unreferenced functions/globals
  • /subsystem:windows - create "gui" executable without console attached

Don't forget to add any extra .lib files you might need - like kernel32.lib, user32.lib, etc...

Details

cl.exe likes to generate calls to memcpy and memset when you initialize & use larger arrays or structures. It also expects _chkstk function when local variables on stack exceed page size (4KB). Also it expects to have initialized global _fltused variable when you use floating point due to some obsolete reasons. All of this can be handled manually in your code, but instead you can simply allow linker to take code for this from CRT (libvcruntime.lib) and not deal with it at all. All of these functions are small and standalone, they won't pull in rest of the CRT. As long as you don't call any CRT function, there won't be other CRT code called than these few implicit calls to memcpy/memset/_chkstk functions.

Limitations

There are limited set of headers you can include and use functionality from them. Few common ones that are OK to use:

  • stddef.h - if you want size_t and NULL
  • stdint.h - various u/intXX_t typedefs
  • stdarg.h - va_arg, va_start, va_end, va_arg things
  • intrin.h and few other headers for intrinsic functions (rdtsc, cpuid, SSE, SSE2, etc..)

Because there will be no CRT code now that initializes on startup, you cannot use multiple things from compiler that depends on this global initialization, for example:

  • no thread local storage (TLS)
  • no global constructors & destructors in C++
  • no RTTI in C++
  • no pure virtual member functions in C++
  • no exceptions

It would be possible to add extra code to handle these, but alternative is simply to manually handle these things - TlsAlloc/TlsGetValue & other functions for TLS, manually calling functions for initialization instead of global constructors, etc..

Debug builds & address sanitizer

Unfortunately to use address sanitizer you should use normal CRT runtime. Without proper startup it will not catch all the errors only some of them. So to run properly with debug CRT, use the normal entry point main or WinMain in your code, and compile as:

cl.exe main.c /nologo /W3 /WX /MTd /Od /Zi /RTC1 /fsanitize=address /link /incremental:no /subsystem:windows

The different arguments from before:

  • /MTd - request debug runtime
  • /Od - disables optimizations
  • /Zi - enables debug info generation (automatically passes /debug to linker to create .pdb file)
  • /RTC1 - enables extra run-time error checks for simple variables/arrays on stack
  • /fsanitize=address - enables address sanitizer

No additional libraries needed, because compiler will automatically select from /MTd argument.

Formatting strings, the easy way

Sometimes you want to use snprintf or similar C formatting functions. There are various options for this:

  • use wsprintfA/W from user32.dll - limits functionality, max 1024 output, no floats, and no C99 formatters (like %zu)
  • use wnsprintfA/W from shlwapi.dll - similar limits to wsprintfA/W as above
  • use snprintfA/W from msvcrt.dll - this dll is present on Windows since forever, but it does not support C99 formatters
  • use stb_sprintf.h or c99-snprintf (alternative location) or nanoprintf.h standalone single-file libraries, just adds a bit more code to your executable
  • use stdio functions from Universal CRT dll

In UCRT case we can allow linker to link to dynamic UCRT dll file, which is always present on Windows since Windows 10 version. If you don't need to support older than Win10 then you get all the goodies like c99 formatters for free. Just add ucrt.lib to linker arguments and you can call sprintf and similar functions. Same as above - it won't add much of CRT to your executable, just one function call which takes no space at all.

In general you can use extra functions from UCRT if you don't feel like implementing them yourself. For example, sinf or cosf from math.h - they will be linked to UCRT .dll files. Be careful about debug dll runtime - it may not work if you don't use proper entry point with regular CRT runtime setup (see section above debug builds)

Example

#include <windows.h>
#include <stdio.h>

#pragma comment (lib, "kernel32")
#pragma comment (lib, "user32")

int WinMainCRTStartup()
{
	char str[8192];
	snprintf(str, sizeof(str), "Hello %s!\n", "World");
	MessageBoxA(0, str, "Example", 0);
	return 0;
}

Run the following:

cl.exe main.c /nologo /W3 /WX /O2 /GS- /link /fixed /incremental:no /opt:icf /opt:ref /subsystem:windows libvcruntime.lib ucrt.lib

It will produce ~3KB exe with only two import functions:

  • __stdio_common_vsprintf from api-ms-win-crt-stdio-l1-1-0.dll
  • MessageBoxA from user32.dll

You can use dumpbin /nologo /imports main.exe to check for this.

Bonus round - clang-cl

If you're using clang-cl compiler, you can add extra /clang:-fno-asynchronous-unwind-tables argument to omit generation of unwind tables in optimized binaries. This will drop few extra KBs off your executable size. But it will prevent debugger working properly in your code - also no exceptions, no profiler call stacks & other tools that use call stack unwinding.

@uucidl
Copy link

uucidl commented Jan 29, 2025

With the libvcruntime.lib solution, I'm getting linking errors due to memset and memcpy needing these symbols: __favor and __isa_available to chose optimized implementations!

@mmozeiko
Copy link
Author

Yeah, nowadays memcpy/set have avx optimized versions and they expect CRT startup initialize to know which avx or sse they can use.
You can make it work by linking to libcmt.lib and calling void __isa_available_init(); extern function at start of your program. But it will bring more code in.

If you really want to optimize for size, then you can declare your own memcpy/set functions and redirect them to kernel/ntdll routines:

#undef RtlFillMemory
#undef RtlMoveMemory
DECLSPEC_IMPORT void WINAPI RtlFillMemory(void* Destination, size_t Length, char Fill);
DECLSPEC_IMPORT void WINAPI RtlMoveMemory(void* Destination, const void* Source, size_t Length);

#pragma function (memset)
void* memset(void* dst, int value, size_t count)
{
    RtlFillMemory(dst, count, value);
    return dst;
}

#pragma function (memcpy)
void* memcpy(void* dst, const void* src, size_t count)
{
    RtlMoveMemory(dst, src, count);
    return dst;
}

But they won't be as fast as MSVC crt memcpy. And this will prevent code to be compiled with /GL argument for link time optimizations.

Another option is to use movsb/stosb intrinsics:

#include <intrin.h>

#pragma function (memset)
void* memset(void* dst, int value, size_t count)
{
    __stosb(dst, value, count);
    return dst;
}

#pragma function (memcpy)
void* memcpy(void* dst, const void* src, size_t count)
{
    __movsb(dst, src, count);
    return dst;
}

Similar caveats.

Third option, which probably is the best, produce import library for memcpy/set functions for ucrtbase.dll file and link to that .lib.

@uucidl
Copy link

uucidl commented Jan 31, 2025

My use case was trying to produce an executable that's as compatible as can be across various versions of Windows and x64 CPUs, so I actually exported __favor and __isa_available symbols, setting them to the base ISA. That way I get non optimized memcpy but that's alright for my use case!

@joakimskoog
Copy link

joakimskoog commented Jun 11, 2025

Out of curiosity: Why aren't you using /nodefaultlib? Is it because you pass libvcruntime and use that instead of the old hacks?

@mmozeiko
Copy link
Author

Because that prevents using #pragma comment (lib, "user32") in your code. Which I like to use.

/nodefaultlib does not really matter for this way because linker will find all the symbols in .lib files passed in arguments. So it won't care about code in "default" libs.

@joakimskoog
Copy link

joakimskoog commented Jun 11, 2025

Yeah I ran into that issue as well when adding /NODEFAULTLIB to my build script. I assumed the flag would act as a guard to prevent you from accidentally adding CRT stuff to your code, like malloc for example, or doing stuff that makes the compiler try to sneak CRT stuff in?

But you're saying I can skip it as long as I'm careful?

@mmozeiko
Copy link
Author

/nodefaultlib won't help you catch accidental malloc or similar imports if you link with ucrt.lib. Only if you avoid ucrt completely then it can help. Nowadays I prefer linking with ucrt.lib and not worry much about it, as ucrt is present by default on all windows'es you realistically want to ship for.

@joakimskoog
Copy link

Ah, now I understand! I'm trying to avoid ucrt hence the confusion. Thanks for taking the time to explain, I really appreciate it 👍

@nedclimax
Copy link

It seems that adding /merge:.CRT=.rdata to the linker flags makes thread local storage work properly when not linking with the CRT

@mmozeiko
Copy link
Author

Compiler TLS will not work properly without using CRT or at least replacing same code as CRT does to initialize TLS. You can check how to implement in CRT source code that ships with MSVC. I cannot check right now, but if I'm not mistaken that is inside tlssup.cpp file. And for dll's it was in couple other tls*.cpp files.

You can always ignore that and just use OS provided TlsAlloc/Get/SetValue API calls that will work without any CRT support.

@nedclimax
Copy link

nedclimax commented Jul 1, 2025

It actually seems that compiler TLS only works when using POD (Plain-Old-Data) structs, since that part is handled by the OS loader and not the C runtime. It's just that when you declare something with a constructor you actually need the C runtime to call the constructor.

So with C (where all structs are POD) you can actually not link with the C runtime and use TLS because the OS already initializes it for you. It's when using C++ that you have to be careful to not put an object with a constructor at the global scope.

I have tested this myself and put the code here: https://github.com/nedclimax/tls_test

@fxrstor
Copy link

fxrstor commented Aug 8, 2025

@mmozeiko What if I'm using Visual Studio 2022? How would I do this? Add to linker options? Also you're not specifying /NODEFALTLIB so how can you say that CRT is not being linked? (I just can't make sense).

@mmozeiko
Copy link
Author

mmozeiko commented Aug 8, 2025

There is nothing special about 2022 version of VS. Everything works the same. I don't need to use /nodefaultlib because in optimized builds linker strips out everything that's not used when walking symbols starting from entrypoint. So if you don't call any CRT functionality, you won't get it. Only exceptions are those memcpy/set/_chkstk/_fltused things that compiler implicitly generates.

@purton
Copy link

purton commented Nov 2, 2025

Great info! I wanted to test compiling with Clang with only the Windows SDK installed, so no libvcruntime.lib. After including Windows.h there are errors about missing files and definitions. Those can be easily fixed by adding couple of empty files and a few definitions. Here is an example. After those, a simple application that opens a window already works!

For memset using __stosb or __builtin_memset gives me stack overflow from infinite recursion. But calling stosb from assembly works well! w64devkit/libmemory has assembly versions of memset, memcpy, memmove, memcmp, and strlen. w64devkit/libchkstk has a __chkstk implemention, which will be required when allocating over 4kb on stack. Here is more about the __chkstk implementation, w64devkit is available under public domain.

For math, when using __builtin_sinf, __builtin_cosf they are linked to UCRT (api-ms-win-crt-math-l1-1-0.dll). For snprintf I had to define the prototype for __stdio_common_vsprintf and then that was also correctly linked to UCRT (api-ms-win-crt-stdio-l1-1-0.dll). I couldn't include math.h or stdio.h from UCRT as those would try to include vcruntime.h.

@erziltis
Copy link

It seems like /fsanitize=address and /RTC1 are not compatible. (see msdn)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment