In this article I'm going to attempt to explain most of what I know about Windows file paths and also some of the weird DOSisms that keep things interesting.
I'll start with NT kernel paths. These aren't usually used directly from user space but I promise they're important to fully understanding Win32 paths.
In Windows everything is an object. And if the object has a name it can be accessed via the kernel's object manager. The kernel uses paths to query the object manager. These look similar to a UNIX path. For example:
\Device\HarddiskVolume2\directory\file.ext
As you're likely aware, a path is made up of "components", seperated by a \
. Each component represents a directory name or a file name. In NT, components are arrays of UTF-16 code units. Any character except \
(0x005C
) is allowed in component names. Even NULL
(0x0000
) is allowed.
If a directory is opened, kernel APIs allow you to open sub paths based on that directory. For example if you open the directory:
\Device\HarddiskVolume2\directory
You can then open a relative path, like so:
subdir\file.ext
So the absolute path of the file will be:
\Device\HarddiskVolume2\directory\subdir\file.ext
This is the only type of relative path understood by the kernel. In the NT kernel .
and ..
have no special meaning and can be regular files or directories (but almost certainly shouldn't be).
Device paths such as \Device\HarddiskVolume2
are all very well but often you want a more meaningful or consistent name. To this end NT supports symbolically linking from one path to another. Many of these meaningful names will be collected into a single NT folder: \??
.
For example, to access a drive by its GUID you can use:
\??\Volume{a2f2fe4e-fb6b-4442-9244-1342c61c4067}
Or you can use a friendly drive name:
\??\C:
The :
here has no special meaning. It's just part of the symlink name.
While the kernel allows almost anything in component names, filesystems may be more restrictive. For example, an NT path can include a component called C:
but a filesystem may not allow you to create a directory with that name.
Microsoft's filesystem drivers will not allow the following characters in component names:
Disallowed | Description |
---|---|
\ / |
Path seperators |
: |
Dos drive and NTFS file stream seperator |
* ? |
Wildcards |
< > " |
DOS wildcards |
| |
Pipe |
NUL to US |
ASCII control codes; aka Unicode C0 control codes (U+0000 to U+001F inclusive). Note that DEL (U+007F) is allowed. |
Each component in a path is currently limited to 255 UTF-16 code units.
Filesystem paths may or may not be case sensitive. In Windows they are typically case insensitive but this cannot always be assumed. In some circumstances case sensitivity can even differ on a per directory basis.
The above disallowed characters applies to component names but NTFS understands an addtional syntax: file streams. Each file (including directories) can have multiple streams of data. You can address them like so:
file.ext:stream_name
Which is also equivalent to:
file.ext:stream_name:$DATA
The stream name cannot contain a NULL
(0x0000
) or have the characters \
, /
, :
. Like path components, it's limited to 255 UTF-16 code units.
The $DATA
part of the stream identifier is a stream type. Valid types are assigned by Microsoft and always start with a $
. If not specified, the type defaults to $DATA
.
The Win32 API is built as a layer on top of the NT kernel. It implements an API that was originally built for those familiar with Win16 and DOS so it doesn't directly deal with NT paths. Instead it converts Win32 paths to NT paths before calling the kernel.
Essentially Win32 paths are a user-space compatibility layer.
In Windows, all paths are treated as Unicode. However the Win32 API provides convinence functions to automatically convert the system encoding to UTF-16 (and vice versa). This helps to avoid the Mojibake problem by only having one canonical encoding. The UTF-16 conversion happens before everything else so interpreting paths only needs to operate on UTF-16 strings. The rest of this section assumes such a conversion has been done, if necessary.
For caveats and further information see Appendix A.
All absolute paths start with a root. On *nix the root is /
. For the NT kernel it's \
. In contrast, Win32 has four types of root and they're all longer than one character.
C:\
,D:\
,E:\
, etc. The first letter is a (case insensitive) drive letter that can be any ascii letter fromA
toZ
.\\server\share\
whereserver
is the name of the server andshare
is the name of the shared directory. It is used to access a shared directory on a server therefore you must always specifiy both a server name and share name.\\.\
. These are typically used to access devices other than drives or server shares (e.g. named pipes). So they are not usually filesystem paths.\\?\
. These can be used to access any type of device.
The following table shows each type and an example of how the Win32 root is converted to a kernel path.
Type | Win32 path | Kernel path |
---|---|---|
DOS | C:\Windows |
\??\C:\Windows\ |
UNC | \\server\share\ |
\??\UNC\server\share\ |
Device | \\.\PIPE\name |
\??\PIPE\name |
Verbatim | \\?\C:\Windows \\?\UNC\server\share\ \\?\PIPE\name |
\??\C:\Windows \??\UNC\server\share\ \??\PIPE\name |
From the table above it looks like device paths and verbatim paths work the same way. However, that's only because I left off a column: the namespace. The namespace determines what happens to the part of the path after the root.
Type | Namespace | Example |
---|---|---|
DOS | Win32 | C:\Windows |
UNC | Win32 | \\server\share\ |
Device | Win32 | \\.\PIPE\name |
Verbatim | NT | \\?\C:\Windows \\?\UNC\server\share\ \\?\PIPE\name |
The next two sections will explain the effects the namespace has.
Paths in the NT namespace are passed almost directly to the kernel without any transformations or substitutions.
The only Win32 paths in the NT namespace are verbatim paths (i.e. those that start with \\?\
). When converting a verbatim path to a kernel path, all that happens is the root \\?\
is changed to the kernel path \??\
. The rest of the path is left untouched.
Note that this is the only way to use kernel paths in the Win32 API. If you start a path with \??\
or \Device\
then it can have very different results.
This section applies to all Win32 paths except for verbatim paths (those that start with \\?\
).
When converting a Win32 path to a kernel path there are additional transformations and restrictions that are applied to DOS drive paths, UNC paths and Device paths. Some of these transformations are useful while others are an unfortunate holdover from DOS or early Windows.
Win32 namespaced paths are restricted to a length less than 260 UTF-16 code units. This restriction can be lifted on newer versions of Windows 10 but it requires both the user and the application to opt in.
When paths are in this namespace, one of two transformations may happen:
- If the path is a drive or relative path and the file name (the final component without the extension) is a special device name then it will be interpreted as a DOS device path. So
C:\Windows\COM1
gets turned into the kernel path\??\COM1
. See Appendix B for more details. - Otherwise the following transformations are applied:
- First, all occurences of
/
are changed to\
. - All path components consisting of only a single
.
are removed. - A sequence containing more than one
\
is replaced with a single\
. E.g.\\\
is collapsed to\
. - All
..
path components will be removed along with their parent component. The Win32 root (e.g.C:\
,\\server\share
,\\.\
) will never be removed. - If a component name ends with a
.
then the final.
is removed, unless another.
comes before it. Sodir.
becomesdir
butdir..
remains as it is. I'm sure there's a reason for this. - For the filename only (aka the last component), all trailing dots and spaces are stripped.
- First, all occurences of
For example, this:
C:/path////../../../to/.////file.. ..
Is changed to:
C:\to\file
Which becomes the kernel path:
\??\C:\to\file
This transformation all happens without touching the filesystem.
Relative paths are usually resolved relative to the current directory. The current directory is a global mutable value that stores an absolute Win32 path to an existing directory. The current directory only supports DOS drive paths (e.g. C:\
) and UNC paths (e.g. \\server\share
). Using any other path type when setting the current directory is liable to break relative paths therefore verbatim paths (\\?\
) should not be used.
There are three categories of relative Win32 paths.
Type | Examples |
---|---|
Path Relative | file.ext .\file.ext ..\file.ext |
Root Relative | \file.ext |
Drive Relative | D:file.ext |
Although Path Relative forms come in three flavours there are really only two. file.txt
is interpreted exactly the same way as .\file.txt
(see Win32 namespace). However, the .\
prefix can help to avoid ambiguities introduced by drive relative paths.
Drive Relative paths are interpreted as being relative to the specified drive's current directory (note: usually only the command prompt has per drive current directories). Root relative are relative to the root of the current directory.
Drive Relative and Root Relative paths should be avoided whenever possible. Developers and users rarely understand how they're resolved so their results can be surprising. Additionally the Drive Relative paths syntax introduces ambiguity with file streams.
If you would like more detailed descriptions of Windows paths, see these articles:
- Naming Files, Paths, and Namespaces
- File path formats on Windows systems
- The Definitive Guide on Win32 to NT Path Conversion
I've tried to keep this document short(ish) and focused on the most relevant information but in doing so details fell by the wayside. For now I've collected some of them into this appendix.
Internally the Windows NT kernel uses UTF-16 strings. Their definition is conceptually similar to Rust's Vec<u16>
:
struct UnicodeString {
length: u16,
capacity: u16,
buffer: *mut u16,
}
In the Win32 API there are generally two types of strings that applications can choose to use. Both are NULL
terminated.
- Multibyte:
*mut u8
- Wide:
*mut u16
.
Multibyte strings can be in any encoding supported by the OS. Windows will automatically convert to and from a UTF-16 UnicodeString
as needed. If a Multibyte string contains bytes that are invalid for that encoding then they may be replaced when converting to UTF-16.
Recent versions of Windows also have the UTF-8 local encoding which, like other local encodings, is lossily converted to and from UTF-16.
Wide strings are UTF-16 and are put into a UnicodeString
struct without being checked, except to get the length. This means that, unlike Rust's String
, Windows does not check if a wide string is valid UTF-16. So it's possible for malicious applications to create file names with isolated surrogates (i.e. invalid Unicode).
In the Win32 namespace, if a path is an absolute DOS drive or a relative path and if a filename (aka the final component) matches a special DOS device name then the path is ignored and replaced with that DOS device. For example:
C:\directory\subdir\COM1
Gets translated to:
\\.\COM1
Which becomes the kernel path
\??\COM1
These are the DOS device names that get the path replaced:
AUX
CON
CONIN$
CONOUT$
COM1
,COM2
,COM3
,COM4
,COM5
,COM6
,COM7
,COM8
,COM9
,COM²
,COM³
,COM¹
LPT1
,LPT2
,LPT3
,LPT4
,LPT5
,LPT6
,LPT7
,LPT8
,LPT9
,LPT²
,LPT³
,LPT¹
NUL
PRN
However the algorithm for matching device names is not as simple as a direct comparision. When comparing file names to special DOS device names, it's as if the following steps were applied to the file name:
- ASCII letters are uppercased
- anything after a
.
and the.
itself are removed - any trailing spaces (
For example, these filenames are all interpreted as \\.\COM1
:
- "
COM1.ext
" - "
COM1
" - "
COM1 . .ext
"
One final note, when opening a file path such as C:\Test\COM1
, it will only resolve to \\.\COM1
if the parent directory C:\Test
exists. Otherwise opening the file will fail with an invalid path error.
One form of path I've only briefly mentioned is GUID paths. These aren't used as much and are essentially just Verbatim or Device paths which aren't handled any differently. Still, it can be useful to be aware of paths such as:
\\?\Volume{79D3A0DE-481C-4D52-A70B-F06A16C020C2}\file.ext
This addresses a volume according to its GUID instead of a drive letter. It is useful for partitions that don't have an assigned letter or for when you need to be sure you're addressing a specific volume, regardless of where it is mounted.
If you read the kernel section you've probably guessed that these GUID paths are just symlinks to, for example, \Device\HarddiskVolume2
. In this way a Drive path like C:
will be exactly equivalent to a Volume path if they are both symlinked to the same volume.
There are other such symlinks but their use is even rarer and are possibly considered an implementation detail.
There's a change in Windows 11 regarding DOS device names – extensions aren't ignored any more, so
aux.c
is now a valid Win32 file and not a device name any more.