miyaokamarina/desktop-entry-parsers.md

## desktop-entry-parsers.md

      
    Raw
  

              desktop-entry-parsers.md
            
          
    Desktop Entry Parsers

KConfig


Only recognizes \n newlines.
Accepts [ \t\n\r] as whitespace:

At the start of line.
At the end of line.
Before =.
After =.


Does not care about UTF-8.
Sections:

Allows subsections '[Section][Subsection]'.
The actual name may be preceded by any number of empty segments '[]'.
Each segment starts with '[' and ends with the nearest ']'.
If the segment '[$i]' occurs at the very end of line:

If there is no name yet, it is the file-level flag.
Otherwise, it is the section-level flag.


If the next char after a segment is not '[', the rest of line is
ignored.
If '[' has no matching ']', the whole line is ignored.
Segment names can contain anything except ']'.
Segments joined with \x1d.
But leading empty segments '[]' are ignored.
Segment names are unescaped.
The '[$i]' token has no special meaning when followed by junk or
another segment.
When the '[$i]' token occurs at the file level:

Switches the rest of file to the immutable mode.


When the '[$i]' token occurs at the section level:

Switches the section to the immutable mode.


Immutable mode:

New properties cannot override existing or deleted ones.
Existing properties cannot be deleted with the d flag.


Props:

Props outside of sections belong to the special section <default>.
Must contain =.

But if the key has the d flag, it is not required.


Names may include anything except [=\[].
Keys may have multiple bracketed metadata fields.
Fields may include anything except [=\[\]].
Junk between fields is ignored.
Junk between the last field and = is ignored.
If the prop has no =, junk between the last field and the end of line
is ignored.
Junk may include anything except [=\[].
Space between the name and the first field is the part of name.
If the field starts with '$', it is the flags field.

Flag fields may include any allowed characters.
The recognized characters are:

d — delete prop.
i — immutable prop.
e — allow environment variable expansion.


Multiple locale tags are not allowed.
Empty fields are handled like locale tags.
Names are unescaped (after the fields parsing).
Names that have locale tag, and the locale is not interesting, are not
unescaped.
Values are always unescaped.
Values are QVariant or IDK.


Unescaping:

Trailing '\' is ignored or IDK.
The standard escapes [strn\\] are parsed as expected.
The escaped [;,] are untouched.
Hex escapes:

If '\x' is followed by exactly two ASCII hexadecimal digits,
evaluates to the corresponding byte.
Otherwise, consumes '\x' and two characters, or '\x' and one
character (near the end of string), or just '\x' (at the end of
string) and produces x.


Anything else is just '\' (ignoring the escaped char).
Strings are unescaped before UTF-8 decoding, and thus, the hex escapes
may be used to represent UTF-8 sequences.


Strings:

Unescaped and then decoded.
If the prop has the e flag, and the variable expansion is allowed by
other means, the environment variables are expanded as follows:

First, unescape the string.
'$$' is literal '$'.
'${' to '}' or to the end of string is the variable name.
'$' consumes as much identifier chars as possible:

Unicode alphanumerics.
ASCII underscores _.


If the var name is empty, it expands to nothing.
If the var is undefined or empty, and the name is one of special
names, use a predefined value:

QT_DATA_HOME → QStandardPaths::GenericDataLocation.
QT_CONFIG_HOME → QStandardPaths::GenericConfigLocation.
QT_CACHE_HOME → QStandardPaths::GenericCacheLocation.


Lists:

Prop values are first unescaped, and then split by still unescaped
separators.
For example, the value 'a;b\x3bc\\;d\;e' evaluates to the list {'a',
'b', 'c;d;e'}.


Exec: IDK.
Booleans: IDK.
Numbers: IDK.

Patterns:

TODO
GLib

Used in GNOME, LXDE, XFCE, etc.

Only recognizes \n newlines.
Accepts [ \t\n\f\r] as whitespace:

At the start of line.
At the end of line, except section header lines.
Before =.
After =.


Sections:

A line that starts with '[' and ends with ']'.
Everything between the first '[' and the first ']' is the name.
The name cannot be empty.
The name cannot contain [\0-\x1f\[\]\x7f].
Only U+0020 SPACE and U+0009 TAB are trimmed at the end of line.


Props:

Props outside of sections are not allowed.
Must contain =.
Ignored when before the first section.
Ignored when the key is empty.
Names must not contain [=\[\]].
Locale tags start with '[' and end with ']'.
Locale tags must be valid UTF-8.
Locale tags must only include Unicode alphanumerics and [-_.@].
Locale tags may be empty.
Multiple locale tags are not allowed.
No U+0020 SPACE between name and locale tag.


Strings:

Escaped list separators (configurable) are unescaped to themselves:

If the separator is ;, the '\;' sequence is interpreted as ;.


Invalid escapes are untouched.
Trailing '\' is not allowed.


Exec: IDK.
Lists:

Strings are first unescaped, and then split by a configurable list
separator.
Escaped separators do not work:

List items ending with the separator are not possible.


Booleans:

ASCII case is ignored.
True is true or 1.
False is false or 0.
Anything else is false.


Numbers:

Looks like GLib uses strtod to parse numbers (seems OK).


Patterns:

TODO
systemd

Uses the similar format for unit files, and also uses the same parser to
interpret the Desktop Entry files in xdg-autostart-generator.

Accepts BOM.
Max line length is 1 MiB.
Accepts [#;] as comment start.
Accepts [ \t\n\r] as whitespace:

At the start of line.
At the end of line.
Before =.
After =.


Accepts the following newlines:

\r\n\0,
\n\r\0,
\r\n,
\n\r,
\r\0,
\n\0,
\r,
\n,
\0.


Accepts line continuations with trailing '\'.
Does not accept invalid UTF-8.
Sections:

A whole line that starts with '[' and ends with ']'.
Everything between the first '[' and the last ']' is the name.
The name may be empty.
The name cannot contain [\0-x1f"'\\\x7f].
KConfig-like subsections syntax works, but differently than in KDE.


Props:

Must contain =.
Ignored when before the first section.
Ignored when the key is empty.
Locale tags syntax is not supported.
The first assignment wins (at least in *.desktop files).
Key names are not unescaped.


Booleans:

Case-insensitive.
True is 1, true, t, yes, y, on.
False is 0, false, f, no, n, off.
Anything else is error.
It seems the case mapping is locale-sensitive, and I hope there are no
locales that map the related characters to something unexpected or vice
versa.
Ideally, the parser should be changed to only use ASCII case mapping.


Strings:

Do not allow invalid escapes.


Exec: IDK.
Lists:

Only allow semicolons as separators.
Do not allow invalid escapes.
The '\;' escapes are handled correctly.


Numbers:

Not implemented.


Other lines are ignored.

Patterns:

split_lines_by = /\r\n?\0?|\n\r?\0?|[\r\n\0]/;
trim_line      = /^[ \t\n\r]*(?<line>.*)[ \t\n\r]*$/;
comment        = /^[#;].*$/;
section_header = /^\[(?<name>[^\0-\x1f"'\\\x7f]*)\]$/;
prop           = /^(?<key>.+?)[ \t\n\r]*=[ \t\n\r]*(?<val>.*)$/;

xdg_bool       = /^true|false|yes|no|on|off|[tfyn01]$/i;
xdg_string     = /^(?:[^\\]|\\[strn\\])*$/;
xdg_list       = /^(?:(?:[^\\;]|\\[strn\\;])*;?)$/;
Examples:
; comment
# comment

ignored=...

[error
[error];
[error]@
['error\x21']
[ok]
[also][ok]
[[no problem]]

=ignored
ignored

ok=ok
doesn't look wrong\x21=

Compatibility:

The ; comments syntax:

Technically, it is incompatible with GLib and KConfig implementations,
but in the context of xdg-autostart-generator, it’s OK.
When using GLib parser, the prop-like ; comments as still OK, as they
are most probably ignored.
When using KConfig parser, the prop-like ; comments may interfere
with normal props that start with the escape sequence \x3b.
I’ve examined all the *.desktop files on my machine, and haven’t found
any that have props starting with ;.


Restrictions on section names:

GLib and KConfig parsers do allow restricted characters in sections
names, but xdg-autostart-generator doesn’t care about sections other
than [Desktop Entry].
I’ve examined all the *.desktop files on my machine, and haven’t found
any that have systemd-incompatible section names.


1 MiB per line:

I’m unsure about GLib, but KConfig doesn’t seem to limit the line size.
I didn’t examine existing files, but I’m pretty sure they are OK.


CR and NUL newlines:

CR and NUL must not occur even in comments.
KConfig allows CR and NUL bytes in the middle of line.
GLib seems to allow at least CR bytes in the middle of line.
IDK if GLib allows NUL bytes in the middle of line or not (TODO).
I didn’t examine existing files, but I’m pretty sure they are OK.


Line continuations:

The spec doesn’t allow trailing \, so it’s OK.
KConfig allows trailing \.
I’m not sure if GLib allows trailing \ or not (TODO).
I didn’t examine existing files, but I’m pretty sure they are OK.


xdg-utils

xdg-open


Interpreted using /bin/sh.
Uses read(1p) with no options to scan lines:

Unix newlines only.
Trailing line is ignored.
Unescapes certain sequences on its own.


Does not trim whitespace.
Does not seem to unescape strings, except what read(1p) does.
Does not unescape key names, but read(1p) does.
Does not unescape section names, but read(1p) does.
Does not support subsections.
Does not tolerate junk past the section header.
The only recognized section is the exact /^\[Desktop Entry\]$/ line.
Actually doesn’t care about the spec and compatibility.
Exec: IDK.

xdg-mime


Uses AWK to parse files.
I’m not sure how exactly it handles all the edge cases.
I’m not sure if it cares about the spec or compatibility.

xdg-email, xdg-settings


Use grep to find the Exec key.
Don’t seem to unescape it.
Don’t seem to care about the spec or compatibility.

xdg-user-dirs

Uses a custom syntax that can be parsed as a subset of the Desktop Entry format
(assuming props outside of sections are allowed).
libqtxdg

Used in Deepin(?), LXQT.

Recognizes [\t\n\v\f\r ] as whitespace:

At the start of line.
At the end of line.
Before =.
After =.


Lines are scanned using QTextStream::readLine().

TODO: What line endings it recognizes?
TODO: What about NUL bytes?


Sections:

Similar to systemd.
But have no restricted chars.


Props:

Empty keys are ignored.
Void values (no =) seem to be OK.
Values are QVariant or IDK.
A file has the only table indexed by <section>/<prop> concatenations.
So, the b/c prop in the a section is the same as the c prop in the
a/b section.


Strings:

Only the [strn\\] escapes are supported.


Exec: IDK.
Lists:

Split after unescaping?
Empty segments are ignored.
Escaped separators do not work?


Booleans: IDK.
Numbers: IDK.

Patterns:

TODO
D-Bus

The Desktop Entry file parser from the D-Bus reference implementation.
Looks like it’s the only parser that follows the spec literally.

Only allow U+0020 SPACE before and after =.
Section names cannot include [\0-\x1f\[\]\x7f] and non-ASCII chars.
Recognizes the following newlines:

\r\n,
\r,
\n.


Blank lines can include [ \t\n\f\r].
Doesn’t allow invalid UTF-8.
Lines are not trimmed.
Only allow valid escapes.
Key names can only contain [-0-9A-Za-z].
Localized props are ignored.