Skip to content

Instantly share code, notes, and snippets.

@miyaokamarina
Last active February 19, 2023 12:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save miyaokamarina/db72a092fa4ca843f7f6bb87f6c4b65b to your computer and use it in GitHub Desktop.
Save miyaokamarina/db72a092fa4ca843f7f6bb87f6c4b65b to your computer and use it in GitHub Desktop.
Desktop Entry Parsers Comparison

Desktop Entry Parsers

KConfig

  • Only recognizes \n newlines.
  • Accepts [ \t\n\r] as whitespace:
    • At the start of line.
    • At the end of line.
    • Before =.
    • After =.
  • Does not care about UTF-8.
  • Sections:
    • Allows subsections '[Section][Subsection]'.
    • The actual name may be preceded by any number of empty segments '[]'.
    • Each segment starts with '[' and ends with the nearest ']'.
    • If the segment '[$i]' occurs at the very end of line:
      • If there is no name yet, it is the file-level flag.
      • Otherwise, it is the section-level flag.
    • If the next char after a segment is not '[', the rest of line is ignored.
    • If '[' has no matching ']', the whole line is ignored.
    • Segment names can contain anything except ']'.
    • Segments joined with \x1d.
    • But leading empty segments '[]' are ignored.
    • Segment names are unescaped.
    • The '[$i]' token has no special meaning when followed by junk or another segment.
    • When the '[$i]' token occurs at the file level:
      • Switches the rest of file to the immutable mode.
    • When the '[$i]' token occurs at the section level:
      • Switches the section to the immutable mode.
    • Immutable mode:
      • New properties cannot override existing or deleted ones.
      • Existing properties cannot be deleted with the d flag.
  • Props:
    • Props outside of sections belong to the special section <default>.
    • Must contain =.
      • But if the key has the d flag, it is not required.
    • Names may include anything except [=\[].
    • Keys may have multiple bracketed metadata fields.
    • Fields may include anything except [=\[\]].
    • Junk between fields is ignored.
    • Junk between the last field and = is ignored.
    • If the prop has no =, junk between the last field and the end of line is ignored.
    • Junk may include anything except [=\[].
    • Space between the name and the first field is the part of name.
    • If the field starts with '$', it is the flags field.
      • Flag fields may include any allowed characters.
      • The recognized characters are:
        • d — delete prop.
        • i — immutable prop.
        • e — allow environment variable expansion.
    • Multiple locale tags are not allowed.
    • Empty fields are handled like locale tags.
    • Names are unescaped (after the fields parsing).
    • Names that have locale tag, and the locale is not interesting, are not unescaped.
    • Values are always unescaped.
    • Values are QVariant or IDK.
  • Unescaping:
    • Trailing '\' is ignored or IDK.
    • The standard escapes [strn\\] are parsed as expected.
    • The escaped [;,] are untouched.
    • Hex escapes:
      • If '\x' is followed by exactly two ASCII hexadecimal digits, evaluates to the corresponding byte.
      • Otherwise, consumes '\x' and two characters, or '\x' and one character (near the end of string), or just '\x' (at the end of string) and produces x.
    • Anything else is just '\' (ignoring the escaped char).
    • Strings are unescaped before UTF-8 decoding, and thus, the hex escapes may be used to represent UTF-8 sequences.
  • Strings:
    • Unescaped and then decoded.
    • If the prop has the e flag, and the variable expansion is allowed by other means, the environment variables are expanded as follows:
      • First, unescape the string.
      • '$$' is literal '$'.
      • '${' to '}' or to the end of string is the variable name.
      • '$' consumes as much identifier chars as possible:
        • Unicode alphanumerics.
        • ASCII underscores _.
      • If the var name is empty, it expands to nothing.
      • If the var is undefined or empty, and the name is one of special names, use a predefined value:
        • QT_DATA_HOMEQStandardPaths::GenericDataLocation.
        • QT_CONFIG_HOMEQStandardPaths::GenericConfigLocation.
        • QT_CACHE_HOMEQStandardPaths::GenericCacheLocation.
  • Lists:
    • Prop values are first unescaped, and then split by still unescaped separators.
    • For example, the value 'a;b\x3bc\\;d\;e' evaluates to the list {'a', 'b', 'c;d;e'}.
  • Exec: IDK.
  • Booleans: IDK.
  • Numbers: IDK.

Patterns:

TODO

GLib

Used in GNOME, LXDE, XFCE, etc.

  • Only recognizes \n newlines.
  • Accepts [ \t\n\f\r] as whitespace:
    • At the start of line.
    • At the end of line, except section header lines.
    • Before =.
    • After =.
  • Sections:
    • A line that starts with '[' and ends with ']'.
    • Everything between the first '[' and the first ']' is the name.
    • The name cannot be empty.
    • The name cannot contain [\0-\x1f\[\]\x7f].
    • Only U+0020 SPACE and U+0009 TAB are trimmed at the end of line.
  • Props:
    • Props outside of sections are not allowed.
    • Must contain =.
    • Ignored when before the first section.
    • Ignored when the key is empty.
    • Names must not contain [=\[\]].
    • Locale tags start with '[' and end with ']'.
    • Locale tags must be valid UTF-8.
    • Locale tags must only include Unicode alphanumerics and [-_.@].
    • Locale tags may be empty.
    • Multiple locale tags are not allowed.
    • No U+0020 SPACE between name and locale tag.
  • Strings:
    • Escaped list separators (configurable) are unescaped to themselves:
      • If the separator is ;, the '\;' sequence is interpreted as ;.
    • Invalid escapes are untouched.
    • Trailing '\' is not allowed.
  • Exec: IDK.
  • Lists:
    • Strings are first unescaped, and then split by a configurable list separator.
    • Escaped separators do not work:
      • List items ending with the separator are not possible.
  • Booleans:
    • ASCII case is ignored.
    • True is true or 1.
    • False is false or 0.
    • Anything else is false.
  • Numbers:
    • Looks like GLib uses strtod to parse numbers (seems OK).

Patterns:

TODO

systemd

Uses the similar format for unit files, and also uses the same parser to interpret the Desktop Entry files in xdg-autostart-generator.

  • Accepts BOM.
  • Max line length is 1 MiB.
  • Accepts [#;] as comment start.
  • Accepts [ \t\n\r] as whitespace:
    • At the start of line.
    • At the end of line.
    • Before =.
    • After =.
  • Accepts the following newlines:
    • \r\n\0,
    • \n\r\0,
    • \r\n,
    • \n\r,
    • \r\0,
    • \n\0,
    • \r,
    • \n,
    • \0.
  • Accepts line continuations with trailing '\'.
  • Does not accept invalid UTF-8.
  • Sections:
    • A whole line that starts with '[' and ends with ']'.
    • Everything between the first '[' and the last ']' is the name.
    • The name may be empty.
    • The name cannot contain [\0-x1f"'\\\x7f].
    • KConfig-like subsections syntax works, but differently than in KDE.
  • Props:
    • Must contain =.
    • Ignored when before the first section.
    • Ignored when the key is empty.
    • Locale tags syntax is not supported.
    • The first assignment wins (at least in *.desktop files).
    • Key names are not unescaped.
  • Booleans:
    • Case-insensitive.
    • True is 1, true, t, yes, y, on.
    • False is 0, false, f, no, n, off.
    • Anything else is error.
    • It seems the case mapping is locale-sensitive, and I hope there are no locales that map the related characters to something unexpected or vice versa.
    • Ideally, the parser should be changed to only use ASCII case mapping.
  • Strings:
    • Do not allow invalid escapes.
  • Exec: IDK.
  • Lists:
    • Only allow semicolons as separators.
    • Do not allow invalid escapes.
    • The '\;' escapes are handled correctly.
  • Numbers:
    • Not implemented.
  • Other lines are ignored.

Patterns:

split_lines_by = /\r\n?\0?|\n\r?\0?|[\r\n\0]/;
trim_line      = /^[ \t\n\r]*(?<line>.*)[ \t\n\r]*$/;
comment        = /^[#;].*$/;
section_header = /^\[(?<name>[^\0-\x1f"'\\\x7f]*)\]$/;
prop           = /^(?<key>.+?)[ \t\n\r]*=[ \t\n\r]*(?<val>.*)$/;

xdg_bool       = /^true|false|yes|no|on|off|[tfyn01]$/i;
xdg_string     = /^(?:[^\\]|\\[strn\\])*$/;
xdg_list       = /^(?:(?:[^\\;]|\\[strn\\;])*;?)$/;

Examples:

; comment
# comment

ignored=...

[error
[error];
[error]@
['error\x21']
[ok]
[also][ok]
[[no problem]]

=ignored
ignored

ok=ok
doesn't look wrong\x21=

Compatibility:

  • The ; comments syntax:
    • Technically, it is incompatible with GLib and KConfig implementations, but in the context of xdg-autostart-generator, it’s OK.
    • When using GLib parser, the prop-like ; comments as still OK, as they are most probably ignored.
    • When using KConfig parser, the prop-like ; comments may interfere with normal props that start with the escape sequence \x3b.
    • I’ve examined all the *.desktop files on my machine, and haven’t found any that have props starting with ;.
  • Restrictions on section names:
    • GLib and KConfig parsers do allow restricted characters in sections names, but xdg-autostart-generator doesn’t care about sections other than [Desktop Entry].
    • I’ve examined all the *.desktop files on my machine, and haven’t found any that have systemd-incompatible section names.
  • 1 MiB per line:
    • I’m unsure about GLib, but KConfig doesn’t seem to limit the line size.
    • I didn’t examine existing files, but I’m pretty sure they are OK.
  • CR and NUL newlines:
    • CR and NUL must not occur even in comments.
    • KConfig allows CR and NUL bytes in the middle of line.
    • GLib seems to allow at least CR bytes in the middle of line.
    • IDK if GLib allows NUL bytes in the middle of line or not (TODO).
    • I didn’t examine existing files, but I’m pretty sure they are OK.
  • Line continuations:
    • The spec doesn’t allow trailing \, so it’s OK.
    • KConfig allows trailing \.
    • I’m not sure if GLib allows trailing \ or not (TODO).
    • I didn’t examine existing files, but I’m pretty sure they are OK.

xdg-utils

xdg-open

  • Interpreted using /bin/sh.
  • Uses read(1p) with no options to scan lines:
    • Unix newlines only.
    • Trailing line is ignored.
    • Unescapes certain sequences on its own.
  • Does not trim whitespace.
  • Does not seem to unescape strings, except what read(1p) does.
  • Does not unescape key names, but read(1p) does.
  • Does not unescape section names, but read(1p) does.
  • Does not support subsections.
  • Does not tolerate junk past the section header.
  • The only recognized section is the exact /^\[Desktop Entry\]$/ line.
  • Actually doesn’t care about the spec and compatibility.
  • Exec: IDK.

xdg-mime

  • Uses AWK to parse files.
  • I’m not sure how exactly it handles all the edge cases.
  • I’m not sure if it cares about the spec or compatibility.

xdg-email, xdg-settings

  • Use grep to find the Exec key.
  • Don’t seem to unescape it.
  • Don’t seem to care about the spec or compatibility.

xdg-user-dirs

Uses a custom syntax that can be parsed as a subset of the Desktop Entry format (assuming props outside of sections are allowed).

libqtxdg

Used in Deepin(?), LXQT.

  • Recognizes [\t\n\v\f\r ] as whitespace:
    • At the start of line.
    • At the end of line.
    • Before =.
    • After =.
  • Lines are scanned using QTextStream::readLine().
    • TODO: What line endings it recognizes?
    • TODO: What about NUL bytes?
  • Sections:
    • Similar to systemd.
    • But have no restricted chars.
  • Props:
    • Empty keys are ignored.
    • Void values (no =) seem to be OK.
    • Values are QVariant or IDK.
    • A file has the only table indexed by <section>/<prop> concatenations.
    • So, the b/c prop in the a section is the same as the c prop in the a/b section.
  • Strings:
    • Only the [strn\\] escapes are supported.
  • Exec: IDK.
  • Lists:
    • Split after unescaping?
    • Empty segments are ignored.
    • Escaped separators do not work?
  • Booleans: IDK.
  • Numbers: IDK.

Patterns:

TODO

D-Bus

The Desktop Entry file parser from the D-Bus reference implementation.

Looks like it’s the only parser that follows the spec literally.

  • Only allow U+0020 SPACE before and after =.
  • Section names cannot include [\0-\x1f\[\]\x7f] and non-ASCII chars.
  • Recognizes the following newlines:
    • \r\n,
    • \r,
    • \n.
  • Blank lines can include [ \t\n\f\r].
  • Doesn’t allow invalid UTF-8.
  • Lines are not trimmed.
  • Only allow valid escapes.
  • Key names can only contain [-0-9A-Za-z].
  • Localized props are ignored.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment