Skip to content

Instantly share code, notes, and snippets.

@francoishill
Forked from notheotherben/README.md
Created February 13, 2017 05:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save francoishill/8c60d9c88b8986d04fcb9ecb2c775cff to your computer and use it in GitHub Desktop.
Save francoishill/8c60d9c88b8986d04fcb9ecb2c775cff to your computer and use it in GitHub Desktop.
PEG Grammar for Command Line Parsing

Command Line Parsing Grammar

This grammar allows you to parse command lines for program execution into their various components - specifically: environment variables, the executable itself and any arguments passed to the executable.

It will take an input like the following:

ENV_X=true ENV_Y="yes please" ./test/my_exec arg1 -f1 "arg with spaces" 'another arg' --flag2 yet\ another\ arg --flag=10

And transform it into an object which can be used to start the specified program in the correct environment, with the correct arguments.

{
   "env": [
      "ENV_X=true",
      "ENV_Y=yes please"
   ],
   "exec": "./test/my_exec",
   "args": [
      "arg1",
      "-f1",
      "arg with spaces",
      "another arg",
      "--flag2",
      "yet another arg",
      "--flag=10"
   ]
}

This grammar places particular emphasis on matching Bash's environment escaping behaviour so that users familiar with invoking commands via a command line should be able to make use of systems with this grammar.

/*
* Command Line Execution Grammar
*/
Expression
= env:(EnvironmentVariable _)* exec:Executable args:(_ Argument)* _ Comment? {
return {
"env": env.map(function(t) { return t[0]; }),
"exec": exec,
"args": args.map(function(t) { return t[1]; })
};
}
EnvironmentVariable "environment variable"
= name:EnvironmentVariableKey "=" value:String {
return name + "=" + value;
}
EnvironmentVariableKey "environment variable key"
= [A-Z][A-Z0-9_]* {
return text()
}
Executable "executable"
= exe:String {
return exe;
}
Argument "argument"
= arg:String {
return arg
}
String "string"
= qqs:DoubleQuotedString { return qqs; }
/ qs:SingleQuotedString { return qs; }
/ s:UnquotedString { return s; }
UnquotedString "unquoted string"
= chars:UnquotedChars+ {
return chars.join("");
}
UnquotedChars
= [^\0-\x20\x22\x27\x5C]
/ Escape sequence:(
" "
/ "\\"
/ "/"
/ "b" { return "\b"; }
/ "f" { return "\f"; }
/ "n" { return "\n"; }
/ "r" { return "\r"; }
/ "t" { return "\t"; }
/ "u" digits:$(HEXDIG HEXDIG HEXDIG HEXDIG) {
return String.fromCharCode(parseInt(digits, 16));
}
) { return sequence; }
Escape
= '\\'
SingleQuotedString "single quoted string"
= "'" chars:SingleQuotedStringChars* "'" {
return chars.join("")
}
SingleQuotedStringChars
= UnescapedSingleQuoteChar
/ Escape sequence:(
"'"
/ "\\"
/ "/"
/ "b" { return "\b"; }
/ "f" { return "\f"; }
/ "n" { return "\n"; }
/ "r" { return "\r"; }
/ "t" { return "\t"; }
/ "u" digits:$(HEXDIG HEXDIG HEXDIG HEXDIG) {
return String.fromCharCode(parseInt(digits, 16));
}
) { return sequence }
UnescapedSingleQuoteChar
= [^\0-\x1F\x27\x5C]
DoubleQuotedString "double quoted string"
= '"' chars:DoubleQuotedStringChars* '"' {
return chars.join("");
}
DoubleQuotedStringChars
= UnescapedDoubleQuoteChar
/ Escape sequence:(
'"'
/ "\\"
/ "/"
/ "b" { return "\b"; }
/ "f" { return "\f"; }
/ "n" { return "\n"; }
/ "r" { return "\r"; }
/ "t" { return "\t"; }
/ "u" digits:$(HEXDIG HEXDIG HEXDIG HEXDIG) {
return String.fromCharCode(parseInt(digits, 16));
}
) { return sequence }
UnescapedDoubleQuoteChar
= [^\0-\x1F\x22\x5C]
Comment
= "#" comment:.* {
return comment.join("")
}
DIGIT = [0-9]
HEXDIG = [0-9a-f]i
_ "whitespace"
= [ \t\n\r]*
@prozacgod
Copy link

I really like the idea of this, I played around with it and quickly realized some nice features, I like the idea of getting a really good bash syntax PEG parser.

examples...

foo='test' program

should be valid, uppercase environment variables are convention, not requirement.

foo=test test | grep 'test'

It's valid, but does not parse correctly, parsing that pattern may be a bit harder mostly because it now creates an "execution chain" along with execution expansion

I was thinking this could be done with an execution chain, a list of executions that must occur

X=3 foo --a -a 3

[{ "env": [ "X=3" ], "exec": "foo", "args": [ "--a", "-a", "3" ], "stdout": "stdout" "stdin": "stdin" "stderr": "stderr" }]

(I started to write more of this idea down and realized... that a lot of work... but I did write the comment - feel free to call me a loon and carry on with life.)

X=test foo --a -a 3 | grep -i "foo bar"

[{ "env": [], "exec": "test", "args": [], "stdout": { env: 'X' } "stdin": "stdin" "stderr": "stderr" },{ "env": [{name:'X'}], "exec": "foo", "args": [ "--a", "-a", "3" ], "stdout": 2 "stdin": "stdin" "stderr": "stderr" }, { "env": [{name:'X'}], "exec": "grep", "args": [ "-i", "foo bar", ], "stdout": "stdout" "stdin": 1 "stderr": "stderr" }] ]

This shows that 2 cannot execute without executing index 1 and 0 has no dependancies and should execute first etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment