Skip to content

Instantly share code, notes, and snippets.

@bbtdev
Last active May 28, 2020 03:06
Show Gist options
  • Save bbtdev/aa623b4b25902ba925303170a20e1cb1 to your computer and use it in GitHub Desktop.
Save bbtdev/aa623b4b25902ba925303170a20e1cb1 to your computer and use it in GitHub Desktop.
In the man page, we find out that word splitting is an expansion:
"There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion,
command substitution, arithmetic expansion, WORD SPLITTING*, and pathname expansion.*
Also it describes that this expansion acts on: "results of parameter expansion, command substitution,and arithmetic
expansion that did not occur within double quotes".
Confirmed also by the BashGuide: "Word splitting is performed on the results of almost all unquoted expansions."
In both the guide and man page is specified that Word splitting is dependent on the IFS variable:
"The result of the expansion is broken into separate words based on the characters of the IFS variable."
Until now, everything is well explained, made sense, it's interesting etc.
But in the BashGuide you see the process when the command line is initially split into words based on
whitespace described as world splitting on numerous occasions, for example:
"The shell takes your line of code and cuts it up into bits wherever there are sequences of syntactical whitespace.
The command above would be split up into the following:
rm myfile myotherfile
^ ^
[rm] [myfile] [myotherfile]
As you can see, all syntactical whitespace has been removed. There is no more whitespace left after word splitting
is done with your line."
Many describe this process as tokenization, and rightfully so, since IFS variable is not involved here and
it does not follow the definition of word splitting from man page.
Wished this terminology (tokenization) was used instead, because naming the same,
two different processes, where the difference are not obvious might be confusing.
At least for me it was, it took me a few hours.
FROM IRC:
the behaviour of word splitting is well documented, but the
tokenisation of the shell's input - which is not "word splitting"
in any formal sense and certainly does not hinge upon the value of
IFS - occurs much earlier and is not so well documented.
even the info pages gloss over the details of tokenisation. it's
briefly touched upon here:
https://www.gnu.org/software/bash/manual/html_node/Shell-Syntax.html#Shell-Syntax
that's the "cuts it up into bits" stage, well before expansions may occur.
and word splitting, for that matter.
on the upside, the node that talks about word splitting is very !!! he is taling about http://mywiki.wooledge.org/WordSplitting
specific as to how it works.
the key thing to remember is that, if no other forms of documented
expansion have occurred up to the point at which word splitting is
on the cards, then no splitting will occur. if it does then the
value of IFS is, of course, relevant.
as an aside, I think it's perhaps not ideal that word splitting is
initially presented as if it were a form of expansion. if you look
at the explanation of it, it becomes clear that it's something
that acts on other expansions and the tone shifts.
if you haven't read the documentation for the Shell
Command Language, you probably should. it explains the
tokenisation process in a satisfactory manner and should apply to
bash, for the most part.
they also use the term "field splitting" rather than "word
splitting" (shrug).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment