bbtdev/bash-word-splitting-confusion

## bash-word-splitting-confusion
In the man page, we find out that word splitting is an expansion:

"There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion,
command substitution, arithmetic expansion, WORD SPLITTING*, and pathname expansion.*

Also it describes that this expansion acts on: "results of parameter expansion, command substitution,and arithmetic
expansion that did not occur within double quotes".
Confirmed also by the BashGuide: "Word splitting is performed on the results of almost all unquoted expansions."

In both the guide and man page is specified that Word splitting is dependent on the IFS variable:
"The result of the expansion is broken into separate words based on the characters of the IFS variable."

Until now, everything is well explained, made sense, it's interesting etc.
But in the BashGuide you see the process when the command line is initially split into words based on
whitespace described as world splitting on numerous occasions, for example:

"The shell takes your line of code and cuts it up into bits wherever there are sequences of syntactical whitespace.
The command above would be split up into the following:

rm myfile myotherfile
   ^      ^
[rm] [myfile] [myotherfile]

As you can see, all syntactical whitespace has been removed. There is no more whitespace left after word splitting
is done with your line."

Many describe this process as tokenization, and rightfully so, since IFS variable is not involved here and
it does not follow the definition of word splitting from man page.

Wished this terminology (tokenization) was used instead, because naming the same,
two different processes, where the difference are not obvious might be confusing.
At least for me it was, it took me a few hours.


FROM IRC:
the behaviour of word splitting is well documented, but the
tokenisation of the shell's input - which is not "word splitting"
in any formal sense and certainly does not hinge upon the value of
IFS - occurs much earlier and is not so well documented.

even the info pages gloss over the details of tokenisation. it's
briefly touched upon here:

https://www.gnu.org/software/bash/manual/html_node/Shell-Syntax.html#Shell-Syntax

that's the "cuts it up into bits" stage, well before expansions may occur.
and word splitting, for that matter.

on the upside, the node that talks about word splitting is very !!! he is taling about http://mywiki.wooledge.org/WordSplitting
specific as to how it works.

the key thing to remember is that, if no other forms of documented
expansion have occurred up to the point at which word splitting is
on the cards, then no splitting will occur. if it does then the
value of IFS is, of course, relevant.

as an aside, I think it's perhaps not ideal that word splitting is
initially presented as if it were a form of expansion. if you look
at the explanation of it, it becomes clear that it's something
that acts on other expansions and the tone shifts.

if you haven't read the documentation for the Shell
Command Language, you probably should. it explains the
tokenisation process in a satisfactory manner and should apply to
bash, for the most part.
they also use the term "field splitting" rather than "word
splitting" (shrug).
	In the man page, we find out that word splitting is an expansion:

	"There are seven kinds of expansion performed: brace expansion, tilde expansion, parameter and variable expansion,
	command substitution, arithmetic expansion, WORD SPLITTING, and pathname expansion.

	Also it describes that this expansion acts on: "results of parameter expansion, command substitution,and arithmetic
	expansion that did not occur within double quotes".
	Confirmed also by the BashGuide: "Word splitting is performed on the results of almost all unquoted expansions."

	In both the guide and man page is specified that Word splitting is dependent on the IFS variable:
	"The result of the expansion is broken into separate words based on the characters of the IFS variable."

	Until now, everything is well explained, made sense, it's interesting etc.
	But in the BashGuide you see the process when the command line is initially split into words based on
	whitespace described as world splitting on numerous occasions, for example:

	"The shell takes your line of code and cuts it up into bits wherever there are sequences of syntactical whitespace.
	The command above would be split up into the following:

	rm myfile myotherfile
	^ ^
	[rm] [myfile] [myotherfile]

	As you can see, all syntactical whitespace has been removed. There is no more whitespace left after word splitting
	is done with your line."

	Many describe this process as tokenization, and rightfully so, since IFS variable is not involved here and
	it does not follow the definition of word splitting from man page.

	Wished this terminology (tokenization) was used instead, because naming the same,
	two different processes, where the difference are not obvious might be confusing.
	At least for me it was, it took me a few hours.


	FROM IRC:
	the behaviour of word splitting is well documented, but the
	tokenisation of the shell's input - which is not "word splitting"
	in any formal sense and certainly does not hinge upon the value of
	IFS - occurs much earlier and is not so well documented.

	even the info pages gloss over the details of tokenisation. it's
	briefly touched upon here:

	https://www.gnu.org/software/bash/manual/html_node/Shell-Syntax.html#Shell-Syntax

	that's the "cuts it up into bits" stage, well before expansions may occur.
	and word splitting, for that matter.

	on the upside, the node that talks about word splitting is very !!! he is taling about http://mywiki.wooledge.org/WordSplitting
	specific as to how it works.

	the key thing to remember is that, if no other forms of documented
	expansion have occurred up to the point at which word splitting is
	on the cards, then no splitting will occur. if it does then the
	value of IFS is, of course, relevant.

	as an aside, I think it's perhaps not ideal that word splitting is
	initially presented as if it were a form of expansion. if you look
	at the explanation of it, it becomes clear that it's something
	that acts on other expansions and the tone shifts.

	if you haven't read the documentation for the Shell
	Command Language, you probably should. it explains the
	tokenisation process in a satisfactory manner and should apply to
	bash, for the most part.
	they also use the term "field splitting" rather than "word
	splitting" (shrug).