Skip to content

Instantly share code, notes, and snippets.

@cben
Last active January 8, 2024 00:28
Show Gist options
  • Save cben/b3e588e15bdd22c2882363247d75773a to your computer and use it in GitHub Desktop.
Save cben/b3e588e15bdd22c2882363247d75773a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "9f252697",
"metadata": {},
"source": [
"Answering https://stackoverflow.com/questions/21011655/declare-array-but-not-define-the-array-shell-scripting ..."
]
},
{
"cell_type": "markdown",
"id": "4a796653",
"metadata": {},
"source": [
"_[answering about `bash`, not sure how portable arrays are between other bourne shells]_\n",
"\n",
"The arr[index] assignment syntax can only define arrays of 1 or more elements (and is tiresome when building long arrays). I generally recommend arr=(...) syntax.\n",
"\n",
" * Do you want an empty array of 0 elements? \n",
" => Use `empty_array=()` syntax, or `local empty_array=()` inside shell functions. \n",
" \n",
"\n",
" * Do you really care about declaring but not initializing it? \n",
" => That's possible with `declare -a uninitialized_array` aka `local -a uninitialized_array`. \n",
" \n",
" But AFAICT it's not very useful, especially with arrays — it behaves almost same as empty_array, and won't even catch use-before-initialize bugs.\n",
" \n",
" ----"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b653b795",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002GNU bash, version 5.2.15(1)-release (x86_64-redhat-linux-gnu)\n"
]
},
{
"ename": "",
"evalue": "1",
"output_type": "error",
"traceback": []
}
],
"source": [
"export LC_ALL=C\n",
"bash --version | head -n1"
]
},
{
"cell_type": "markdown",
"id": "a87d1a0b",
"metadata": {},
"source": [
"There are several aspects here and to separate them we'll need multiple test variables:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e0b351ab",
"metadata": {
"collapsed": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002\n"
]
},
{
"ename": "",
"evalue": "1",
"output_type": "error",
"traceback": []
}
],
"source": [
"empty_array=()\n",
"declare -a declared_empty_array=()\n",
"declare declared_empty_array2=()\n",
"declare -a uninitialized_array\n",
"array_with_empty_string=('')\n",
"empty_string=''\n",
"declare uninitialized_string"
]
},
{
"cell_type": "markdown",
"id": "c33d0072",
"metadata": {},
"source": [
"#### debug tips\n",
"\n",
"`declare -p` is useful to get full¹ metadata on variables. It dumps the commands you could use to re-create variables with same attributes (at least if you start from a blank shell). "
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "d96cd7ae",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u0001\u001b[24m\u0002declare -a empty_array=()\n",
"declare -a declared_empty_array=()\n",
"declare -a declared_empty_array2=()\n",
"declare -a uninitialized_array\n",
"declare -a array_with_empty_string=([0]=\"\")\n",
"declare -- empty_string=\"\"\n",
"declare -- uninitialized_string\n",
"bash: declare: undefined: not found\n"
]
},
{
"ename": "",
"evalue": "1",
"output_type": "error",
"traceback": []
}
],
"source": [
"declare -p empty_array declared_empty_array declared_empty_array2 uninitialized_array array_with_empty_string empty_string uninitialized_string undefined"
]
},
{
"cell_type": "markdown",
"id": "be5f3c15",
"metadata": {},
"source": [
"So, first thing we can confirm is `-a` is entirely optional when assigning array value =(...). \n",
"declared_empty_array2 is same as declared_empty_array — both are arrays with empty values.\n",
"\n",
"¹ One thing it won't tell: whether it's local. But `local -p` would."
]
},
{
"cell_type": "markdown",
"id": "3d3b5244",
"metadata": {},
"source": [
"But you'll see some distinctions bash makes won't matter in practice. \n",
"I'll mostly want to check how variable expansion _behaves_. I'll be using ruby to print the arguments a command recieves:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "28b0801e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u0001\u001b[24m\u0002[]\n"
]
},
{
"ename": "",
"evalue": "1",
"output_type": "error",
"traceback": []
}
],
"source": [
"ruby -e 'p ARGV' # just to show how it reports 0 arguments..."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e749e5f7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u0001\u001b[24m\u0002[\"\"]\n"
]
},
{
"ename": "",
"evalue": "1",
"output_type": "error",
"traceback": []
}
],
"source": [
"ruby -e 'p ARGV' '' # ...vs 1 argument which is an empty string"
]
},
{
"cell_type": "markdown",
"id": "b33bdc03",
"metadata": {},
"source": [
"## `declare` makes variables local!\n",
"\n",
"I'll warn you right away not to think of `declare` builtin as \"declaration\". \n",
"Yes it does that, and it has advanced uses like readonly vars, associative arrays (out of scope here) etc. \n",
"But for our needs here, its most visible effect is making the listed names [\n",
"local inside a shell function](https://www.gnu.org/software/bash/manual/html_node/Shell-Functions.html) (unless you use `-g` flag). \n",
"Inside functions, it's clearer if you use the (mostly equivalent) `local` spelling instead of `declare`."
]
},
{
"cell_type": "markdown",
"id": "bece4bb6",
"metadata": {},
"source": [
"## arrays vs. regular string variables: quite weakly typed\n",
"\n",
"We're talking here about \"indexed arrays\". bash also has \"associative arrays\" (which really need `-A` declaration) but off-topic for this question.\n",
"\n",
"Reminder: bash has essentially **3 expansion modes**:\n",
"- unquoted `$var`, `${array[3]}`, `${array[*]}`, `$*` all do [word splitting](https://www.gnu.org/software/bash/manual/html_node/Word-Splitting.html) resulting in 0 to many arguments, even from a regular non-array var.\n",
"- quoted `\"$var\"`, `\"${array[3]}\"`, `\"${array[*]}\"`, `\"$*\"` always result in 1 argument (which might be empty string), even from an array.\n",
"- [one-to-one](https://www.gnu.org/software/bash/manual/html_node/Arrays.html) `\"${array[@]}\"`, `\"$@\"` which results in exactly 1 argument per array element. This is the only mode that can distinguish single \"foo bar\" vs two \"foo\" \"bar\", and is the **whole point of using real arrays**. A pity it's so verbose ☹\n",
"\n",
"*You can think of all regular string variables as \"wannabe\" arrays of 1 element*. \n",
"\n",
"Both `declare -a` and `=()` syntaxes mark a variable as an array, and `declare -p` will confirm that, but it almost doesn't matter! \n",
"You _can_ freely apply array expansions to string variables and vice versa! And string vars promote to an array as soon as you assign to some other var[index], use `+=(...)` operator etc."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b5dab017",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002[\"foo\"]\n",
"\u0001\u001b[24m\u0002[\"foo\", \"bar\"]\n"
]
},
{
"ename": "",
"evalue": "1",
"output_type": "error",
"traceback": []
}
],
"source": [
"string1=foo\n",
"array2=(foo bar)\n",
"# Using string in array expansion:\n",
"ruby -e 'p ARGV' \"${string1[@]}\"\n",
"ruby -e 'p ARGV' \"${array2[@]}\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "1b748ba3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u0001\u001b[24m\u0002[\"\"]\n",
"\u0001\u001b[24m\u0002[\"\"]\n",
"\u0001\u001b[24m\u0002[\"foo\"]\n",
"\u0001\u001b[24m\u0002[\"foo\"]\n",
"\u0001\u001b[24m\u0002[\"foo\"]\n"
]
},
{
"ename": "",
"evalue": "1",
"output_type": "error",
"traceback": []
}
],
"source": [
"# Using arrays in string expansion behaves as if you accessed first element [0]\n",
"ruby -e 'p ARGV' \"$empty_array\"\n",
"ruby -e 'p ARGV' \"${empty_array[0]}\"\n",
"ruby -e 'p ARGV' \"$string1\"\n",
"ruby -e 'p ARGV' \"$array2\"\n",
"ruby -e 'p ARGV' \"${array2[0]}\""
]
},
{
"cell_type": "markdown",
"id": "299a0bd4",
"metadata": {},
"source": [
"## uninitialized/unset/unbound variables\n",
"\n",
"For historical reasons, Bourne-derived shells default to silently tolerating expansion of unknown variables ☹\n",
"\n",
"* When treated as regular variable, by default they silently expand to empty string, but you can enable [stricter `set -u` mode](https://gist.github.com/robin-a-meade/58d60124b88b60816e8349d1e3938615) to complain.\n",
"* When expanded as an array, they behave as empty arrays — and `set -u` doesn't seem to matter?!?"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "471cc150",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u0001\u001b[24m\u0002[\"\"]\n",
"\u0001\u001b[24m\u0002[]\n",
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002bash: undefined: unbound variable\n",
"\u0001\u001b[24m\u0002[]\n",
"\u0001\u001b[24m\u0002\n"
]
},
{
"ename": "",
"evalue": "1",
"output_type": "error",
"traceback": []
}
],
"source": [
"ruby -e 'p ARGV' \"$undefined\"\n",
"ruby -e 'p ARGV' \"${undefined[@]}\"\n",
"\n",
"set -u\n",
"\n",
"ruby -e 'p ARGV' \"$undefined\"\n",
"ruby -e 'p ARGV' \"${undefined[@]}\"\n",
"\n",
"set +u"
]
},
{
"cell_type": "markdown",
"id": "e3413d5e",
"metadata": {},
"source": [
"#### declare without value keeps it unset! Is that useful??\n",
"\n",
"You've asked about declaring like `int bar[100];` would in C, without initializing a value. Yes you can do that `declare -a arr` (but can't set size 100). What does that achieve?\n",
"- it makes it local — but so would `declare arr=()`.\n",
"- it marks it as an array — but as we said it's pretty weak type.\n",
"- 💡 `set -u` mode could catch bugs where it's set conditionally and some code path forgot to set it before use! \n",
" ALAS, array expansions seem allowed ☹ \n",
" AND AFAICT it doesn't matter whether we declare it with `-a`, without `-a`, or don't declare at all. It only matters which expansion syntax we use — when accessed as array, it'll never cause errors.\n",
" \n",
"**=> I don't see any benefit to \"declare array without defining\". Can always initialize as empty `=()`.**"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "1b81e1ac",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u0001\u001b[24m\u0002[\"\"]\n",
"\u0001\u001b[24m\u0002[]\n",
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002bash: uninitialized_string: unbound variable\n",
"\u0001\u001b[24m\u0002[]\n",
"\u0001\u001b[24m\u0002\n"
]
},
{
"ename": "",
"evalue": "1",
"output_type": "error",
"traceback": []
}
],
"source": [
"ruby -e 'p ARGV' \"$uninitialized_string\"\n",
"ruby -e 'p ARGV' \"${uninitialized_string[@]}\"\n",
"\n",
"set -u\n",
"\n",
"ruby -e 'p ARGV' \"$uninitialized_string\"\n",
"ruby -e 'p ARGV' \"${uninitialized_string[@]}\"\n",
"\n",
"set +u"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "9dd45bde",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u0001\u001b[24m\u0002[\"\"]\n",
"\u0001\u001b[24m\u0002[]\n",
"\u0001\u001b[24m\u0002\n",
"\u0001\u001b[24m\u0002bash: uninitialized_array: unbound variable\n",
"\u0001\u001b[24m\u0002[]\n",
"\u0001\u001b[24m\u0002\n"
]
},
{
"ename": "",
"evalue": "1",
"output_type": "error",
"traceback": []
}
],
"source": [
"ruby -e 'p ARGV' \"$uninitialized_array\"\n",
"ruby -e 'p ARGV' \"${uninitialized_array[@]}\"\n",
"\n",
"set -u\n",
"\n",
"ruby -e 'p ARGV' \"$uninitialized_array\"\n",
"ruby -e 'p ARGV' \"${uninitialized_array[@]}\"\n",
"\n",
"set +u"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Bash",
"language": "bash",
"name": "bash"
},
"language_info": {
"codemirror_mode": "shell",
"file_extension": ".sh",
"mimetype": "text/x-sh",
"name": "bash"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment