Jeff-Russ/Study: Multi-File Bash Scripting and Paths.md

## Study: Multi-File Bash Scripting and Paths.md

      
    Raw
  

              Study: Multi-File Bash Scripting and Paths.md
            
          
    Bash Scripting:

Managing Pathnames in Multi-file Bash Projects

Paths can get somewhat muddled up when you execute a Bash script with dependencies. Each script, when executed, is executed at a certain location as demonstrated by echoing pwd from the script. This is the directory assumed if ever you try to source (aka .) from that script.
This location is subject to change as it traces back to the ORIGINAL caller. By "ORIGINAL" I don't mean the location script that called it, or even the location of the script that called the script that called it, I mean the working directory of the first to initiate the chain of calls, which might be the user in the terminal or the location of the clicked executable.
Thing to consider:

The caller's location
the entry point: the location of the first script to be called
locations of all other script's called by that entry point

Getting the Caller's Absolute Location

Of course the user can get whatever directory they are in as an absolute path by typing pwd, but so can any script. Check it with this script:
# /Users/Jeff/dir/main.sh

echo "/Users/Jeff/dir/main.sh called pwd:"
echo `pwd`
touch ./file_at_script_working_dir
touch "$1"
It's helpful to have a global variable for the absolute path of the first caller, which will be the same for all involved scripts.
  CALLER_DIR="$(pwd)"
At your terminal:
$ pwd # shows: Users/Jeff
$ ./dir/main.sh ./file_at_caller_working_dir
# shows:
# /Users/Jeff/dir/main.sh called pwd:
# Users/Jeff
$ ls | grep dir 
# shows:  
# dir file_at_script_working_dir file_at_caller_working_dir
You can see the script's working directory is the same as the caller's. Therefore, the script file's location is not the same as it's working directory. It's working directory changes depending on the caller.
Now let's add a second script:
# /Users/Jeff/dir/lib/funcs.sh

echo "/Users/Jeff/dir/lib/funcs.sh called pwd:"
echo `pwd`
And modify the first to have source ./lib/funcs.sh. You will get an error that it doesn't exist because ./ is referring to the caller's working directory, not the called script's working directory. We could have the main script to know the absolute path of the script it source's but if User/dir/ gets moved we have to modify our code, even if the two files are in the same relative locations.
Getting a Script's Relative Location

${BASH_SOURCE[0]} gives us the path of the script it sits in, USUALLY relative to the original caller (more on "usually" later).  Put echo ${BASH_SOURCE[0]} in main.sh and call it:
$ pwd                # shows: /Users/Jeff
$ ./dir/main.sh      # shows: ./dir/main.sh
$ cd ../
$ pwd                # shows: /Users
$ ./Jeff/dir/main.sh # shows: ./Jeff/dir/main.sh
$0

By the way, there is also $0, which in the first-called script is the same as ${BASH_SOURCE[0]} but in each subsequently called script will return the same thing, so it's effectively a way for any script to find out what script was originally called. All will output in the same format.
Summary of Issue

Think back to this:

The caller's location
the entry point: the location of the first script to be called
locations of all other script's called by that entry point

There are two things that would make our including of dir/lib/funcs.sh in dir/main.sh always work with source ./lib/funcs.sh. The first would be to move there before calling it:
$ cd /User/Jeff/dir
$ ./main.sh
The problem this this is that it effectively changes your script's working directory too. Usually a script is used to manipulate something at it's caller's location, or relative to it. If the user calls the script with an argument and that argument is a path provided as a relative path, we can use it directly in the script. It's also just annoying to move there each time and if the script is in $PATH you would just call it directly making this a non-option.
The second would be to move the script away from the caller's current directory to it's own location but this really produces the same result.
# /Users/Jeff/dir/main.sh

# dirname takes off '/main.sh' from the end
# but it's still a relative (from caller) path
MAIN_SCRIPT_DIR=$(dirname "${BASH_SOURCE[0]}")

cd "$MAIN_SCRIPT_DIR"

source ./lib/funcs.sh
Either way you are making ./ mean the same thing in main script just so that source ./lib/funcs.sh always works. You could change back afterwards with cd - but things can get risky if you ever diverge from the plan i.e. the sourced script run cd somewhere else.
If we knew the absolute paths of everything involved we would have a lot of flexibility to move around freely and safely.
Testing Paths: The ${BASH_SOURCE[0]}

So far we haven't had our first script call the second one, so we still don't even know the details of the paths seen by the second script. Let's to some temporary things to test that. First let's just source the second file by it's full path:
 # /Users/Jeff/dir/main.sh
echo 
echo "inside dir/main.sh"
echo "pwd #->"`pwd`
echo '${BASH_SOURCE[0]} #->'"${BASH_SOURCE[0]}"

source /Users/Jeff/dir/lib/funcs.sh
And put something similar in the other:
 # /Users/Jeff/dir/lib/funcs.sh
echo 
echo "inside dir/lib/funcs.sh"
echo "pwd #->"`pwd`
echo '${BASH_SOURCE[0]} #->'"${BASH_SOURCE[0]}"
Now we can call main.sh from anywhere and it will be able to source the other. Try calling it from Users/Jeff and here is the output:
inside dir/main.sh
pwd #->/Users/Jeff
${BASH_SOURCE[0]} #->./dir/main.sh

inside dir/lib/funcs.sh
pwd #->/Users/Jeff
${BASH_SOURCE[0]} #->/Users/Jeff/dir/lib/funcs.sh
Well that's odd, isn't it? All three parties involved show the same working directory and ${BASH_SOURCE[0]} matches them in the main script but not the other. Try calling cd dir and calling main from there...
inside dir/main.sh
pwd #->/Users/Jeff/dir
${BASH_SOURCE[0]} #->./main.sh

inside dir/lib/funcs.sh
pwd #->/Users/Jeff/dir
${BASH_SOURCE[0]} #->/Users/Jeff/dir/lib/funcs.sh
This might have something to do with the way we included it with: source /Users/Jeff/dir/lib/funcs.sh so let's change it to: source ./lib/funcs.sh and call main from dir/ and we get:
inside dir/main.sh
pwd #->/Users/Jeff/dir
${BASH_SOURCE[0]} #->./main.sh

inside dir/lib/funcs.sh
pwd #->/Users/Jeff/dir
${BASH_SOURCE[0]} #->./lib/funcs.sh
So it IS due to the way it's sourceed, which might make you wonder if the same is true in the main file and they way we call it. You change back to source /Users/Jeff/dir/lib/funcs.sh and call main with /Users/Jeff/dir/main.sh you will see it's output as:
${BASH_SOURCE[0]} #->/Users/Jeff/dir/main.sh
This proves that ${BASH_SOURCE[0]} stores a paths according to the way the script was called or sourced! This still doesn't change the fact that everything reflects the same working directory, it just means that paths will be stored in ${BASH_SOURCE[0]} as absolute or relative depending on the way the script was called.
So what if the script called by main is in the user's $PATH environmental variable and called directly as funcs.sh which is not absolute OR relative??? We'll I'll save you the trouble of testing this...
Scripts in PATH, called DIRECTLY will still have the same working directory as the original caller but will store to ${BASH_SOURCE[0]} as if they were called with an ABSOLUTE PATH. In other words, ${BASH_SOURCE[0]} shows it's absolute path.
Scripts in PATH, called by their RELATIVE PATH will store to ${BASH_SOURCE[0]} their RELATIVE PATH.
Testing cd

We now know that all three parties: the caller, the first called script, and the script called by it, have the same working directory. But what happens if any script runs cd? Does cd in the main script effect the other? Does it effect the user's shell?. What about if the other script runs cd? Then there is the complication of the order of events.
Let's just start with one script and see what it does to the caller's shell. Just put cd /; pwd in a script and call it (not from /!). The script pwd as / then completes. You run pwd in your shell and you see nothing has changed. So that's that. On to two scripts.
inside main: cd /; pwd
/
now calling funcs

inside funcs: pwd;
/
back in main: pwd;
/
Let's throw in a function:
# main.sh
echo "inside main: calling funcs"
source /Users/Jeff/dir/lib/funcs.sh

echo "back in main: cd /; pwd"
cd /
pwd
echo "Still in main calling a function defined in funcs"
other_func

echo "Back in main: pwd;"
pwd
# funcs.sh
echo "inside funcs: nothing to do"

other_func () {
  echo "inside other_func() defined in funcs: pwd;"
  pwd
  cd /usr/local/bin
  echo "still in other_func(): ran cd /usr/local/bin; now: pwd"
  pwd
}
The output:
inside main: calling funcs

inside funcs: nothing to do
back in main: cd /; pwd
/
Still in main calling a function defined in funcs

inside other_func() defined in funcs: pwd;
/
still in other_func(): ran cd /usr/local/bin; now: pwd
/usr/local/bin
Back in main: pwd;
/usr/local/bin
So it does propagate back up, even from a function. The only way to prevent it is to start a subshell. The reason why any cd does not stick when all execution completes and the the user runs pwd again is that everything was in a subshell, which is a child-process of the user's session. If you wrap anything in back-ticks or parenthesis it will be executed in a subshell, even $().
Re-Summary of Issue

Everything has the same working directory and follows each other' changes.

The first problem is that the working directory is adaptive but a line of code doesn't change when called differently, and we have to type something next to so source that will always work. We would be able to use ${BASH_SOURCE[0] to dynamically modify out source statement but:
${BASH_SOURCE[0] might return a relative path and might return an absolute one, depending on how the script was called.

Getting an Absolute Path from a Relative Path

One trick get an absolute path from a relative one is to cd to a directory in a subshell, pwd and then leave the subshell. Here is a re-usable function that does that:
absPath () {
  # use double quotes for paths with spaces etc
  local ABS_DIR=`cd "$1"; pwd` 
  echo "$ABS_DIR"
}
# Usage: some_file=$(absPath "rel/path")
remember that ${BASH_SOURCE[0]} does not give us a directory, it gives us the path of the file and we can't cd to a file, only a directory. We can strip the filename off the end with dirname "${BASH_SOURCE[0]}". To assemble this all into a one line would be look like this:
ABSOLUTE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
You can make a re-usable function out of it like this:
absdirname () {
  echo "$(cd "$(dirname "$1")" && pwd)"
}
Then in our script we can do:
THIS_DIR=$(absdirname "${BASH_SOURCE[0]}")
We probably wont need the full path with the file name but here is how you can add it back:
ABSOLUTE_PATH="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/$(basename "${BASH_SOURCE[0]}")"
We can always get the file name separately and append it to the absolute directory if we need to. Getting the file name is quite easy:
THIS_FILENAME=$(basename "${BASH_SOURCE[0]}") 
The Plan

Basically what we want is two global constants we can use like this:
 # /User/dir/main.sh

CALLER_DIR="$(pwd)"
SRC_ROOT= #??? This will be derived from BASH_SOURCE

source $SRC_ROOT/helper.sh
source $SRC_ROOT/lib/funcs.sh
source $SRC_ROOT/lib/strings.sh
This only works if all of those other files are somewhere inside the same directory as main.sh, not if they are part of some other source tree. It would also be wise to never use main.sh in another script that has a different "source root". Your globals could get mixed up unless named differently and, since we are trying to come up with a naming conventions, that just might happen.
We will be able to move dir/ and everything in it and this will still work. We will be able to call main.sh from anywhere, in any way and it will still work. All inner scripts will be able to use $SRC_ROOT to include any other files anywhere in that directory, script will be able to cd and where without breaking that. All scripts will be able to cd back to the caller's location since it's stored in $CALLER_DIR. This can act as home-base and scripts should be expected to return to each other in this state unless some other location is the focal point of the script running.
To make a long story short, our way of converting a relative path to an absolute path won't mess anything up if we run it on a path that's already absolute. Testing the path first and only applying it conditionally better be a test that always works. We might as well just always run the conversion.
Deciding on Convention

The Main Entry Script
# main.sh

absdirname () { echo "$(cd "$(dirname "$1")" && pwd)"; }

# GLOBAL CONSTANTS:
CALLER_DIR="$(pwd)"
SRC_ROOT="$(absdirname "${BASH_SOURCE}")"

source "$SRC_ROOT/lib/funcs.sh"
source "$SRC_ROOT/lib/strings.sh"

main () {
  local var="some string${string_from_lib}"
  funcFromFuncs "$var"
  parseArgs "$@"
  source "$SRC_ROOT/helper.sh $3"
}
parseArgs () {
  cd "$1"
  funcFromFuncs "$2" "$var"
  cd "$CALLER_DIR"
}
#####################################################################
# execution starts here:
main "$@"

Sourcing from the Sourced
# helper.sh 
[ -z "$1" ] && source "$SRC_ROOT/other.sh $3"
Closing Thoughts

We could actually keep to relative paths for our source statements by just using:
SRC_ROOT="$(dirname "${BASH_SOURCE}")"
source "$SRC_ROOT/lib/funcs.sh"
source "$SRC_ROOT/lib/strings.sh"
But we if we ever cd we would break the now relative path $SRC_ROOT. So there would be no easy way to access things at the source root after that occurs. For that reason it's probably best to at least have absolute paths stored.