The PATH train between New York City and New Jersey.
The PATH train between NYC and New Jersey. Photo attribution here.

One of the most important environment variables that we use in shell scripting is the "PATH" variable. Whenever we type a command into the terminal, the shell will use the list of directories contained in $PATH to locate the file associated with that command. So it's worth spending some time understanding how $PATH works.

Background- env and $PATH

When we talked about shebangs in the walk-through of RBENV, we saw that one commonly-used shebang is #!/usr/bin/env bash. We learned that this means the shell runs the /usr/bin/env command, passing bash as an argument. If we look up the manual entry for the env command, we see the following:

ENV(1)                                                            General Commands Manual                                                           ENV(1)

  NAME
       env – set environment and execute command, or print environment
  
  SYNOPSIS
       env [-0iv] [-u name] [name=value ...]
       env [-iv] [-P altpath] [-S string] [-u name] [name=value ...] utility [argument ...]
  
  DESCRIPTION
       The env utility executes another utility after modifying the environment as specified on the command line.  Each name=value option specifies the
       setting of an environment variable, name, with a value of value.  All such environment variables are set before the utility is executed.
  
       The options are as follows:

...

So when we type /usr/bin/env bash, we're running a command which sets certain environment variables and then running the bash command. If I run env or /usr/bin/env in our terminal, I see the following output:

$ /usr/bin/env
TERM_PROGRAM=Apple_Terminal
SHELL=/bin/zsh
TERM=xterm-256color
TMPDIR=/var/folders/n9/35wcp_ps2l919c07czwh504c0000gn/T/
TERM_PROGRAM_VERSION=452
TERM_SESSION_ID=5E26DC6A-6460-4F51-9F67-BA46EFF35574
USER=richiethomas
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.mrN3AnwD1B/Listeners
PATH=/Users/richiethomas/.rbenv/bin:/Users/richiethomas/.rbenv/shims:/Users/richiethomas/.rbenv/bin:/usr/local/lib/ruby/gems/2.6.0/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/Library/Apple/usr/bin:/Library/TeX/texbin:/Users/richiethomas/Library/Python/3.7/bin/
__CFBundleIdentifier=com.apple.Terminal
PWD=/Users/richiethomas/Desktop/Workspace/impostorsguides.github.io
XPC_FLAGS=0x0
XPC_SERVICE_NAME=0
SHLVL=1
HOME=/Users/richiethomas
LOGNAME=richiethomas
OLDPWD=/Users/richiethomas/Desktop/Workspace/impostorsguides.github.io
HOMEBREW_PREFIX=/opt/homebrew
HOMEBREW_CELLAR=/opt/homebrew/Cellar
HOMEBREW_REPOSITORY=/opt/homebrew
MANPATH=/opt/homebrew/share/man::
INFOPATH=/opt/homebrew/share/info:
EDITOR=/usr/local/bin/code
NVM_DIR=/Users/richiethomas/.nvm
RBENV_SHELL=zsh
LANG=en_US.UTF-8
_=/usr/bin/env
$ 

One of the environment variables that my env command prints out is PATH, which contains a list of directories that UNIX will search through, when it looks for the command we include in our shebang (whether that command is bash, ruby, or something else).

Because UNIX will search $PATH in the order in which directories appear, $PATH also determines which version of a given executable takes precedence over others, if multiple versions are found.

Because the above PATH value is long and hard-to-parse, in this post I'm going to use a simpler, shorter version of the above:

/Users/richiethomas/.rbenv/shims
/usr/local/bin
/usr/bin
/bin

I split the above into one path per line for readability reasons, but note that $PATH will normally print out as a single, long string with directories separated by the : character. Later in this chapter, we will write a script to do this splitting for us.

If my shebang is #!/usr/bin/env ruby, and my PATH variable is the above value, then UNIX will check for the ruby command in those directories, in the same order listed above. For example, if /Users/richiethomas/.rbenv/shims contains no Ruby versions, /usr/local/bin contains Ruby version 2.7.5, and /usr/bin contains Ruby version 1.9.3, then a Ruby script which contains the shebang #!/usr/bin/env ruby will tell UNIX to run that script using Ruby version 2.7.5.

Let's do an experiment to verify this is true.

Experiment- setting our own $PATH

In my scratch/ directory, I create two sub-directories: foo/ and bar/. Next, I create a file in each directory. These two files print different strings, but they share the same name: baz.

The foo/baz file looks like this:

#!/usr/bin/env bash

echo 'Inside foo/baz'

And bar/baz looks like this:

#!/usr/bin/env bash

echo 'Inside bar/baz'

I run chmod +x on each file, to make sure they're executable.

When I run ls -Rla, I see the following:

$ ls -lRa                 
total 0
drwxr-xr-x  4 richiethomas  staff  128 Jan 30 10:14 .
drwxr-xr-x  6 richiethomas  staff  192 Jan 11 10:35 ..
drwxr-xr-x  3 richiethomas  staff   96 Jan 30 10:14 bar
drwxr-xr-x  3 richiethomas  staff   96 Jan 30 10:14 foo

./bar:
total 8
drwxr-xr-x  3 richiethomas  staff   96 Jan 30 10:14 .
drwxr-xr-x  4 richiethomas  staff  128 Jan 30 10:14 ..
-rwxr-xr-x  1 richiethomas  staff   38 Jan 30 10:14 baz

./foo:
total 8
drwxr-xr-x  3 richiethomas  staff   96 Jan 30 10:14 .
drwxr-xr-x  4 richiethomas  staff  128 Jan 30 10:14 ..
-rwxr-xr-x  1 richiethomas  staff   38 Jan 30 10:14 baz
$ 

The -R flag after ls just tells the shell to recursively list the contents of the current directory and any sub-directories.

Next, I update my PATH variable to the simplified version I mentioned above, with just 4 directories in it:

$ export PATH="/Users/richiethomas/.rbenv/shims:/usr/local/bin:/usr/bin:/bin"
$ echo $PATH
/Users/richiethomas/.rbenv/shims:/usr/local/bin:/usr/bin:/bin
$ 

Next, I update my $PATH variable a 2nd time, so that it begins with the bar/ and foo/ directories, for my current terminal tab only. In my case, I run:

$ echo $PATH
/Users/richiethomas/Desktop/Workspace/scratch/bar/:/Users/richiethomas/Desktop/Workspace/scratch/foo/:/Users/richiethomas/.rbenv/shims:/usr/local/bin:/usr/bin:/bin
$ 

Notice that the above string contains two absolute paths (one for /Users/richiethomas/Desktop/Workspace/scratch/bar/ and one for /Users/richiethomas/Desktop/Workspace/scratch/foo/), followed by the original value of $PATH. This means we're prepending $PATH with our two new absolute paths, with bar coming before foo.

By the way, when adding these paths to PATH, it's important to use the absolute path (i.e. /Users/(username)/...) rather than using the ~/ shorthand prefix. Otherwise, it won't work.

Lastly, I run the newly-created baz command in my current terminal:

$ baz
Inside bar/baz
$ 

We see "Inside bar", not "Inside foo". That's because, even though we have two different "baz" files, and they're each executable, the file inside the bar/ directory comes first in our $PATH environment variable.

$IFS and delimiters

As previously mentioned, the directories in PATH are concatenated together into a single string, with the : character used as a delimiter. This delimiter is also called an "internal field separator", and UNIX refers to it by the environment variable $IFS.

The link above contains an experiment, which I've modified slightly below so we can see an example of how $IFS is used.

Experiment- printing $PATH in a readable way, using $IFS

The $PATH variable can be pretty hard-to-read, with all those directories concatenated together. Let's write a script to make it more legible:

#!/usr/bin/env bash

string="$PATH"

for path in $string
do
  echo "$path"
done

When I chmod +x and then run the script, I see:

$ ./foo
/Users/richiethomas/Desktop/Workspace/scratch/bar/:/Users/richiethomas/Desktop/Workspace/scratch/foo/:/Users/richiethomas/.rbenv/shims:/usr/local/bin:/usr/bin:/bin
$ 

No improvement in readability so far.

Then, I add the following to the top of the script:

#!/usr/bin/env bash
#!/usr/bin/env bash

IFS=":"   # <= I added this

string="$PATH"

for path in $string
do
  echo "$path"
done

When I re-run the script, I get:

$ ./foo
/Users/richiethomas/Desktop/Workspace/scratch/bar/
/Users/richiethomas/Desktop/Workspace/scratch/foo/
/Users/richiethomas/.rbenv/shims
/usr/local/bin
/usr/bin
/bin
$ 

This experiment also shows that you can iterate over a string in the same way you can iterate over an array, so long as:

  • you include a delimiter in-between each part of the string that you want to treat as a discrete array item, and
  • you update $IFS to tell the shell what delimiter you used for this purpose.

Note that the default value for $IFS is a combination of 3 characters:

  • the "space" character
  • the "tab" character
  • the "newline" character

So a string like "foo bar baz" (with 3 words and two spaces) will be separated into 3 separate strings ("foo", "bar", and "baz") if you iterate over the string and print each item.

Photo Attribution

Title: PATH Train, New York

Description: The Port Authority Trans-Hudson Corporation (PATH) was established in 1962 as a subsidiary of The Port Authority of New York and New Jersey. The heavy rail rapid transit system serves as the primary transit link between Manhattan and neighboring New Jersey urban communities and suburban commuter railroads. PATH presently carries 244,000 passenger trips each weekday. This volume is expected to continue to increase with the anticipated growth in regional residential, commercial, and business development.

Author: P. L. Tandon

Source: Flickr

License: CC BY-NC-SA 2.0 DEED Attribution-NonCommercial-ShareAlike 2.0 Generic