Before we start diving into the code for this file, let’s look at its Github history.

Github History

According to the PR which introduced this file (link above):

The most time spent in rbenv execution is resolving paths to their absolute representations without symlinks. Manually doing this in bash is slow.

By dynamically loading a compiled bash builtin, we can access the realpath POSIX C function which does exactly what we need and is fast.

If dynamic loading fails, rbenv will still continue working as before (will fall back to shell implementation).

And if we look at some of the commit names in this PR, we the following:

Speed up realpath() with dynamically loaded C extension

So the goal here is to replace the pre-installed realpath command on our machine with one which is potentially faster. If we recall, the job of the realpath command is to continuously follow any symlinks of a filename until we have the canonical filepath.

Let’s now move on to the code.

Code

We’ve seen the Bash shebang and the setting of “exit-on-first-error” mode before, so we won’t dwell on those two lines of code.

Storing the current directory

the first line of code is:

src_dir="${0%/*}"

Here we’re creating a variable named src_dir, and setting it equal to the directory containing the current configure file.

Detecting the user’s C compiler

Next block of code:

if [ -z "$CC" ]; then
  if type -p gcc >/dev/null; then
    CC=gcc
  else
    echo "warning: gcc not found; using CC=cc" >&2
    CC=cc
  fi
fi

If the value of the CC environment variable is empty, then we execute the code inside the block.

The first thing inside that block is another if check, this time to see whether type -p gcc succeeds (i.e. returns a 0 exit code). We recognize type -p as a way to check whether a path to the named program exists, which by implication tells us whether that program is installed on the machine. So we check whether gcc is installed on the machine. If it is, we do one thing. If it’s not, we do another.

FYI, when I run type -p gcc in my terminal, I see:

$ type -p gcc

gcc is /usr/bin/gcc

Your output may be different, depending on the machine you run this command on.

Since gcc is installed on my machine, we can expect the CC variable to equal the string “gcc” when I run the configure script.

What is GCC?

According to GNU’s homepage, GCC stands for “GNU Compiler Collection”:

GCC was originally written as the compiler for the GNU operating system.

In other words, it takes a program written in C (or C++, or Objective-C, etc.) and translates it into a binary file that your computer can execute.

Aborting if no C compiler is found

Next block of code:

if ! type -p "$CC" >/dev/null; then
  echo "aborted: compiler not found: $CC" >&2
  exit 1
fi

Here we check if our $CC variable corresponds to a program that’s installed on our machine. The value will either be gcc, cc, or whatever value the user passed into the configure script themselves.

If no compiler program is found, we print an error message to STDERR and exit.

Detecting the host operating system

Next block of code:

case "$(uname -s)" in
Darwin* )
  host_os="darwin$(uname -r)"
  ;;
FreeBSD* )
  host_os="freebsd$(uname -r)"
  ;;
OpenBSD* )
  host_os="openbsd$(uname -r)"
  ;;
* )
  host_os="linux-gnu"
esac

We perform a case statement based on the output of a command named uname. Here’s its man entry:

UNAME(1)                                                                     General Commands Manual                                                                    UNAME(1)

NAME
     uname – display information about the system

SYNOPSIS
     uname [-amnoprsv]

DESCRIPTION
     The uname command writes the name of the operating system implementation to standard output.  When options are specified, strings representing one or more system
     characteristics are written to standard output.

     The options are as follows:

     -a      Behave as though the options -m, -n, -r, -s, and -v were specified.

     -m      Write the type of the current hardware platform to standard output.  (make(1) uses it to set the MACHINE variable.)

     -n      Write the name of the system to standard output.

     -o      This is a synonym for the -s option, for compatibility with other systems.

     -p      Write the type of the machine processor architecture to standard output.  (make(1) uses it to set the MACHINE_ARCH variable.)

     -r      Write the current release level of the operating system to standard output.

     -s      Write the name of the operating system implementation to standard output.

     -v      Write the version level of this release of the operating system to standard output.

     If the -a flag is specified, or multiple flags are specified, all output is written on a single line, separated by spaces.

...

We see that uname prints some operating system info. I try uname -s in my terminal, and get the following:

$ uname -s

Darwin

OK cool, pretty simple. So our case statement switches depending on the name of the user’s OS.

If the os name starts with “Darwin” (as it does in my case), then we run the command host_os="darwin$(uname -r)". This creates a variable named host_os and sets it equal to darwin$(uname -r), which in my case resolves to darwin21.6.0. So just concatenating the OS name to its version number.

The other two non-default branches of the case statement look almost identical, except for the name of the OS that we prepend to the version number:

  • If the user’s OS is “FreeBSD”, we prepend “freebsd”.
  • If it’s “OpenBSD”, we prepend “openbsd”.

The default behavior in the catch-all case at the end is to set host_os to a hard-coded value, "linux-gnu".

Populating more variables

Next line of code:

eval "$("$src_dir"/shobj-conf -C "$CC" -o "$host_os")"

On my machine, this will translate to:

eval "$(./src/shobj-conf -C gcc -o darwin21.6.0)"

So we’re running the script ./src/shobj-conf via command substitution, passing it some flags (i.e. -C gcc and -o darwin21.6.0), and running eval on whatever comes back from running that script.

What is shobj-conf? If we open it up, at the top we see:

# shobj-conf -- output a series of variable assignments to be substituted
#		into a Makefile by configure which specify system-dependent
#		information for creating shared objects that may be loaded
#		into bash with `enable -f'

So it prints out a bunch of variable assignments, which are then plugged into our Makefile.in file to produce a file named Makefile. Then later (specifically here), we’ll use the enable -f command to load the result of the Makefile (i.e. a file named libexec/rbenv-realpath.dylib) into our code.

Reading the above description at the top of shobj-conf is enough for now. We see just below the above description that the author of this file is Chet Ramey, so I’m reasonably sure that RBENV’s copy of this file was borrowed wholesale from another source. Reading it line-by-line would require a lot of time and effort, so let’s skip that script for now.

But what does that script actually print out when we run it? What is it that we’re evaling inside the parentheses? To find out, let’s actually run ./src/shobj-conf -C gcc -o darwin21.6.0:

$ ./src/shobj-conf -C gcc -o darwin21.6.0
SHOBJ_CC='gcc'
SHOBJ_CFLAGS='-fno-common'
SHOBJ_LD='${CC}'
SHOBJ_LDFLAGS='-dynamiclib -dynamic -undefined dynamic_lookup '
SHOBJ_XLDFLAGS=''
SHOBJ_LIBS=''
SHLIB_XLDFLAGS='-dynamiclib  -install_name $(libdir)/`echo $@ | sed "s:\..*::"`.$(SHLIB_MAJOR).$(SHLIB_LIBSUFF) -current_version $(SHLIB_MAJOR)$(SHLIB_MINOR) -compatibility_version $(SHLIB_MAJOR) -v'
SHLIB_LIBS='-lncurses'
SHLIB_DOT='.'
SHLIB_LIBPREF='lib'
SHLIB_LIBSUFF='dylib'
SHLIB_LIBVERSION='$(SHLIB_MAJOR)$(SHLIB_MINOR).$(SHLIB_LIBSUFF)'
SHLIB_DLLVERSION='$(SHLIB_MAJOR)'
SHOBJ_STATUS='supported'
SHLIB_STATUS='supported'

So the above output is what we pass to eval. It sets a bunch of variables, which (if we skip ahead a bit) we see are referenced in the subsequent sed command (see below), as well as in Makefile.in. We’ll find out later that the purpose of the sed command is to insert the above variable values into Makefile.in, and then use that updated Makefile.in to generate a file named Makefile. From there, we can run the make command and build our faster realpath function.

We’re also going to skip the discussion of what each variable is responsible for in the Makefile. If you’re curious about that, check out this link, which defines what most of the variables do.

Generating the Makefile

The last block of code in configure is that sed command we mentioned earlier:

sed "
  s#@CC@#${CC}#
  s#@CFLAGS@#${CFLAGS}#
  s#@LOCAL_CFLAGS@#${LOCAL_CFLAGS}#
  s#@DEFS@#${DEFS}#
  s#@LOCAL_DEFS@#${LOCAL_DEFS}#
  s#@SHOBJ_CC@#${SHOBJ_CC}#
  s#@SHOBJ_CFLAGS@#${SHOBJ_CFLAGS}#
  s#@SHOBJ_LD@#${SHOBJ_LD}#
  s#@SHOBJ_LDFLAGS@#${SHOBJ_LDFLAGS}#
  s#@SHOBJ_XLDFLAGS@#${SHOBJ_XLDFLAGS}#
  s#@SHOBJ_LIBS@#${SHOBJ_LIBS}#
  s#@SHOBJ_STATUS@#${SHOBJ_STATUS}#
" "$src_dir"/Makefile.in > "$src_dir"/Makefile

It looks intimidating, but it’s actually pretty straightforward:

  • The sed command, which we talked about before. It’s a command to read in a certain file, perform actions on each line of that file, and output the results to a new file.
  • A single (although rather long) string with a bunch of nearly-identical commands in it. These commands are called “scripts”.
  • A filename representing the input to run these commands against. In this case, the filename is Makefile.in.
  • The > symbol to redirect the output from STDOUT to another destination
  • A 2nd filename to act as the destination which the output gets redirected to. In this case, that’s a file named Makefile.

What does each of those sed scripts do? One clue is that they all share nearly the exact same format:

  • an s# at the beginning.
  • a reference to a variable, surrounded by ampersands on each side.
  • another #.
  • a parameter expansion operation (ex.- ${CC} or ${CFLAGS})
  • a final #.

Each command is a search-and-replace operation, except instead of using / syntax (ex.- s/@CC@/${CC}/), we use # as a delimiter (ex.- s#@CC@#${CC}#). Using a non-slash character as a delimiter is permitted, according to the docs for sed:

\%regexp%

(The % may be replaced by any other single character.)

This also matches the regular expression regexp, but allows one to use a different delimiter than /. This is particularly useful if the regexp itself contains a lot of slashes, since it avoids the tedious escaping of every /.

Additionally, the man page for sed adds the following examples:

EXAMPLES
     Replace 'bar' with 'baz' when piped from another command:

           echo "An alternate word, like bar, is sometimes used in examples." | sed 's/bar/baz/'

     Using backlashes can sometimes be hard to read and follow:

           echo "/home/example" | sed  's/\/home\/example/\/usr\/local\/example/'

     Using a different separator can be handy when working with paths:

           echo "/home/example" | sed 's#/home/example#/usr/local/example#'

     Replace all occurances of 'foo' with 'bar' in the file test.txt, without creating a backup of the file:

           sed -i '' -e 's/foo/bar/g' test.txt

The above implies that we expect that our variables may contain the / character, and are therefore using # as a delimiter to avoid any conflicts.

Let’s quickly test out with an experiment.

Experiment- non-traditional delimiters in sed

I have a string:

Hello/world

I want to replace it with:

Hola/mundo

In my terminal, I try the following:

bash-3.2$ echo "Hello/world" | sed "s/Hello/world/Hola/mundo/"

sed: 1: "s/Hello/world/Hola/mundo/": bad flag in substitute command: 'H'

Here, sed can’t tell the difference between the use of / as a delimiter, vs. the use of the literal / character in my string.

If I tell sed to use # as a delimiter instead, I get the following:

bash-3.2$ echo "Hello/world" | sed "s#Hello/world#Hola/mundo#"

Hola/mundo

This time, it works correctly.

Will this replace all instances of Hello/world, or just the first one it finds? I run the following to find out:

bash-3.2$ echo "Hello/world Hello/world" | sed "s#Hello/world#Hola/mundo#"

Hola/mundo Hello/world

Looks like it just replaces the first example it finds.

Let’s take the first of the sed scripts as an example:

s#@CC@#${CC}#

Here we’re replacing the first instance of @CC@ that we find with the value that $CC resolves to. If we look at Makefile.in, the first line we see is:

CC = @CC@

Given we already know that $CC variable evaluates to "gcc" on my machine, we can expect the first line of our real Makefile to be:

CC = gcc

And in fact, that’s what we see when we run the ./src/configure script.

The $CC variable was instantiated by us, at the beginning of the configure script. But many of the variables in the sed search-and-replace scripts will be instantiated by the eval line we ran just prior to sed:

eval "$("$src_dir"/shobj-conf -C "$CC" -o "$host_os")"

Any variable which is not populated by either us or by shobj-conf will be blank.

If we look at the first 16 lines of Makefile.in, it looks like this:

CC = @CC@

CFLAGS = @CFLAGS@
LOCAL_CFLAGS = @LOCAL_CFLAGS@
DEFS = @DEFS@
LOCAL_DEFS = @LOCAL_DEFS@

CCFLAGS = $(DEFS) $(LOCAL_DEFS) $(LOCAL_CFLAGS) $(CFLAGS)

SHOBJ_CC = @SHOBJ_CC@
SHOBJ_CFLAGS = @SHOBJ_CFLAGS@
SHOBJ_LD = @SHOBJ_LD@
SHOBJ_LDFLAGS = @SHOBJ_LDFLAGS@
SHOBJ_XLDFLAGS = @SHOBJ_XLDFLAGS@
SHOBJ_LIBS = @SHOBJ_LIBS@
SHOBJ_STATUS = @SHOBJ_STATUS@

And when we run configure and inspect the resulting Makefile, we see we’ve transformed these lines into this:

CC = gcc

CFLAGS =
LOCAL_CFLAGS =
DEFS =
LOCAL_DEFS =

CCFLAGS = $(DEFS) $(LOCAL_DEFS) $(LOCAL_CFLAGS) $(CFLAGS)

SHOBJ_CC = gcc
SHOBJ_CFLAGS = -fno-common
SHOBJ_LD = ${CC}
SHOBJ_LDFLAGS = -dynamiclib -dynamic -undefined dynamic_lookup
SHOBJ_XLDFLAGS =
SHOBJ_LIBS =
SHOBJ_STATUS = supported

Again, your results may look different if you run the script on a machine with different architecture from mine.

Let’s move on to the next file.