What Are Processes?

Picture of a food processor making pesto, because it has the word 'process' in it and because I'm hungry right now. — Picture of a food processor making pesto, because it has the word 'process' in it, and because I'm hungry right now. Photo attribution here.

In my post about RBENV's use of the exec command, I briefly touched on the concept of processes. I didn't go deep into what those are, because there's a lot to talk about on that topic and I wanted to keep the focus on RBENV and its shim file.

In this blog post, I circle back and address the questions that I still had after writing the earlier post. These include questions like:

What is a process, in plain English?
What's the difference between a process and a program?
How are processes created and destroyed?
When should we use exec vs. forking?

There's a lot to learn about processes, much more than we can or should cover here. This is meant to be a begginner's primer on processes, nothing more.

In answering question #2 above, we'll implicitly answer question #1 as well, so let's start with that.

What's the difference between a process and a program?

In the last post of my walk-through of the shim file, I linked to this post from an associate professor at the University of Cincinatti. Let's walk through it in detail together.

The post starts off with:

A process can be simply defined as an instance of a running program. It should be understood that a program is part of the file system that resides on a non-volatile media (such as disk), and a process is an entity that is being executed (with at least some portion, i.e. segment/page) in RAM.

So a program is (for example) the /Users/richiethomas/.rbenv/libexec/rbenv file on my machine, containing the code which gets executed when I run the rbenv command. And a process is, in this example, the specific instance of RBENV that I'm running.

To see this in action, let's do an experiment.

Experiment: Processes vs. Programs

In my scratch/ directory, I make a file named foo, containing the following code:

#!/usr/bin/env ruby

while true
  puts 'Hi'
  sleep 20
end

It's a Ruby script which continuously prints the string 'Hi' every 20 seconds.

I then open up two new terminal tabs, and run ths script in each tab. After awhile, I see the following in each tab:

~/Desktop/Workspace/scratch ()  $ ./foo
Hi
Hi
Hi
Hi
...

Then I open up a 3rd tab, and run ps aux | grep. There I see the following:

$ ps -afx | grep foo
UID   PID  PPID   C  STIME   TTY           TIME   CMD
501 29994 88303   0  1:59PM  ttys004    0:00.05   ruby ./foo
501 30036 29735   0  1:59PM  ttys005    0:00.04   ruby ./foo
501 30052 29800   0  2:00PM  ttys010    0:00.00   grep foo
$

OK I cheated a little: I actually don't see the first row above (i.e. UID PID PPID C STIME TTY TIME CMD). I added this so we could more clearly read the output. If I had left off the | grep foo code, we would have seen this as our first row, but we would have also seen every single process running on my machine, which would have been difficult to parse.

We see the first two processes listed are running the command ruby ./foo. These are the two processes that we just kicked off. We also see the process corresponding to our running of the grep command, which we can ignore.

So the "program" in this case is the file /Users/richiethomas/Desktop/Workspace/scratch/foo, and the "processes" in this case are the two running instances of that file, which we see above.

Properties of a Process

Each line of the above output to ps -afx has:

a UID (The unique id of the user running the process, i.e. us)
a PID (The unique id of the process)
a PPID (The unique id of the process's parent, i.e. the process which kicked off this process)
a C (The amount of CPU that this process is using)
a STIME (The start time of the process)
a TTY (The terminal from which the process or command is executed)
A TIME (CPU time, including both user and system time)
A CMD (The command which is executed)

Many of these columns are described by the Cincinatti post above:

(E)ach process will have a unique numeric identifier associated with it, referred to as its process identification number, or PID

All processes (except the very first one) have a parent which created them. Similarly, when a process is created, it is created as a child process, with the process responsible for its creation being its parent...

(P)rocesses have ownership attributes associated with them, both from the user level and from the group level

Parent and child processes

The article mentions the following relationship between parent and child processes:

All processes (except the very first one) have a parent which created them. Similarly, when a process is created, it is created as a child process, with the process responsible for its creation being its parent. When a process creates a child process, it is said to have spawned the child. Every process on a Unix system must have a parent (again, except the very first one), since "orphaned" processes are not (normally) allowed. Also, all processes on a Unix system can be linked to the one initial process. As you will see, processes have a similar hierarchical structure to that of the file system.

Let's see this parent-child relationship in action.

Experiment- creating and destroying child processes

I write the following script in a scratch directory:

#!/usr/bin/env ruby

def say_hello
  while true
    puts 'Hi'
    sleep 20
  end
end

5.times do
  Process.fork { say_hello }
end

Process.wait

This is similar code to the earlier example, but wrapped inside a method called say_hello(). In addition to printing 'Hi' inside a while loop, I'm also calling this method inside an invocation of Process.fork.

I do this 5 times, in order to get 5 different forked processes. Since I call Process.wait at the end, our parent process will love on, instead of exiting immediately (creating 5 orphan / zombie processes as a result). This will allow us to not only compare the PPIDs of this child processes with the PID of the parent process, but it'll make cleaning up our experiment easier.

When I run the above, I see:

$ ./foo   
Hi
Hi
Hi
Hi
Hi

The terminal in which I run ./foo hangs, while it waits for me to either kill the parent process or for the parent process to end on its own. Because of the while loop, the latter will never happen, so we'll need to do that ourselves. Before we do that, however, let's again call ps like so:

$ ps -afx | grep foo 
UID   PID  PPID   C  STIME   TTY           TIME   CMD
501 31015 79086   0  2:32PM ttys000    0:00.05    ruby ./foo
501 31030 31015   0  2:32PM ttys000    0:00.00    ruby ./foo
501 31031 31015   0  2:32PM ttys000    0:00.00    ruby ./foo
501 31032 31015   0  2:32PM ttys000    0:00.00    ruby ./foo
501 31033 31015   0  2:32PM ttys000    0:00.00    ruby ./foo
501 31034 31015   0  2:32PM ttys000    0:00.00    ruby ./foo
501 31093 21483   0  2:34PM ttys003    0:00.00    grep foo
$

Here we see the PID of the first process is 31015, and this matches the PPID of the 5 processes beneath that.

Lastly, let's kill the parent process, which will cause the child processes to be killed as well. I navigate to the terminal tab running the parent process, and I hit Ctrl+C:

^C./foo:6:in `sleep': Interrupt
	from ./foo:6:in 'say_hello'
	from ./foo:11:in 'block (2 levels) in main'
	from ./foo:11:in 'fork'
	from ./foo:11:in 'block in main'
	from ./foo:10:in 'times'
	from ./foo:10:in 'main'
./foo:6:in 'sleep': Interrupt
	from ./foo:6:in 'say_hello'
	from ./foo:11:in 'block (2 levels) in main'
	from ./foo:11:in 'fork'
	from ./foo:11:in 'block in main'
	from ./foo:10:in 'times'
	from ./foo:10:in 'main'
./foo:6:in 'sleep': Interrupt
	from ./foo:6:in 'say_hello'
	from ./foo:11:in 'block (2 levels) in main'
	from ./foo:11:in 'fork'
	from ./foo:11:in 'block in main'
	from ./foo:10:in 'times'
	from ./foo:10:in 'main'
./foo:14:in 'wait': Interrupt
	from ./foo:14:in 'main'

$

Now, if we re-run our ps command, we see:

$ ps -afx | grep foo
501 31295 21483   0  2:38PM ttys003    0:00.00 grep foo
$

Now we only see our grep command, which we know we can ignore. This is why we included Process.wait at the end of our file. The printing of 'Hi' would still have happened if we had left this out, but then these child processes would have had no parent. We'd then have to manually kill each process by using the kill command. Let's try this below.

The "kill" command

As mentioned above, hitting "Ctrl-C" in our keyboard is not the only way to kill a process. If we have a terminal which is frozen or something, and we know the PID of the process which it's running, we can use the "kill" command to terminate that process. Let's remove the call to Process.wait from our script, and see what happens. I comment it out:

#!/usr/bin/env ruby

  def say_hello
    while true
      puts 'Hi'
      sleep 20
    end
  end
  
  5.times do
    Process.fork { say_hello }
  end
  
  # Process.wait

When I re-run this script, I see:

$ ./foo
  Hi
  Hi
  Hi
  Hi
  Hi
$

My prompt is now available again, because that parent process has finished running. However, if I wait awhile, I start to see 'Hi' still being printed in this terminal:

$ ./foo
Hi
Hi
Hi
Hi
Hi
$ Hi
Hi
Hi
Hi
Hi
Hi
Hi
Hi
Hi
Hi
...

And if I re-run my ps command, I see:

$ ps -afx | grep foo
UID   PID  PPID   C  STIME   TTY           TIME   CMD
501 31649     1   0  2:49PM ttys000    0:00.00    ruby ./foo
501 31650     1   0  2:49PM ttys000    0:00.00    ruby ./foo
501 31651     1   0  2:49PM ttys000    0:00.00    ruby ./foo
501 31652     1   0  2:49PM ttys000    0:00.00    ruby ./foo
501 31653     1   0  2:49PM ttys000    0:00.00    ruby ./foo
501 31669 21483   0  2:50PM ttys003    0:00.00    grep foo
$

Now, instead of seeing a PID which is close in value to the children IDs, we just see 1 as the PID. That indicates these child processes have been assigned to the original process on this machine, since their parent is no longer around.

To end these processes, we'll need to use the kill command, since there's no parent process to use Ctrl+C on:

$ kill -9 31649 31650 31651 31652 31653
$ ps -afx | grep foo                   
UID   PID  PPID   C  STIME   TTY           TIME   CMD
501 31716 21483   0  2:52PM ttys003    0:00.00    grep foo
$

It's considered good hygiene to make sure your child processes always have a parent process, otherwise you may end up with zombie processes taking up valuable CPU resources on your machine.

When to use `exec` vs. forking

To answer our last question, let's tweak our script so that it calls Process.exec instead of Process.fork. According to the docs for Process.exec, we need to pass it a string representing a terminal command, like so:

#!/usr/bin/env ruby

5.times do
  Process.exec('echo "Hi"')
end

Process.wait

Here I'm simply using a Bash command (echo) instead of our previous Ruby command (puts), because the docs told us to pass a shell command. Since we're no longer calling the say_hello() method, I removed it from our foo program.

When I run this, I see:

$ ./foo
Hi
$

We only see one instance of "Hi", despite our call to Process.exec being inside a call to 5.times. Why is this?

It's because only the first call to exec actually gets executed. It replaces the original parent process which called 5.times, and after it's finished doing its job (i.e. printing "Hi"), it terminates itself. So we don't perform any of the subsequent iterations of times, only the first one.

This implies that, if we still have more work left to do after our first iteration is finished, exec is not the right tool for the job. Instead, we should use fork.

Again- if we still have more work left to do after our call to exec is complete, we shouldn't be exec'ing at all. We should be forking.

Wrapping Up

On the other hand, if we don't have any more work left to do, then exec is a perfectly acceptable tool for the job. It's for this reason that RBENV's shim ends with a call to exec- it's the last line in the script. There's nothing left to do afterward.

Photo Attribution

Title of Photo: Kale Pesto

Description: "I tried to use the food processor to chop the kale but it didn't do it evenly so I ended up with big leaves mixed in with small pieces. I can do more with my knife more quickly. But the little leaves made a lovely kale pesto with garlic and parmesan cheese drizzled with olive oil. The whole mixture went onto roasted garlic bread slices and topped with olives. Into the convection oven to get toasty and warm."

Author: Karen and Bob Richardson

Source: Flickr

License: CC BY-NC-SA 2.0 DEED Attribution-NonCommercial-ShareAlike 2.0 Generic