On Processes and Threads

Submitted by PurpleAlien on Tue, 07/01/2014 - 02:05

Tags: C, Computer Architecture, Linux, Programming

Some time ago I followed an interesting discussion on a board where people were discussing multi-core software development. During the course of the discussion it became apparent that there is a lot of confusion and misconceptions about a 'process' and a 'thread' as they exist on e.g., a Linux system. Both are applicable to make use of multi-core systems, but they do so in different ways. Even though the exact distinction while compared to early definitions of the terms has perhaps become somewhat blurred, the two remain separate entities which can complement each other perfectly. In this post I'm going to try and illustrate the similarities and differences, and show you some real life scenarios of both. Keep in mind that we will be making some generalizations - and there are lots of examples where these generalizations do not directly apply, or where there are other possible implementations of the cited examples. Going into these would turn this blog entry into an entire book...

Multiprocess

Let's start with some definitions. When we are talking about processes and threads, we are talking about independent sequences of execution. In other words, both provide mechanisms to run code in parallel to each other. On multi-core systems, this code can run physically in parallel while on single core systems the scheduler makes it look like they do by switching back and forth between them: each sequence is allotted a time slice in which it runs. I will interject here immediately and point out that both processes and threads are operating system concepts, not hardware concepts. Hardware can however be used to improve performance of these, known under the name Simultaneous Multithreading (SMT). Have a look at Intel's hyper-threading for an example of this.

The main distinction between a process and a thread is that each process on a system runs in its own address space. That is, one process can not alter data present in another process directly. This is not the case with threads: multiple threads run within the same address space and can (and will) share variables and resources amongst each other. This means that a process can actually contain multiple threads. In the (good?) old days, a thread was called a lightweight process for this exact reason. Creating a process meant setting up an entire new address space, while a thread was much easier to create (just allocate some stack space). With the advent of copy-on-write among others, process creation has become a lot faster. However, the definition is still a good one to make the distinction since it alludes to a process being something 'bigger', which it usually is. Sometimes, a process is defined as a "program in execution". This program in execution is a reference to the executable that is being run, while at the same time this running executable can contain multiple threads. An example of this can be a program that contains a thread to update the GUI running independently from some other thread dealing with I/O - yet all contained within the same process.

Let's go a little deeper into these distinctions. When we say that each process has its own address space, we imply that in order to share data between them we need something special. We can not just share a variable between the two. Let's demonstrate that with some code which creates a new process, using the fork() system call. The full source code that compiles is attached to this post (fork_1.c).

  pid = fork();
  if (pid < 0)
    printf("Error on fork");
  else if (pid == 0)
    printf("Child process. \n");
  else
    printf("Parent process. \n");

It is important to note that as soon as we call fork() we don't have a single process anymore, we will have two. The original one is called the parent process, the newly created one the child process. In essence, fork() will create a copy of our original process. This is perhaps best illustrated with the pstree command on a Linux system. In Linux, once we get to the user side of things, different processes will get started by users executing programs and the system creating and destroying web server processes etc. The pstree command shows the hierarchy of all those processes on a running system. For example, below is part of the pstree output of a random server of mine:

init-+-acpid
|-atd
|-cron
|-dbus-daemon
|-dovecot-+-anvil
| |-2*[auth]
| |-config
| |-9*[imap]
| |-25*[imap-login]
| |-log
| |-16*[pop3-login]
| `-ssl-params
|-exim
|-6*[getty]
|-httpd---10*[httpd]
|-mysqld_safe---mysqld---29*[{mysqld}]

You can see the process hierarchy, starting with the root process on a typical system: init. All other processes can be traced back to it. They are all parent/child processes with 'init' being the grandparent to all of them. In essence, fork() is being used to create all these processes, and those processes in turn will use fork() to create even more processes. You can see this in action by starting a program in a terminal and see the changes using pstree. For example, let's start with a terminal, this will look something like this (on my laptop):

...
├─xfce4-terminal─┬─bash───pstree
│ ├─bash
...

Now, running e.g. 'vim' in one of the terminals:

...
├─xfce4-terminal─┬─bash───pstree
│ ├─bash───vim
...

You can also see the 'pstree' process in there which I ran in a different terminal. That one too came into being after a fork() call by the terminal (with some extra exec() stuff to replace the newly created process, but that's not for now). Also note that these processes can be run on any processor core, meaning that we can now make use of the multiple cores our system provides.

When we want to use the same principle in our code, we then have to distinguish between the two, since we now have two identical processes running while we might want to do different things within each of them. This is why we check the return value of fork(). If it is greater than 0, we're in the parent; if it is equal to 0, we're in the child process. Anything less than 0 is an error condition, i.e., something went wrong.

Now assume we have a couple of variables defined all the way at the beginning of our program and we're going to manipulate this variable in our child process (fork_2.c):

  /* Global variable to manipulate */
  int globalVariable = 2;
 
  int main(int argc, char *argv[]){
 
    /* One to see where we are, one to wait for the child in our parent */
    pid_t ws, pid;
    /* Needed for waitpid() */
    int  childExitStatus;
 
    /* We'll manipulate these too */
    int localVariable = 20;
    char identifier[20];
 
    pid = fork();
  
    /* Child process */
    if(pid == 0){
      strcpy(identifier,"Child Process: ");
      globalVariable++;
      localVariable++;
      sleep(5);
    }
 
    /* Error condition */
    else if(pid < 0){
      printf("Failed to fork \n");
      return -1;
    
    }
 
    /* Parent process, pid > 0 */
    else{
      strcpy(identifier,"Parent Process: ");
      ws = waitpid(pid,&childExitStatus,WUNTRACED);
      printf("Child process ended. pid was: %d\n",ws);
    }
 
    /* Code executed by both parent and child. */
    printf("%s \n",identifier);
    printf(" Global variable: %d \n",globalVariable);
    printf(" Local variable: %d \n",localVariable);
 
    return 0;
  }

The output of this program:

Child Process:
Global variable: 3
Local variable: 21
Child process ended. pid was: 1169
Parent Process:
Global variable: 2
Local variable: 20

As you can see, the child process changes the values of the local and global variable, but these changes are not reflected in the parent process once the child process is finished. Now, if you were to print the addresses of these variables in both child and parent processes, they would appear the same. The reason for this is that you are reading a virtual address, not the physical address. Every process has its own virtual address space, and the memory management unit maps this virtual address to a physical one. In this case, they will even point to the same physical address until until there is a copy-on-write.

One more thing about the program above: waitpid(). This system call is used to make the parent process wait for the child process at that point. It is good form to have a parent wait for a child process. What now if the child exits before the parent? Then we get so called zombie processes. These processes only have an entry in the operating system's process table left with the exit status of said process. The following program will show this (zombie.c):

  int main(int argc, char *argv[]) {
    pid_t pid;
  
    pid = fork();
  
    if (pid == 0) {
      printf("Hello from the child process!\n");
      printf("The child is exiting now.\n");
    } else if (pid > 0) {
      printf("Hello from the parent, pid %d.\n", getpid());
      printf("The parent has forked process %d.\n", pid);
      sleep(60);
      printf("The parent is exiting now.\n");
    } else {
      printf("There was an error with forking.\n");
    }
  return 0;
  }

When using a tool such as 'ps' to find these processes, you should see something like this:

$ ps aux | grep zombieprog
2669 0.0 0.0 4076 352 pts/0 S+ 02:49 0:00 ./zombieprog
2670 0.0 0.0 0 0 pts/0 Z+ 02:49 0:00 [zombieprog] <defunct>

The <defunct> part indicates this is a zombie child process (PID 2670) to its parent (PID 2669).

Because the very principle of seperate address spaces means that special constructs are necessary to share data between them, processes are excellent to perform multiple tasks in parallel where there is no need (or indeed, where it is imperative) that data from one process does ot leak into the other. The use of processes is thus very well suited to for example in web servers where each connection is handled by its own process (where a number of processes could be in existance before handling connections, not just fork() on a new connection). Should it be required to share data between processes, mechanisms such as shared memory and pipes can be used. If there is a lot of shared data, threads might be a better solution. However, it is important to note that using threads will mean that you will open yourself up to an entirely new set of possible bugs just because threads share each other's memory space. In addition, a new set of tools will be required to properly manage the different variables, bufferes etc. on which those threads operate.

Threads were called lightweight processes in the past because they were less intensive to set up. Threads did not need a complete copy of the parent process to a child process (before the advent of copy-on-write and others, this was the case). There have been several implementations of threads, but we will use POSIX threads, as provided by the pthread library (GCC: -lpthread), for the examples in this text.

Let's have a look at a simple program that uses threads (thread_1.c).

  /* Struct to hold data to be passed to a thread */
  typedef struct 
  {
      int thread_no;
      char message[100];
  }tdata;
 
  /* Print_thread is used for both threads */
  void print_thread(void *ptr)
  {
      tdata *data;            
      data = (tdata *)ptr;  /* Type cast to a pointer to tdata */
    
      /* Do the work */
      printf("Thread %d has the message %s \n", data->thread_no, data->message);
    
      pthread_exit(0); /* Exit */
  }
 
  int main(int argc, char *argv[])
  {
      pthread_t thread1, thread2;  /* Thread variables */
      tdata data1, data2;          /* Structs to be passed to threads */
    
      /* Initialize data to pass to thread 1 */
      data1.thread_no = 1;
      strcpy(data1.message, "Ping");
 
      /* Initialize data to pass to thread 2 */
      data2.thread_no = 2;
      strcpy(data2.message, "Pong");
    
      /* Create threads 1 and 2 */    
      pthread_create (&thread1, NULL, (void *) &print_thread, (void *)&data1);
      pthread_create (&thread2, NULL, (void *) &print_thread, (void *)&data2);
 
      /* Main blocks now and waits for both threads to finish before it exits */ 
      pthread_join(thread1, NULL);
      pthread_join(thread2, NULL);
              
 
  return 0;
  }

When you compile this program and run it several times, you will see that sometimes thread 1 finishes first, while on other occasions thread 2 will - independent of which one was created first in the code. This depends purely on the scheduling of the underlying operating system. This also means that we will have to be very careful when accessing data: we can not assume that one particular thread will always finish first. All threads created in the above program run in the same address space, which means that should we have a resource or e.g., a global variable in this program, both threads will be able to access the same variable and perform operations on it. This can lead to some hard to trace bugs; consider the following:

Let's assume we have two threads, T1 and T2. We have a global variable called 'counter' which is a variable that hods and event count as counted by T1 and T2. Say that T1 increments 'counter' every time a key is pressed. Let's also say that T2 increments 'counter' every time a bird flies by. What happens now if both events happen at the same time? Let's have a look at some pseudo assembly code to investigate.

    T1                       T2 
    move counter,r1          move counter,r2
    inc r1                   inc r2
    move r1, counter         move r2, counter

Imagine the scheduler interrupting thread T1 just after it finished incrementing register r1 to move to T2. T2 finishes completely finishes and then the scheduler moves back to T1 to finish. Let's assume 'counter' was at 5 at the beginning. What will it be at the end of T1 and T2? Right, 6 - not the correct 7. This is called a race condition, where the final result depends on the precise interleaving of both instruction streams. To prevent these kinds of errors, we have to use a mechanism such as a semaphore, which will lock access to the shared variable for as long as there are operations ongoing on this variable. This is called mutual exclusion. The program thread_2.c illustrates the use of a mutex in the pthread library to do this.

These aren't the only kinds of issues you run in to when dealing with threads. When you have multiple threads using the same resources, you could end up having one thread using the resource for too long, while the others can't get to it (this is called resource starvation) or where multiple threads each hold a single mutex while they need two to access a certain resource (this is called a deadlock). The following website illustrates this using the Dining Philosophers problem: http://www.doc.ic.ac.uk/~jnm/concurrency/classes/Diners/Diners.html

While debugging issues with threads isn't trivial, they are however very useful and pretty straight forward to solve those problems commonly known as embarrassingly parallel. One of those is raytracing whereby each pixel of the image can be generated independently from one another. This means we can split the target into several regions and use separate threads to compute each region. You can find a simple implementation of a threaded raytracer (the end result of a tutorial I have on this blog) on my Github page: https://github.com/PurpleAlien/Raytracer.

As always, there is a lot more to it, but this would become a very, very long post if we go into the details...