Low-Level Process Hunting on macos

Parent/child relationships are one of the simplest and most effective ways to detect malicious activity at the host level. On Unix, multiple methods can be used to create a process, all of which result in a different behavior on the operating system. These days, a majority of host-based endpoint technologies provide ways to view process trees and write detections based on them. However, there is a fundamental understanding of process spawning that security analysts don’t take the time to learn. This is unfortunate because even after identifying a malicious process tree some data within it can be overlooked or misunderstood. The majority of programming languages that exist today have built-in functions for process creation, but high level languages end up wrapping a number of functionalities that are actually taking place at the low-level. Perhaps those that understand processes best are the dedicated coders still building code in C using the fork and exec system calls. With the new Endpoint Security Framework being all the hype, you’re likely to see a number of tools telling you something “forked” or “exec’ed.” So with that in mind, let’s get started by looking at some of the basics of macOS process creation techniques and how they affect threat hunting.

If you’re using a tool that allows you to collect processes as they’re created you’ve probably stumbled upon a process tree that looks like this…

mystery

The confusing part of the above process tree is that, logically, we expect the command “sh -c whoami” to create a child process of whoami. However, the whoami child ends up getting created as a sibling process. The goal of this post is to solve the mystery of this confusing behavior.

Forks

One of the most basic ways to create a process is by using the fork system call, but keep reading. From a threat hunting perspective it might not be doing exactly what you think it’s doing. Take a look at the man page (also known as using your man fork)

FORK(2)                     BSD System Calls Manual                    FORK(2)

NAME
     fork -- create a new process

SYNOPSIS
     #include <unistd.h>

     pid_t
     fork(void);

DESCRIPTION
     fork() causes creation of a new process.  The new process (child process) is an exact copy of the calling process (parent process) except for the following:

           o   The child process has a unique process ID.

           o   The child process has a different parent process ID (i.e., the process ID of the parent process).

           o   The child process has its own copy of the parent's descriptors.  These descriptors reference the same underlying objects, so that, for instance, file pointers in file objects are shared between the child and the parent, so that an lseek(2) on a descriptor in the child process can affect a subsequent read or write by the parent.  This descriptor
               copying is also used by the shell to establish standard input and output for newly created processes as well as to set up pipes.

           o   The child processes resource utilizations are set to 0; see setrlimit(2). 

The most important note being right here in the description

     “fork() causes creation of a new process.  The new process (child process) is an exact copy of the calling process (parent process)…”


Let’s quickly put together some code that runs the fork function so we can see what this looks like from a process tree perspective.

/*iFork.c*/
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
   int pid = fork();

   if (pid == 0) {
        printf("Hello from the child. PID -> %u : PPID -> %d\n", getpid(), getppid());
    } else {
       printf("Hello from the parent. PID -> %u\n", getpid());
       wait(NULL);
    }

    return 0;
}  

Next we compile the program and run it…

gcc iFork.c -o iFork 
>>> ./iFork
Hello from the parent. PID -> 50281
Hello from the child. PID -> 50282 : PPID -> 50281
 

Let’s break down what’s happening here. When our program starts, we call the fork() function. As you can see, we don’t specify any type of path for the binary in which we want to execute. This is because fork is not designed to execute a new binary. Instead, it is designed to create an exact copy of the process that is already running. Some of you are probably looking at the resulting output from our code and wondering how in the world we got two lines of output when the only two print functions are in opposing sections of the if/else statement. Remember, when we called fork we cloned this process, meaning it was run a total of two times: the time we executed it, and the time it forked and “re-executed itself” as a new process. 

So if this process is a clone, how does it know if it’s the child or the parent?  The fork function actually returns 0 if we are within the child instance. Finally, notice that we called the wait() function when inside of the parent process. This wait function will suspend the parent process until the child finishes executing. This allows us to ensure that we don’t kill the parent process while the child is still running. It’s worth noting that sometimes the child process may finish before the parent, and sometimes it may finish after.

On my system, the above code caused the following process tree to be created…

test_program

Right now you might be wondering what on Earth this has to do with threat hunting. We’ll get into some additional reasons why it’s important to understand fork() later, but for now when you encounter a program that executes itself you have a basic understanding of what is happening. It’s also important to note that programs sometimes fork inside of while loops. This leads to massive process trees that all appear as duplicate processes. For this reason a lot of endpoint security solutions don’t collect or display forks. In reality, when a program forks that doesn’t really tell us threat hunters anything. It’s almost as if a process opened a new thread (but it didn’t!). We’ll talk a bit more about forks later in this blog post because they are still important, but, for now, let’s take a look at another way to execute processes.

Execs

There are a number of functions within the exec family that allow you to create a new process image. However, if you read the documentation you’ll notice what’s happening here is actually much different than the fork function. What happens with exec is that the current process image is overwritten by a new one rather than a new pid being created. As you can imagine, this can get very confusing when analyzing a process tree because it means a process that used to exist has now been overwritten by whatever program the author decided to execute. 

Let’s take a look at how this works. All exec functions are pretty similar. We will demonstrate by using the execvp() function. We will use it to run the dash shell. I’ve chosen dash for multiple reasons. Mainly, because nothing else uses it, so it’s easy to spot in a process tree, and also because it’s a shell so it will remain open until we manually close it. This allows us all the time we want to dig into the resulting process tree. We will call this code dash_wrapper.c

#include <stdio.h>
#include <unistd.h>

int main( void ) {
    char *argv[] = { "dash", 0 };
    execvp(argv[0], argv);

} 

Next, we’ll compile this into a binary called “dash_wrapper”

gcc dash_wrapper.c -o dash_wrapper 

If we run our newly compiled executable called dash_wrapper and then take a look at the resulting process tree when it’s executed from the terminal we see…

exec

Here you can see that even though the program we executed was called “dash_wrapper,” we see no such program in the process tree. Since I’m inside the terminal, it seems logical that zsh would execute dash_wrapper and dash_wrapper would then go on to execute dash. Instead what happened is zsh executed dash_wrapper as pid 303, and then dash_wrapper used execvp() to run the dash shell, resulting in dash overtaking the pid 303 process image.

dash_wrapper_example

So why is this relevant? It’s relevant because if you’re using a tool that records processes in real time, you’re bound to eventually see two processes created around the same time that share the same process id and the same parent process. From a developer perspective, note that once you’ve called exec in this manner, you will not be able to return execution back to your original program. As soon as exec runs, the new program takes over and the old disappears. This gets us a step closer to understanding the “sh -c whoami” scenario described in the first section of this post.

Fork + Exec

So at this point we’ve covered two separate process creation functions: fork() which runs a program by cloning the currently running process and exec() which runs a program by overwriting the current process image. This brings us to the most common usage when it comes to process creation which is a combination of both fork and exec. As mentioned above, running exec will not allow a developer (or malware author) to return control of the original executed program. This is a problem for malware because it often uses built-in commands to collect recon data on the system. For example, If malware execs the “uname -a” command and then wants to parse the output to get the kernel version, it won’t be able to. After the malware execs“uname -a” the malware will be dead. So instead, what the malware author will do is first fork (duplicate) the malware process, and then exec the “uname -a” program within that fork so that the duplicated process is then overtaken by the uname command. After uname terminates, control is then returned to the process that called it (the malware) and the developer is able get the output. This is all functionality that we take for granted nowadays thanks to high level programming languages that do it all for us. C Code to perform the fork and exec of uname would look something like the following (thank you, stackoverflow):

/* getUname.c */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define die(e) do { fprintf(stderr, "%s\n", e); exit(EXIT_FAILURE); } while (0);

int main() {
    char output[4096];

    // Create a pipe so we can get the output of the uname command
    int link[2];
    if (pipe(link)==-1)
      die("Pipe failure");

    // Fork this process
    pid_t pid;
    if ((pid = fork()) == -1)
      die("Fork failed");

    if(pid == 0) {
      // If pid returns 0 we are now writing code for the child process
      // We will take this forked child process and exec uname with it
      dup2 (link[1], STDOUT_FILENO);
      close(link[0]);
      close(link[1]);
      char *argv[] = { "uname", "-a", 0 };
      execvp(argv[0], argv);
      die("Exec failed");

    } else {

      close(link[1]);
      int nbytes = read(link[0], output, sizeof(output));

      // Print the output of the uname command to the terminal
      printf("%.*s", nbytes, output);
      wait(NULL);

      /* Do whatever else you want to do with the uname output here */

    }
    return 0;
}
 

If you follow the comments you should get a good idea of what’s going on here. In the above code we first fork the current process which creates a clone of this process with a new pid. We then call exec while inside of that forked process which means the forked process will then be taken over by a new process image (uname in our case). We can of course compile this code to an executable called getUname like so…

gcc getUname.c -o getUname 

If you are using a tool to capture processes as they run, you should notice that the getUname command would create a tree that looks like so…

forkexec

Ah, at last. A process tree that simply makes sense. If only they could all work this way. This tells a clear story that the getUname executable was run and when it ran it executed the uname executable. We can make an easy assumption that the getUname executable requires the output of the uname command and that’s why it chose to run it.

Obviously, the code we compiled is not malicious. It’s just an example of how malware performing recon might look. In fact, malware often creates many different child processes in this same manner. Multiple executables that are already on the system are often executed to gather data about the system because malware authors don’t want to reinvent the wheel when writing code.

Solving the Mystery

So at last, this brings us back to the question I opened this blog post with. If you’re collecting process creation in real time, what in the world is up with a process tree that looks like this?

mystery

The short answer is this: sh -c whoami exec’s twice without forking and that’s why we get three different processes names running as pid 303. For those that want the long explanation, hold on to your butts…

For those who like to be hands on, I will first show that it’s very easy to reproduce this process tree with C code by using the system() function. The system() function is a quick and dirty way to run a program. It’s perfect for when we want to execute a command and don’t care about the output. It accomplishes this by running a new program using a “sh -c” call (as you can see above).

#include <stdlib.h>

int main( void ) {
	system("/usr/bin/whoami");
	return 0;
} 

System() uses fork and exec to create the “sh -c whoami” process which is why we see it as a child process to some_program, but why do we see bash and whoami as child processes to some_program instead of to the “sh -c whoami” command? To answer this question we must first take a peek at the sh man page (also known as shmaning).

SH(1)                     BSD General Commands Manual                    SH(1)

NAME
     sh -- POSIX-compliant command interpreter

SYNOPSIS
     sh [options]

DESCRIPTION
     sh is a POSIX-compliant command interpreter (shell).  It is implemented by re-execing as either bash(1), dash(1), or zsh(1) as determined by the symbolic link located at
     /private/var/select/sh.  If /private/var/select/sh does not exist or does not point to a valid shell, sh will use one of the supported shells.

... 

Apparently to be qualified as a POSIX-compliant command interpreter you only need the ability to pass arguments to another shell because by the looks of it that’s all that sh does! It does this by grabbing the shell at the symbolic link located at /private/var/select/sh. (Also, a quick interesting tidbit is that this symbolic link will by default point at bash even if you’ve set your default shell to something else – 10.15.5.) Anyway, we’ve now discovered the next piece of the puzzle. The sh shell is designed to take the arguments supplied to it and then turn around and pass those exact same arguments to the bash shell using exec().

If you scroll back up and take a quick look at the command line used for the “bash” process you’ll see that it is “sh -c whoami”. You might be wondering how it’s possible for bash’s first argument to be “sh.” That’s a great question. As a threat hunter it’s strange to see this, but in reality a program, especially a program written in C, does not necessarily need to provide the program name as the first argument. As it turns out, this technique is a special way to run the bash shell. If you look at the behemoth that is the bash man (Bash Man!. The lesser known Marvel superhero) page you’ll see what I’m talking about under the “invocation” section.

BASH(1)                                                                                                                                                                      BASH(1)

NAME
       bash - GNU Bourne-Again SHell

SYNOPSIS
       bash [options] [file]

COPYRIGHT
       Bash is Copyright (C) 1989-2005 by the Free Software Foundation, Inc.

DESCRIPTION
       Bash  is  an  sh-compatible command language interpreter that executes commands read from the standard input or from a file.  Bash also incorporates useful features from the
       Korn and C shells (ksh and csh).
...

INVOCATION
...
       If bash is invoked with the name sh, it tries to mimic the startup behavior of historical versions of sh as closely as possible, while conforming to the  POSIX  standard  as
       well.
... 

Did you catch that? “If bash is invoked with the name sh, it tries to mimic the startup behavior of historical versions of sh as closely as possible.” If you find that confusing, you’re not alone. It’s fairly vague. However, based on the fact that we see this bash instance created where the first arg is “sh,” I think we can assume that it means executing bash while using sh as the first argument, bash will behave a bit differently.

This finally leads to bash exec’ing the whoami command. Altogether, if we look at the actions that have occurred in a non-tree format, this is the order of events we see.

corrected_mystery

Notice here that pid 303 exec’ed twice without any forks. In other words, this pid has been associated with three different process images. Ah, the wonderful world of Unix.

If we take all of these events and arrange them in a process tree format we get…

mystery_solved

And there you have it. The extremely long winded answer to the question you were asking…or maybe you weren’t asking? Regardless, you can rest assured that your computer is behaving as expected when you see such events.

Conclusion

Some security solutions will try to display this data to you in a format that makes more sense to the standard user. As stated before, a lot of solutions already get rid of forks and just try to show you when different items exec. Not everybody cares about exactly what’s going on under the hood so long as threat analysts are provided a tool that’s useful (and we appreciate it). However, with the release of the Apple Endpoint Security Framework we’re bound to see some new tools that show us exactly what’s going on with processes on our systems. ProcessMonitor by Objective-See and Crescendo by FireEye are two great examples of this already. The future looks promising in terms of Mac security tools and you’ve probably picked up by now that I think understanding process creation on Mac is one of the most critical components for a Mac threat hunter to understand.