Sandboxing: a dig into building your security pit

Introduction

Sandboxes are a good idea. Whether it's improving kids’ immune systems, or isolating your apps from the rest of the system, sandboxes just make sense. Despite their obvious benefits, they are still relatively uncommon. We think this is because they are still relatively obscure for most developers and hope this post will fix that.

Sandboxes? What’s that?

Software sandboxes isolate a process from the rest of the system, constraining the process’ access to the parts of the system that it needs and denying access to everything else. A simple example of this would be opening a PDF in (a modern version of) Adobe Reader. Since Adobe Reader now makes use of a sandbox, the document is opened in a process running in its own constrained world so that it is isolated from the rest of the system. This limits the harm that a malicious document can cause and is one of the reasons why malicious PDFs have dropped from being the number-1 attack vector seen in the wild as more and more users updated to sandbox-enabled versions of Adobe-Reader.

It's worth noting that sandboxes aren't magic, they simply limit the tools available to an attacker and limit an exploit’s immediate blast-radius. Bugs in the sandboxing process can still yield full access to key parts of the system rendering the sandbox almost useless.

Sandboxes in Canary

Long time readers will know that Canary is our well-loved honeypot solution. (If you are interested in breach detection that’s quick to deploy and works, check us out at https://canary.tools/)


A Canary is a high quality, mixed interaction honeypot. It’s a small device that you plug into your network which is then able to imitate a large range of machines (a printer/ your CEO's laptop/ a file server, etc). Once configured it will run zero or more services such as SSH, Telnet, a database or Windows File Sharing. When people interact with these fake hosts and fake services, you get an alert (and a high quality signal that you should cancel your weekend plans).

Almost all of our services are implemented in a memory safe language, but in the event that customers want a Windows File Share, we rely on the venerable Samba project (before settling on Samba, we examined other SMB possibilities, like the excellent impacket library, but Samba won since our Canaries (and their file shares) can be enrolled into Active Directory too). Since Samba is running as its own service and we don't have complete control over its internal workings, it becomes a prime candidate for sandboxing: we wanted to be able to restrict it's access to the rest of the system in case it is ever compromised.

Sandboxing 101

As a very brief introduction to sandboxing we'll explain some key parts of what Linux has to offer (a quick Google search will yield far more comprehensive articles, but one interesting resource, although not Linux focused, is this video about Microsoft Sandbox Mitigations).

Linux offers several ways to limit processes which we took into consideration when deciding on a solution that would suit us. When implementing a sandbox solution you would chose a combination of these depending on your environment and what makes sense.


Control groups

Control groups (cgroups) look at limiting and controlling access and usage of resources such as CPU, memory, disk, network, etc.


Chroot

This involves changing the apparent root directory on a file-system that the process can see. It ensures that the process does not have access to the whole file system, but only parts that it should be able to see. Chroot was one of the first attempts at sandboxes in the Unix world, but it was quickly determined that it wasn’t enough to constrain attackers.


Seccomp

Standing for "secure computing mode", this lets you limit the syscalls that a process can make. Limiting syscalls means that a process will only be able to perform system operations that you expect to be able to perform so if an attacker compromises your application, they won't be able to run wild.


Capabilities

These are the set of privileged operations that can be performed on the Linux system. Some capabilities include setuid, chroot and chown. For a full list you can take a look at the source here. However, they’re also not a panacea and spender has shown (frequently) how Capabilities can be leveraged into full Capabilities.


Namespaces

Without namespaces, any processes would be able to see all processes' system resource information. Namespaces virtualise resources like hostnames, user IDs or network resources so that a process cannot see information from other processes.

Adding sandboxing to your application in the past meant using some of these primitives natively (which probably seemed hairy for most developers). Fortunately, these days, there are a number of projects that wrap them up in easy-to-use packages.



Choosing our solution

We needed to find a solution that would work well for us now, but would also allow us to easily expand once the need arises without requiring a rebuild from the ground up.

The solution we wanted would need to at least address Seccomp filtering and a form of chroot/pivot_root. Filtering syscalls is easy to control if you can get the full profile of a service and once filtered you can sleep a little safer knowing the service can't perform syscalls that it shouldn't. Limiting the view of the filesystem was another easy choice for us. Samba only needs access to specific directories and files, and lots of those files can also be set to read-only.

We evaluated a number of options, and decided that the final solution should:

  • Isolate the process (Samba)
  • Retain the real hostname
  • Still be able to interact with a non-isolated process
Another process had to be able to intercept Samba network traffic which meant we couldn’t put it in a network namespace without bringing that extra process in.

This ruled out something like Docker, as although it provided an out-of-the-box high level of isolation (which is perfect for many situations), we would have had to turn off a lot of the features to get our app to play nicely.

Systemd and nsroot (which looks abandoned) both focused more on specific isolation techniques (seccomp filtering for Systemd and namespace isolation for nsroot) but weren’t sufficient for our use case.

We then looked at NsJail and Firejail (Google vs Mozilla, although that played no part in our decision). Both were fairly similar and provided us with flexibility in terms of what we could limit, putting them a cut above the rest.

In the end, we decided on NsJail, but since they were so similar, we could have easily gone the other way, i.e. YMMV


NsJail
NsJail, as simply stated in its overview, "is a process isolation tool for Linux" developed by the team at Google (though it's not officially recognised as a Google product). It provides isolation for namespaces, file-system constraints, resource limits, seccomp filters, cloned/isolated ethernet interfaces and control groups.

Furthermore, it uses kafel (another non-official Google product) which allows you to define syscall filtering policies in a config file, making it easy to manage/maintain/reuse/expand your configuration.

A simple example of using NsJail to isolate processes would be:

./nsjail -Mo --chroot /var/safe_directory --user 99999 --group 99999 -- /bin/sh -i
Here we are telling NsJail to:
-Mo:               launch a single process using clone/execve
 
--chroot:          set /var/safe_directory as the new root directory for the process

--user/--group:    set the uid and gid to 99999 inside the jail

-- /bin/sh -i:     our sandboxed process (in this case, launch an interactive shell)
We are setting our chroot to /var/safe_directory. It is a valid chroot that we have created beforehand. You can instead use  --chroot / for your testing purposes (in which case you really aren’t using the chroot at all).

If you launch this and run ps aux and id you’ll see something like the below:
$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
99999        1  0.0  0.1   1824  1080 ?        SNs  12:26   0:00 /bin/sh -i
99999       11  0.0  0.1   3392  1852 ?        RN   12:32   0:00 ps ux
$ id
uid=99999 gid=99999 groups=99999
What you can see is that you are only able to view processes initiated inside the jail.

Now lets try adding a filter to this:

./nsjail --chroot /var/safe_directory  --user 99999 --group 99999 --seccomp_string 'POLICY a { ALLOW { write, execve, brk, access, mmap, open, newfstat, close, read, mprotect, arch_prctl, munmap, getuid, getgid, getpid, rt_sigaction, geteuid, getppid, getcwd, getegid, ioctl, fcntl, newstat, clone, wait4, rt_sigreturn, exit_group } } USE a DEFAULT KILL' -- /bin/sh -i
Here we are telling NsJail to:
-Mo:               launch a single process using clone/execve
 
--chroot:          set /var/safe_directory as the new root directory for the process

--user/--group:    set the uid and gid to 99999 inside the jail

--seccomp_string:  use the provided seccomp policy

-- /bin/sh -i:     our sandboxed process (in this case, launch an interactive shell)
If you try run id now you should see it fail. This is because we have not given it permission to use the required syscalls:
$ id
Bad system call
The idea for us then would be to use NsJail to execute smbd as well as nmbd (both are needed for our Samba setup) and only allow expected syscalls.

Building our solution
Starting with a blank config file, and focusing on smbd, we began adding restrictions to lock down the service.

First we built the the seccomp filter list to ensure the process only had access to syscalls that were needed. This was easily obtained using perf:

perf record -e 'raw_syscalls:sys_enter' -- /usr/sbin/smbd -F
This recorded all syscalls used by smbd into perf's format. To output the syscalls in a readable list format we used:
perf script | grep -oP "(?<= NR )[0-9]+" | sort -nu
One thing to mention here is that syscall numbers can be named differently depending where you look. Even just between `strace` and `nsjail`, a few syscall names have slight variations from the names in the Linux source. This means that if you use the syscall names you won't be able to directly use the exact same list between different tools, but may need to rename a few of them. If you are worried about this, you can opt instead to use the syscall numbers. These are a robust, tool-independent way of identifying syscalls.

After we had our list in place, we set about limiting FS access as well as fiddling with some final settings in our policy to ensure it was locked down as tight as possible.

A rather interesting way to test that the config file was working as expected was to launch a shell using the config and test the protections manually:

./nsjail --config smb.cfg -- /bin/sh -i
Once the policy was tested and we were happy that smbd was running as expected, we did the same for nmbd.

With both services sandboxed we performed a couple of long running tests to ensure we hadn't missed anything. This included leaving the services running over the weekend as well as testing them out by connecting to them from different systems. After all the testing and not finding anything broken, we were happy to sign off.

What does this mean for us?

Most canned exploits against Samba expect a stock system with access to system resources. At some point in the future, when the next Samba 0-day surfaces, there’s a good chance that generic exploits against our Samba will fail as it tries to exercise syscalls we haven’t expressly permitted. But even if an attacker were to compromise Samba, and spawn himself a shell, this shell would be of limited utility with a constrained view of the filesystem and the system in general.

What does this mean for you?
We stepped you through our process of implementing a sandbox for our Samba service. The aim was to get you thinking about your own environment and how sandboxing could play a role in securing your applications. We wanted to show you that it isn't an expensive or overly complicated task. You should try it, and if you do, drop us a note to let us know how it went!