Stack Smashing Protection Bypass via Pthreads - Aleksandar Nikolic

Name: Stack Smashing Protection Bypass via Pthreads - Aleksandar Nikolic
Uploaded: 2019-02-18
Duration: 35 min 21 s
Description: The full title of this talk was too long for the title field: Stack Smashing Protection Bypass via Pthreads - A Case of MiniUPNP Buffer Overflow In 2015 Talos identified and reported a buffer overflow vulnerability in client side code of the popular MiniUPnP library. The vulnerability was promptly

BSides Knoxville35:21210 viewsPublished 2019-02Watch on YouTube ↗

Mentioned in this talk

Tools used

GDB netcat Wireshark

About this talk

The full title of this talk was too long for the title field: Stack Smashing Protection Bypass via Pthreads - A Case of MiniUPNP Buffer Overflow In 2015 Talos identified and reported a buffer overflow vulnerability in client side code of the popular MiniUPnP library. The vulnerability was promptly fixed by the vendor and was assigned TALOS-CAN-0035 as well as CVE 2015-6031. Martin Zeiser and Aleksandar Nikolic subsequently gave a talk at PacSec 2015 ("Universal Pwn n Play") about the client side attack surface of UPnP and this vulnerability was part of it. Talos has developed a working exploit against Bitcoin-qt wallet which utilizes this library. The exploit developed by Talos includes a novel Stack Smashing Protection (SSP) bypass. As the bypass technique lies in the way pthreads work it perfectly illustrates how a seemingly hard to exploit issue can still be exploited due to unforeseen consequences arising from the complexity present in modern process execution chain. In this talk, we will introduce the details of stack smashing protection implementation, discuss the relevant libc and pthread mechanisms, introduce the steps required for the successful bypass and conclude with a demonstration. https://bsidesknoxville2016.sched.org/event/6tCp/stack-smashing-protection-bypass-via-pthreads-a-case-of-miniupnp-buffer-overflow

Show transcript [en]

[Applause] so we get that there's there's ninjas right we get that much it's not it's not English but there's ninjas maybe you want to start out with that a little bit think where that came from well as I said ninjas are popular ninjas are good uh for some reason they were hugely hugely popular in Serbia during the 80s when the this is from a movie called how and roll died and it combines a some guy came up with an an idea of making a band that combines ninjas mysticism surrounding ninjas and turbo folk or rather folk pop music it's a hilarious movie I would recommend check it out but I'm not sure it's even available

somewhere or if there's a translation but piece of local lower uh so I'm Alex I come from Serbia I'm part of the tals vulnerability uh development team uh my team at tals at Cisco tals does a zero day research we are tasked with finding vulnerabilities in commodity software uh you can see some of our work at our website under vulnerability reports or some of our open source software and some tools that we've developed at this GitHub link so what am I going to talk about today is uh basically a walk through to this uh neat little vulnerability that we found in a UPnP Library implementation uh some of the things that make this vulnerability interesting

and some of the things that make the exploitation itself interesting uh exploiting this vulnerability required uh stack smashing protection bypass which makes it somewhat interesting as a showcase so really quickly I'll go and make an overview of UPnP uh how the protocol works just as a refresher uh UPnP is generally designed to be run on residential networks uh usually as a way and a set of protocols for network discovery for uh usually doing pore forwarding stuff uh media streaming and things like that uh it runs on top it's built on top of ssdp HTTP it uses uh soap with XML of course and a whole bunch of fancy words there uh the most the usual way the most

used feature of UPnP is by applications that want to make incoming connections inside uh a land like uh applications behind behind of net that want to set up port forwarding on a Gateway or whatever so it's usually mostly popular with peer-to-peer applications like torrent clients cryptocurrencies and stuff like that wherever you need to basically accept incoming connections into our local network uh there would be some form of UPnP support for port forwarding probably uh uh how does that work basically the connecting application usually on Startup sends out a broadcast UDP packet uh specifying the packet is called M search it just says I'm new on this network can you please tell me what kind of UPnP functionality do you

support it sends it on broadcast receives replies client pares the replies and then chooses to which the UPnP endpoints to connect uh briefly in a few wire shark screenshot it looks like this the first packet the client client sends out the M search request to this obviously broadcast U uh address over UDP next the server replies the reply contains one header called location header which just has an IP address a port and a location of a of an XML file that actually contains the description of the devices capabilities and the client then proceeds to fetch that XML and proceeded to par it to see what what it can actually do so uh this is basically how the XML

works there's a bunch of information there our my colleague Martin tyer the did some interesting research with uh UPnP scanning over the Internet uh presented at this year's Pac conference so if that's of your interest you can look it up we're not going to deal with the details of the XML uh so obviously this poses a an attack surface to the client application uh client fetches the XML and starts parsing uh the parser is exposed to the Expos external XML files and we've come to the fun part here because most UPnP library implementations for various reasons come with their own XML parser implementations they're not using lib XML they have their homegrown XML pars and we know for some reason XML is hard

to [Laughter] parse uh so on one hand it means for Developers there's less dependencies because these libraries are uh intended for embedded devices you don't want to put a lot of dependencies so they're small complete libraries and on the other hand we as attackers know that this means less audit less audited less trusted code so I took a look uh I took a look especially at this Min PMP client which is if not first the second most popular mini uh UPnP implementation out there it's a relatively small Library self-contained and it compiles in basically everything uh so what's actually using it a couple of examples here uh as I mentioned torrent clients almost all torrent clients are using it here I give

a transmission as an example on the other hand cryptocurrencies all Bitcoin implementations are using minium UPN P to do their bidding and Tor used to and there's a fun fact with Tor use of it so right around the time I started looking at mini PMP uh I took a look at what's actually using it and Tor had this tool called Tor firewall helper which does what it says in the Box it pierces through firewalls sets up port forwarding and it did that using mini UPnP uh when I found the vulnerability I went back to check out the tour imp USIC how the tour uses it and found this message waiting for me there they basically removed the tool and left the

message that they didn't trust the underlying libraries enough to have them run on their software uh this wasn't used by default but would have been interesting either way so they rewrote their uh their their firewall helper tool in go and then went off from there do you know which de I honestly didn't look it up but for some reason didn't occur to me to to check it could be interesting to see that was a definitely good decision because they have some critical code running there uh even though it wasn't run by default uh a good decision to have something like that Rewritten in a manageable language so the you already see where this is all

going uh it's the vulnerability that I'm showcasing is just a vanilla buffer overflow in the XML parser stack buffer overflow in the XML parser it's trivially uh triggered by just sending the overly long XML element name uh it's been patched since October the 1st last year and has this CV uh uh a little bit of the vulnerability itself here we see the parser initialization we set up a couple of uh call back functions we here we're interested in IGG start element function which is called back whenever it starts parsing XML element names and then the parser is invoked next piece of the puzzle is the IGG data structure which is used while parsing the uh element

element names and we you can see that it has a couple of uh string buffers with fixed sides uh the minu p and p eural Max size is limited to 256 bytes so we have a fixed size buffer and finally in this function we have uh an unchecked M Copy call that just copies whatever we sent to it into the that small buffer which is located on the stack this clear clearly results in uh stack uh stack buffer overflow and we want to try and exploit this so reaching the vulnerability is relatively straightforward you're on a fake UPnP server on your local Land network some victim uh connects to the land Network fires up an application

that uses menu PNP sends out M search Discovery receives a reply starts parsing the the XML and gets owned not much in that way the problem here is with exploiting this vulnerability is or was that uh since we were trying to to Showcase this vulnerability by exploiting a Bitcoin client because I thought it would be an interesting situation if I would put a fake UPnP server on a local land you Jo join the land fire up your Bitcoin client and I get access to your box without ever specifically targeting You by just virtue of network discovery uh the thing is Bitcoin folks being a bit paranoid as they should have built their binaries with all sorts of proactive uh

anti- exploitation t uh protections one of those is obviously stack smashing protection which kind of get in our way so just as a recap a few slides about how the stex matching protection actually works so on this slide we have a usual uh function frame uh function frame on the stack at the bottom we have local variables in this case a character buffer then we have a stack Canary then afterwards we have a same fra saved frame pointer and finally saved return address above it are function arguments ments when we want to exploit the stack smash uh stack buffer Overflow what we want to do usually is override the saved return address but in this case since

it's protected with stack smashing protection we have this stack Canary between the buffer and the return address if we override the stack Canary with anything else then it currently is it would it can be detected and would stop the exploitation in code in a bit of assembly this is how it looks like uh at the left hand side we have the function prologue that takes a value from GS uh segment register at offset 14 and puts it in and saves it in a local variable here called varc GS is actually in Linux in Linux implementations GS segment register usually points to a tread control block which which is where the stack cookie is uh saved each time a

process is restarted a random stack cookie is generated and placed in saved in the thread control block uh here we are copying it to a local function stack at the beginning of function on the left hand uh I'm kind of mixing my hands because I keep turning but on the other side we have the uh function epilog and at the end of the function just before the function returns we have a check for stack smash stack smashing we have here we see the store instruction uh comparing the value from variable C with the original value in the in the tread control block soring them if they're the same the result will be zero and we continue to return from

the function otherwise we know that something went wrong and stack check will stack check fail function will be called and it usually results in a message something like this you got stack smashing detected and your process is terminated and you get a cordum or not but doesn't matter so we get this message what's actually happening when the stack check fail gets called it in turn calls 45 fail so it's just a wrapper for 45 fail fortify fail in turn eventually calls lip message which outputs that string outside uh here we start seeing a bit of a problem because lip message is has some complicated code it parses the environment trying to find Environment variables uh to see to know

what kind of output you want it interacts with sockets with file descriptors wres to the screen obviously uh the problem is that this function kicks in once you detect that the process is not to be trusted and you're executing more code in the process that you detected is not to be trusted uh I'm not obviously the first one to notice this uh Adam zoski has a nice paper that goes deep into stack smashing protection internals outlining some of these ideas uh about it it being complex instead of just invoking a the second the instant it detects something is wrong it just keeps executing more code uh one interesting example of that was uh from Dan

Rosenberg who posted this example where he did a stack smash stack smash but instead of using it for uh hijacking the process control the execution of the process he used it as an info leak so in this example here we have a buffer flow which uh overflows a buffer and then overwrites the main function argument the arcv on the on of the main functions stack frame and then when the stack Mion protection kicks in and tries to Output that message he gets the memory leak because if you remember here how the lips message looks like the argument s will be the name of your B B which comes from arv he over wrote that controls the

pointer points it to something that he wants leaked uh so we are going to go a step further and abuse this subvert SSP for code execution uh a little bit of what's actually going on once we do stack smash once we smash the stack with different lengths uh the first run with GDB we see the stacks smashion detected the regular message printed out second time the first one was with 200 BYT second time we had four more bytes and we got a different crash uh and if we take a look uh at the call stack we see that the values of AR V uh are mangle so we overwrote uh this part of the process memory and we get a

different type of Crash uh now to get to the next part I need to make a little detour and get into how CIS calls work on Linux and something called Al auxiliary vectors uh since well basically since in modern times uh somewhere after the Linux kernel version 2.4 uh all system calls were all system calls on Linux were executed using this kernel VC skull functionality or method before that the regular way to invoke system calls is by invoking an interrupt 8 hex uh that wasn't particularly efficient so now lib's standard system for doing that is by using kernel VCS call and here we have an example of a system call being invoked this is a

right system call wrapper we have it setup registers as you regularly would but instead of invoking an interrupt it calls this uh function pointer that's again in Gs segment registered but this time at offset 10 uh as I mentioned earlier GS points to a trend control block and Trad control block is this memory structure that holds this information on the other side uh at offset 10 10 Hax we can see that there's a uh C info pointer C info pointer actually points to the kernel VC function which then proceeds to execute a system call so uh there has to be a way for the loader or the kernel itself to pass the uh the location of the kernel vcal

function to the process itself and that's done through the loader to Lipsy via elf auxiliary vectors Al auxiliary vectors are just an array of integers that's located somewhere up the stack so the usual uh stack of the Linux process has function Stacks up up uh up of those you have the environment and upwards of the environment there are alpha auxiliary vectors uh some of those vectors are at entry which points to the entry point of the binary at P pH phdr which points to the elf headers uh page size information and of course the ATC info pointer which points to Kernel V uh VC skull function so what's happens is that at the process start startup uh after the loader is

finished then process actually starts the some of the first code that get executed copies the pointer from the ATC info auxiliary Vector into the tread control block for later use uh just a little disclaimer this is x86 specific the mechanism is a bit different on X 8664 but similar tricks would apply uh it's obviously somewhat more different than other platforms uh so where am I going with this as I mentioned earlier stack check fail function uh is fairly complicated and it will eventually have to write something on the screen it outputs that warning message uh and it will do So eventually by invoking a right system call so idea here is that if I can

somehow take control and overwrite that kernel vcll function then when the uh stack smashing protection kicks in and tries to write to the screen it will try to uh output something to the screen inadvertently calling right but since I have control of the function pointer it will jump to my code that's not going to be so straightforward for a couple of reasons first if you try and make a huge buffer overflow to reach alha auxilary vectors on the stack to override them uh we would inadvertently smash all the environment which would actually make the lip Smash It code crash because it's parsing the environment of course we could make sure that the environment stays sane looks good but then another

problem is that the pointer itself is already copied into tread control block and tread control block will in most cases won't be reachable with a reasonable stack Overflow either way if even if we have a huge overflow it and it was reachable it's way off of our current location that we would override basically half the system memory and kill the process anyway so we need another another venue want to the environment uh they have uh some environment variable which you can set that would make it output a different message or log it into a uh certain file descriptor or something like that so the it just looks for different options which you control through the

environment variable uh so as I said we need another venue of attack interestingly the application that we're attacking in this case is Bitcoin QT which is is a graphical application it uses QT Library which means it's using it's a threaded application uh and it's using posix treads for it what's interesting about Treads is that they have tread local Stacks uh tread local Stacks means that all of that information that's uh kept for the binaries for the ex execution of uh system calls is gets copied into the tread local stack they have each thread has its own local copy and that in initialization is done somewhere while creating the tread in this allocate stack function in the posic tread

implementation so each tread will have its own uh copy of the tread control block which we've seen earlier which then gets referenced uh when executing system calls uh meaning that meaning that we could maybe use this as a venue for exploitation uh also going along our way is that is the fact that TCB tread local TCB would usually be located somewhere above our current ESP relatively closely so we can relatively easily reach it and overflow it and take control of it so as an illustration of this I have a small example program that has a vanilla buffer overflow in it in this F Vol function uh what's a bit different from from the other examples is that it's a

it's creating a tread that actually calling that instruction so in this F1 function we have a string copy buffer overlow which just smashes the stack with no problems it's just a demonstration and on the other hand and we have a GDB session showcasing what's actually happening when we do make the stacks matching uh you can see that we run the program with a buffer of 292 plus 4 bytes 292 bytes to make an overflow plus four bytes to write what we want and when we write uh when we run the program with this buffer it crashes with the segmentation fault uh error and we caned it crashed with EAP being set to EA EA EA which we

set upstairs uh when we run the program so we've taken control over the execution obviously uh by overwriting the tread local uh tread control block and its CC info pointer notice that no message from stack smashing protection is printed out uh because this kicked in uh the process crashed while it tried to uh invoke the uh the right system call to actually create output on the screen there's uh a one problem with this is that now that we've gained initial control over the process uh our kernel VC call function uh pointer is smashed it points to some of our code so none of the system calls that we ourselves tried to execute will work so our Shell Code

would break wouldn't work so the first thing we need to do uh for successful exploit is to actually repair this function pointer write back the original value so we can actually execute Shell Code afterwards and another problem is that we need to uh use R gadgets to bypass uh nonexecutable stack in this case to solve those problem uh in my exploit I use just two gadgets to relatively simple and useful gadgets first one is your usual stack pivot gadgets something that you would use to get the stack pointer pointing back to the area of your buffer the area of memory that you control that you know what's there and this first Gadget uh helpfully pops four registers off the

stack so we get the control over for registers from it for free second Gadget is what you would call WR what where Gadget uh which uses those previously set registers to write four B four bytes of data anywhere we want using these two gadgets we repair the kernel VC scull pointer uh uh again enabling us to execute system calls of our own and then just proceed to execute more code so uh the actual exploit breaks down to to these few steps first the new threads gets initialized by application TCB info is copied to local thread stack then the Overflow happens we overwrite everything up the stack we smash the canary because we don't care about it we

smash the return address because we don't care about it we smash everything up the stack pass the environment and override the tread local tread control block with its C info pointer point it to our first Rob Gadget we pointed to this first Gadget which uh sets up the the stack pointer and the registers and then jumps to the second Gadget which repairs the uh repairs the kernel V Cisco function pointer then as Rob has already kicked in we restore it then we just call M protect to set memory protections to execute and jump to our Shell Code not much not much to it there uh and what would be an exploitation talk without a an actual

vulnerability demonstration so I'm just going to start up a VM and hopefully it will work I never know with these things okay just [Music] see my IP address gets reseted every time and my exploit is hardcoded with an IP address so I have to reset it so in this screen I run the exploit uh so it listens listens on a uh on a network interface listens for those broadcast packets on another screen I'm going to run a netcat reverse netcat listening for incoming connect back shells and then obviously I'm going to start up the vulnerable application so the application starts up we get a request we set the fix server sends out the reply sends out the description

XML uh we don't see the application anymore but in the reverse shell we have actually gotten code execution and uh gotten a

shell so that has been a quick demo um just a small conclusion to conclude this the whole talk what we did here we abused SSP error reporting which is too complicated for its own good as I said earlier you detected that the process is misbehaving and you keep executing more code in it you shouldn't do that obviously uh additionally uh in case that there was some CIS call executed before the function returns we wouldn't even have to abuse the stack smashing protection to get code execution we could just reuse the same thing so uh patching the SSP not to do this complicated code output wouldn't necessarily fix this issue and I just think that this whole

uh this vulnerability and the way it's exploited showcases How uh complexities of modern process run chain different components in the modern process run chain uh have unforeseen consequences which can always be abused in one way or the other and that's basically it from me today if you have any questions go ahead so what made the uh ke threads I mean my understanding is that in P threats the TC the threat control threat getting TCB right uh is way too close to the rest of the St yes what made them not put in a guard page uh not sure that could probably that would be a good idea probably it's just the way the the Trad local stack is set up in the

that allocate stack function it just doesn't doesn't have anything in in the way of doing that maybe they put the guard page would be too much to subtract from the overall step area could be yeah would take too much too much space and so is there anything in between the uh the top of the stack and the TCB uh I'm not sure if there are any other structures but uh as I was going along smashing stuff trying to see what's going on I only interesting things that I saw were previous function function stacks and uh previous function frames and then the local TCB so not sure so overwriting the heal office vectors would not really help much because by

the time need it's already I just use the used the that detour to introduce how uh how the the kernel visis pointer gets passed from the loader to the to the processor so my question is is there anything in the ox Vector that does get used uh it gets reused uh so we know that there is stuff that gets used there uh at the relocation that's that's that that was our elf yeah right uh but after that uh I haven't looked into that I have to admit thank you for the questions aw some stuff yeah uh the full exploit and uh there's a blog post on our blog explaining all the stuff there too so if

you want to look it up later feel free to do so thank you

Stack Smashing Protection Bypass via Pthreads - Aleksandar Nikolic

Related talks