← All talks

Matt Davis - Utilizing High Entropy Stack Canaries for Locating Function Return Addresses

BSides PDX · 201820:2983 viewsPublished 2018-03Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
DifficultyAdvanced
StyleTalk
About this talk
enferex (enferex) Canaries, threads, and split-stacks! This presentation will describe a technique that I stumbled into for identifying a return address within an opaque memory space. Stack canaries of high entropy can be used to locate stack information, thus rendering a security mechanism as a tool for compromising a memory space. Of course, once a return address is located, it can be overwritten to allow for the execution of malicious code. This return address identification technique can be used to compromise the stack environment in a multi-threaded Linux environment. While the operating system and compiler are mere specificities, the logic discussed here can be considered for other execution environments. Bit flipper, coffee guzzler, horrible fixie rider.
Show transcript [en]

all right good afternoon everybody house hello zombicide so far a lot of really interesting talks all right so my talk today is a little bit just about some research that I was doing recently and I think it was some pretty interesting stuff so I want to tell you guys what I discovered and through doing this research I also found a potential weakness in a security mechanism so to get started with this presentation is just a narrative of the research that I was doing recently I was looking into split stacks and threading during program execution and some of the security mechanisms behind these the main driver of this talk is something called a split stack a split stack is an

alternative to a traditional program execution stack and I'll go into all the details in just a minute but I really want to understand how split stacks work so that formed the basis of this talk and all right well a little bit about myself you know it doesn't matter all right okay so as I mentioned once you understand how a stack works during execution time you can then start plugging it with various types of attacks and in particular I'll just go ahead and give you the TLDR at the for the whole talk here is that a security mechanism known as they stack canary which if you're not familiar with I'll go into how they are implemented in

a minute they actually provide a fingerprint into a opaque memory space so when you launch a program it belongs to this giant blob of memory well I can now find you cannot find return addresses if you have stack Canaries enabled because stack Canaries have a certain fingerprint that sticks out of the memory space so if you're an attacker and you're scanning the system's memory well if they're running with the security mechanism enabled you can you can find where the return addresses are and you know that if you can compromise a return address and you can compromise a program or I best make the program crash alright so the background material I'm sure many you guys are familiar with

stacks stacks that's the stack of stacks okay so I'm just gonna kind of cover everything which will help paint the story about the attack okay so stacks if you're not aware are a data structure that contained things it just contains information in this data structure they're used all throughout computer science but the idea here is that they're to operations you can perform to a stack you can push items or push data onto the top of the stack or you can remove items from the top of the stack there's only two operations a push or pop and they only occur to the top of the stack you cannot manipulate the middle of the stack or the bottom of the

stack as I said this is used all through our computer science but what's important here our execution stacks so an execution stack is what a program uses during runtime so if you think about what a program does it starts off in the main function then you call subsequent functions so main calls foo foo calls bar and the whole series of functions are getting executed well you have to store information about that function the local variables in the function the return address of the function and the parameters or the arguments passed to the function they all belong in this temporary storage area of the process known as the execution stack every frame has is an item that is pushed onto the top of the

stack when you call a function and when you return from the function you pop that stack frame off or that activation record or whatever you want to call it it looks something like this I have stole this from the Wikipedia so that's completely legit this is two functions here these are two separate stack frames the blue stack frame and the green stack frame because you can only do things to the top of the stack the currently executing function is that item at the top of the stack as you as I said earlier it's got the local locals or the automatics it's got the return address and it's got the parameters that were passed to it so this is kind of what it looks

like in memory during runtime and I've done just a little bit disassembly here just to give you an idea of what the code actually looks like when your compiler generates stack frames so I'm going to use this I play rock band I know how to use a mic all right okay so in this very simple example all I'm trying to do is illustrate what a stack frame looks like during Excellus em bleed for it looks like because this is very useful because this is this helps identify what we need to do to actually attack a running process so I've generated a three function program foo foo calls bar bar calls Baz now you've seen in the function bar I've defined

two local variables okay so it just looks like that little fancy diagram there here is the assembly dump of that as you can see foo and Baz look pretty similar this is the portion of the code that is setting up the stack here you're pushing the stack base pointer you're pushing a value on to the stack there if we look at the function bar you notice there is a subtraction here because the stack actually grows downward on an x86 machine my this this is focused on x86 program execution and that subtraction is enough memory to actually hold the local values of that we're defined in the in the user's code okay so that's just kind of what we're what we're

looking at here for a regular function without any security mechanisms in place so all of this discussion so far has led up to just singly threaded programs now remember a per my process has one stack space that works fine for single threaded programs but what about multi-threaded programs because you have now multiple threads executing functions independently of each other and a non-deterministic way a compiler cannot decide at compile time what functions only be executed in what order due to this multi-threaded problem here so the solution is that every thread gets his own stack space but this presents a problem because as I said stacks contain information they contain the the process stack contains information about the callers and the

kaali's of that thread a compiler doesn't know how big to make the thread when it's building out your program when it's making that assembly code that I just showed you so there's a very simple solution to this and I think it's probably not the best solution but this is what POSIX does they hard code the size of threads so when you start a thread if you do pthread create in your program or if you're using some kind of threading library these threads are created with a hard-coded size that means two things either you're going to be wasting that memory if you don't actually need it all or you might actually blow out the stack of your thread and terminate the process

prematurely and on the system that I was investigating it was hard-coded to eight megabytes now think about that for a minute if you're spawning threads chances are you might be doing a lot of these all over the place think about web servers something like that eight megabytes is an awful lot of memory for every thread think about the overhead this is also not a very scalable solution okay so this is just your traditional POSIX thread now let's think about other programming languages let's think about the go programming language if you're not familiar with go they really boast about being this highly concurrent you know we can spawn tons of threads well it turns out they

do a different trick and this is what led up to the investigation of a split stack instead of using the traditional stacks that I just showed you go uses a concept call it a split stack and what that does is it solves this hard problem in the function prologue oh sorry in the function prologue so the very beginning of the function of a go program so where all the stock initialization goes down for that function go makes a check go says okay I might be running on a thread how big is the stack area of my thread if it's too big then what happens I'm sorry if it's not it's not large enough to hold that

next function that's getting ready to be executed goes runtime will actually allocate dynamically a brand new stack just to run that function so this solves the a megabyte problem or the hard-coded thread size problem by allocating only when it needs to this also means go can use really tiny threads so they can do this in a much more efficient manner so I've just kind of outlined what happens during a split stack and I'll I'll share the slides later I don't want to blast you with fun words but it turns out that when you actually create a thread they are actually dynamically allocated as well so what this means is that thread memory whether it be a split stack from like a

go thread or your traditional POSIX thread they all belong to dynamic memory this is very interesting because on a linux machine which is where I dump this I can now identify what parts of my process memory space are threads okay so I've trimmed out some of the output here but I've highlighted there at the bottom part where the thread belongs you can see that it is not assigned to any any in the object file there on the right column it's just an empty slot and there's read and write permissions on it before that wasn't there but then when I sort of allocating threads you'll notice that these additional pages keep just growing so my process gets larger and larger so I can

look at any memory space and find potential pages of memory where a thread lives now remember threads have stacks stacks have returned addresses so now this is narrowing my search to try to attack a return address of a running process so we're gonna have a little bit of fun all right it's my boy Alex there and so what's that the bottom question marks there it looks very similar to what I just showed there that's probably a thread maybe not but we can make a rough you know make a make it a sumption here I'm not gonna ask you guys what the answers are here probably a thread or stack here your first thread there but

what about the other one the other page there the one right above it that's kind of interesting notice the permissions on it there are none there's no read write or execute it's just - - - well that's what's called a guard page basically if you do in protect and you protect a page with the protect none flag it sets absolutely no permissions and if anything reads or writes to that page of memory your program terminates because it throws an exception the CPU says there is a segmentation fault that's a security mechanism so that if you're overwriting pages or your scanning memory you're trying to touch that your bum the program would just immediately turn the

name okay so as I've let hopefully convinced you all now that stacks are obviously critical pieces of information they control the flow of a program because return addresses live on the stack so of course it makes a lot of sense to protect the stack right well there's a concept called e-stat canary this is you know ancient technologies but the idea is you place a known word you being the cap islur when it creates the program it places a known word at the top of the stack right as the function starts and then when the function completes it checks the stack to see if that same value is there if the value has changed at all

then the stack is assumed to be compromised and the program is immediately terminated so here's a little more information like I said I'll share the slides later but here's what a canary looks like the Assembly of a canary it looks like so here's a function foo I've compiled it using GCC stack protection mechanism and you can see right here and this is going to be fun all right you can see right here there's a little bit of additional information if I were to compare this slide to the ones I showed previously you'll notice that there's a few extra instructions I'm reaching into the TLS or the thread memory for this this process and I'm putting some value into

the RA X and I'm moving Ari X 2 8 bytes offset of the the stack which in this area right here is the actual canary value so the canary changes every time you start the program but it should stay the same during probe during the program the reason is if I were an attacker and I knew what the canary value was a priori then I can write my malware to actually just use that canary the solution is to use a random word instead that's what Canoe does so when new or if you're running Linux a random word will be generated in the kernel past all the way up to user land your program is launched

and that random value is the canary that is stored here so if I were writing malware I have no idea what to actually use as a canary value if I were trying to compromise the stack just one one lovely little little way that kanou tries to keep things safe and then right here is the the epilogue of the function and it checks the the stack canary set there you can see it's putting that value into RDX and checking it if the value changes then the program terminates

all right okay so that stock Canaries and that code there at the bottom was ripped straight from the kernel Linux kernel and that is what it's using to generate the the Canaries you can see it's just using the the random device on your CPU or your on your system to actually generate some entropy and then store it and pass it up to user land this all is actually used as part of the auxilary part of an elf file or of your executable so now the question comes can a canary of maximum entropy used to be feds used to find and find addresses in an opaque memory space well yes of course again that's the band yes you

didn't get the joke nobody left Wow at least I'm not a comedian because that would suck all right so this is this is called Shannon the man who coined the term Walden he didn't coin the term entropy but he tried to measure entropy and his definition is it's just the average amount of information given an event so what we're trying to do is look into memory space look for the highest entropy word because I'm pretty sure that's going to be the canary and then I know that the canary value lives a certain way for a certain distance from the return address so all this means is I scan to memory or you can scan a

memory memory dump look for the highest entropy word and then knows that 16 bytes from that it's going to be where the return address lives if it's not well you you might break a program or two who cares there's false positives it's not it's not a perfect science here but it does work I thought he looked cooler with the glut know he looks cooler without glasses like Sears these this guy's awesome so I just kind of defined my concepts of maximum entropy is based off of Shannon's enterpise not entirely it's not a total it's not not total but it is it is very close and what I do is I look at a word size and address

sighs and I look for any address or any value where every byte is a different value and that's what I call maximum entropy I'm not looking at the business I'm looking at every byte just saying if there's a in the word if every bite is different I call it maximum entropy and that seems to work so how frequently does this occur on a system if it happens all the time they don't have a lot of false positives so I scan my entire system it took a long time but I was drinking coffee and I don't really care and it actually turns out it's actually pretty rare look at that I mean I'm not gonna read the slides to you but

it's this doesn't mean a lot let's draw pictures so here's a memory I decided to write a heat map instead of the highest entropy words now if you're color blind I deeply apologize this is gonna not make any sense but the read items there I think there's four diagonal in the middle five six I see seven red items out of this entire memory space I removed all the zero bytes because this would have been enormous otherwise but this is looking at just one single process so as you can see the highest entropy words are pretty rare so this leads to the attack where you first you scan the memory area of a process you look for the highest entropy word and

then you look for a value 16 bytes from it because that's probably in a return address if that value is in fact a value that lives within the memory space if it looks like an address and it smells like an address it's probably an address so there's a little bit of dock logic there or platypus logic and it actually turns to work so that's that's basically the hack there is quite simple using the security mechanism provided by a stack canary we can now identify return addresses so I've got a lot of information here I'll post the code I have written which I haven't posted yet because it's gonna be really close I doing like a live like I'm releasing now

but I'll just do it later no big deal and you can look at it you can make comments you've got my email addresses at the top so feel free to hit me up with any questions

you

[ feedback ]