Shellcoding basics

Name: Shellcoding basics
Uploaded: 2016-11-14
Duration: 43 min 49 s
Description: An introduction to shellcoding and binary exploitation through buffer overflows. The talk demystifies assembly language, x86 registers, and system calls, then demonstrates how to craft and inject shellcode into vulnerable binaries to gain shell access. Practical for CTF competitors and developers le

BSides Charleston · 201643:494.6K viewsPublished 2016-11Watch on YouTube ↗

Speakers

Max Harley

Tags

CategoryTechnical

TopicCTF Reverse Engineering Vulnerability Research

DifficultyIntermediary

TeamRed

StyleTalk

Mentioned in this talk

Tools used

objdump

Frameworks

Metasploit

About this talk

An introduction to shellcoding and binary exploitation through buffer overflows. The talk demystifies assembly language, x86 registers, and system calls, then demonstrates how to craft and inject shellcode into vulnerable binaries to gain shell access. Practical for CTF competitors and developers learning secure programming practices.

Show original YouTube description

Title: "Shellcoding basics" Speaker(s): Max Harley (@max_68) Max Harley is a freshman in college who loves security. Max worked for Soteria, a Charleston-local security firm during his senior year in high school. Security is Max's passion, so he strives to become better at it.

Show transcript [en]

giving hand for our next week

thank you hello my name is max Harley I'm gonna be talking about the art of shell coding shell coding is truly a dying art people these days sort of tend to gravitate newer people that come to security kind of gravitate towards web exploits because it's much more approachable than like running through binaries kind of scary stuff so most people don't really get to experience shell coding or binary exploitation so in this talk I'm going to try to prove that shell coding isn't as hard as people make it make it out to be and it's also still extremely useful it's a little bit about myself I really enjoy CTF that's actually how I got on this topic that also means I don't do

this for a living so don't take anything I say as fact but sort of use it to formulate your own strategies for shell coding so if you want to follow along you can go to a presentation max HIO /l code you can bring it up it has a really good mobile support this is actually the master presentation so if you it'll like follow me throughout this presentation kind of cool if you want to do that so in this presentation we're gonna learn how this string of totally random looking characters becomes this and as a security researcher we really like this because this is this means we have a shell we get a root shell we can run whatever we

want and that's pretty awesome so a quick question is anyone here a C developer or like part-time C developer so we have one so more than I thought there would be - and does anyone like CTFs everyone yay so these are the groups that this is really going to be extremely applicable to so for developers obviously teaching like secure programming practices but for CTFs if you look at all the DEFCON qualifiers there's always at least a little bit of shellcode that's required in every problem so kind of a cool thing I keep throwing this term around but what is shell coding really very generally and very typically referred to shell coding as assembly instructions that allow us to run

arbitrary code typically in like a C or C++ program so things built in C++ are like servers even games a lot of time or built in C or C++ so that's really where you're gonna be able to use this why is shell coding relevant this is kind of a big one because a lot of the times we look at web exploits and we kind of disregard binaries but dirty cow came out recently that's that was really topical and it actually this specific version that I have linked here uses shell coding to generate a root shell on whatever device you run it on which is a really bad so it technically it uses this tool called MSF venom to generate

the payload but it's still shell code and it's still cool if you look at any of the Metasploit modules they all to get there she'll use shell coding and you also get like hacker style points because shell coding is kind of crazy kind of cool so definitely hacker points you get those so we need to learn a couple of things before we start actually shell coding I'm not going to spend too much time on this but I'm kind of gonna overview it the first thing very seemingly obvious but it gives a good perspective on what we're gonna be talking about we have a couple definitions vulnerability and exploit vulnerability I'm just going to define a

flaw in the system that leaves it open to attack and an exploit an attack on a computer system especially one that takes advantage of a particular vulnerability so we see exploit takes advantage of ability to do malicious stuff so for people who do web app pentesting cross-site scripting is your vulnerability and then session hijacking is your exploit very similarly in binary exploitation our buffer overflow which we'll talk about in a second is our vulnerability and getting a shell some sort of shell through shell coding is our exploit so buffer overflow in ten minutes this is a preface kind of what you have to know beforehand I want to try to keep it down to ten

minutes but buffer overflow is extremely practical we see it's good for training people good for tutorials and stuff and it's also extremely practical so a lot of newer developers especially once your programming and see tend to kind of fall into this trap of allowing buffer overflow vulnerability to be used so here we go first thing we need to know assembly this is the lowest level language that can reasonably be programmed in so actually C is converted into assembly so IB program c code gets converted to assembly we have these things called registers so these are the things that are highlighted in red EAX EBX there's an e CX e SP there's a ton of different names for these

and we have a couple simple commands MOV moves data so on the first line we move the number five into EAX second line we move 10 into eb x the third line just like in whatever language is your favorite the plus operator will add two things in this case add adds EAX and EBX and actually stores it into a X so we get 15 so we add EAX EBX 5 plus 10 15 and then we print one specific register I want to talk about is a IP e IP is our instruction pointer this one is really necessary to understand how the buffer overflow vulnerability works VIP will holds memory locations and it also tells the program where to go next I'm

going to get into a kind of a demo to show you help you understand this

so I have this C program it so we start off in main we create an integer called 5 or an integer called a set it equal to 5 we have an integer called B 7 equal to 9 we say a is equal to a plus B we call a function and then we print so we print hello world and then a new line and then 14 because a is equal to 5 plus 9 and that's it so we'll compile this run our program and we see we get hello world and 14 I'm in at 14 so we're going to use a tool called objdump what objdump will do is take our C program and convert it to assembly this

is going to help us understand how C kind of converts to assembly

so you don't need to understand all of this right now but we can see some some things that look familiar like this emit these numbers over here 5 and 9 actually may be easier with some house you can see 5 and 9 over here getting placed into areas we see add we're adding two things we call new function like we saw in the C program so it calls blah blah blah FB if we scroll up we can see this function of blah blah blah FB and all this does is print something the rest of it is kind of you know unnecessary so the first column over here on the side these are the locations so these are the locations of

where we are the second column is the processor what's called op codes and this is what your processor actually reads these are the values that your processor read and then this last column just converts the op codes into sort of a more human readable something so that we can understand it

so next so how is data stored in assembly so we need to think about how we can interpret this how sort of some sort of compiler will convert our C code into assembly and it actually so we start off we arrive at main so we need to do something with main we create an integer and then we create an another integer and then we exit so in assembly there's actually an internal stack that's that's used and kind of created and we push and pop things from the stack to tell us where we are what kind of values are being inserted so in this piece of code this is how the stack would look so we first push the main return address

at the bottom of the stack and then we push the variable a then we push the variable B and that's kind of how it's built if we see if we lay the stack out horizontally we can see that the start we start off with the topmost part and we kind of go on from there over a equals four and then the return address for me so functions on the stack it sort of operates a little bit differently kind of weird how functions work so we start off we're in main we create the return address we first pushed the return address for for main and then we create the integer foo set it equal to two then we

call new function when new function is called we push the return address on to the stack so we know where to come back to it which is kind of important because if we didn't know where to come back to the program would just kind of keep executing and nothing good happens from that so next we push things onto the stack bar and then the variable name and then this function exits so we return back to the return pointer return come back and there's things that were on the stack got pushed off because we left this function so that's kind of how how functions work on on the stack that's an internal staff that assembly uses so now that we have

sort of that prerequisite knowledge how do we actually how do we exploit this buffer overflow vulnerability so if we can figure out a way to overwrite the return address and store figure out where a IP is going to execute from from that return address we can actually control the execution process so for example if we write more characters into some sort of buffer so actually I'll get to that in a second so here we have source for some program we made we first create a function called a new func we create a buffer and then we use gets so gets is a function in C that allows us to get from standard input and then

store whatever we get into a buffer the only problem with gets is that it allows us to put in however many characters we want so it doesn't have this restriction of our buffer being only five characters long we can put whatever we want so in this stack we see that if we were to input hello into a standard input we would have a stack that looks like this we first pushed the return address for main that's this one zero down here that's this one and then we go to the new func we see to restore the return address for new funk and then we create a variable a called hello or that has the value hello five characters long

it works out perfectly the only problem is if we put in some sort of character that is longer than five longer than five characters we completely overwrite that value that was once there in this example we're overriding zero eight zero four two B which is the second thing in the stack with 0 8 0 4 8 4 because we push greater than 5 characters to this under this buffer so that means we're actually able to control program execution just by writing more characters than then what's allowed so this is really important for C developers because they need to make sure that no one is writing more characters than than what your buffer size is so I'm going to do a little demo

as I said I really enjoy CTFs and this was a really good example of let's see so this was a really good example I found on one of the CT ups I did of a make that smaller the other works there so this is a really good visual representation of what we're gonna be what I'm actually talking about we have this source right here so you see we go to main we take one of the arguments and we pass it into the Volm function we create a character array called buff that's only 16 characters long and then we use STR copy which just copies whatever we write to it or copies our input into that buffer the

only thing just like gets STR copy will not check for bounds so this means we're allowed to write however many characters we want into this into this function so here's what our actual stack looks like so you have our buffer that's a 4x4 so with 16 characters we have 8 characters right here of sort of null space and then we have EVP which is pushed don't really need to know about that but then we have e IP so eventually we are actually able to rewrite enough characters in C won't fit into ASCII mode so we write enough characters we can actually start writing values into e IP so that's definitely a problem but we can see this really cool function over

here called give shell which actually gives us a slash bin slash SH shell that's telling me how to do it

so let's look at what we have so we check out the makefile so this is saying it's gonna copy or it's gonna compile that overflow Tutsi file that we're wait we just looked out name it overflow too so we're going to use the objdump tool that we just talked about and find out where the location for that gift shell function is so that we can hopefully call it

so we can see we get a memory address over here 0 4 0 ad something quickly gonna show you that this works really well

before and we can see in this saved a I P this green space up here I just have written the 0 4 0 0 8 0 4 8 4 ad address in this so if we run it it runs it prints this out ads our our location and then

let me get a flag yeah see chips are fun that's pretty cool so that's that's a really basic buffer overflow vulnerability it really takes advantage of overflowing some sort of buffer to allow us to execute wherever we want to the last thing I'm gonna talk about is Linux system calls these are really important for learning how to your shell code because a lot of the time will need to call kernel functions or things in the kernel to allow us to do cool stuff so in assembly the that's the int 0x80 is just means let's call something in the kernel so you you set up your registers you call in 280 and then it executes whatever system call you tell it to

so specifically we're gonna be talking about system call one and sis call one is the exit siskel so anytime of program exits you actually have to call something in the kernel to tell it to exit the program cleanly so I'm actually show an example of this

[Music] so we start off we this is just a way that we can program and assembly we move one into EAX so we are calling cysts using system call one we move 0 into e bx this just means that we're going to be returning 0 and c this is a kind of a way to say this program exited cleanly and then we use this call int 0x80 so exits already so we created a binary off of that called exit and if we run exit it exits so how cool is that we were able to exit a program so we can actually use this I don't know if you guys can see that but this is just saying it says 0 over here

and that just means that it returns 0 so we it did exactly what we told it to we'll get a little bit more complicated here and use system call for system call for is right so this system call for will tell your Colonel - right - whatever buffer you want it - right - so we'll start off up here say create a a string called message msg we allocate it we say we want it to say hello world and then 10 which is just a new line operator we get down into the code we use system call for we move for into EAX just to denote we're using system call for we move 1 into e bx that just means

we were going to write to standard output you can specify buffers so if you want to like write into a buffer you can do that in ECX we put the pointer to the message which we declared above EDX we put how many characters we're gonna write we're gonna write 12 characters and then we call and ad so that's gonna execute that next we exit the program like you would like our previous example showed so I'll make this and run it and we see we get ello world and if we echo this we'd return zero so again it did exactly what we wanted it to and everything but perfectly great good

so now for the actual shell coding how are we gonna insert our code this is a good question so calling arbitrary functions is pretty cool but we really want to get shell that's kind of kind of really what we want so here what we're gonna do we're gonna go back to our previous example so instead of A's we have this kind of long string of A's instead of A's what we can do is actually insert insert the opcodes that we saw when we did the disassembly so if we insert opcodes here and then we set E I P so we overflow into e IP and set e IP to the beginning of our buffer our program won't know the difference

between the stack or the calls that it's trying to make and our buffer so it'll actually start executing what we say in our buffer so for this presentation there are actually tools to mitigate this where the program will know if the stack or it'll know the difference between the stack and our shell code body but that

so I can get two here we go so

so if we were to insert instead of those ages if we were to insert that middle column the be eight zero one zero zero zero and then point it to the beginning of that it'll actually execute execute exactly so making shellcode typically I say typically pretty much always we'll write our shellcode in assembly we'll use what's a tool called NASM to compile our assembly and then use objdump which was the tool we talked about to disassemble the binary that we just wrote to figure out the op codes that we want the reason why we don't use something like GCC or any sort of other compiler is because GCC does like spooky like really spooky stuff because the code is

so optimized that typically you won't really know what you're feeding into the buffer or feeding into this shellcode stack its GCC is optimized for speed and not size so we typically really just don't want to use that this leads us into making good shell code which is very different from hacking shell code so a good shell code is kind of what makes it denotes the or shows the difference between like like hackers and security researchers hackers will try to get the shell code as small as they can sort of as a game I mean if you've ever seen people play code golf it's kind of the same thing smaller is better and the reason for

this is because if we have smaller shell code we can fit that same the same code into a smaller buffer so if some programmer gives us a buffer size of like thirty first like 525 we need to make more efficient shell code to mitigate their to use that buffer the next thing shellcode cannot use two zeros next to each other and the reason for this is because [Music] strings in c are terminated by this character so if we have a string it'll kind of just shut the string off and kind of cut it where it is it's just kind of a rule that you have to have because of because of C so I'm actually going to do an example

so this is gonna look really familiar to you we just did this this is our old exit shell code or our old exit assembly we move one into EAX we move 0 into e bx and we call anti D when we do objdump

we can see how many characters is this we have 1 2 3 4 5 and we have 12 this is 12 characters long so I also wrote a new one that does a little bit of magic we start off by exhorting EAX with itself what this will do if you've ever done like binary operations XOR puts a 0 in something that's the same so if you XOR something with itself it's just gonna set it to 0 and then we increment EAX by 1 and I'll show you why we do this and then write for EBX we XOR evx with EBX so that EBX is set to 0 and then we call it ad so this is extremely helpful

because look at how much smaller our shellcode is now we went from 12 characters to 1 2 3 4 5 6 7 from 12 characters to 7 characters just because we didn't have these kind of crazy amounts of these crazy amounts of zeros here so we can see that the easiest way is not always going to be the most efficient way to do this

so now we want to insert our shellcode so we insert a shellcode into the buffer we point the EIP to the beginning of our buffer we magic happens and then we get a shell so that's pretty sweet so I did have a demo of this but I left my lab stuff on a USB for hours away so that sucks but here's a really good representation of it so we sure we have our very this very familiar code now we call get input we create a character array and size 80 and then we call gets on that buffer so we have this kind of 80 character long place that we can throw stuff into if we assume that 0x be 7 7 6 is the

beginning of our buffer we can insert the shell code that we previously wrote that that new the new exit function to the beginning of that buffer and then we write a ton of a's or just whatever kind of we have we run a ton of a's and then we overwrite a IP to set it to the beginning of that buffer in this case B seven six seven six and so what will happen is it gets obviously this would be cooler if I like could show you but it would go to the top it would do gets we override it and then it just exits and exits cleanly and nothing happens kind of pretty pretty cool

so here's a few tricks there's a lot of tricks to shell coding again it's kind of it's kind of an art you know you're trying to make it as small and as a kind of cool as you can get it so there's a couple tricks actually for shell coding and for kind of exploiting buffer overflows in general that are really really extremely helpful our first one is the knob slid so knob is a really cool command a really cool assembly thing it stands for no op it's 0 X or it's a X 90 so at X 90 and all it does is nothing which is the beautiful thing about it it'll all it will do is increment the IP

by one and do literally nothing so why is this useful the problem with with this is that when we overwrite a IP we need to set it exactly equal to the beginning of our shellcode we need to set it to 0 XB 776 and if we don't set it to that exact address we like it'll it'll do whatever it wants it'll just kind of flail around have no idea what it's doing but if we put no ops before the shellcode and then have our shellcode at the very end then this gives us a much larger region too to kind of find put our stuff in too so we start off our top sled so here we start off instead of all these

A's we replace it with an OP slip so we can set our AIP to anywhere between be 776 and the end of our blank space before E I P and it will knock not not not not not not find our shellcode and execute it that's extremely useful because it's a it's really hard to figure out where the beginning of our buffer is so this gives us a lot of room there's also another not really a trick but another kind of tool we have available to us is a msf venom msf venom sort of in the same tools packages as Metasploit and it'll generate shellcode for us it's not a trick or a tool really in

assembly but it'll help you do your research or whatever you want to whatever you're doing it for because you don't have to go through the process of writing an exact ve2 like slash bin slash sh to generate your shell well it's a good thing to know how to do that we do have tools that will do it for us and it'll even do things like Knob sled for you so you just tell it to like the shell code that you want to generate how big your buffer is and then if you want an OPS light or not and it'll generate that whole string for you so you don't have to really do anything now the thing yeah so it's

definitely definitely good to have so the last thing is recommended reading so this is a very can get very uh you can get very deep into this there's a lot to it so definitely reading reading and practicing are the easiest ways to get better at it shell coders handbook is a super great book that sort of steps you through the buffer overflow shows you how to exploit it to get a shell sets shows you how to set GUI d20 to get a root shell which is pretty awesome talked about other vulnerabilities other than buffer overflow really just cool stuff next asm tutorial if you want to generate your own shell code learning ASM assembly is gonna be

how you do that so if you can write it really efficiently you'll get better shellcode that's just how it's going to work and then learning about binary exploitation honestly this link isn't even about binary exploitation it's more about CTS find whatever all those technical

CTF time gorg CTF time is has lists of all CTF that are coming up honestly this is going to be the best way to practice is by finding out whatever CTF is coming up next and just doing it because doing it is gonna be how you learn you can read a lot but doing it is gonna what this is gonna be how you get better I would definitely recommend checking out CTF time recommend participating in as many as you can because you will benefit greatly from this so thank you for listening if you need to contact me here are some avenues to do that and that's it

[Music] does anyone have any questions Chris yes I have never needed to obfuscate shellcode

okay yeah so a shell code obfuscation of it'd be useful I mean I guess for like yeah I mean yeah detecting like there's there's death there's programs that will check like if malicious code is being run so if you can actually obfuscate that then you'll bypass a lot about so yeah

so that's actually a very good point if you want to just write shellcode writing as smallest is going to be best if you want to write shellcode that will bypass a fire a firewall or whatever antivirus you're gonna want to Optus gate it pretty much that yeah pros and cons of different things hey what's up yeah exam yeah yeah so if you were to yeah you need to find some way to compile your arm but yeah if you were trying to exploit an arm system exact same way you write your arm shellcode do your objdump or whatever your tool of preferences and then throw that shell cut up exit yeah exactly the same way you just have to

use a different compiler

especially with art yeah yeah yeah

yeah sänger yeah I mean it's there's definitely some stuff that's uh if you're gonna do the labs on it you need to actually download an old version of like like Red Hat or something because there's a lot of a lot of new like things like stack Canaries things like aslr are newer ways of evading like shell coding attempts so there's definitely it dates it a bit but the shell coding handbook does explain a lot of like these new at the time newer technologies like bypassing ASL are bypassing stack Canaries so that you actually do get it still works but you know using all of your resources so looking at that book is going to be good but using like doing

CTFs is going to show you how sort of modern-day or how we how we do it five years

yeah so yeah so my the lab that I had set up actually I had disabled a SLR in the kernel because you have to do some kind of more advanced techniques to bypass a SLR but yeah yeah yeah yeah you can bypass just disable it if you don't want to have to deal with it yeah hey

actually I really prefer radar - it's a really awesome tool and it can do a lot of it's kind of meant for meant for this kind of stuff like binary exploitation so I would definitely recommend radar to recommend checking out that tool but objdump works you know you can do some like awk stuff to get it to actually output the shell code you want but radar - is definitely a good good tool to check out for debugging

I mean it's like there's a process to how you bypass one of each one of them so I mean learning after you learn the process I mean they're both as is just a process yeah to doing that stress so that's a lizard it's a gangster lizard who's actually in space yeah

thank you very much [Applause]

Shellcoding basics

Related talks