← All talks

Automating disk and memory evidence collection in AWS

BSides SATX · 202041:13119 viewsPublished 2020-08Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
TeamPurple
StyleTalk
Mentioned in this talk
About this talk
Title: Automating disk and memory evidence collection in AWS Presenters: Ryan Tick & Vaishnav Murthy Track: In The Weeds Time: 0900 BSides San Antonio 2020 July 11th, San Antonio, Texas Abstract: During an incident, answers are needed quickly. Often this starts with evidence collection and log correlation. At Goldman Sachs, we have automated an event-driven cloud response platform that uses AWS native services to successfully collect disk and memory from compromised EC2 instances. Speaker Bio: Ryan Tick and Vaishnav Murthy are cloud security architects for Goldman Sachs, responsible for automating the detection, analysis, and reporting of security incidents in Goldman's public cloud environment. They work with the firmwide Security Incident Response Team to design and conduct purple team exercises and respond to tier 3 security incidents in the cloud. Prior to working at Goldman, they were digital forensics and incident response (DFIR) consultants that led high profile cybercrime investigations for Fortune 100 clients across the globe. They both hold various AWS and GIAC certifications and are GIAC advisory board members.
Show transcript [en]

tells all the commands run their status whether they succeeded or failed and the time stamps when they were attempted uh again this will be a pretty technical talk but before we deep dive we wanted to provide kind of some main takeaways to keep in mind as we go through the technical details so the first point we have here is going to emphasize that there's a lot more going on behind the scenes than you may originally think um as vaishnav stated before we're dealing with three different aws organizations that total over 3000 aws accounts not only in that but in each account we have various types of ec2 instances running um these instances all have varying

forms of encryption and may or may not have some level of logging or network connectivity so our solution uh when we were designing it it really had to consider cross-scan access ebs and snapshot encryption sharing kms keys for example and varying levels of access logging so that's point one point two uh is pretty straightforward but if you automate a task correctly your mean time to respond your cost and your error rates should all go down this one again is pretty self-explanatory but even though we invested a lot of time in building this solution and getting it off the ground we believe that our return on investment is high and we're saving the firm money and time in the long run

so next we wanted something that could scale easily uh you know given our footprint in aws we're dealing with a kind of a large magnitude of accounts and that's why we chose to use a lot of native aws services um all of our resources are spun up and used only when needed and when we don't need them they're not running so this really helps reduce cost and increase efficiency and uh lastly so we wanted to do this accomplish this to be independently to be done you know in a parallel manner so in a similar vein a collector vm instance is spun up per volume to be collected so that volumes that we collect are being processed in parallel so the

second thing we want to talk about is full disc versus triage collection so we focused on a full dd image for this process mainly due to the regulatory requirements in our industry however triage collection could be appropriate for your use case uh we're basically doing the best of both worlds so we we do a full dd uh capture but we're also uh stream relevant log files uh via the cloudwatch logs agent uh from production ec2 instances to basically fulfill the use case of a triage collection so we have access to the logs before uh before we collect the full disk and then lastly or thirdly streaming evidence directly to s3 so we stream the dc-3d output directly

to s3 without any intermediary storage this is to reduce the resource consumption of our collector vm instance awesome so if we move on to point number four here um so our entire process is audible with cloudtrail and done with vpc endpoints to ensure that everything is transmitted over the backbone aws network and having an internal audit trail of our actions is always very important so you can tell someone specifically what you did and when you did it so our legal and risk team loved us for that um the next point here is basically how fast

everybody else for being patient with us um all right i will let you take it away all right uh hello everyone today we'll talk about automating binary the obfuscation processes using dynamic taint analysis and symbolic code execution we'll be focusing uh solely on code virtualization and how we could obfuscate that my name is back i work as a threat intel analyst i'm a theater graduate and i have over five years experience in fan testing and red teaming my friend osama here is an rnd engineer at trap mine

[Music] science

[Music]

uh

[Music] bye

okay hopefully you can still see my screen um this is wrong yeah all right so uh we will focus on code visualization

everyone thanks for waiting we still have some significant av issues from the presenter we are still working through them as best we can sorry about this

uh then everything is fine so you don't have to deal with words and bytes etc um great do i go to the all right great so um

with uh 32 stack machines we said that we we can use less pop codes because of this and you have a minimal processor stack to keep track of uh an example can be maybe uh i think jvm used to be a 32-bit stack machine and then it they switched to a 32-bit stack and register mission so you also don't have you don't deal with registers and you don't deal with flags so uh for our code visualization uh we we we made this 32-bit stack machine and it supports 11 instructions uh fairly simple uh we have arithmetic operations comparison store and load uh conditionals read and write instructions in addition we have a roth instruction uh this throat instruction allows us to

rotate things on the stack which makes it just easier for us uh you can actually use any two instructions sorry like two instructions is enough to to do whatever you want to do [Music] but uh we were more comfortable with 11. so to illustrate uh how this would work it's basically like assembly uh but i'm just going to go over this one example that we uh we worked on uh getting the absolute value of an integer right so our input is -5 we get it using read and then we duplicate it we used to uh copy it on top of the stack and then we push zero now we will use this zero to to check if uh minus five is greater

than or less than zero we use our gt instruction and it returns one if minus five is less than zero so now we push nine on top of the stack the reason why we push nine is because we would actually uh jump to instruction nine so this is where we're going to jump to uh after a conditional jump so here we have our conditional jump uh since it returned one that means it's less than zero so our input so it's a negative value now we basically push minus one on top of the stack and then uh we multiply minus one with minus five thus getting the absolute value of our given input and then of course we

would have to write that back up onto standard output here in this case so our interpreter structure is fairly straightforward we fetch we decode so bytecode comes and we understand where this is going to go and then we handle it every uh so a handler is basically a routine that handles your uh virtual instruction for every uh for every instruction you need a different plan handler and then of course we terminate after we kill the process after all of the handlers uh or routines uh have been run so this is the diagram as you can see i'm just speeding up a little bit uh because we've lost quite an amount of time so here you can see uh if you can see our class

named interpreter you can see that we have a pointer to the beginning of our stack a pointer to the program we're going to run the size of our program and the program counter to see which instruction we're going to run next there are commercial virtualization providers uh these do a much more complicated job than uh how our implementation does they actually also uh work as a crypter and a packer um the tigress c diversifier of the skater is a good one uh it's generally used for as far as we've seen academic uh research uh vm protect is we um protect them from either you can actually see it used by attackers uh vm protect even apt

groups and then there are of course many other alternatives but please mind that these uh solutions are also used for digital rights management and uh for basically any non-malicious purpose uh where you don't want people reverse engineering your code now back to our uh the obfuscation techniques we mentioned earlier the first one is dynamic taint analysis taint analysis is uh we use it to track the flow of information in a program uh specifically to put it the track of information uh between two cert not two uh certain points uh in a program we do this by labeling certain memory locations uh as tainted and then we propagate the state uh throughout different memory locations uh so

if a memory if any value uh gets generated or derived from a tainted memory location we also taint that uh memory location where that single value is being held we do that for everything that drives from uh obtained source uh and then we call uh we call the other tainted values uh sinks so portions of the memory uh so portions of the code that use that memory space sorry are these things so the way we track information we do it by attaining policies uh they're very important to us to get actual results that you can use so just to go over the terminology here if um if anything derives if any value addressed from a taint

source that value is tainted and any other value that derives from a tainted source a tainted memory region is also tainted so when we talk about policy we have three properties uh taint introduction propagation and checking introduction is where we're going to introduce the taint propagation is how it's going to spread out in memory and tame checking is the action we're going to take once a certain condition is met so taint introduction generally uh we can introduce this to user input to library returns or cisco return values or memory you read from a file and values you read from a file and things like that uh this is a good example for obtain propagation it's

very simple we have variable a that's user input now we decide to obtain user input in this example because we think that's uh good for our purpose right so you choose what you're going to think so we take variable a and we put that in a memory region a and then we taint that region manually and then in routine 2 when variable b is generated using a we also take the memory region where variable b is being stored and this will go on for now you can use this uh in exploitation prevention uh let's say we have a shell code let's talk about a sheltered override exploit example somebody has just uh managed to push uh

shellcode on top of the stack so and this malicious child code will run right the moment um so how can we stop this is that if we taint all user input and if execution at one point uh in a in time ever uh goes to a place that is tainted that means we are executing user supplied uh code right user supplied region so we can actually stop this with um dynamic change analysis engines with rope exploits uh you can do something very similar you won't have shell code but you will have a function pointer that's overwritten or a return address that's all written so if any of these are overwritten with um tainted values which are coming from

user input in this scenario you can actually stop execution so here an example is we have variable a again and then in routine two we are copying uh variable a on to uh b right we're using stir copy but uh as you can see a is twice as big as b so we'll be overwriting the return address now uh when execution uh the execution flow comes to the return address we will see that it's tainted and the dynamic taint analysis engine uh so the the dynamic binary instrumentation framework will actually stop execution there but i mean if this was the case everybody would stop exploitation the problem is that there is an expensive runtime overhead

so i said we use dynamic binary instrumentation frameworks and the way that this would work is that the code would be executed in the framework before uh so portions of the code would be executed there before they're actually executed uh on the computer itself uh so this brings a very expensive runtime overhead another big challenge is how to define uh the tank profit to how to define the taint propagation uh because there's you'll be dealing with data dependency control flow dependency and implicit flows so a quick example uh you can use implicit flows as an anti-taint analysis method right you have variable a which is your tainted method but if the code basically uh generates b uh variable b

so updates variable b using uh tainted value a but indirectly as you can see here it's uh it gets incremented uh twice the size of a now here you can choose to taint uh b as well but then you will also be tainting other things that you wouldn't like to taint and that would be overtainting and you won't get so good results the same thing goes for undertainting uh in this example there's underpainting so this is always a challenge you have to be dealing with all right um i will now give my screen to sama if that's possible and he can continue with symbolic code execution

ah okay i'm so sorry uh i forgot to unmute myself all right i'll just repeat the few sentences that i spoke so burke covered uh the design and implementation of with it of the 32-bit virtual machine the stack machine that will we will be using to demonstrate the application of uh our de-obfuscation procedures uh our defecation procedure comprises of two techniques dynamic paint analysis and symbolic code execution so burke covered uh dynamic gained analysis and now we're moving on to symbolic code execution so fundamentally an execution of a program is a series of computations performed on data held in memory the data is read manipulated by the instructions and then it is stored back in memory

symbolic code execution works in the same way except instead of actual values being manipulated we work with symbols that's the name uh the result of computations are stored as expressions involving these symbols so for example uh memory location one logical and memory location two will be stored explicitly like that uh so this allows us to consider logical formulas describing our program and the advantage of this is that it gives us the ability to reason about our program and answers a few important questions that help us with program analysis and binary analysis problems issues such as whether a particular program state is reachable and such things so i'll go over a few use cases briefly of

symbolic code execution uh these include detecting infeasible paths uh generating test inputs to maximize code coverage uh symbolic code execution is also used in conjunction with sas solvers to generate input uh for automatic export generation uh however in this presentation the application of symbolic code execution we will be focusing on is backward slicing we will discuss in detail what it is later but uh briefly for now program slicing is used to find which set of instructions contribute to a value at a certain point in the execution of a program all right so uh in order to build an intuition of what symbolic execution is we will compare and contrast it with concrete execution so concrete execution

over here is uh how normal execution happens ie execution actual data so uh as you can see in the table illustrated on the side slide over here symbolic execution executes in symbolic values whereas in concrete execution we use actual values with symbolic execution we compute logical formulas over these symbols whereas concrete execution determines exact values as determined by the execution in symbolic execution we also emulate all possible control flows the reason for this is that what we're doing when we reach a condition statement is just creating a formula for it and then continuing on all execution paths normally however in concrete execution during one run uh we can only execute along one uh control flow path

all right so in order to uh yeah in order to explain um properly what symbolic execution is uh i will go over a small example uh but before that we'll have to understand uh a concept of the concept of symbolic state so uh symbolic state can be regarded as a parallel of concrete state of the program at any point in the execution and however in the symbolic state this uh this symbolic execution the state comprises of two components the symbolic expressions and the path constraints so symbolic expressions are is either a symbolic value or a combination of other symbolic expressions whereas path constraints encode the limitations on the symbolic expressions as determined by the condition

statements so briefly going over this example what we see on the left is some code with certain with multiple different control flow paths and on the right what we see is uh a graphical illustration of what a symbolic state might look like so we have three symbolic variables in this case denoted by alpha beta and gamma and three concrete values x y and z uh concrete values actually take on actual data during the execution whereas the symbolic uh values are represented by their logical formula the important thing to note over here is at the the leaf nodes of the tree denoted by the diamonds have certain logical formulas in them these logical formulas uh describe

under what conditions this particular control flow path will be taken all right so so far what we've done is we've covered uh the theory briefly the theory behind dynamic gained analysis and symbolic code execution so recapping uh dynamic analysis allows us to track information flow between different sections of the program whereas symbolic code execution gives us the ability to reason about the program by constructing logical formulae uh about the different uh logical formulas regarding the different control flow paths all right so for the obfuscation procedure the tool that we will be using is triton uh triton allows us to perform symbolic execution uh backward slicing and dynamic analysis so in order to contextualize our de-officiation procedure what we will do

is we will uh run it over an example virtualized routine so in this case the virtualized routine that we'll be working with is a simple factorial algorithm i'm sure everybody is familiar with factorio uh just going over briefly what that code describes is that we begin by checking whether input is greater than zero input less than zero is obviously undefined uh the factorial function then begins with the structure this stack as illustrated on the slide with the top of the stack being the counter the second value being the running sum and the original value and the third uh item on the stack being the inputed value we then manipulate the stack according to the logic necessary and

arrived at the sum so uh the defection procedure we will be presenting is a simple simplification of the approach described in the paper symbolic definition virtualized code back to the original the algorithm involves three steps primarily uh step one or step zero uh would be identifying the region to analyze the next step would be performing taint analysis to isolate pertinent instructions and the final step would be reconstructing the virtualized routine so uh the first step which is manually reverse requires you to manually reverse engineer the binary in order to identify whether uh a virtualized routine is present or not obviously this requires some uh experience handling and reverse engineering uh virtualized code uh there are some

projects that attempt to do this automatically but they are such as vm hunt for example but they are not generic and what they generally do is look for heuristics uh of known uh they look for basically look for artifacts of known uh virtualization over virtual virtualization solution providers uh so they are pretty simple to circumvent by just designing your vm machine so that those artifacts are not present so uh dynamic gained analysis which is the second step uh we begin by identifying the source of input for the virtualized region so the main purpose of this step is to isolate the vm machinery meaning we want the set of the instructions that belong to the bytecode

interpreter which executes the virtualized routine at runtime this step is conducted so we can pinpoint the region to emulate during symbolic execution the virtualized routine is likely to be taking input from some external source which determines its execution so for example in the factorial algorithm we presented the the routine takes input from the user but the input input could come from anywhere the uh any any source of in the external environment once the vm terminates the results of this computations can be used by the rest of the program uh so the approach is to identify where this input is taken from then paint the memory location where this value is held as the vm executes the taint will

propagate until we reach a predefined sync where this input is then used the output of this instruction will be the output of this step will be an instruction trace of the vm execution by doing this what we have managed to do is to isolate the instructions uh to some degree of accuracy that illustrate uh illustrate that define the execution trace of the virtual machine so uh i'll briefly discuss how one can go about finding pain sources i mean one approach would be to use stress or else to identify a possible library and system calls and then viewing their use in the assembly to see whether the input is used anywhere in the suspected code portion but this

is this approach is simple and may not scale very well to complicated programs so a better solution might be to use something like a binary instrumentation framework where we can automatically look at the values coming in from different sources track their progress through the execution and see whether they end up being used in the suspected portion of the code this might be a better approach for a more complicated program so the final step in our de-obfuscation procedure is symbolic execution uh so what we do is we perform symbolic execution uh and then compute the backward slice from the vm output to the painted input so i'll go over some of the details of these steps so we can understand what is

going on over here so symbolically executing the code so what we do is we begin by symbolically executing the code from the paint source to the vm output uh the reason for this is so the so in order to do this what we need to do is we need to provide a memory map which is something like a snapshot of the memory before we enter the virtualized routine the reason we do this is because we only want to symbolically emulate uh the virtualized routine and not the entire program uh this is so that we can actually drive execution to the uh virtualized routine so that we can then begin to analyze it uh this memory mac

and again we build using a dvr framework or a debugger and such things so as we symbolically execute each instruction we build a symbolic expression for them uh each instruction is defined as a function in brick vector logic uh so i i will skip uh some of the details about how the symbolic construction of each of the instructions happens but the main idea is that what we do as mentioned before what we do is we construct a logical formula associated with the memory locations being operated on so uh finally we reach the point where we're actually in a position to perform the backward slice the pseudocode for performing the back precise straightforward we start at a given address

we continue symbolically executing each instructions until we reach the point determined until we reach the point from which we want to compute the backward slice uh once we reach this point uh what we do is uh yeah once we reach this point we can compute the backward slice which then gives us all of the previous instructions that contributed to the value of the instruction at that point uh what this will allow us to do is it will isolate the virtualized algorithm from all of the vm machinery so that we can actually look at the data manipulations that happen uh in order uh for the uh the routine the obligated routine to generate its output so uh

yeah so basically uh what we end up with is uh uh we get a trace representing all of the instructions that contribute to the value at the slice address for our factorial function examples once we execute this program we get this neat little trace of alternating add and multiply instructions working through these in the debugger and observing how the the values actually change across the execution it's pretty simple to conclude that it's a factorial function however for more complicated routines it might not be that simple however at least what we have done so far is to completely isolate completely extract uh the virtualized routine from the entire office the virtualizing office station so uh to recap what we've done so far is

uh by computing the backward slice we have gotten our virtualized routine which means we've effectively removed all instructions related to vm machinery and thus are left with the instructions that execute that or that execute the logic of the virtualized program so at this point what we've we've reached the final step we've managed to extract the offscated routine and now uh begins the grunt work of actually figuring out what that routine is trying to do and that is the end of a talk we are open to taking questions and if somebody cannot get their questions in right now they can get in touch with us over twitter and our handlers should be available on the screen right

now thank you thank you burke and your sama for uh for uh bearing with us for working through the technical difficulties and still being able to deliver your talk um thank you for organizing this event especially during times like these and thank you for letting us present our work over here

you