
besides volunteers everybody that worked on and volunteered on security onion con yesterday as well this is this is and continues to be one of my favorite events so so thanks it's it's been a good weekend so far hopefully everybody here is having a good time see I'm Paul Nelson I work I work for Target and hunt bad guys for them for a living this project there isn't really so much about my day job we're gonna talk a little bit about one of my side projects I actually shared for the first time here last year kind of the inner workings of a project I have going called scum BOTS and it occurred to me I should just you
know do like a real quick overview so scum BOTS is a paste bin scraper and Twitter bot it looks at new pastebin uploads in near-real-time started out looking for Windows PE files and then analyzing them to see if they were known families and malware it still does that but last year actually like the morning of my talk I committed to to prod the first of what would be kind of the PowerShell hunting tools so now that that stuff's been up for almost a year I want to talk about some of the lessons learned in hunting PowerShell kind of on the open Internet because it's been it's been sort of an interesting year so but first let's level set on hunting plus I
don't think you can do a blue team talk without quoting David Bianco so here's my obligatory David Bianco quote it's not the pyramid of pain you're welcome but so so so hunting is any manual or semi-automated semi-automated method of finding new security incidents and I would I'd say right like David when he came up with this was thinking in terms of this is a this is a blue team operation within the context of a particular network but I would also encourage you to consider the possibility that that one incidents are not always observable only within the the victim network and to that hunting is the thing that you can do out on the open Internet and the more of it that
blue team does the better we will get at reckon bad guys days so let's talk a little bit about the the desired outcomes that we have for our our hunting initiatives though right so so first of all in this particular case I'm out to find malicious code right I'm looking at static artifacts staged in a particular site these techniques and tactics could be reused in other venues but we're looking you at code that code that's going to be pulled down and run we're looking in some cases to discover a previously unknown malware so we're casting a wider net I don't want to go for only stuff I already know about at a detailed level I want to cast a wider
net and understand you know find things that I maybe didn't know about previously also wanted to take a look at threat actors and their behaviors right we're catching them kind of in an early phase of the the kill chain and understanding timing and particular tool marks in particular because we don't in in open internet hunting you don't always necessarily get to understand sort of victimology or even necessarily the endgame but getting a sense of where yeah where common infrastructure exists what what behaviors they exhibit and you know right people are creatures of habit right whether they mean to or not they generate a signature and they leave their mark on on any operation they undertake and then finally right
building new detection logic so let's talk specifically about PowerShell and trying to find bad PowerShell so PowerShell comes with 10 these are the 10 most commonly used obfuscation methods that that I see with PowerShell code and the reason that they're the most commonly uses these are natively supported now are there others yes are there variations on these sure but this is this makes up the majority of what you're going to see in the way of obfuscation taking place in and around PowerShell so so basics before and I'll padded basics before I differentiate the two because they get used differently but also being able to tell when you're looking at something that's you know binary data versus versus text data even
Winterson in code format has a certain amount of value compression hexadecimal and decimal in coding because PowerShell can do some do some interesting things by being able to pull in and leverage native net capabilities it also has this horrible mode where it can create a byte array that is that is mixed decimal and hexadecimal and you can read it but good luck building in Delta an automated parser for it it's like the meanest thing ever X or native XOR happens in in PowerShell fortunately if you don't reimplementation PowerShell like if you do it in line like the little BX or command so that's beautiful because single byte XOR is highly detectable and reversible as opposed to multi byte and
then then so those are the encoding and then string manipulation which includes things like Daniel Bohannon's da sophistication tactics and techniques of backticks and you know some of the other kind of character escaping and spacing and things that you can do string concatenation so string concatenation and reassembly can go from simple things like taking a keyword that we might be looking for and just breaking it up and then putting it back together with quotes and pluses it can also be you know order reassembly using array and array orders things like that string reversing is a native functionality that PowerShell has and also character replacement right so we can build a string that might contain a payload but
then we change this character value to that character value and then at the end we put the right character back in place so that string based detection gets harder and so it's important to kind of so this is just a level set we'll kind of get back to why this matters a little later so I want to talk about so I came up with an analogy one that I thought might land because I'm in the I mean I'm in the south and and at a security conference but for those of you not familiar with spirit distillation the process of distilling a spirit right is right it's actually a great aspirational model for how we ought to
think about hunting and detection development so you start with these very large quantities of raw materials like water and grain and you know so on and they you go through these kind of iterative processes that refine and extract and simplify and enrich the the raw materials until what comes out the other end are these small quantities of high quality highly prized output right so you know if your operating is still that that might be you know whiskey and if you're operating a if you're operating sort of a detection like a like a hunting detection pipeline right you're going from lots of bulk raw data into a model that you know results in hopefully high integrity detections so
here's a here's a visualization that simplifies it and breaks it out into taxonomy that I like to use and this actually aligns pretty well with kind of how I've architected scum BOTS right so we start with we have a lot of raw data which is literally everything that gets uploaded to paste bin we do a little bit of metadata selection but then we're gonna apply keywords and pattern detection to the actual to the actual paces they come down now that's highly prone to error and that's okay because then we have what we end up with on the outside of that assorted data and we put that through type validation in particular if we're looking at an
encoding type we're going to attempt to decode it that's probably one of the best like deterministic ways to validate that we properly detected a piece of deep of encoding is can we apply the decoding and actually get valid values out the other side and especially for things you know like basics before that you have have kind of padding and checksum capabilities right like you might there are some things that really lend themselves into this and then finally right now we've got you know kind of a targeted data set and we can apply may be more specific and targeted logic to understand the threat and or understand the particular attributes of that data that we want to analyze and
pull out and coming out of that we can build some hide high integrity high fidelity detections but what happens when we're dealing with obfuscation is is that you have to have this layered approach you have to work in rounds so that's my premise and at this point I'd like to kind of pivot into you a handful of case studies things I found you out on out on the internet on paceman so let's talk about ransomware because who's not talking about ransomware so boron ransomware is a Russian ransomware kit I don't know much about the actors or necessarily their intended victim ology I don't think they probably know much about their victimology either other than hopefully people with computers
that will give them money to get their computers back right ransomware profit model fairly simple so one of the things about boron is that so so it's loaded via PowerShell and at least one of the actors involved is staging their their droppers on Palin on paste bin so they have taken and modified invoke PS inject PS one from the PowerShell Empire kit the payload is basics the four encoded PE payload they it's a combination of basics t4 encoding of the actual boron ransomware payload and some simple string concatenation and they randomized the variable names in you know in invoke PS inject PS one in hopes of making it a little bit harder to detect I would I
give them like a 2 out of 10 in terms of obfuscation I mean this is this is you know if you this is the code so random variable name all the way at the top but we get right into I mean that's a fail it's fairly obvious that that's a big you know blob of base64 if you stare at basics before you can see TV QQ so PE header lots of A's so lots of null so it's probably a binary file we get down here into the actual loading code that will you know it goes through and kind of handles the the conversion it does a few other interesting things but this is all like you know variable names and
everything in here that's all like true to the original PowerShell Empire code they did unlike unlike many other actors they did strip the comments out so there's that but but then down here all the way at the bottom you can see right like invoke and then here's the randomized variable name and then the other variable name right so so if you're if you're a detection rule keyed in on exactly that default from PowerShell Empire they would bypass it but let's talk about detection performance because that's actually kind of what we're here to talk about so over this family of malware I found 34 samples over the last year and along those 34 samples here are the string
matches that we hit on so you get string reflection assembly name hits 68 times so that's perfectly - per sample from base64 string hits once per sample and then we get so we got 31 that hit on one variation of TV cue cue and another one the or sorry TV PQ because it's a dll that they're loading into memory and then another one and then that bottom string there this is a backstop I use I'll talk a little bit more about it later but that is a substring of so if you've ever looked at a Windows Windows PE binary you know the this this program cannot be run in DOS mode that message except it's not always this program
cannot be run in DOS mode sometimes it's this program can only be run in Windows mode and sometimes there are some like excellent typos and that string doesn't even really matter but it's just kind of always there because it's a convention so for whatever reason his program can shows up in almost everything so by making a couple of variations on basics de for encoding of that we can roll up stuff and because well and I'll talk a little bit more about why that's a nice piece of detection later but so anyway so so moving on but I'll come back to this also so didn't know Kibby ransomware basically the same thing just different family or variant somewhere
different act or stage and on paceman using she'll also based on Empire but instead they picked invoke reflective PE injection they didn't bother with string concatenation although we could we could argue if you'll write so is that really string concatenation or is that just Pepe compliance so so pseudo-random function it's so like they did less work the so did the so dinner Kibby folks did less work with their obfuscation so you can see like up there it's still function invoke everything else is as it was but that function invoke one of the things that uniquely identifies this builder so I guess I'm burning this piece of attribution is that it's always invoke ash and then 15 uppercase letters
no numbers no lowercase always uppercase letters so and then here's some more of the the actual code itself and so you can see down here not Pepe compliant it's just one big long string and they called it PE bytes 32 just like in the original PowerShell Empire you know attempts in a MSI bypass tries to turn off script block logging so you know Empire things so let's look at keyword performance for this one so different actors different piece of ransomware different decisions but they stole code from roughly the same project and some interesting things persist so at 161 samples of soda no QB as of like a like a week and a half ago when I finished
this part of the the deck there's lots more now I saw some just this morning but 161 samples two to one ratio on that system reflection assembly name one to one ratio on from base64 string and in this case the TVP we only see two dll payloads and then 159 of the regular TV QQ style PE headers so but but so what we end up with though is this ratio that we can describe and it persists between the loader used by actor a with ransomware a with the borrowed code that they had and to be in ransomware B with the borrow code that they had even though like we've I think they're operating independently this pattern persists
because you know they stole code from PowerShell Empire so that's awesome and what you can describe is basically these loaders are gonna call system reflection assembly name twice per per run from basics before string once per run and they will contain one p like base64 encoded PE file somewhere in them so strings that describe that are also valuable so that pattern repeating means that we could go ahead and use something like your a rule to go ahead and build detection for that loader so you wouldn't hunt with this but this is actually pretty decent detection this works on about three-quarters of the actual PowerShell loaders that come as part of empire agent here's to take a
picture or write this down so at the end of the talk I have a link to my github and this is out on github as of this morning just as a sample for for the talk so pretty simple strings based approach and just using a little bit of logic in the actual condition to dial that in so all right let's talk about a few things that are a little a little little less structured a little less neatly well let's just say it's not Pepe compliant so PowerShell keylogger for whatever reason there there has been just a glut of garbage PowerShell stuff being used by attackers on on paceman lately and and this is one so it started
out as started at 7 2016 blogger named coder girl published they like you know sort of a hypothetical hey what would it take to write a keylogger in PowerShell and and and so so she did and and the blog post is fine somebody took it and modified it so that when it's done writing to the local file it then exfiltrated content's of that file is an email attachment over email right like over TLS like SMTP TLS port 587 the code itself is not obfuscated or encoded and there are multiple variants English and Spanish a lot of them include her comments there's a typo that was made at one point that persists that yeah so there's some interesting lineage
even though there's not much code so this is an example of the actual PowerShell keylogger that I found so like right there at the top is the config where we set up you know the subject and you know the to and the from and the password and then you can kind of see how you know how it's hooking these different functions from user 30 to DLL and starts to build the like an endless loop that captures keystrokes and just writes them to a text file in kind of a garbage format so I know what you're thinking right he's collected over a hundred samples of these these key loggers that are intended to steal victims information and he has all of
the credentials that go with them - there's no way the next slide is just a list of all the email addresses and passwords belonging to the bad guys right so so here we go so I have not and you will not because we would not violate CFA a logged in to any of these accounts but but there they are for for the room and eventually the entire internet to see and he and and I know now what you're you're thinking okay cool but he's definitely not gonna point out all of the pentester accounts that were used that enrolled up in here when there's absolutely no way he's gonna point out all of the typos and do-overs
either so so that's fun bad guys aren't always the best at what they do so let's talk keyword performance for a minute um so what I really rolled this up on was hunting for convert to secure string which is a so converts a secure string is a simple like encoding encryption function that that PowerShell includes natively to be able to handle passwords securely so what's funny about this of course is is that that they put the password in clear text in the payload that gets to the victim but they have to do convert to secure string in order for the actual like SMTP message handler the powershell uses to like authenticate to Google so they like like you must handle
the password properly like after you're done with your attack so so we so we catch you know convert to secure string pretty reliably PowerShell - some of the implementations have just sort of a little bit of a variation on how that gets called PowerShell dot exe PowerShell - with a new object system IO I don't I would have to go back and look at why that one only once I also don't know why system dot net fires but so you can see like it's a we rolled it up on basically a single keyword it reliably works because they have to have it in order for the actual like exfiltration connection to happen but this isn't like so this is good
hunting logic it's pretty poor detection logic like you wouldn't you would definitely not want to deploy convert to secure string to all of the script lock logging that you have turned on in your environment unless the unless you don't have it turned on in your environment in which case then you knock yourself out zero false positive zero true positives so let's talk a little bit so another kind of similar piece of garbage that I see a lot is a Wi-Fi password extractor and and this one so it coated by whisper grooves I don't know what that means but it shows up in most of the samples so whisper groups congratulations you you are almost as good at writing code as I
was when I was 12 it's so because what they wrote was a batch file that uses sort util to decode and run basics before encoded PowerShell so so first giveaway question why is it stupid that they use cert util to execute base64 PowerShell code in the back yes so the answer for those of you that couldn't hear it is that PowerShell can just natively execute a 64 encoded PowerShell that's correct so um would you like the the tap or the Wi-Fi Nick all right thank you so yes that's 100% correct there's zero reason to so cert util run from the command line as suspicious as all get-out and they do it anyway because they don't know how to
run - encoded command with PowerShell but you can see here we'll actually look at the so the same thing by the way it's it it steals steals the passwords from you know WLAN profile object and then X fills them over port 587 so this is what it looks like when it shows up so it echos a base64 string that is a valid PowerShell like well here there so now one PowerShell execution instead of a CMD dot exe with an echo with cert you to then followed by PowerShell again so it's pretty hilarious to watch that one run in six mana it's really noisy but and so this is the actual code itself pretty simple anybody else want to point
out anything else that that's potentially problematic here like how RM doesn't work in batch files but anyway um but like I said well so so here we are again more and so you can see though again we rolled them up you know new object system that network credential and then where is it but somewhere up there we've got the we've got the the same security the they came the secure string we have to used to generate the valid credentials but so it works it's like the XML method is basically the same between these two even though they do something different I don't know like and I've seen literally hundreds of these things and I don't know like how valuable is it to
steal the the Wi-Fi creds off of off of a device staging and I'm paceman I have a theory my guess is that somebody teaching it in a class so anyway here we are and I know what you're thinking there's no way that the next slide is all the passwords but of course it is man there's there's no way I'm gonna make fun of the red teamers so so it would appear that just based on the name of a few of these Wi-Fi stealer exploiting ducky at gmail.com that this is probably tied in to you like bash money and and rubber ducky attacks so all right Oh typos and reduce and then and then this one this one is for my
coworker Adam I see you Adam I see you all right so keyword performance on this so interestingly enough so remember we added obfuscate like they added a layer of obfuscation by bay 64 encoding the payload so instead this time we're we're hooking up on basic that base64 encoded variations on system net and a little bit about base64 real quick is you can take a string and depending on the position of that string relative to other characters in that story you like so a substring inside of a string that you're coding with BAE 64 and depending on that position you'll end up with different values right because basics before essentially it creates three character blocks of encoding as it goes
so anyway three different possible ways to say system net in base64 we have detection for all of them and so what you see there if you add all of those up is that they basically equal 2x 82 so so system system net occurring twice in there in base64 encoding and we had three of them that only three of them out of the 82 detected with PowerShell dot exe space - because and this is probably another interesting tool mark because 79 of them it was PowerShell space space - I don't know why but but I also don't know why they use cert util and I don't know what and I also don't know why I don't have a
rule for certain util I'll be fixing that okay so so kind of looking forward a little further powershell empower your agent so we talked about PowerShell empire droppers earlier they're being used by ransomware PowerShell Empire agent is the rat it's the implant portion of Empire that's implemented fully in PowerShell and I see a ton of this so it's it's an open-source you know empires an open-source like post exploitation framework it's a favorite of red teams and actual criminals to the world over actually some of the stuff that the bots tweeted out earlier this year was Finn six with implant activity related to handoff between another threat group and their situ in furtherance of a rollout of ryuk
ransomware so that's fun empire agent implemented fully in PowerShell drop to be a PowerShell one liner which is obfuscated and then there's camel casing and a few other things that occur in there but a 64 encoding so this is this is what this is what it would look like when you see it for the first time so a lot of base64 encoding this is no padded base64 so if you decode this every other characters and no so it prints out a normal nice and normal looking but if you actually decode it into like a bytes object inside of Python you you like it's it's twice the size it needs to be in every other character within it is as a null
pad which is interesting because that's not a thing that gets read normally easily by other functions that's a thing that's almost exclusive to like there are other there are other applications but it's usually a pretty good tool mark for for PowerShell so if you encounter basics before that doesn't like so it's super handy when it's just PowerShell you know - encoded commander - a year - Ian C or actually - Ian also works it turns out but anyway if you find something that's knell padded base64 chances are the intention was for it to be run by PowerShell so before you even get to decoding we can we can leverage that information here's what it looks
like when it's kind of broken down and torn apart and beautified so but you can see the camel casing in here and over here this base64-encoded block so that's again null padded base64 and that contains the URL that points back to the C - so you decode that you get to seat you out and there's a few other telltale signs so same thing as other parts of Empire you tries to disable a MSI and script block logging before it downloads the actual persistent payload so 139 samples of this that the bots rolled up over the last year and change 120 of them based almost exclusively on that PowerShell - so that's not great in terms of
detection performance right like it's it's it's an okay hunting string but like we're not really getting into the meat of why Empire agent is there now I have some stuff downstream from the actual you know the initial funnel that does a nice job of pulling this stuff apart not a meeting the analysis and getting the actual details to the Twitter bot so that so that y'all can read it but from base64 string system Nets a new object system IO so these are all right like so these are different pieces of PowerShell functionality so from base64 string says I'm decoding some basics t4 that's always potentially interesting system net you know begins to reference the network connections
libraries system convert can be a decoder of types new object system i/o means we're gonna write an object or at least at least a byte stream object it might not be a permanent thing on a file system but but potentially useful we've got a couple of - B X ORS so in a handful of these cases there was some some single byte X or obfuscation that that the the one of the actors applied and we rolled that up this way as well and then basic ste for encoding versions on PowerShell and from base64 alright so again though like the the bottom line is so one this is this is how my detection performed over the last year and two big
the point that like this is not this is not as as good as I would hope it to be like it's not high fidelity and it's and moreover it's fairly fragile right PowerShell - like there's it wouldn't take much for them to take all of the payload they need to deliver and just do something else rename the executable on the other end or just include that in a different piece of the handoff that's not visible here and all of a sudden like my hunting like starts to break down so interpreter and cobalt strike so in particular I'm talking about a family of capability here that involves PowerShell scripts that load shellcode into memory to execute Network callback in most
cases it's reverse shell in some cases it's a payload download and delivery but either way I'm grouping these two things together because both cobalt strike and meterpreter leverage heavily the same like tactics right there there there's a lot of similarity in code and so being able to differentiate them at a at a at a high level becomes complicated it really requires sort of lower level analysis to be able to differentiate and I'll show you a few things from an analysis perspective as we get down but the but the actual PowerShell droppers are are very closely related so again favorite of red team's and actual criminals the world over I kind of explained this so cobalt strike
Metasploit in unicorn all kind of and and a few things of unknown origin that I rolled up be 64 encoding X or da sophistication string concatenation compression all kinds of variations on this again you know and and here's the thing right like would you be able to tell by looking at this like the the actual stage payload being able to differentiate this from an empire agent payload almost impossible even even mechanically and certainly visually they look pretty much identical right it's just more notepad base64 so this is so when we decode this first round we get this next round so that's helpful in this case that happens to be gzip compressed data that has been basic ste
4 encoded which means that trying to in trying to detect the actual payload code through the first round of base64 encoding is impossible right I mean it's not impossible but it's it because it starts to become a physics problem quite honestly when we talk about trying to do that with taking taking a string we're interested in its presidents and iterating it through multiple rounds of decoding and encoding I'll talk more about that in a minute okay so now we've decompressed it here are here we're actually getting the opportunity to see some code that that starts reflective loading and this is basic ste for encoded shellcode so this is x86 shellcode and if we go as far as to
actually decode this and put it through powershell x-ray so these are the actual hex bytes but you can see up there in the strings the toolmarks so at the very end there is the IP address there's the URL and the user-agent string and the fact that it has user agent string is a pretty good tell that what you're looking at is COBOL strike and not interpreters HTTP download but the first the almost all of the system calls like the actual preamble and the rest of the the rest of the opcodes they're basically identical the the code that I use to parse this out works the same and is like 98% accurate whether it's cobalt strike or meterpreter payload and I
suspect that's because cobalt strike borrowed from meterpreter payloads but that's just a theory so keyword performance 246 samples and nothing is like a definitive roll-up by the way the asterisks represent sort of case variation so camel casing especially as you get into Empire age and and some of the meterpreter stuff is used to throw things off so you know case insensitive detection of strings is is helpful in these in these instances but you can see we're getting there with most of the same kinds of things right except the only thing that's potentially unique here is the the use of Create thread we haven't seen that much previous li that seems to be kind of a thing
that's sort of unique to to the cobalt strike loader into Indy unicorn alright so but again you know not a lot of high fidelity detection and the stuff and the stuff that has the high hit rates at the top the stuff that makes up the bulk of the the actual detection that pushed this into the actual hunt pipeline in the first place those string matches are super fragile right it's just the Declaration and the this is PowerShell code to begin with so so that brings me to you where I'd like to go next which is making a case for advanced pattern detection wanting to move on from wanting to move on from just just string
based detection string based detection by the way is performing like I you know and you shouldn't I don't want to throw the baby out with the bathwater but more specifically in the arena of PowerShell you're focusing on almost all but one of these threats leverage some form of encoding is obfuscation and multiple rounds of encoding is obfuscation made it very very difficult for strings to remain performant so so as we talk about that as a problem right like as you stack and iterate over encoding an obfuscation in arbitrary quantities and orders of operation as you get deeper and deeper into the number of techniques applied simultaneously it becomes almost impossible and the graph order of
magnitude like order of magnitude graphs suck and and this one sucks because at two we're already oh so at one we so at 0 we're at 17 strings at 1 we are at 65 strings at 2 or 300 strings at 3 were over a thousand strings at four were over 20,000 strings but the thing is is that we go into the millions of strings like so it's it's it's more like a power of three or four in terms of like the number of strings you would have to have in your actual detection library to cover all the possible use cases so instead I decided I would propose a new piece of technology it's proof of
concept the codes actually out on my github repo for you to play with and I've started implementing and incorporating this into the the pastebin scraper so part of that's been updated few things to work out there to make it performance enough for paste bin but so MLS GM is the new technology that I'd like to roll out and I know what you're thinking right machine learning signature generation mechanics know so what if I didn't need a PhD to generate a oisin learning model so that's that's ultimately this talk so how does MLS GM work so we're gonna define the so encoding is particularly easy to attack if you've ever done any crypt analysis you kind of understand right like being
able to map out all of the possible values that a particular thing is going to have is super useful encoding is even more attackable than that because we know that in order for encoding to to adhere to the rules of the already built in functionality it has to do certain things so in the case of base64 it has to adhere to a particular character set I calculate the similarity between the unknown file that we're analyzing and the known character set and with just a little bit of simple statistics we can go ahead and look for simple variations on obfuscation and and one more thing going back to PowerShell characteristics well we look at those string
manipulation things things like the ossification string concatenation string reversing like they can't ultimately change the character set that's used right you can break up an encoded string and make it hard to understand or detect what that is until you've actually executed the code and put it back together but identifying the encoding that's being used none of these things actually impact that character replacement impacts that but we can sort of manage for that too because the beauty the beauty about base64 in particulars that occupies so much of the valid ASCII character space that you can only do so many rounds of replacement before you've ultimately broke in and which that happens quite a bit like that's fairly hilarious that people like
some of the adversaries out there that have tried to do like character replacement but don't understand that basics t4 is case-sensitive and that really matters like so they'll break their payloads on deployment because it decodes as something different it's awesome so anyway it's not really magic it's just math and so and finally we can use standard deviation to pick up on things like compression and encryption and padding and some of those things so live demo time I guess all right all right so so this is the actual code here and you can check it out on github but I'm gonna walk through it for a minute so what I have over here though is a
sample set of pastes a thousand of them to be exact and these are just randomly selected pastes that the paste files that the Bob has already taken they're all kind of raw right so here's an example of one right and then I built a little wrapper for the demo so that we could grind through them all easily so it's just a little bash script that will loop through them and tell us how long it takes to run so let's see if I can explain the code faster than the demo can run the answer is probably not all right so the first the first function that we're really dependent on here we define all the character sites that
we're potentially interested in we go look for a perfect match to read the file in and everything right out of the box automatically matches the character set then we're golden right like that's that's a one-to-one match on encoding and we can move on with our lives I did that almost more for performance than anything else this is the actual MLS GM function all 20-some lines of it so not a highly complicated tensorflow model or anything like that it's actually fairly perform it this box is not well one its reimplemented all the Python literally every file it scans its reloading all the libraries and everything so it's not optimized at all but this works fairly well and mostly
what we're doing is just going in counting the size of you know how big is the file that we're looking at you know how many characters does it have and what percentage of the bytes in the file that we're analyzing match the character set that we're looking at and then we're making some judgments against that to basically determine how likely is it that the thing that we're looking at is basics before encoded in this particular case and so I came up with kind of three use cases basis on based on things I've seen so is it a base64 binary file or does it contain is it likely to contain null padding that one's particularly interesting because that's that's the
you won't get a perfect match on one of those Empire's hits because you're gonna have spaces and you know periods and things like that in there but this is a good good detector for that compression or encryption where we're looking at a relatively low standard deviation across the frequency of each of the characters within the set so knowing that we're looking at something that's encrypted or compressed is also particularly helpful especially because some of the some of the gzip modes don't include a header so decoding the fot the data you get a blob you don't know what it is but having some idea that you're probably playing with compression gives you gives you a couple of short steps to completing that
decoding and then finally being able to detect character substitution so when somebody replaces all of the A's in my in in a base64 encoded file with like yeah the pound sign or something still being able to pull that out and pick that out without having to do perfect string matching on that so so that really a nutshell is MLS GM and the use case that I had so at this point I'd like to take any questions from the audience okay so the question is is is paste bin the only place that I'm seeing PowerShell malware no it's the only place that I have a scrape or deployed but I you know I can definitely give you
examine on pasty ghost bin github github has kind of a problem I'd really like to I think that's probably where I'll look next in terms of being able to automate collection because there's some stuff github offers up yeah yeah and I'm I'm just interested in collecting and analyzing this stuff from you know one from building a library and then two for doing some indicator extraction and building some intelligence Oh for blue team to like block list github repos yeah absolutely no no and and and so what I would pose is right like the the blue team play right because pay spins another one right like there's a chance that you've got I mean paceman maybe you could have an argument
like you could have a discussion that like the threats higher than the than the ROI and the business github for basically every company it's upside down on that there's some threat there but the value to the company's just gonna be way too high especially if you have any software developers so instead what right like the argument I would make is that you want to get in line for all the files that are kind of coming you know ingressing into your network as downloads or email attachments or any of those things and start applying detection and analysis to those because right because because a lot of the a lot of the kind of bad code artifacts that
we talked about here are not gonna be things that show up in normal source code repositories so but you probably catch Red Team updating their Kali boxes alright other questions yes sir
for owed analysis for go I mean maybe if I saw more go so I have a sample I have a sample problem with go if you've ever compiled go it you know Java said hey we're gonna make bloated jar files when we roll out and go said hold my beer right and so as a result that when you when you encode like go malware or payloads like they've literally too large to like they exceed the max paste size for paste bin so I pretty much never see it there I had I have seen and analyzed some go malware in other contexts but without without the maybe the github expansion or some other source I'm not likely to ever like
those two things don't cross just because of the kind of the the space limits on paste bin
it's a really good question which is you know with with with empire being out of development and you know and so I'll take to think the first question was you know with with Empire and tools like that you know maybe well Empire coming out of development there are a couple other packages that have been walked away from so there are one empires out there and you can't pull it back so bad guys get to keep using it a lot of the techniques continue to work and it'll be kind of on Microsoft to kill those off individually in particular like the msi bypasses and you know they've already done some stuff with with later versions of powershell i do see quite a few
powershell 1.0 like downgrade style attacks so the one thing i would say is if you have a need for powershell in your operating environment do your level about like treat powershell 1.0 like it was malware kill it make everybody upgrade to 3 or later Nevin and seriously it will bad guys will you know they will they will that's that's a moat they'll fall in so second question is you know we're where do I see PowerShell as as Microsoft doesn't better jobs kind of cut down and curtail or be used I mean at the end of the day it's pretty powerful I think we will see so I think there's still a lot of meat on that bone
so I don't think Power Cells going away one because it's it's super handy living off the land and Windows is still kind of the big enterprise server and workstation environment so having said that you know the the dotnet and Visual Basic kind of like functional keywords that I use in hunting like find a decent amount of Windows scripting host and VB script stuff to that stuff all still runs JavaScript so you non binary script attacks or script loading preamble portions of attacks that's like that's not going anywhere I think I think I could probably talk on that topic for for a solid decade if not til I retire
okay trivia question for the giveaway if I'm looking at a piece of basics before and it's no padded what's the character it's gonna have the most of a which a the capital A all right that's right capital e all right thank you very much [Applause]