← All talks

Julian Wayte - Resource Smart Detection with YARA and osquery

BSides Boston37:06104 viewsPublished 2020-11Watch on YouTube ↗
Mentioned in this talk
Tools used
Concepts
About this talk
Traditional filehash malware detection is relatively easy to circumvent as threat actors easily morph code to create "new" variants, rendering old IOC's useless. YARA, uses a different approach. Its rules match to small segments of code within the malware, making traditional morphing techniques ineffective. The challenge can be knowing which files to scan with YARA, as scanning everything can be expensive. This is where osquery comes in, it can tell us exactly which files have been executed, and therefore which files to scan. Even if a file has not been executed, osquery can use an alternative approach - creating whitelists from golden images - to identify unrecognized binaries. This session will provide: -An introduction to three open source tools: JA3, YARA, and osquery -Benefits of using targeted osquery YARA scans (vs full system YARA scans) -Configuring and running YARA detections via osquery for changed files and processes that have run. -Looking at how to use JA3 functionality to detect malware, then using targeted osquery-YARA scans to look for that malware across other systems
Show transcript [en]

well uh welcome everyone uh good to uh have everyone join looks like we've got a pretty good audience so uh let's get started i'm gonna talk today a little bit about um os query the endpoint tool that's used to turn your operating system uh into a sql-like database runs on windows linux and mac and then a little bit about yara as well the uh swiss army knife for uh pattern matching that's often used in identifying malware families we'll talk a little bit about a a couple of malwares that were found in the wild and how the traditional method of using the md5 sha-1 or shah 256 file hash is kind of obsolete and there are new

techniques coming out for things like impash and the jar 3 signature tls signature for example and using yara to better detect malwares and variants of malware so that when you get a zero day you have a better chance of detecting it based on variants that have been identified in the past so let's go ahead and dive in so we'll start off talking about a connie remote access trojan malware this was identified and associated with the advanced persistent threat group 37 and the way this malware worked is it started off with a phishing document it was a microsoft word document that had macro in it and some hidden text fields inside of those so that when the

user downloaded those uh the file and executed the macro this document would go out and download the payload from the commander control server then that payload would capture the data on the system and exfiltrate it so this particular variant well this this malware family was observed at four different timelines in 2019 and the malware in each case was different but there were aspects of it that were the same and could be identified with yara so if we look here at some of the different variants of the malware this was dll files on the windows environment we can see that the sha-256 values of those malware files to dlls were different in all cases but because the same code was reused just

variations of that code there were things that were the same about it the impash for example so on the portable executable file there's a method to try and identify that file by looking at the order of the dlls that are present and loaded and you can see here that the impasse was the same across some variants even though the sharp 256 was different and this is important because it shows that the code reuse by the malware threat actors can be detected by methods such as impash but also yara as well so yara is a uh a method that's looking basically at individual strings or regular expressions to identify subsections within a file or malware so each rule in yara consists of

typically a set of strings or regular expressions and then the condition of how many of those strings have to match in order for the yara rule to fire so you can get really quite specific with these yara rules to identify a set of conditions that will help classify or trigger on lots of different variants of the same malware if they reuse the same code components or encoding algorithms custom base64 encoding keys and so on you'll be able to detect those variants with the same set of yara rules how does the yara rule work how does what does it look like here's an example from the read the docs yara website you basically specify the identifying strings whether they're

ascii strings or hexadecimal strings you can put some metadata in there to describe the rule and then the condition in this particular one it's saying any one of these strings a or b or c if any of those are found then this rule will equate to true

so you can get very complex in these yara rule specifications uh you can specify uh what the entry point is for the rule so you know your string may be fairly short and so there's the risk of getting false positives but if you specify the entry point then you can reduce those false positives by saying i must see this particular string at this particular point in the file you can have some quite complex conditions like i see string a six times and string b more than ten times and you can put regular expressions as well if you don't want to specify the string if you're finding in the malware that there's a lot of variants that

match to a particular regular expression then you can use reg x's as well inside of your uh yara rules you can specify things like uh the size of the file to make sure that you can exclude maybe small files that are false positives and a whole host of different things so the malware researchers on the case of the connie virus were able to specify yara rules that were able to detect the different variations uh in all of the uh attacks uh in 2019 and uh what they found was even though the dlls that were used in each case were different uh as well as the the lure documents had differences in them um by analyzing those they could set up

yara rules that could detect across all of those variations so here's an example of the yara rule that was created to detect all the different variants of the lured lure document that word document with the macro specified strings like shell command line and byron vb hide it was a visual basic macro and then some other strings like auto open and so on and then the condition was that had to have four of the s string and one of the a string

here's a example of the yara rule that was used to detect the the dll and the payload that was downloaded so in this case they were looking for things like temp.zip post.text and other calls to the dlls all defined there in different strings some of them ascii strings and in other cases hexadecimal code strings and then there's a condition there that says which of those strings is required so in all the different variants of the malware this particular set of yara rules was able to fire and detect even though the variation was significant here's some additional resources for yara there's some cool tools out there like uh airbnb wrote a this binary alert tool so that you can

trigger yara scans on files as they're uploaded to s3 for example there's also a a nice tool if you use cape where you can extract the payloads from malware and i'll scan those and extract the different iocs indicators of compromise like i p address and domain associated with the command and control server so there's quite a few tools out there that you can use with yara and as well as the tools there's a whole bunch of rules that you'll see out on github and other sites so if you want to scan using existing arrow rules that's quite a bit available so you could quite easy to install yara if you want to just test out some yara rules against some

files on your system uh just a matter of uh installing it and then when you run yara you specify the dot ya file a set of yara rules and then the file that you want to scan or the directory that you want to scan so here's an example where we're using rules.1 yar file i think this had nine different yara rules in them you could have hundreds or thousands of yara rules but we'll see in a second that because we're doing some quite complex uh comparisons in yara as the number of files increases that you're scanning uh going to take longer to run and there's the number of rules as well and your rule set

increases it's going to take longer to scan but you can go ahead and run a simple yara scan using a command like this and uh it's available on all the different platforms mac windows or linux so uh if we look quickly at how the number of files affects the the yara scanning because this is quite a you know significantly more complex operation than calculating a md5 hash or a sha-1 or 256 hash because it's it's looking at you know each of those strings in each of the files that you're looking to scan and then depending on the the length of those strings if it's quite a large file and you're looking for short strings you might get a lot of hits a lot of

false positives or if you're using a regular expression and if you have a very generic regular expression then it might take a lot of time to do a scan so here we see an example um where we scanned uh up to uh 500 05 600 000 files and took about 120 seconds now this was only with nine yara rules so if you're looking using an extensive set of yara rules maybe it's uh in the thousands then uh you can take quite a lot of resources on your machine so here's a little little bit of an analysis of how the the scan time varies with the number of yara rules so um we went up to about 250 yara rules

in this case and it took about oh sorry 120 yara rules took about 250 seconds on our on our test set so the um yara does come with uh some performance type metrics that are given to you when you run yara you can see if the rules that you've created to identify the malware are going to be slowing down your scanning so here we see there's a warning that a specific string contains dot star which can really slow things down another it's also identifying here another string the f string says it's slowing down your scanning um and that says this particular one is critical so you might want to look at adjusting that string and then it will

tell you if it's slowing down the scanning and it's not critical uh and this is important because as you're using a larger set of rules and a larger number of files if you don't address these tuning aspects of your url rules you can use a lot of resources so in our example we looked a little bit about at the cpu this was a two cpu machine so it was pretty well pegging the the machine during the yara scans you can see here 197 193 cpu usage so when you have a large number of files and complex uh yara rules and a large number of yara rules it can really use a lot of resources on your machine

so it's not really effective because of the complexity of the yara scanning engine particularly with complex or large sets of rules it's not really cost effective to scan every single file on your system so we're going to look at a more effective way to triage your scans and look at how you can identify a subset of files on your systems that you can scan so os query is very nice for that it provides a way to identify all of the processes that are running on your system and also it has a ability to do file integrity monitoring which means look for files that change on your system so we're going to conf look at how we

can configure os query so that when files change on your system like you get a new file downloaded on your user system that you can just scan that file with yara or you can set it up so that those processes that run on your system potentially the the payload or the the word document when the user opens it typically that set of processes that actually is running on your system is quite small and so we can configure os query to use yara to scan just those files so what is os query what's a nice endpoint agent that was originally developed by facebook and it's now an open source tool it's used for security compliance and devops use cases and uh here's an

example what it does is uh it turns the operating system into a database so we used to have to run commands like ps minus ef and then parse out the content you know pipe it to grip and awk and said and so on now you just do a sql statement so here's an example of a sql statement that's selecting some processes that are running on your machine and then you can add optional where clauses in this particular case they're looking for those files that do not have any file anymore on disk so you know one of the defense and evasion tactics on malware is once it executes to delete the on disk file so that it

can't be detected by a antivirus tool that's scanning on the files so os query is a it's a nice tool it's fairly lightweight in terms of endpoint agents and uh now it's open source um it's i think the the linux foundation owns it now and uh runs on mac windows and linux so it's it's easy to download and it has quite a lot of telemetry that's available you can see here this is from the os query dot io website and it's got 263 tables in this version of telemetry that's listed across windows linux and mac i think on windows is close to 100 tables on mac it's close to 150 and on linux it's like 140 i think

but across all three operating systems there's 260 tables worth of telemetry so it's a really nice tool if you're interested in getting visibility into your assets you can look at things like the apps that are installed on mac the programs that are running on windows to registry entries uh the startup items um linux you look at the cron entries uh you can look at all of the processes that are run on your machine all of the network connections via process open sockets and then it has a separate eventing framework which is a pub sub framework so that no data is missed because in os query there's really two methods of capturing the data one is through

regular scheduled queries and the other is through this eventing framework the pub sub framework so you'll look there'll be tables like process events and socket events where as every process runs or as every socket is opened to a remote machine like a command control server maybe that is stored in a rock's db backing store and then uh when you subscribe like if the demon is running uh maybe every 30 seconds or every minute or so and sending that data out to some uh platform like a maybe an elk stack or a splunk or something like that then the nice thing about the pub sub framework is that no data at all is missed so it's using

on linux the k audit framework for example so a lot of different applications for os query and one of those applications is yara yara comes with os query you don't have to download a separate version of the yara like i showed earlier uh but os query also comes with other tools like audios to pull back uh as a database rows the um entries within configuration files like etsy password or etsy hosts or you know if you have apache config files or whatever it is you can pull that data back to return it into rows in os query so there's really a lot of a lot of flexibility there with os square so the nice thing about

this pub sub framework that i talked about where it's looking at every single process that is executing on your machine you can send that data out or you can use that information in conjunction with yara so as i mentioned earlier you could say let's use os query to determine all of the processes that are running that would be the processors table or you could use the event based table which is the process events table and then use that to trigger a scan using a set of yara rules that you have so instead of having to scan 2000 3000 files on your machine you could just scan the 10 processors that are running so you can read a little bit more about

uh os query and it's pub sub framework there on the read the docs website so in addition to the the pub sub framework there's that file integrity monitoring feature so that os query can tell you if you set up the config to say um tell me when there's any changes to these files like the user's download directory or the ssh keys on your system or the etsy or binary files on your system you can configure os query to say these are important directories for me and i want to know when they changed and then you can configure os query to trigger yara scans of those files that were detected to have changed

so os query it's very easy to download and install it comes on linux sent to us red hat as an rpm so you can just install it with rpm-i and then once you've got it installed you can configure it to activate the pub sub framework so that you can start getting uh process events recorded for every single process that runs um so you have to run it with the disable audit equals false os crew loves the double negatives uh but you can um you can run it and turn on the event based capture and start to get a record of every single process that runs in the process event tables once you turn on that pub sub framework

you can then go into os query configuration and specify a set of yara rules so here you can see this rule 60.sig file it's a signature of 60 yara rules in this case and you specify that in your os query configuration file and then you can use that to do scanning

so here's an example where we've turned on the the pub sub framework the scanning so we're starting to capture process events um and then we can run a query like this where we can scan using yara for every uh process that runs on the system so we're just saying for every process that runs on our system select distinct path from process events go ahead and run a yara scan select start from yara using a particular set of um yara rules which is the sig group one and that's pointing to that um yara rules file and you can see the result in this case it's it's got a hit on um a certain file and it's going to

uh show you the rule name and this particular case it was angler worm and then it'll show you the strings that you've defined in that yara rule that are matching so this is a nice way to be very targeted in how you're scanning your files uh using os query and getting all of the advantages of the other security related pieces of telemetry that you can capture using os query here's how you set up os query so when you have a file that's changed to go ahead and do a yara scan on that file so you set up the file integrity monitoring piece so in this particular case you'll see that we've set up file paths

in os query to do file monitoring in the user's home private directory in the root ssh keys uh all of the users ssh keys and in the etsy directory and once we set up those file paths and then we also set up the yara set of rules the signature group which is that rule 60 yara file in this case that has the arrow rules and you then map the film path sets homes and etsy in this case to the appropriate set of yara rules and once you have that set up inside of os query whenever you get a change uh to one of those files oscar is going to report it as a film event file integrity

monitoring event and it will also do a scan using yara against that file so you'll be able to see if there's any malware based on those yara rules that you've built and here's what that looks like with film turned on as those files get changed os osquery is going to go ahead and scan them with yara and you'll see if you get any hits so on the very bottom right you can see that we have the evil os x remote access trojan yara rule has found a match on this one particular launcher dot b5 bo d8 dot py so uh the nice thing about uh running yara with os query is that the uh the resource usage

is particularly lightweight in this particular example we saw that we have about a six percent uh resource usage um or five percent sorry on os query uh this was run through the os query i which is the interactive version you can also run it in a a daemon version the os query d so that it's doing these kinds of processing automatically in the in the background but this approach was really lightweight so the amount of resources used because it was only triggered on files that changed or on processes that ran was really uh significantly reduced so if you were looking for a uh a malware and you'd maybe sandbox to mower and got a built a

set of yara rules or used a set of yara rules that were available already and you wanted to uh check across your fleet or all of your machines uh using this approach uh it's fairly lightweight because you're only going to be scanning those directories that you've set up with fim when the files change we are only going to be scanning those files that have actually executed on your machines and were recorded with os query via the process events table so in conclusion um yara is a really powerful tool but it can use a lot of resources and using os query is a really nice way to reduce that resource usage by using the file monitoring feature as well as the

pub sub framework to monitor the processes that are run and that works really nice in conjunction with yara so uh i kind of wanted to open it up to questions now if anyone had any questions about osquery uh or about yara uh feel free to to fire them off

hey jillian um i need to double check are you in um discord on the track to you channel uh tricked i am not there yet is that where the questions coming through yes um okay uh i haven't seen any questions just yet but um okay i can read them to you yeah that would be great yeah okay so i do have one from mr clark uh have you seen any instances of malware specifically trying to detect os query

malware trying to detect os query i have not uh seen that um are you thinking about it from a point of view of um them trying to uh detect if uh os queries running as a kind of an anti-malware engine and so trying to thereby avoid it um he hasn't not he's typing right now so we'll just give him a second

he said yes exactly right yeah um i've not seen any malware looking for uh os query specifically um it's uh you know because it could be looking for uh yara as well but uh the fact that yaris run from from os query in this case um that would be uh kind of hard to detect because it wouldn't matter whether you're running arrow on a regular scan outside of os query that could be a possibility as well okay i have another question from hams what permissions are needed to run on a network yeah um os query um typically runs on an end point itself so different from uh you know running just on the network but the os query endpoint

you can run it with regular permissions as a regular user or you can run it as root with elevated permissions the difference is that running regular os query you'll be able to get access to most of the telemetry so for those 236 tables or 263 tables you'll be able to get i think there's about 250 plus of them that are available with regular permissions but you do need root permissions in order to activate the um the k audit framework for example the pub sub framework um on linux uh so you do typically most people run the os query daemon as root so that they're able to uh capture the um the audit-based events including all of the network socket

information and that's one thing that's can be quite noisy on a lot of servers is the network activity so the os query when you turn it on the um the amount of volume on the socket events can be quite significant um and it's one uh it's an interesting way that people use os query in a malware detection scenario is capturing the socket events to see if the their end points are connecting to a command and control server so they typically will have a list of iocs based on no and bad ip addresses but also domain information so newly registered domains and lists of domains from dgas to main generation algorithms if their machines are connecting out

to those domains then they'll flag it as a possible malware because certain versions of os query like the optics version of os query can capture the domains that are being resolved by your machines and that requires the the root permissions as well

okay i did see some people typing but i don't see any more questions in the track too so i'll give them a moment okay yeah well we're waiting there you know that um questions kind of brought up a some other ways that uh people are using uh os query to detect malwares and that is um by looking at the uh the client server connection uh in the malware to the command and control server so sometimes malwares nowadays will encrypt the traffic with their own tls protocols and so salesforce developed a technique of fingerprinting tls communication called jar 3 and what it does is it looks at the clear data in the client hello message and will take certain

key fields and hash those out to create a tls fingerprint or a jar 3 fingerprint um of the client server connection to the malware and some versions of os query can calculate that tls fingerprint or the jar 3 fingerprint so that even if the underlying code in the malware changes significantly you'll still have the same tls fingerprint for that malware as it's communicating with the commander control server so there's lots of nice ways that os query can be used uh not just with yara but a lot of stuff around the networking side as well okay i don't see any more questions um julian thank you so much for your presentation um please don't leave the the go to webinar

i'll go ahead and switch you out oh actually it looks like oh mr clark said um awesome presentation and also thank you for your presentation so great well thanks everyone for joining appreciate it