
hi hello my name is cuba i'm working as a software engineer in security team at yelp and i'm here to tell you today about how we automated our model response at yelp so first couple words about me i started at yelp around two years ago and since then i've been mostly involved in our malware incident response process and also in the meantime automating and basically working on our security processes before that i used to work at sap in sofiance policies in france in a security and trust research group and prior to that i was studying in krakow and also doing joint degree with the university of collecting party tech this is eureka also in sofie antipolis
so just sort of a quick recap yelp's mission is to connect people with great local businesses and this last over the past 12 years since uh the company started to over uh 102 million reviews as you can see on this slide 90 million of the users are coming from the mobile websites and applications around seventy percent of the searches are also from mobile and and web apps uh mobile apps and um basically our mobile uh website and right now europe yelp is present 32 countries worldwide so what i'm trying to say by showing you all these stats is basically that we have more than 4 000 employees by now and most of them are actually using
macbooks to do their daily job so sort of acquainting interaction to our malware incident response process uh this sort of starts when one of our employees gets an alert either like some detection by our endpoint monitoring software or some network monitoring stuff that detects some suspicious binaries on the user's machine or some suspicious network traffic coming from from a particular machine but just let me first introduce you sort of to uh people involved in all this process so fess will have a yelp employee who typically uses macbook to perform his daily duties and from time to time they will wander to some gray sides of the internet right for instance get alert to download a
latest update to adobe flash player or some free video converter app which comes packaged with malware also for free we have help desk engineers so they are kind of serving as an interface between users and us the security team help desk engineers are the best people to perform all this task they have best outreach to the users in terms of both different time zones and different locations because yelp is also present in various locations around the globe our offices are in various locations around the globe and we have also security team that is also consisting of malware analysts so the people who are basically in charge of analyzing the alerts about malware on molar infections
on someone's machines so job of malware analysts is typically to answer these three questions how the malware got there in the first place is the machine even infected or not and how we can prevent and detect further infections to sort of stop spreading malware all over our company infrastructure so there used to be lots of false positives in our approach in the alerts we were receiving so we started involving malware analyst as early in the process as possible so we are doing this initial 3-h and this is basically to establish whether this is a real alert or a false positive whether it's something like windows thread on mac os machines that we don't really care about and this is
sort of to save us time from all these other tasks like forensics collection and forex analysis so we can weave out as filter out sort of as quickly as possible whether this is a real threat or not is always saving us some time but our traditional sort of approach involved later on after this initial three edge uh basically collecting forensics from the from the machine i'll tell about this a bit more in the next slides this task used to be performed by our helpdesk ninja so they're like as i mentioned the best people to do it because they are really leaving close to the users they can just go grab someone's machines take it up
the network run the necessary collection scripts get the output back and then the malware analysts can start start analyzing the output so they can assess the risk related to the infection so when it comes to different kind of tools we have available for digital forensics collection on mac os there is osx auditor so this is script that more or less inactive for the past few few months it's open source and github it lets you collect different properties from the from mac os machines there is osquery which also is open source project open source by facebook it allows you to query different system properties like you will be acquiring sql database knock knock is also quite useful tool that lets you
figure out which processes are running on your macos machines so this may give you more insights into whether there are some known processes like system processes or something that was actually installed by the user potentially packaged with malware there is google rapid response framework this is a bit more interesting they let you for instance collect file samples so it also gives you more insights into in in terms of whether the machine is infected with something or not because they give you possibility to collect the samples of the files from the machines as well i'd like to also mention a book that was recently released this os standard incident response scripting analysis by jaren bradley these books also comes with
kind of ideas for scripts that you can uh basically have to collect various forensics from from macos machines and also gives you some ideas about how to analyze them at yelp we use osx collector which is a tool based on osx editor it's also open source on github so this is a forensic evidence collection analysis toolkit for os 10 we open sources some two years ago it was actually first project i worked on when i joined the alps so i'm pretty proud of it that it's still up there and still used by uh people actually let me get a quick show of hands how many of you are familiar with osx collector in the audience
a couple of hands okay cool so uh basically what isaac selector is is a simple python script that you run on potentially infected machine it will collect various system properties you see it here on this slide and then output them as a json file so that analysts can take this file and try to figure out all these properties whether machine is infected or not so the way osx collector runs is it collects gathers all these different information from p list which are kind of like you would say windows registry things on macos machines various sqlite databases that also mac os uses to store system properties as well as some other local file system information for for instance for
applications installed on the system or browser history browser extension things like that so here is an example of how such a json entry collected by osx collector looks like it comes with some common keys like file paths file hashes timestamps uh there are also signature chains for instance for for binaries that might be useful to figure out whether this is something we expect on the system or not so what we used to do later on after collecting all these files from the potentially infected machine is that malware analyst would basically sit down and with some simple tools like grab or jq which is actually quite cool thing for json visualization they will go through the scripts
basically trying to find some events that were happening around a certain time frame or it will also allow you to withhold obj filter and show only urls related to user's activity around a certain time frame and basically based on that analysts will try to figure out where for instance the file was downloaded from or when the file was installed by the user trying basically to figure out answers to these questions i mentioned earlier so this also worked pretty well but if you have 30 000 lines of json output it may come as a really job of basically looking at a lot of json and don't get me wrong like i like jason jason is very pretty it is simple
but it's also very easy to read and process by by code from code so actually why don't we let code process json output and this is what we sort of done as next step so uh early on we automated the json analysis process with what we called os exclamatory output filters and they basically what they do is they augment the initial json with all different properties for instance the information from our internal blacklists or information from some external threat intelligence apis they will also try to construct the list of related files to files that are potentially infected and then produce some sort of summary of findings so this is really cool it automates the whole kind
of analysis process but the the tool itself osx collector output filters it was quite tedious to maintain because smaller analysts had to basically get the installation installation of the tool on their machine when they started the analysis they had to also basically sit there watch it running they if the machine went to sleep or if they decided to like close the machine or for whatever reason they lost internet connections all this sort of process will kind of halt and then they all have to restart it because it connects to various external thread intel apis uh by http so basically the tool was not written in a way that allowed them to kind of like pause it and resume at a
certain point of time also not to mention that basically whenever there was like new version of this osx selector output filters malware analysts were in charge of themselves updating the source code getting all the dependencies so it was really really uh tedious task and not something we were looking to do with the process that we were actually trying to automate so we thought we can do better and we turned the o6 collector output filters into a service and we called the service amira automated malware incident response and analysis so right now with amira what analyst is doing is just dropping the osx collector output file to the s3 bucket and the mirror will automatically trigger uh the analysis of the new
object in the bucket this is based on a thing called s3 event notifications so we have configured the zs3 bucket basically to send a notification to an sqs q whenever there is new object in this bucket this sqsq is called here on the slide amir s3 even notifications so whenever there is a new object created a notification will be sent to this queue amira will periodically check for the new messages in the sqsq and upon receiving a new one it will fetch the related osx collector output file from the s3 bucket and you can see actually the output of the lsx connector is packaged in target z file to save some space because it's like lots of json so we
want to compress it as much as possible so it will act amira will also extract the first decompress the file and then extract the proper json file from from the archive and then it will run execute all these different analysis filters on the osx collector output file and after all this process it will send the results of the analysis for instance to another s3 bucket so the malware analyst can patch the results from from the bucket and see whether the machine was infected basically read the whole summary of the of the analysis process here are the examples of the analysis results that amira produces so for instance we'll see some domains and hashes that are refined from the
black on the blacklist that we're curating it'll also give you an idea about information found by conducting the external threat intelligence apis and that will also provide you suggestions so basically for the things that were found on the external thread intelligence apis but are not listed on your blacklist it will suggest you to you to add them to the 2d blacklist amira doesn't require too much configuration to run basically all you need to do is to figure out on your own this s river notifications thing it is well documented on the awf documentation so it's not really something difficult and then just to run it you basically need to specify sqsq name and the aws region where the queue was
configured to run there is also a possibility to specify this results of loader so there is for instance possibility to add also some other results upload there so results of load are basically a way for you to tell amira what to do with the analysis results so you may want to add some other way of distributing the results for instance you may think about sending the results of the analysis via email or some similar or some basically attach them to your incident resource platform if you have some more advanced system to to three edges alerts and so that's sort of why i was mentioning that kind of disastry back at the end results of the s3 bucket is is uh is is optional
are there any questions so far related to amira are you using it as a lambda function are you running it as your own so the question is whether we are using it as a lambda function or running it as our internal instance so we are running it as our internal instance um yeah there are several factors it's lambda factors are really cool but i found them quite it is when it comes to importing some external dependencies like basically this whole osx cluster output filters but our first idea was actually to think about lambda functions there were some other questions okay if not let me continue so actually you can go even a step further with all this for instance and also
automate the forensics collections tab so basically what you can do is instead of just getting the machine and running the osx collector script on the machine you could think about basically having some scripts that will run os x collector on the machine and for instance upload the results to bucket if you have large installation in your in your company so like we all have around four thousand employees with mac os systems you probably use something like your inventory management system you could think of basically just dropping the script to run os x collector collection and then uploading the results to s3 bucket and that will trigger the amira to to run the collection run the analysis on
the on the results here is an example of script it's very basic i actually stole it from someone else it's the only thing it does it's just calculating this signature for aws s3 so it can send the file there and then trigger the whole analysis process so the whole analysis sort of saved us a lot of time in certain cases it serves us up to like hours from several days um we it used to be when we were involving also help desk into the whole collection process we have to wait for them they were in different time zones sometimes they had to chase the user which was also in the other time zone so the whole process could easily take
up to several days and then the whole analysis as i mentioned when it was interrupted amirah takes basically all this effort from you you don't have to sit there watch how it collects all the all the information from various threat intelligence apis it also reuses lots of caching mechanisms so all this osx collector output filters it comes packaged with some basic cache that will not issue the same queries for instance if the user visited the same websites i mean most of the users are actually visiting probably like 80 90 of the websites they're visiting are the same uh so when we're running the process on each by each individual malware analyst uh basically all all of them i had to
pretty much get the same information all over again and with amira we're just able to have this information get one from the apis and cache it so for instance saves us lots of quarter from the external apis and it basically makes the whole process even faster it also cut all this interaction between the malware analysts and help desk so right now amira is taking care of forensics collection also the analysis obviously there is a need for analysts to review the whole results summary and actually provide remediations that are then executed by the it engineer help desk but before still like there is less sort of human interaction less errors possibly that could come all over
along the way also there is no need for this physical collection so even less problems with basically chasing users down the corridors and taking their machine off the network we can just remotely run the script collect all the forensics get the analysis done and then basically malware analysts can sit down and look at the analysis and figure out whether there are any false positives or we were down any sort of uh other problems and it also allowed us to uh do more proactive forensic collections so right now even on the machines that we are not really sure that they're in fact they're not we haven't even received any alerts but potentially there is some suspicious
network activity from our dns resolvers we could practically run amira and get all of the forensics analyze them and then figure out whether the machine is impacted or not and the whole thing is open sourced so go try it out i'm really looking forward for any questions related to the project any issues that you've spot if you have any suggestions also don't be shy create simple requests and then my way i'll try to review them and on that note i'll like to thank you for coming and i'm open to take any questions
what kind of falls i'd say it's probably way more than 80 percent like around this like 80 20 percent rule right so i guess some of them will be clearly false positive some of them will be like oh it is a frat but it's not applicable for this particular machine and yeah amira basically helps us to to figure this out because even for a certain press that are for instance our endpoint monitoring say our windows only we'll still get the alert and then we'll have to figure out is seriously windows only for instance some browser extensions that don't really are related to any systems so that's how we can also analyze it uh are you planning any integrations
with like sandboxing technologies uh cuckoo or anything like that yeah so regarding sandboxing so this is purely sorted for forensics collection so there is no sample collection obviously it would be very interesting to connect it also with something that could process the sample but then we are sort of approaching this problem of how we are transferring the samples along the way so for instance google rapid response framework they do something that allows you to pick a sample and i guess at the point of time we would be able also to have some more reliable analysis when it comes to the file so so far we are operating basically on file hashes i guess file names things
like that i mean urls are pretty okay like sometimes we are basically collecting a sample from the original url rather than from the machine because also maybe sort of to give you a heads-up on all this remediation process apart from just getting rid of the threat on particular instance what we are trying to do as well is block domains block ip addresses serving malware so at this point of time actually it is more important for us where this threat got from and if we are able to pinpoint it to particular domain url and get a sample there then we know we have to block it right so it's it's actually more related to what we are doing later
in the step so now that you're now that you're able to collect a lot more and analyze a lot more at scale is there have you found particular things that you've uh particular indicators or particular types of data you've collected now that were not worth collecting before uh that were not worth collecting we're worth collecting yeah so there are particular parts of forensics collection done by osx collector it tries to get as much information as possible that's why this is 30k lines for machine that was running i guess for several months and then the whole analysis process is also longer because of that yeah we decided for instance not to look too much into cookies collected in the
browser there were several issues actually with that also that collecting cookies value from someone's machines is a security issue in the first place because you're collecting lots of information that should not leave a machine or there is assumption it's not leaving a machine sometimes we get some noise related to to one of the filters so there are filters that try to extract domains from particular urls there are also filters that try to create kind of a network graph of related files so for instance if you have some related files to the one that's infected maybe it's potentially interesting to look at them as well and these are most of the parts too noisy to be actually taken seriously we very
seldom look at them on the latest mac hacks that are out there using gpu graphics attacks are you looking at those uh not really i'm not familiar with them i was wondering if you keep your os x collector files and periodically re-run them you know as your threat feed updates and if that's proven any of any value to you yeah so so far what we are doing we are keeping the malware forensics that we collected from the machines we are not i mean right yet it's not too stateful apart from this cache that i've mentioned earlier it's not uh creating any state there was actually a project that was presented at sciencecon in utah last year where people were trying
basically to also put all this information uh i think it was mongodb but you may think of okay let's put it in something like elasticsearch cluster or splunk and let's query it so these are potential next steps for this project basically apart from just taking as like one machine individually let's try to see how this machine differs from all the other machines in the network this is actually something that os query is able to do for instance like having the whole fleet of machines and trying to compare one machine against all others in the in the in the same infrastructure what's the run time of collection and a mirror i mean how long does it take on each laptop and how long
does it take to process so the whole collection process so just purely osx collector script it does rely on how basically for how long time the machine was used if the machine was used for several years you will and for instance like browser history is not huge like you have lots of apps installed it might take quite some time uh then the whole and so like worst case like you know up to a day let's say yeah yeah we had the cases when it was like running for a day like it was still in the good old days of desk engineers trying to get the machine right now as we're running it by our inventory management
system we to be honest we don't collect too much insight about when the collection was started and then when the collection was finished we only know this time when the collection was finished so you don't know if the collection was took so long because user for instance was not in the office for like a day and his machine didn't mean up to the central inventory management system in the best case i had actually on my when i was preparing the the presentation i had a case when the whole collection and analysis process also running analysis which which can also may take up to yeah another several hours it took eight minutes basically from collection to to having the results
available for for the malware analyst to look at you mentioned started doing proactive collection um have you looked at collecting that data and trending on the data over time in something like elasticsearch or cabana yeah so this is something we are not doing yet but definitely something i'd like to work as a next steps basically after we automated all of this uh kind of similar with respect to active defense do you guys have a particular threshold that you have to meet um in order before you just deploy the script or do you just kind of have the script you know already available to run on machines and you run it across all your machines so do you again or do you have to reach
a threshold so uh yes we do have something like a threshold it's basically this alerts we get from endpoint monitoring or network monitoring so this is our initial kind of trigger for the whole malware incident response process but then we had some cases in the past when we were suspecting part of the fleet to be infected with something and then with just one click we were able to deploy the script like hundreds of machines get the analysis done by the next day and look at this so it's way more scalable than our previous approach in this matter what kind of file sizes does osx collector like put out like have you seen file size yeah in terms of the the
collection yeah so i'm not sure if it was actually on the slide but uh the compressed rgz file it's usually several megabytes if you decompress it it's like i know 60 megabytes 80 megabytes or something like that okay any other questions still okay and no thank you kuba let's hear it