← All talks

Lessons learned in automating the incident Life Cycle

BSides Athens · 202219:03122 viewsPublished 2022-06Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
Mentioned in this talk
About this talk
Alexander Sinno and Walter Stinkens share hands-on experience building security operations centers with SOAR platforms. The talk covers designing an automated incident life-cycle framework that reduces alert workload by 97%, including triage, enrichment, incident management, analysis, and remediation. Key topics include platform selection, SOAR development methodologies using Scrum, and live demonstrations of automated workflows that achieve containment in under four minutes.
Show original YouTube description
Abstract: Over the past several years the volume of alerts coming into the SOC has been untenable especially considering the deficit in Cyber Security personnel globally and the required effort to train an analyst. Creating and structuring an operating model based on SOAR with multiple layers of abstraction including, enrichment, incident management, analysis, notification and remediation will not only drastically reduce the workload of your L1 efforts in your SOC, it will also help your team focus on the important events. Many of the misconceptions of a strong SOC is to focus purely on detections and threat hunting capabilities from the start, however, it’s more important to first have an operational framework in which you able to capture the threat actor expeditiously and react to the attack with aggression by cutting them off in the kill-chain. In this talk we will explain how to build your Incident Life-Cycle. This is an important aspect of the operating model, in your SOC to leverage the power of automation in order to react quickly and decisively in the event of a real breach. There are multiple facets of this presentation that will help SOC professionals achieve a high level of maturity, through SOAR development methodologies, tying your automations into your operation and achieving extremely fast reaction times in your SOC. This talk will present real-hands on experience from the past several years of building SOCs and the lessons learned of what to do vs what not to do. It will cover choosing your platform, operating and maintaining it and implementing your designed incident life-cycle including a live demo of our current automated workflows. Bio: Alexander Sinno is an expert in Cyber Security operations. He has experience in building SOCs around the world and started his security career in the US Military. All of the operations he has built has been on SOAR with a focus on the overall Incident Life-Cycle framework for controlling the flow of an event from ingestion to remediation.
Show transcript [en]

today we're going to talk about lessons learned automating the incident life cycle within the context of sock but first i want to make a quick introduction my name is alexander cino i'm the head of the in viso fusion center operations um i have some experience building security operation centers out in the middle east previously i used to work for the largest mssp in the world known as dell secure works as a senior intrusion analyst and before that i was in the united states military as a fire support specialist and i would like to also introduce my colleague who's a co-presenter with me today roger stinkins thank you alex i'll also quickly introduce myself my name is rodriguez i've been working at a

visa for four years i have a background as a devops engineer and in cloud security as a development engineer big parts of your work is automating stuff and that's also what i've been doing as the source engineering team leader lm viso all right thank you walter just before we dive into the cool stuff like the automations and those sort of things we should break down the perspective of why automation is so important now when you look at the traditional sock uh each analyst they have about 20 minutes per security event or 20 minutes to handle a security event right now those numbers actually add up quite a bit um most of the time a security uh a sock

analyst can actually handle a maximum of 25 security events per day and the big problem is okay we have a huge lack of personnel worldwide so how do we deal with that and we know that from our experiences that automation is key now we took a seven day time span and we calculated that we ingested 647 security events and if you use the math that we displayed above that would take about 26 analysts to handle all of that so you do the math and how what type of a cost that is and now just for you to have 24x7 alone without factoring in all the security events you need about 12 analysts now when we saw those numbers we decided

immediately to start automating and start assisting the analysts with automations the first thing that we saw is that our automations actually decreased the analytical workload by 97.42 of 5790 alerts that we took in that time period only 145 were manually analyzed in seven days by the stock this reduced cost down to 1483 euros as opposed to 59 251. so the math is there and it makes a lot of sense to handle it this way now now to explain a little bit about why this is important and why it's so effective let's look at the traditional stock so in a traditional sock this is what you would have you would have isolated components none of them talking

to each other and other than working with it with each other so everything in a different silo which increases the number of clicks for an analyst to resolve an incident to find the information that they need significantly and it makes them less apt to going into all those individual devices at the end of the day in order to find the information that they need so what do we do to solve that we use a security uh orchestration on an automated response platform called xor and we do a lot of development on top of it and that way we can actually work with all the different components without actually having to go to those platforms so we assemble all of the data

into one spot where xor will become the central nervous system of your entire security operation center and all of the uh capabilities that uh that are available for for your analyst will be built into one uh platform so isolating machines running anti-virus scans resetting passwords revoking sessions making a phone call getting information on what ports are open on a specific device am by getting a reputational analysis from virustotal from your threat intelligence feeds all in one place for the analyst and just completely handled by automation and presented to them so that it actually takes uh the analyst significantly less time to handle an incident because they have a hell of a lot more contacts to work with

so these are all important factors so not only just automating from the entire events but also reducing the amount of time and effort that it takes an analyst to make a conclusion on a particular security event now just to go through a bit about the incident life cycle and the way that we manage it here in viso so we put it we built something up at a high level uh the first step is triaging so we build our integrations into the xor platform we only pull in from the security and products such as the edr and the sim and what we do what those are ingested we immediately start enriching those events with other various types of data

so for example we'll take all of the ip addresses all of the hashes all of the user accounts the hosts and we'll start converting those into indicators of interest those indicators of interest are actually stored in a panel for the analyst for them to be able to see all the relevant information that they need now in a lot of cases we even store data inside of the indicators themselves so when we extract a phishing email for example that entire phishing email sample is actually laid out for the analyst inside of xor so no need to jump all the way into another tool just to see the layout of a phishing email so taking it maybe one minute just to

analyze fishing events now as opposed to 20 or 30 minutes per each fishing event because you gotta jump into you know one or two different platforms to do that now the second step is the incident management portion right this is where you classify events this is where you decide how you want to notify people this is where you want to decide if this should go to an analyst right now and it does a hell of a lot of filtering and and then it goes into the analysis phase the analysis phase we tried to do as much automation as possible without uh adding additional risk onto our customer base or into any of the customer environments so

we don't automate where it's not practical and not wise to do so now um basically a lot of the full automations that we do are checklists that analysts would do anyway so they're manual steps that we fully automate uh one great example is called the access anomaly classification which we've built and from that we're able to fully automate the analysis of an impossible travel activity and we can do it significantly faster than a human being and most times it's either something interesting or a compromised account that comes in so when that is done we then we um we then notify the customer through ticketing through phone calls um through other various uh methods and then we determine do we need to

destroy the actor now is there something in here that we need to do so again do we want to revoke a session do we want to isolate a machine and how do we give those tools to a socket analyst so that they can start taking action now and this these components is what makes a traditional sock or you know it's the difference between a traditional sock and and a future sock which is built on orchestration and automation and those are very very key elements now from here i'm going to hand it back over to walter who's going to talk a little bit about sword development okay thank you alexander as alex said i'm going to dive in a bit deeper on how

to do the soul development and we've been doing um using xor for more than three years now and we're currently on the third generation of the automation stack and we use cortex exo as an automation platform and then you have two concepts to actually create your automations one is playbooks which is drag and drop programming so you can actually define a complete automated workflow but without writing a single line of code and you have automations there you can actually host your own python code and do everything in there generation 1 and generation 2 playbooks and then automation stack we actually had the principle to do everything within playbooks why because we wanted our engineers to be able to

create those playbooks without needing to know python but from experience we've learned this had some disadvantages mainly performance now here you can see the a table with the how long it would take to for us to only do enrichment with playbooks and generation 2 and regeneration 3 and with by doing as much as possible actually in python code we had well we have improved performance by almost 100 which is well from performance perspective insane to to do it like that currently running our entire playbooks can be done while between well at least under one minute but we'll show you that after at the demo at the end of the presentation um one of the consequences of actually

doing as much as possible in code is that well it becomes a development development project and this also means that you have to implement some of the principles you would do um when doing development with my background as a devops engineer i did a lot of these things so the things we actually implemented is things like test driven development deployment pipelines and all of these things and one of the important things that we implemented to structuralize actually over development effort is um we we started using uh the scum methodology scum is actually an agile development methodology which allows you to structure your development efforts and the beginning when we started using xor we were just two guys um developing

playbooks on a production environment and and the requirements weren't clear we just okay we had daily calls with uh with with the salt manager alexander and we were just implementing stuff and we were just going going going um when you do this with two people it's kind of manageable but now we've grown our team into six um solar engineers who would they are dedicated um uh just uh right automations the entire time well if you grow your team you need a better methodology yeah with it what's come as common actually allows you or requires you to first define all of the requirements for development as user stories and everything needs to be cleared what you need to develop

um once that's clear yeah you need to plan it and and the nice thing about scrum is um you can incrementally improve all of your processes and at the end of the each scum sprint and when it's come you use a sprint of two weeks where you plan all of your development tasks at the end of the sprint you um you do a retrospective meeting where you actually reflect on the previous spin to see okay what went well what didn't went well and what well then better and we've been doing this for almost a year now and you really see that we have been able to produce much more we have been able to do a planning because in the beginning

it was impossible for us to um to commit to our customers okay this is when your feature request will be delivered that's something we can actually do now because we have an insight on how how many development how many development dolls can we actually do within one spin and also when you do development uh some other things actually are also really important first of all um would put everything into version control i have to make sure that everybody's working on the same code base very important one um have um don't develop on production you would think that's logical but that still happens we have a development testing qa and and um and production server and we use our

version control to move everything between those servers uh all of our code is on hit up and we use pull requests between different branches those pull requests need to be reviewed by by somebody else who developed it and all of those things so when you start doing this treat it as a development project implement the same principles as you would as when developing software basically that's a very important one so now to the interesting part uh we're going to show you a demo the scenario of the demo is where oh first we're going to show you the end user perspective and the scenario is the ceo opens a malicious document on this laptop um and and first we will show you the

end user perspective to show you how long does it take for ultimate automated mediation actions to be applied and then we will show you how this all looks in xor and and how how this exactly works uh so first we'll show you the end user perspective and this is the laptop of the ceo um we're going to show you exactly how long it takes so let's start top stopwatch and execute the the malicious document now we have cortex xvr uh installed as an edr tool and after 39 seconds it has already detected that okay something malicious has happened on the on the laptop this alert will now be ingested into xor um and organelles playbooks will

determine that well the cat is actually uh the malicious document has been allowed to run and it would automatically execute a v scan so here you can see after two minutes after the alert has been generated a v-scan has already been executed the second immediate remediation action that will be executed is a machine isolation this is a semi-automated process which requires an approval from a senior analyst so depending of course on how long the scenic analyst takes to approve it the machine will be isolated so here you can see that after uh well in total four minutes um one total four minutes the the the the threat has been detected our humiliation actions have been taken and the the cat has been

contained so how does this look like in xor here you can see the the xor interface and this is one of the dashboard this is actually the dashboard that the analyst looks at every day so if a new incident comes in the it will show up in this dashboard excel pulls alerts every minute so it can take up to one minute before the alert is actually um in our soft platform so let's refresh this dashboard until the left is there

ah there it is this is the alerts let's look at the playbooks that are being executed now it's already on the analysis playbook okay now it's already on the notification playbook and because this is the laptop of the ceo which is defined as a critical asset a different escalation part is actually followed so our analyst will get a phone call because this is a category security alert from the yeti security operations center this is a severity alert for wildfire malware laptop ceo press one at any time to acknowledge this alert to hear more details please press two thank you for acknowledging this so that's actually the goal that a senior conclusion analyst will get uh

because this is a critical uh severity alert uh and this is a gold chain that's well if somebody doesn't acknowledge it it will go to the next person so and now we can already see all of the remediation actions actually have been executed um and um well we'll see how this looks like um well maybe first of all i'll show you the interface itself this is the interface of pixel what you can see here is on the left you can see all the case details the severity okay it is a chemical asset it's not aggregated the incident is the outcome is allowed which means that the cat has not been blocked here you can see the the investigation

data so what the result of the analysis playbooks were so here it says the threads were detected but they were not all contained here on the here you can see all of the indicators which are actually in the alert in the incident so all the hashes the sample uh which user account which hosts all of these things here you can see the actual um malicious document that got executed and um we also try or we ingest alerts from different themes and we present a unified interface and just one layout to the analyst depend or that doesn't doesn't matter which source product actually the alerts generated so in the remediation tab here you can see all of the

available remediation actions here is a list of the ones that have actually been executed so here you can see okay a scan is spending and isolation that's pending approval so um this actually requires approval from a senior handlers so what they need to do is

first click the load button to see the approvals then they have to approve it and then submit the approval here you can see okay the status approval status is approved and now it's waiting to be executed and this actually runs on the job on the background so that okay here you can already see that the machine has been isolated so this is actually how this looks like in exo exo allows you to completely customize the interface which is really nice this is a completely custom tab that we created ourselves and with the automations on the background you can actually run all of the python code you want so it's a very flexible platform and this has

allowed us to actually automate as much as uh as much as possible in ahsoka okay so that was the demo um i would like to thank you all for viewing this presentation if you have any questions you can find us on linkedin and well hopefully you guys found it very interesting