← All talks

Six Degrees of Infiltration: Using Graph to Understand your Infrastructure and Optimize Security Decision Making

BSidesSF · 201830:332.8K viewsPublished 2018-04Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
Mentioned in this talk
About this talk
Sacha Faust - Six Degrees of Infiltration: Using Graph to Understand your Infrastructure and Optimize Security Decision Making Current infrastructures depends on multiple technologies and third party infrastructures that increase security complexity and makes it very difficult to have a clear end to end view of the overall state and possible risks. Existing approaches were good investments but a few challenges were observed * Some duplication - Broad set of dedicated services that collect and visualize similar data. * View of the environment relies on broad set of tribal knowledge * Recurrent questions difficult to quickly answer - “What is my exposure” - “Does this vulnerability affect us and in what way?” - “What priority should we allocate to this issue?“ * Moving target problem - Does infrastructure match expectations at all time? * Transitive risks or lateral movements exploration not possible cross dependencies * Overall state of the infrastructure hard to visualize and validate * Difficult to apply internal context to external intelligence feed The talk will provide insight on a graph solutions explored by Lyft Security Intelligence team to tackle knowledge consolidation and improve decision making. Attendees of this session will be introduced to methodologies and off the shelf tools like Neo4j, we use along with the release our open source graph based security intelligence platform they can use to get started and collaborate.
Show transcript [en]

[Music]

thank you sorry slight problem I've been 10 years at Microsoft so Mac is new to me so quite a quite an onboarding experience at lyft hi my name is Sasha Foust I've been doing security for about 20 years I don't feel too old it just means I have a lot of experience I have made plenty of mistakes before so we're here to talk a little bit about billing intelligence graph like what's the value of this why did I start billing in Intuit where were some of the reasons why I looked into that problems so overall in terms of agenda is some of the challenges I've seen in security over all 20 years I'm missing patches of

hair here and there I've definitely banged my head against common problems at different companies I did the startup route in the late 90s went into hiding after the dot-com boom into consulting and then went back into hiding at Microsoft for 10 years primarily focused on security and ended up my career leading the Cloudant Enterprise red team which was quite interesting and I recently moved to lift back last November so I've seen a few challenges some in more political than other environments I would say lift is not that political yet so we'll go over some of the challenge Intel service overview like sort of the how the what and the why and some of the use case as well so

the use case are going to be pretty basic kind of exploring a little bit what I personally try to do in q1 since I just joined but it's kind of an overview of their direction lift is taking in terms of risk management automation and exploration as well so a lot of companies are using the stick and carrot approach I've seen it work I used to think it used to work I've learned over the years that it doesn't really work there was a great presentation I've seen called the puzzle of motivation from Dan pink that basically said at some level the carrot and stick doesn't work is south is better enable people and the mode really motivated people are actually

going to go do the right thing as long as they feel enable so the Karen in in stick approach can sometime work I've asked plenty of people especially when ever was interviewing last year about what is a true security carrot and the answers were a little bit unsettling and I thought I was talking to some of the leaders in the industry we're talking about like some of the basically big companies and the jitter of it was the answer was accountability it's like well Kara is we drive accountability towards engineering teams as is what does that really mean like how do you enable them to actually have accountability and they said well we're not quite sure so it was a little bit

unsettling calm unsettling conversation I've seen a lot of security teams used to stick too much basically jumping right into conflict from the get-go and in some in some cases never getting out of conflict some of them get out of conflicts and drive more in terms of compliance like the people are kind of doing what they're being told because they need to that's not really approach I personally believe works more towards commitments in teamwork so how does the security team actually enable teams to make the right decisions and and create this partnership so that we all work together there's this very constant fight between red and blue at the end and enterprise you're all blue so you

got to work together we're just kind of separating the duties they're very common issue I've seen is communication you see when one person is saying this is hi because it's in Korean it says because I said so and the engineers on the other side says no it's low and and in the translation is basically because my focus is making money right so there's this huge communication problem in the industry and that's basically been a deterrent to actually make progress and again it's always around like right away right off the bat if we don't communicate well we're right away into a conflict problem the only issue is that a lot of engineers are looking at dependencies they're looking at

software dependencies but they really look rarely look at other people or the team's like if your team is depending on another service you're basically depending on that team right so building this sort of human connection that we're all depending among each other is something that I think we need to revisit as part of a in in the industry this leads into building empathy within your offensive teams don't point fingers all the time be a little bit more diplomatic but the overall is that we're all dependent and the reality is that we're completely blind about a lot of our dependencies a lot of business holders or to some extent blind to the risks that they're taking and it's not an intentionally

blind hopefully your seaso is not intentionally blind but the reality is that there's no way to visualize no ways to explore these things so that we're kind of running with blinders on and hopefully one of the sort of bad events won't happen the reality I've learned at Microsoft and I've learned is the hard way not once but multiple time is assume breach right you're you can invest as many as much money as you can into protection the reality if there's a will there is a way and some of the nation's sets are very well funded you will have to live in a breach so assume breach and let's enable exploration so that you're not running with blinders on so the

reality is how do we make sense of all that that data how do we actually get a representation of the state of our infrastructure at any point in time and this will become gradual this is not something that you invest overnight right so how do we actually look at the overall system I started looking at this problem but maybe five six years ago when I started the red team in in Azure where I started having problems in terms of there's too many doors that I can open which one leads me to where I need to be where I need to go and is there a route to my objective past that door so it was this huge problem of scale I was

trying to make sense of all this data maybe I'm getting old but the mental graph I was building in my head was just running out of battery I just could not figure out a way to kind of digest that but I also felt kind of like the recommendation I was making to people weren't very useful they were kind of like one-trick and we didn't really kind of enable them to explore further one important thing as well is to evaluate effort and reward how do you know that if you spend an amount of effort to resolve an issue what's going to be the reward what are you really fixing the problem are you playing sort of a

whack-a-mole approach so we weren't really having systems that would enable us to get there and then the whole concept of regression what are my expectations are they're being met now are they being met tomorrow and so on so very quick slide on sort of a basic deck stack I do use graph database I think this is more and more common team in the industry of using these I think the main reason was a relational database are not really good at multi-hop explorations and so on do you use the cloud to actually you know automate the whole process I've build this for Azure in the past moving to live I kind of transfer to AWS somewhat similar functionality so

the idea is you have this this uber graph database that maintains state then you have a basic cron job this is again very simplistic infrastructure you have a cron job that triggers a graph sink so you're kind of rebuilding the state every time the cron job run runs so that you have an updated graph you can either take an approach of updating the graph which is a little bit harder or you can if your environment is small enough you can kind of destroy and repave every time and the graph is a combination of multiple intelligence modules that are responsible to update parts of the graph that they own so you have a double yes in this case Google API github we do

have we do use JIRA not Jarra hopefully on that name it right and and the sort of lift HR infrastructure like humans org structures and so on all of it is accessible by lift employees so lift employees are enable to go and explore these things as they see and there's a web UI in binary protocol as well that's enabled by neo4j so I personally like neo4j just because of the query language it translate very well it's easy to understand if you have larger data I think in the case at Microsoft when I left we had close to a billion nodes and about two and a half billion edges neo4j can scale to that product to that level but I'd rather

don't don't invest too much on neo4j clustering that's all I'll say and there might be other platforms that are better for that so right now although what we have is running out there Community Edition so it's a kind of a low cost of exploration to actually build a case and as long as you're not making financial business decision the licensing allows you to do that so let's dive a little bit into graph I'm not using Visio but it looks like I'm using a lot of Israel so I'm just representing basic graphs so if we look at AWS a very small chunk of the schema you can have a degree s users data or members of groups

and each groups and users or part of an AWS account very basic and AWS user who has access key always worry about those access key we tend to create them and kind of forget but they do exist so very simple you can use AWS api's we're currently using the I started using the AWS CLI just to go get the data again I'm very new to this whole Mac development and in AWS so I kept it very simple the engineering is more in the explore phase versus like more mature so all I do is just use lists groups get groups let's access key and list users very basic I think a JSON data and import it into neo4j so now we have some

basic representations of user and group membership right we can do the same as well with compute or a degree as ec2 where we can have representations of in senses that you have as well as there's the easy to reservations the autoscale group and everything is mapped back to an AWS account so at live we do deploy using auto scaling group so that's why I started looking into the mapping there and the mapping again it's very easy AWS api's or the AWS CLI will give you the data so in the case of reservations instance and membership of reservations is easy to describe instance it will spit out everything that's there and order the scaling group is described order scaling groups and

we'll give you the membership right so very simple graph okay now we know like the mapping between ec2 instance and basically auto scaling group and then we can start augmenting that graph a little bit more with sort of the question is what's installed on those ec2 instance right so if we can map that into the graph we start having a little bit more knowledge of dependencies that we're taking on modules so what we ended up doing here was quite simple just basically list out all the package that's installed on each instance so you do need some sort of agent on the instance and it's themselves and they can list out the package and right back

to the graph right so in a in a very low-cost way you have ways of knowing all in all packages that are install per each instance you can also use a wes query I think where we've been looking into that to even have more information but just listing the package is good enough so you end up with sort of a mapping to each ec2 instance you don't need any scanning like host-based scanning for that matter or the traditional network scanning and so on you have actually everything that's installed and right there you can enable exploration around you know what is the most common package that we have or is our services actually running package that we don't need so right off the bat

you can start actually looking at trimming down the number of package and if you Abell exploration to your engineering team they can go and do that themselves nowaday more into the security aspect of it what I'm interested in make sure okay because I always see a slide ahead I think it was ahead of time one other investment or one question that I personally had is okay we have the ec2 instance we know the package what are the vulnerabilities out there and I'm not a huge fan of a certain company that does network scanning I've seen them fail miserably over and over I don't really like that approach it doesn't scale very well for our environment so I was kind of interested

and one of our engineers made us realize that when butdo actually has a CV tracker that's accessible for free and all we had to do is map the package that we already have we already have the name of the package and the version number so all we have to do is just sync the book to CV tracker and create those relationship so in a very I think it took like a day or so for that pull requests that come in in about a day we had all packages in all CVE so that was very very cheap and that allows us again to explore further it's like okay we're the high risk okay those high risk or related to

package do we need those package and so on so right off the bat in a very effective way I think in a matter of days we had that exploration that was actually pretty meaningful now the side effect of that exploration is when you start measuring again you know the beginning of change is actually awareness so you have to as a intelligence team you have to make you have to kind of disseminate the knowledge so you have to make people aware about what we have and it can have kind of a knee-jerk reaction effect which we somewhat did I'm not necessary for I see so is here he actually keeps pretty calm so I like that he didn't

freak out too much but he was like okay this is kind of interesting data in my previous jobs I've had some see so that would kind of jump in a room Lipsy serves actually pretty calm so we had like now an awareness of the large debt that we have we actually can actually get a pulse even though we had a good idea of the number of issues in terms of vulnerabilities that we had in terms of tribal knowledge and so on kind of a gut feeling it was kind of very empowering to see those numbers pop up but those numbers were sort of like okay there's an ec2 instance right so we didn't really know who owned that ec2 instance

but at least we had some good data good data there the reality is that when you're looking at that data everything looks urgent right if you look at the number of high or critical and so on they all look urgent if everything is urgent nothing is urgent like if your priority is by an area like okay either you do it or you don't do it at the end of that you need to make a decision but at a higher level if you're looking at like it needs to be fixed or not you're not going to make that much progress I think you need kind of a greater more calories in there there's also the sense of avoidances well we

already knew this that problem is too big let's just avoid the problem or struggle under the rug which is not sort of our thinking but I've definitely seen companies say yeah we're fine we're not compromised at least we don't know that we are therefore everything is everything is running fine so how do you actually tackle this this problem how do you actually make sense out of it you can I don't feel comfortable going back to an engineering team or an infrastructure team that says you gotta go fix everything I mean I can take that approach I've taken it before a couple of times in my career that just doesn't work that doesn't get the needle moving

so what we need is a better insight in terms of privation but also applying more context business context to our decision making most of the tools out there will spit you out like CVS s rating and says okay that's the truth okay if you look at a 10,000 foot view maybe but the reality the context of our company may change that in different ways depending on what we want to do depending on the risk tolerance that we have depending on the exposure as well so how do we answer those types of questions in terms of exposure risk tolerance lateral movements as well or what I call transitive risk which I'll dive a little bit later is is the ability that to take

a vulnerability that appears low and actually that vulnerability allows you to pivot towards high-value asset how do we actually enable that in through our price sedation so going back to the ec2 instance so we have a bunch of CV right there basic example and we kind of know we want to know like a very basic answer is this integrator or not is it internet exposed or not I mean if you break down the series of questions one that comes to mind is is that even exposed to the Internet now the reality is that you need to worry a little bit more about insider things as well but I think that's a good way of slicing that so how we infer

internet expose is again we go back to knowledge base we go back to how our AWS is configured and I required us to actually augment the graph a little bit more in looking into group membership network interface obviously what are the network interface we're at ten minute all right okay I'll go a lot faster network interface easy to security groups and load balancer as well and the inbound connectivity as well so with that simple graph now we can infer that if if I simplify just inbound everything you can look at IP range down to inbound down to the exposure and we can infer that okay if the IP range is zero zero zero zero zero we can infer that that

ec2 instance is actually exposed now from a load balancer perspective AWS has the scheme internet-facing label which tells you that this load balancer is interfacing so every ec2 instance that are related to that will also be internet exposed so right there you've kind of starts slicing your wrist level and I'll go fast but I think those those presentations will be made available on the right side is actually the neo4j query so you can see it it's actually doable one other part is ation is the reality is it production is it staging is it one box as well in in that case again we actually go back to very simple mechanism is we went back to the odor scaling group and I

realize that leaf we have a naming convention which may not be the case for you if you don't have naming convention for your resource maybe you have bigger problems maybe you want to revisit that first but in our case I was lucky enough to have a naming convention for odor scaling group which breaks it down by service role uses in a grid so with that information I can actually infer that if the ec2 instance is a mem is has an older scaler group that has production I can infer that that asset is a production asset very very easily so again apply your internal knowledge how you can actually infer these things but for us it was actually quite simple I

was pretty lucky there we can also based on that in further based on the service part we can now associate auto scaling group there is given service so now we're starting to make the route in terms of identifying ec2 instance to respective service to maybe potentially owners let's go try to solve attribution problems so with a very simple graph and again these are hours of work there are not weeks of work we were able to produce a patch prioritization by service based on production intern Expos and the overall patching debt so now you're making progress actually you can actually go back to your to your seaso and says ok we think that we should prioritize on these ones which would be

production and internet-only right so that actually allowed us to to drive the change pretty quickly going pretty fast the other problem I've seen very often is security engineers are screaming go fix your bug but they keep the buck to themself they keep it in isolation either in their own TFS or zero or the github and so on so I wanted to actually route the issues make the engineering team again accountable I do want to drive accountability to engineering team but I want to enable them to see that so if the bug is actually filed within their range of vision they have a more a better tendency of actually looking at it versus oh actually I don't I need to

go look at this TFS or Java project or sorry JIRA project that security owns like I'm not as an engineering manager I would not look at that not because I don't want to is just I got enough things in my place so make it visible in terms of my dad so I was pretty fortunate lifted microservice so we do have a manifest file that resides in our repository so I was able to infer quite a bit of data out of that which allowed me to map to the team and map to on-call so we do we do use page of duty with page of duty policy so I was able to extract that interrogate the page of duty and

actually start mapping to an actual human directly so now we ended up having an investment in our HR structure in terms of who are the humans that we have actually in place and the human can also have like a work term in terms of full time contractors and so on so now you can explore like the difference between humans full-time employees and contractor access in org structure like who fits with which bucket and who reports to who as well so for us if you have the G suite you can look at reporting structure if you're using I did this at Microsoft with AD in exchange have that mapping as well in again the graph is a living thing so so

when people reorg if you're prone to reorg like the graph would update itself as well it actually allows you to uncover that resource has not been fully transferred you can also keep track of X employee basically terminated users and verify with the graph if there's still have access for example so that was pretty meaningful now how do you actually send and make sure that your buck gets stick sticks at the right place I went ahead and interrogated JIRA for all our project and that was a little bit painful manually curated the team mapping to your project and which attributes of each projects would allow you to actually do the mapping but I only had to do that once so even though

it was tedious I'm thinking maybe a day or like you know browsing around ended up using the graph actually give me directions of okay who is on call per service let me look in Jared actually see where that person had their boss and I usually gave me an indicator of like okay where's the mapping right so again to you the graph will help you kind of do this mapping so now we had a nice way of actually mapping severus and filing the right bugs at the right person and making it visible to the engineering team which is pretty impactful will go very quickly I think I'm okay that filled that did not fail I just don't see it here let's see it so

in neo4j now you have a mapping between you take the CVE on the left side down to the human including your project and on call as well so now you have a direct route which you can take and build automation around this so now you can build this sort of like guided but cannon that can take multiple inputs of your vulnerability scanner of your web scanner hygiene problems or patching and route they all these things bundle all these things to the right owners and right engineering teams so for some of you you're still probably curating these things manually this will allow you to actually drive the automation hard going very quickly exploring isolation so most

most of us have isolation problem we want to maintain isolation between infrastructure or from at the attacker side more kind of the pivoting opportunities of lateral movements we can apply the same structure as well so if we have an ec2 instance that is deemed to be production true sorry that is deemed to be production true and we want to look at is there isolation between production easy to instance and staging easy to instance we can look at the ec2 subnet for example if they Sherriff submit then that subnet allows lateral movements so it that becomes very easy to implement if you've already label your ec2 instance so the reality is if you have an ec2 instance that's production that

easy to sum that should be label as production but if you have an ec2 instance that is staging the same subnets to be also label as staging and now the question becomes very easy is give me all the ec2 subnets that have both production and staging tag and then you have your pivoting opportunity or gaps in isolation so you can visualize it through the o4j you can use linkers to visualize these things but you can also return a count and actually measure that risk for burndown for example so that's basically it and furthermore you can derive automation where you can enable engineering team that have their own assumptions including security and IT and so on it says if I can ask a

question to the graph that can validate my assumptions meaning if I query this it should always return zero and if it doesn't return zero I should have a bug then you can start building this automation of enabling your engineering team to ask those questions so that's something that we're currently looking at right now so or any deviation from expectation yes everything is good radiation of expectation it basically there's a deviation go leverage your older bug following to actually go clean it up think I'm getting slowly kicked out okay let's go real quick so you can also do blast radius exploration or transitive risks is what is the impact of a low value acid actually move

forward so the reality if you really look at it quickly if you have since you have all the CVEs and you have types of cv let's say or see for example that will give you an entry point into your environment especially if it's internet exposed and you can ask the graph what are the steps needed to move to a high value target that is not internet expose so that can be easily formulated in the graph let's see I'm struggling with these so you can actually ask that question and and hopefully in a recording you can look back at the query I build but now you have a CVE it happens that the same cv is affecting both instance but each

instance are pivoting over the ec2 some net that we talked about earlier and now you can actually see lateral movements possible with a simple CVE so you can continue exploring this was heavily leveraged in Red Team operations in Azure where we could model TTP that we wanted to exercise get a quick quick visibility in terms of the routes meaning is there an F if we spend efforts of going through that route what's the end result we can anticipate that and actually take that route down to automating sort of billing the super botnet moving forward we're looking at applying this to data privacy how do we map access invalidation for data privacy services service dependency let's look at the V PC log and so on and

if you're curious about how to actually use this approach in a more attacker mindset offensive mindset I have a series of presentation I've given over the years that drives a lot more into sort of my my attacker mindset I'm more on the defensive defensive side or risk reduction now so thank you sorry I went faster last few ones but thank you [Applause]