← All talks

Building A Data Driven AppSec Programme With Kiln

BSides Leeds · 202037:0443 viewsPublished 2020-07Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
Mentioned in this talk
Show transcript [en]

morning everyone so I've set my name is Dan let's talk to you about building a data-driven app set program with kiln this is a open source project I've been working on in work and spare time security conference so what kind of presentation would this be without Who am I slide so I'm Dan paste numbers on the Twitter's if that's your thing I'm an app sec engineer simply business we're a small business insurance brokers prior to that I was a quality engineer and a software engineer and I'm also the father to a very sleepy boy I figure I'd go for an easy win early on because dogs they're good so let me set the scene so

in most development environments you'll probably have some soft CI setup and typically those pipelines will duel in sing order unit testing there may be so you eye testing is how I used to be a quality engineer and you might run the Skerritt tools as part of like a manual code review that might be static code analysis that might be dependency analysis if you've got a particularly mature CI set up you might have secured some of running as part of your pipeline itself so for our Ruby applications we have like bundler or there robocar and right man as part of our pipelines but if it's all find something and breaks the builds which in most cases it

probably should someone's going to need to log into Jenkins go find the bills that's failed if on the step that's failed dig through console logs figure out why it's failed and as a developer that means I need to leave the environment I'm normally working in whether that's my editor slant github issues and there's a security engineer I'm not gonna see any of this and I think we can do better and I think particularly in small teams and small companies we have to do better friend of mine once told me the prioritization in big companies is the difference between do with users in q1 or do we do this in q3 or 4 in smaller teams it's a

difference between do we do this now or do we not do this so we've already got limited time limited people and I think we need to be better at prioritizing where we focus that time and energy now businesses have been using data to drive decision-making for years she looking at the disciplines of business intelligence marketing UX product development data really is king and that would be things like a be testing conversion rates you send some of your customers down one journey with one set of wording and then you could send the other side down a different journey with air there's a slightly different wording and they'll measure the conversion rates and use that to go

that's the copy we're going to use and it's clearly working for them and I think that as security folks we should be doing that too as an engineer I want to be able to ask questions like this I wanna be able to ask which projects have got which security tools in there CI pipelines I want to be able to ask we've held some security training say we got stop Helmand to do hack yourself first have we seen any changes in our security posture and the three months since we did that what projects our biggest source of risk now we don't have the data to be able to answer these questions currently so we've just gone

through q1 prioritization at work and the data to answer these kind of questions could have resulted in very different decisions being made in that prioritization so hopefully you like willem dafoe are screaming out internally to tell me how so i'll tell you how hopefully go on so the idea behind kiln is it should make it nice and easy for you to run security tools if you're a say you're a no jer shop and you want to run NCC Scout suite to audit your AWS environment you should not need to worry about Python to deprecation Python 3 which version does Scout Street Suites run on you should just be able to run the tool get your findings and carry

on with your day Cohen also performs data collection on the output of these tools and then does some parsing some normalization and some data enrichment which makes it a bit easier to analyze and then you can use the data that has generated to drive better decision-making about where to focus your time and energy so hopefully the demo are going to be kind to me because we've got not one but two live demos for this so first off I'm going to show you how users would typically interact with Kaltura of command line tool and I'm going to be using the rails go honorable application for this and the bundle up audit dependency analyzer so this is

going to be in an interactive setting but where this will really shine is if you put this in your CI pipelines and it's running on every single commitment so which way were my got my demo terminal yep so so I've got a clone of rails go here and I've dropped a quick configuration file in there that just says what the app name is and where to fire all this data property so if I run kill until right Ruby tendencies so this is going to fire up under audit in a docker container it's going to mount my codes in that container front burner or drove it and we can see that we've got runner abilities in Puma and rack and

then on the right you can see that that's popped out in a slide time I've got set up so we can see what package is affected what version of that package is affected what app is in brief description of the problem what commit did we see this on CBS s score so you can get a rough idea of how severe it is and then a link to the advisory seating go find out what versions do I need to upgrade to to patch this vulnerability so there's more than just the CLI tool and how we got to slack there's a few bits in between that makes the magic happen so got the command-line tool which I've

shown you we've got a data collection endpoint and that's going to perform some validation and yeah validation and normalization to get it into a caf-co cluster which is where we store all of our data we've then got something that's parsing those reports extracting the individual findings and turning them into normalized versions so you shouldn't need to know whether this has come from a Python dependency analyzer or a ruby one you should just know you've got a vulnerability Amon your dependencies and then we've got one or more service connectors which gets killed talking to the outside world does that look like so all of the moving parts in this docker containers I realized not everywhere uses docker but

it makes things fairly portable so on the Left we've got the kiln security scanner so that would be the tool that you're trying to run that will then bundle up all that tool output some metadata about the git repo what environment you're running in and fire off to the data collection endpoint that's going to do that first pass of validation and that will make sure that anything that's downstream is working with good quality data that goes into a Kafka topic and then we've got the service at the bottom the report parser which is going to be looking for those raw events coming in and that's going to be doing the heavy lifting it heavy lifting of extracting findings and

looking up severity scores in the nist NVD that's going to republish it to a different casket Opik and then we've got the slack connector that's listening for those dependency events coming in and that's going to formulate a slack message and fire off the slacks API endpoint now the slight connector is look like kilns fairly young require first release couple weeks ago and so the slack connector and bundler auditor kind of our MVP and but we are going to be expanding those in upcoming releases so there's more tools and more service connectors available so it's a modular system built using an event sourcing architecture if you're not familiar with the term event sourcing I'll get onto that in

just a sec and is built around an Apache kapha cluster for data storage everything is packaged up in docker containers like I said everything is written in rust which I personally love and it's all open source and MIT license so you can use it for pretty much whatever you want so I mentioned event sourcing so way a good way to think about this is if you imagine a bank account instead of directly updating the accounts balance every time you have a debit or credit transaction instead you would create an event which would represent this is a new bank account with a zero balance and then every transaction would just be a new event in that stream that says this

account has had this money debited this accounts like this one credit it and then if you wanted to know the balance at any point in time you just replay those events from the beginning of time up to right now and that will let you go to your balance now Kefka which is what is we're using four of our data storage is a and this is a bit of a mouthful a disjointed fault-tolerant append-only commit log no it's not a blockchain I'm not here to talk to you about blocks so basically you can spin up a cluster of servers that will automatically shard data across those servers and then if you've got a outage in a data center say you're

running in US East one and all of their ec2 nodes suddenly lose network connectivity just picking a a hypothetical situation there's never happens home then Kafka up to a certain point we'll carry on serving transactions so keep accepting data and keep allowing consumers to read data out and then once those nodes come back online data will be rebalanced everything goes back to normal it'll actually build a record of changes over time which works quite nicely with that event sourcing architecture that I talked about and it means that each component can build a view of the data that they need so you don't need to model your data in such a way that you have to

promise' between all of the things that need to use that data you model it in like real world events and then each system can handle those events and build a view and a shape of that data that suits what they're trying to do so hopefully this diagram is visible to most of you can can you see that at the back yeah so this is a way of visualizing what Africa cluster looks like so you have a number of topics and there are partitions within those topics so producer will send messages to a topic and then consumers on the right will be reading messages from a topic and Kafka has these things called consumer groups so if you need to deploy

multiple replicas of a component you can stick them in a consumer group and then Kafka will manage breaking up that data about as evenly as it can across those consumers and then if one of them dies that partition in that topic will get assigned to a different live consumer and you don't need to worry about any of that so caffeine also makes it really easy to react to events as they happen so it's it lets you set up like a pub/sub messaging broker model and it also allows you to design a system that can recover from their processing bugs so say you're doing some calculations on events and you realize there's a bug in that calculation you're getting the

wrong results you can fix the bug deploy your patch version of this component and then rebuild data that you've calculated from those events and everything's good again and because you've got quite loose coupling between components because nothing's talking to another component directly it all goes via Kafka you can plug in new components quite nicely so I promise you data-driven decisions so we're going to do a second live demo and I really hope the demo gods are going to be kind to me this time so we're going to be using Jupiter notebooks in Jupiter hub if you've not played with Jupiter notebooks before and they're wicked tool for doing like interactive data analysis and exploration it lets you mix blocks

of down and blocks of it started with Python there are other language cones available if you like things like Julia or R I think you can look like in Java and there's a whole bunch of them but it lets you mix markdown blocks and executable blocks of code and you can then share that notebook with someone else and they can reproduce everything that you've done and it's got documentation in mind we're using spark streaming to read data from our Kafka topics and we're using the Python panda's library to do a lot of the heavy lifting if you've not used pandas it basically gives you like a spreadsheet in Python which means that the analysis afterwards is quite nice and what we're

going to try and get to is a view of all of the CVS that have landed in the master branch of three open-source Ruby projects so again with OWASP rails go the mastodons code base and the get lab code base and what we want to find out is what is their average time to remediate a vulnerability in one of their open source dependencies so demo time and if I poop over to the troopers a little bit there it is and I'm quickly going to switch back to mirroring so I can actually see what I'm doing yeah where's it gone there it is okay so we're gonna skip over the documentation at the start of this after

today I'm hoping to publish some documentation on how to set up the demo static I've got running here and how to run this notebook and generate all the data I've used for the testing so you can ever play with kill and get your feet wet so a lot of these blocks aren't going to be producing output until we get into the really interesting stuff and if you see an asterisk that means a block is running so prune in some dependencies up front pi get two for interacting with a git repo pi spark for doing all the reading from Kafka fast Avro I'll explain what have ro is in just a second but that's going to be

decoding the data that comes out of Kafka and then pandas for our data processing so when I spin up a spark session so when I was building this I was doing this on a t3 medium ec2 instance which has got like 4 gig of ram and one call and I kept running into the Linux out of memory killer and it manifested as these cells would just sit and hang forever and because I'd eaten up all my RAM and Linux was like no RAM for you and killed processes including behalf of my demo infrastructure so now we're running on a t3 a 2x large and with like 32 gig of ram and 8 cores which it's much happier

with so we're going to read in all of our dependency events from Kafka it's all compressed so we're going to decompress it on the way and we're going to spit it out into what's called a panda's data frame which is that 2d spreadsheet I alluded to now we're working with about a million events last time I looked so some of these steps takers take a few seconds and the decoding in particular is a very CPU intensive task so we're paralyzing this over seven core so we've got one left over for the system to keep doing its thing dividing those million events up into about 30 partitions and then decoding all that data so I mentioned this is all encoded

in averin so that's basically a binary serialization format that brings its schema with the data so you can involve your data schema over time and still give it a week data cause it puts its own schema so we're decoding all of those events and this takes a second if you were using if there were fewer events like if you were just analyzing like rails go for example this runs in a couple of seconds yeah we've got like a million here and yeah this is known as an embarrassingly parallel problem so there's no data dependencies on any other of these events so you can paralyze it and throw it many causes you've got it so we're gonna so at the

moment we've currently got a roughly 1 million row 1 column spreadsheet which is not very helpful so just going to break that out so each object has its own row and then each field in that object has its own column I'm gonna pick by application name and we're gonna have a look at how many events we've got so gitlab has got about six hundred and thirty thousand Mastodon about hundred and ninety thousand and then rails go to sell about twenty thousand it's a much smaller project and this is what a dataframe looks like so you've got a number of rows and then we've got event versions my teas what application this is in the git branch which in this case

is none because I was running this sequentially over every commit and master so we were in a detached head state and get no branch for them available and we've got commit hash that we saw this in the timestamp of when we saw this the package that's affected its version the CVE ID an advisory URL and unfortunately some of these are dead links because the Ruby set database which sits behind bundler audit haven't migrated their advisories to stop using OS VB Rick OS BBB and but if they don't update that soon then the report parser can take care of that for you and replace those dead limbs with live ones which is an example of the sorts of

things we can get it to do to make this data easier to work with roof description of the problem and a cbss Scott now obviously these are floats remember kids don't compare flows to float so they're not quite as exact as you think so now we're going to group that data by application name and we're going to go through each of their git repos and we're going to find the commits that we saw these CDs in find the first one and the last one we saw it in and then we're going to try and find the commit after the last one we saw ed because at that point the problems fixed and we don't forget the times that those

commits happened I'm going to track him in a dictionary keyed on the CVE ID so we can see that rails goats had 74 CV he's land in their master branch it's a vulnerable application so not that surprising mastodons had 57 and get labs had 139 which i mean it's a big project there's a lot of code there and it's been going a while so not that surprising cut me some slack so now we're gonna go read in all the data from NIST's national vulnerability database this is just that as compressed Jason on disk it's like 20 years worth of data and they provides these nice data feeds with meta files which lets you download like 400 bytes and then work out if you

need to go download them several megabytes of Jason if it's been updated since you last looked at it and we're going to go through piles out all the CVE IDs and when they were published and again we're going to stick them in a dictionary because oh one lookups in this context really nice and and then we're gonna go grab the CVE date for each of these and stick them in the dictionary for each of these applications now I've had to be kind of defensive about how I've written this code and some of the Seavey's we see in the applications and I've come out of under audit the CDs you go look em up in this database and there's nothing there

so turns out what that is is someone has reserved a CVE they've got the number they've published that somewhere but miss doesn't know they've published it so they don't have an entry in their database so a good example of how your data if you're pulling it in externally you're always going to be 100% perfect so we've currently got CVS across the top and then publication dates etc as rows so we're going to transpose those and we're gonna drop every row we're either the commit that fixed this on durability doesn't exist because it's still in master or the publication date doesn't exist because of the issue I mentioned earlier any data buffs in the room might recognize this as a pivot

table whenever going to do some date Mouse and we're going to subtract a CV publication date from the date when it was fixed and that should give us an idea of how long it took to fix that vulnerability now some of these are going to be negative so that means that the application upgraded from the vulnerable version of the library before the vulnerabilities we're not interested in those so we're gonna drop them and now we can see that railways gopis had 45 actual vulnerabilities in master master Dom has had 24 and get labs had 67 so quick show of hands who remembers The Price is Right yeah nice to see you too see you thank

you so we're gonna play a little bit the price is right and we're going to look at the average remediation times for these projects we'll start with rails go so rails Koch has got an average remediation time of 230 days so who thinks that mastodons is going to be higher than that no lower than that yeah so Master Don has an average of 62 days which I think might be being skewed by this one down here their longest one was 265 days which is a bit of a whopper now there might be a legit reason for that they might have triaged the issue realized they're not using the vulnerable part of the package that's affected and gone now we've got bigger

fish to fry and moved on to something else which is fair enough or it might be that they didn't know about it it might be that upgrading dependencies has been flaky for them in the past they're not got good test coverage I've been new but upgrading their dependency breaks things or it might be that that part of the codebase didn't have an owner anymore so if this was an internal project and say it's a big monolith and no one no one team is responsible for that part of it anymore everyone will just go ask someone else I'll fix it and move on with the stuff that they've been told to do so get lap do we think get labs remediation time is

going to be higher than master Don's butcher hands hiah hiah hiah who thinks it's gonna be lower ok so about an even split so it's higher 79 days and again I think it's being skewed by this one which is a whopping 18 months that that was sat in master 18 months and again could be a legit reason maybe not so let's go have a look at those so we have a look at mastodons so in a sort but remediation time grab the last one and it's CVE 2015 9 to 8.4 so we'll do cheeky google so we can see that this was a vulnerability in the Omni oath ruby gem it had a sea surf bug that

allows you to link accounts without intent permission interaction or any feedback to the user basically account takeover through sea surf when accused in rails with a whopping 8.8 cbss score and that was there for about two hundred and sixty days so it might be that they weren't using it in rails they could have been using it in sinatra which is just a different ruby web framework in which case they might not been affected which is why it could have sat there for 200 days so get lab is CBE 2018 1 million two hundred got there were a lot CVS in 2018 so this one is a bug in doorkeeper version for 20 and later and

it contains an access control issue where the token replication API is for tokens would sometimes not revoke those tokens which meant access was leaks and till the token expired and so not quite as severe as account takeover through sea surf but still a respectable 7.5 and again it might have been that gitlab weren't using the vulnerable part so they were like yeah we've got other stuff to do we'll upgrade that when we need to so hopefully this gives you an idea of the sort of data you can get out of what's being generated by Kelm and i'm going to hop back to my slum to a quick

yep so there are some other ways of analyzing the status so I've done this in like a batch query but you could just as easily do this as a streaming application reacting to those events as they come in and you could have those stats being updated alive and have them sat on a dashboard obviously I've been doing this interactively so this is how you could do like data exploration but you could just as easily script that process and you don't have to execute each code block in turns you just go boom and data you can use this to identify common causes of boats so in the upcoming release we're going to be adding support for ecstatic code

analysis tools so you could start looking like are we seeing lots of particular class of vulnerability in particular parts of our application you could also use it to help prioritize your vulnerability remediation so what does the future look like for kill so we want to build a reporting dashboard so for common queries like I want to know what projects are running what tools and are they running them in master or feature branches they run in the middle local or on CI servers we want to add support for tools for other languages so we want to add support for Python we want to add support for Java scripts that sort of thing we want to do tools

outside of source code scanning so think NCC Scout suite like I mentioned earlier for letting your kind of primary and Claire for docker images so being able to find out if you've got vulnerable libraries installed or if you're running an application as route that should hopefully pick that up we want to add support for more service connectors so like I said slack is our MVP but we want to add support for github issues Trello cards JIRA tickets if JIRA is what you use at work and we want to release documentation or better documentation we've got some but currently there's no documentation on how to deploy this that will change hopefully in the next two days because

I've got notes on how I set up that big demo stack and some sample Jupiter notebooks on how you can analyze this data and start getting some inspiration for how the questions you might want to ask also we would deliberately cut some corners to get things ready for b-sides today but in the next release we want to go back and we want to polish all those out and I want to make sure that if you're deploying kion in production it is going to be rock-solid so that you do not need to babysit it because it shouldn't make your life harder it should make it easier so if you want to throw a killing for yourself it's all on github and if

you want to get involved quite frankly I would be thrilled so most of our development is coordinated in github issues and we've got to get er if you want to discuss something meteor say if you want to make some like architectural changes come and have a chat with me on get er all of our contribution guidelines are in the readme is the documentation or tools you need to build all the components we also linked to my code of conduct in there if you take part in kiln you are expected to adhere by that code of conduct I think that's only reasonable these days and like is that everything is written in rust now if you don't know rust there is the ways

you can get involved so by far the biggest way would be try and kiln out on your projects and submitting bug reports if you find anything if you've got tools that you want to support or services you want kiln to connect up to raise an issue on our backlog I'd love to hear about it if you find issues with our documentation say something's unclear there's missing documentation there's typos I consider those bugs in come with our template for it please tell me about it so I can fix it and if data analysis is your thing if I get you going in the morning I would love to hear some ideas for how we could analyze the data in different

ways and be able to answer different questions so I've just realised that my home 27 minutes short but first time speaking nerves I've got a little bit quick but with that I've been done Murphy this has been building a data-driven app set program with kiln thank you very much for your time any questions [Music] [Applause] yeah you mentioned that your must be working on Ruby stuff at the moment when you mentioned about Python as well do you have plans to stop to support but like c-sharp and Java products um yeah so I don't currently have plans because I've not used c-sharp in about eight years so I'm giving real rusty rusty and I've not used Java since I was in uni

which was quite a while ago but if you know about good like dependency analyzers and static analysis tools for either of those languages preferably open source ones and so that anyone can use it and I'd love to hear about it please pop a tissue on Kim's github repo yeah I've not yet because there's not any rust tooling that I've added support for just yet but that I should probably do but there is like the cargo audit projects does something similar to bumper all that so it's auditing your dependencies looking for vulnerabilities I have run cargo order over it and we aren't using any vulnerable dependencies and but I'm not aware of any rust like static analysis tools and

that's definitely something I want to dig into a bit more and and hopefully find something yep

so life supermax use it rather I love distant issues yeah so for the record amy and gentleman was asking if we could use kiln to find vulnerabilities in a number of projects particularly for supporting things like bug bounties so absolutely so if you deployed a kiln stack and was running killing over commits to each projects so you could serve something that was watching their github repo and it was cloning that repo and running the tools over it then you could absolutely do that and look at stuff that's still open in their master branch you could look at and the doctor containers they're producing and look at possible vulnerabilities in those and analyze the data to look for stuff that hasn't been

patched yet and we're still seeing in master likewise if you were using things like Scout sweet when we had support for that and if you were running a pen testing job and you wanted to own it the clients cloud environment and look for potential Prebys routes this would help orchestrate those tools and give you findings to say hey you've got an iamb role that's overly permissive and could allow you to do if you've got SS RF in this component and talk to the cloud metadata service then you could go and talk to as three buckets even though you shouldn't have permission to do that yeah so most of the moving parts are docker containers so if you've got something that you can

throw docker containers at it's fairly straightforward the one caveat is there's a calf cut cluster somewhere in the middle of this now for the demo stack I've got a docker eyes version of Kafka and that depends on something called zookeeper for doing leadership elections inside a cluster for the love of God do not run your own Kafka cluster it's really painful and it takes a lot of setup and it's very fiddly so unless you've got a Kafka system in full time just use a managed offering like Amazon msk or confluence platform they're fairly cheap and it saves you a lot of headaches so multiple small containers so you only have to deploy the bits you're using so for example if you're

using Trello and not JIRA you shouldn't need to be running both of those components just deploy the bits you need and pay for what you use in the only bit that's not a docker container is the command-line tool itself and we're about to be releasing a homebrew tab so you can install it on Macs a dev repo an RPM repo and we will be bundling up the source code is part of our release process oh sorry okay yeah it's coming on browsers yeah it's running this from a MacBook I don't want people to have to install the rust compiler and compile things from source because like I said you shouldn't need to worry about rush you should just get

a container or get a binary and carry on with your day oh cool and if you do think of any questions later I'm around all day I'll be at the bar later if you want to chat over a beer and I guess you've got 20 minutes before the next talk if you wanna grab a coffee sorry for running a bit quick [Applause]