
cheers okay folks uh so I know how hard it is post lunch so we're going to try and perk you up a bit and make this as interactive as possible because I know how sleepy I get after lunch okay so who work for a security vendor okay uh infos teams okay CIS admins and what what are the rest of you aside of humans no okay you're very quiet Okay cool so um first of all a bit about me uh I like to scare people this is another key post lunch technique scare people with a giant face okay so uh I'm a lead data scientist at a security company called CET we do we're working on a unified
platform for web multiactor and email and more to come uh I've been a longtime Community organizer and presenter mainly in the uh sort of Microsoft at don't hold it against me in the Microsoft and the r worlds and uh for my sins Microsoft made me a data platform MVP uh if you're uh for the people who um came to the r session today uh I'm starting to we're starting to organize worldwide conferences uh kind of similar to the London bides uh and the whole bides franchise and there'll be on Saturdays very cleverly named as a consequence um I put a lot of stuff on GitHub uh so all the materials from today including my workshop earlier and
various other presentations can all be found on there I blog um you I tweet so you know if if you want to contact me afterwards you can and I have little business s down there as well okay so what we're going to cover today is I wanted to look at ways we can improve how we're doing things with our security data so we're going to uh look at the actual data problems that we may be have right now and then look at people processes and tools so the key things uh that make a how something gets handled okay so data we have an increasingly large amount of devices between BYOD and internet of things there are
there is more bits of data traveling around and with all those disparate streams it's getting quite tough to actually bring it all together and analyze it so that's one of the key challenges for me is how do we get stuff in a consistent format and how do we uh even get things like consistent date times together uh software didn't come out too well but the cyber security software Market is massively growing and we've got new vendors all the time so uh it's no longer um you're never just going to have one bit of security software you're always going to have multiples and you need to get that data out of those different systems and into something where you can
combine them together cuz uh yes many attacks are on a single front but that's not always true and being able to get data and provide that uh broader Viewpoint to supplement the information from one uh front with another very handy and we need to be able to access that from our software and then uh crime the world economic Forum produced uh like a Gartner magic quadrant of risks um and one of them is cyber attacks you know it's a leader in World risk so we need to be uh increasingly Vigilant and I'm sure I don't need to kind of uh evang evangelize on that point otherwise you wouldn't all be here learning how to do your jobs
better all right so what people should be or can be involved in helping your organization get better with data so one of the key people is actually bi analysts do people have business intelligence departments or report chimps or you can call them any disparaging name you like but their job is to make numbers look pretty yeah okay so these people their job for a living is to get data from different sources and put it into a Consolidated format and build reports and get insight out of it so if if you're stuck and you want to get more value out of your data then it's worth going and feeding them some cakes bribing them and getting them to
help you cuz they are experts in what you're trying to achieve in terms of getting the data Consolidated and then you've got data scientists like me you know I'm missing my Mac and my Blazer and I'm not from San Francisco uh but they're statisticians data miners these are people who can help you get an expected Insight out of your data but they generally need the data to be Consolidated in some format because most data scientists are not data Engineers yet another hip term and uh they find it difficult to bring that data together and it's always worth remembering that when we're when we're actually working on being better with how we're processing our data is
thinking about who the specific audiences are the information I give to my CIS admin is very different to the information I give to my CEO about the latest level of uh denial or service attacks that are coming into the business and again this is actually where a bi person can help you get get uh more practice at pitching the right level of notifications and content because assis admin needs things that they can act upon to make the system safer your infac people people usually need to be able to verify that actions are taken and work in a often an audit type capacity and your cxos need high level reassurance or their pants being scared off so that they give you plenty of
budget so the processes I really like coffee I wish we had more coffee you all could do with some right now uh cuz like me you're feeling that slump that's uh trying to get you to just snooze but to be able to get better with data we need processes in place that actually help us um a make sure we don't miss anything B we do it in uh usually in a compliant and auditable fashion and we need to be able to actually ideally move first and do that whole agile bi uh Agile development type stuff so the first thing is how do you construct that wider view uh from a process perspective and the aim is to
get is to have a a regular check that says can I get some data that supplements my existing information so gives me uh an added Dimension like if I get um if I can manage to get uh have I been poned data via the API can I be combining that with my ad to be looking at potentially compromised accounts and be able to act on them so I can supplement imp my existing data with another source to get something that is really difficult otherwise uh can I get data that compensates for a flaw in my existing process so um with uh centet we have proxy we have agents that sit on machine on uh people's machines but that doesn't help
when people are trying to get to assets that are um for machines that aren't on the network or known on the network so we can compensate by adding in the access logs from those assets to identify the gaps in the uh agent deployment process and then inform so so if we've got uh lots of activity about that a specific person is performing then they might look quite dodgy you see this person going to um you know box Dropbox and um a whole load of other FP sites and things and you think what the hell are they doing why are they going to these sites sites and it's only when you actually bring in something like their ad uh like the ad
schema and see that they are um that they work with uh Partners or Brokers that you see that actually they're just disseminating information as they'd expect so being able to give that added Dimension really helps okay and when you're building that view there's a few things that you really need to focus on one is the uh ETL pipeline so extract transform load standard um like database jargon what your process needs to take into account is how do you get data ideally programmatically what Transformations do you have have to perform to get it into an acceptable format and then where are you storing it and of course when you're producing all this information there's a lot of
potential for uh personally identifiable info to be going through so with uh centet stuff we uh monitor application actions so we actually get a lot of email addresses and poten potentially sensitive um comments between people in the detail so we have to be very careful about how we uh secure that data and um your own internal data sources need to be thought about in a similar way and then uh time it's actually it's really hard to get an accurate chain of events when all your servers are reporting different times okay it's actually quite hard and you should give some thought to it as to how to make sure that everything is reporting uh the right
time and acquisition and prototyping so as I said earlier there are a huge amount of software vendors out there now and there's more of us every day what is g to have to become more important is avoiding vendor lock in but also always being ready to um pick up a new one so instead of going I am a Microsoft shop and then never ever buying another piece of software unless Microsoft has shipped it it's being able to work out what work works for you I would say the open source Solutions are always a great place to start so if you're worrying about security try the Met exploit um open source Edition or the Community Edition
and use that to pentest your stuff don't um use open sources uh and Community additions lower cost acquisition to be uh more willing to try stuff out it gives you more data it gives you more insight and hopefully it gives you better security by being able to try a lot more and that comes with the thing of prototyping even big vendors will depending on your size and how much you can shmoo them will usually give you a proof of concept time if you can work at trialing stuff to see what data what Insight you can get out of it and then if it doesn't work out be willing to dispose of it that's a good thing I have
seen a lot of companies though that because it can take a long time to implement a piece of software they then feel almost obliged to um to stick with it it's a sunk cost C fallacy it's much better to jettison something even if you put a lot of effort into it than it is to keep going with something
substandard okay on to some of the tools you need a database this doesn't have to be a relational one so if you're a nosql fan it doesn't have to be tables but when we're bringing data from lots of different streams together you need a place to store it and you need to be able to add the supplementary information so if um you're looking at if you have uh your web blogs and uh you're doing something pretty cool like keeping an eye out for vulnerabilities noted on Twitter with hashtags or if they get a cool logo and a nice website then you need to be getting that Twitter Pipeline and then having an ETL and a
process that then says these are the things I need to look for in my logs and then go through and look for them it's a lot easier if that data is all in one place and ideally well thought about uh I'm still very dubious I don't know if has anybody heard the buzz word data Lake okay well there's this great New Concept dump it all in and index later okay the developer might have left year 3 months after setting up the dump and now nobody knows what on Earth that data means but it's okay we'll get the schema that's a bad idea you need to know what's going on with how your data is collected how it gets stored and how it
gets shipped to you you need that Providence you need that metadata it's all very well having a non-relational database but you should spend the time when you set up the data inserts into it to actually specify what's likely to come from the source that you're working with
dashboarding dashboarding is really important uh for being able to get stuff out part of it is um being able to distill your alerts down so especially if you have a Consolidated system but even um you know things like scarm and Splunk and stuff and with cabana with elastic search you have the ability to integrate alerts from a load of different systems picking which uh you know charts and things to see is quite important and again this is where a bi analyst can come in handy if you have one laying around because if they're a good bi analyst then they should know solid data visualization principles and should be able to help you inform the audience for the
dashboard so if it's um if it's a dashboard for you and you're a techie who doesn't like charts and you really like your tables of data then make it that way but if it's for your infc team what do they need how do they need it represented and if it's for your CEO you need to usually stri away as much detail as possible and just leave some gauges I'm not too cynical about them I hope right uh so that you can buy them in I'm actually quite a fan of the Microsoft powerbi stack at the moment uh it has this fantastic Q&A capability so once you set up your data and you have some visualizations people can actually write
queries against the dashboard so they can say how many 500 errors did I get on this page yesterday so you can do a lot of exploratory analysis uh just from the dashboard so people don't you you widen up the audience of the people who can perform the analysis and then if it's something of Interest something that's worth tracking then you get to you can pin it to the dashboard so you can evolve dashboards without making code changes on the flip side I actually quite like code based dashboards so uh I use a lot of R and uh there is a package called shiny which makes uh interactive um applications without you ever having to code HTML CSS or configure
Apache it is run on pixie dust I don't understand how it works but it is super impressive and all you need to be able to uh code to use it is R so you can set up your data sources bring them in memory or dump csvs you can produce regular reports in it and you can have a whole lot of uh different parameters to make life e easier so people can do um exploratory analysis and they can do configurations and things like that that's great if you want to specifically Target um a view of your
data okay uh elk I should have I I kind of felt like instead is showing the the web page that I should show an actual picture of an elk if anybody working with uh elastic search log stash cab at the moment okay so for those of you who aren't if you need log data unas the elk stack is really handy it can do streaming data and it can do uh bulk uploads so log stash is your log shipping tool it's the bit that gets your data um from A to B in the format you need so if you need a correct um if you need if you have a weird date time stamp because uh some developer
thought he wasn't going to do Epoch he was going to do second since 1900 for some crazy reason you can specify that so you can solve a lot of issues as the data is in transit then you've got elastic search which is um it basically builds indexes over your data that you can tell tell it which indexes and make it a very fast search so it's great for exploratory work and then Cabana is a dashboarding system that sits over the top of it and gets data real time and shows you uh does more dashboarding stuff and it's quite easy for people to build and configure their own once you've set up the right connections to elastic
search it's uh anybody using Docker I am working on hitting all the buzzwords by the way okay well Docker is a great tool it means I never break my machine it means I just break my Docker container and that's it's okay cuz I break it and I start again so Docker is like a very light VM and you can get a Docker container with the elk stack in and just basically say hey I want to play with elk in a command line and it'll spin up and you can play so it can be really easy to get started with and
prototype you guys know know about Splunk and stuff um Microsoft are actually doing some interesting things uh very they're keeping it quiet but their uh operational log stuff is quite interesting and they're also uh fishing in the same space that centet are because you can upload logs from what your um users have been doing and I identify what cloud applications they've been using so that that can be quite handy and you can set it up to automatically ship the logs too and then there was no way I was going to cover the Myriad and plenty for uh tools that are available that help you manipulate data better into a form that gives you Insight um but there are many and there will
always be the right one for you whether uh you have operational constraints about whether the data can go on premise or on or on the cloud whether you have technological constraints uh budgets things that are new and shiny that you want to play with the there'll be considerations okay I left it as the data Deluge so uh it's about time for me to wrap up get off the stage and uh I've generally kept you all awake which is good I would I'm impressed for myself and nobody's died yet so this is good so there's a lot of data and if you're not using it then it's of no value so if you're not using it lose it save yourself for
storage costs or get using it put something simple over the top of it uh to begin with try out an open-source tool a little prototype can be a real use case as to whether something's of value to you um uh people in much the same way that everybody bribes the CIS admin with cakes I highly recommend bribing your bi people to help get that knowledge that um and get that helping hand developers can actually also help in this respect uh because a lot of them will have been working with non relational uh search capabilities so if you're looking to do elastic search which is based on Lucine then uh a developer might come in handy focus on who you're who you're
delivering the data to and in what format don't overwhelm people with the wrong info and the wrong level of detail always make sure you're right at management decisions are being made on it okay you will lose so much um it's really important to be right when you're providing people with information so uh if you can script it and write unit tests and things against it that is a fantastic way to go and that's a big reason why I use r as opposed to any um gooey based solution because I en code it I can test it and then tools there are a lot of tools out there um ones that help exploration are very useful you need somewhere ideally with
your data to be Consolidated into and then you need or you don't need but specialized tools that help you do specifics can uh can be very um they can save you a lot of money in the long run if they help stop something big okay so um this is available online and a lot of my other materials are as well so they've all got links so if you want to learn a bit more about r or Microsoft or this and that then you can go and do it and I present a lot of links to other things uh just try it get prototype something run met exploit give it a go see see what you can break see what you
can solve and do keep in touch this is my first security conference uh I came from a finance background previously and I never wanted to attend their conferences so uh yeah I'm always looking to make new uh contacts in a very fun industry so thank you all very much for listening to me for staying awake and uh have fun for the rest of the day [Applause]