Secure(r) Cloud Development

Name: Secure(r) Cloud Development
Uploaded: 2017-12-08
Duration: 32 min 20 s
Description: Two DevOps engineers discuss security best practices for cloud-native development teams. The talk covers identity and access management, secrets management, monitoring and alerting, and integrating security into CI/CD pipelines—using AWS as the primary example but applicable across cloud providers.

BSides Cape Town · 201732:2069 viewsPublished 2017-12Watch on YouTube ↗

Speakers

Christo Goosen Toufeeq Ockards

Tags

CategoryTechnical

TopicCloud IAM Detection Engineering DevSecOps

StyleTalk

Mentioned in this talk

Tools used

AWS CLI AWS CloudTrail Chaos Monkey Elasticsearch HashiCorp Vault Kibana Logstash OpenVPN Packer PFSense Security Monkey Terraform

Platforms

Amazon EC2 AWS AWS DynamoDB AWS IAM AWS KMS GCP

Service

Amazon S3 Bitbucket Google Cloud KMS

About this talk

Two DevOps engineers discuss security best practices for cloud-native development teams. The talk covers identity and access management, secrets management, monitoring and alerting, and integrating security into CI/CD pipelines—using AWS as the primary example but applicable across cloud providers.

Show transcript [en]

welcome this is me and his talk so let's get started because there's probably quite a lot to cover and I feel like running out of time just quickly me I work for that company at the top quite recently but I used to be a DevOps engineer in a a DeBlase environment I'm still doing a DeBlase and Google cloud at the moment and a wasp we've got t-shirts printed for today so come and find us later based on the call like I finished these slides between last night and this morning because I was sold during all hours the whole week thanks to Mike but anyway and tewfiq we used to work together and I guess that's

me yeah okay come to work at this guy say yeah that's basically there's not much fun no more what Mike is suspicious just talk to me yeah yeah so we usually sort of talk about these kinds of things so he sort of thought just to throw all ideas into one so when I started at the previous company about 10 months ago there was some security but I felt like there was a lot that could change and I focused kind of than the three main ones although you'll see I referenced a table a lot because that's what I worked with and generally the problem was that development came first of features the CEO calls he's grumpy let's get things

art and generally that's where we saw that things sort of came off the rails but I'll talk more about that that's the kind of outline hopefully we get through all of it because there's a lot to cover so with the theme of back to the future usually you had Dave sick and ops and that usually broke because Dave sends it to six six is it's a piece of crap since the back or Saints to ops and ops breaks everything or has VMs that have been alive for the last six years and doesn't work and kind of what I've found is that they all seem to actually be the same thing rather or it makes more sense to have them work

together we were a small team kept on office was 10 people so you had to work together even though I was the DevOps engineer you kind of want to get a culture going otherwise you're just fighting upstream and at first when you kind of look at the security model for these kinds of things aw ace is this which explains it rather well they try and cover this end and then it's quite top-heavy which is the same for most things I mean Oracle everyone can only guarantee the platform as much as you don't go and open up brute and press ENTER twice and things like that so yeah so kind of a lot of what you have to end up doing is it

makes a lot of sense I mean your customer data is important the operating system needs patching server-side encryption client-side you need to HTTP and encrypt and generally when I started out I was a little unsure way way to look and I found a lot of great great mentorship out of coinbase and Netflix and our coinbase has been hacked a couple of times so I wouldn't say they're the perfect example but if you go watch that YouTube video the guys started out in NASA's JPL and they moved some of NASA's infrastructure into AWS into not the gov cloud but a more secure cloud so they they had a lot of lessons that they then applied to coinbase

and they have a whole Dave sick ops lifecycle where Dave's gone actually push any code to any containers without going through a whole process now Netflix on the other hand devs have a lot of control so AWS they pretty much give them as much permissions as they can and then they both open source to to manage the damage that comes from that so obviously a dev a new dev might not know that certain permissions is just going to open up everything and you can consider the Netflix is probably one of the biggest customers that a table is so they must have some idea of what they're doing so to kick it kind of off I mean if your identity and access

management isn't right with Google and Microsoft and a table saw cover you're generally gonna get your head bashed in so generally networking and I am everything was separated each service had a separate I am role and separate networking so the the great thing about a table assets all the API so lots of networking can be thrown in and generally even even on this level it's still a bit too simple you can break everything up into separate subnets but it does kind of wild like this images that I tend to talk about different clouds and that's where generally things get quite sticky and weird so the first thing we generally did is we got a pfSense box set up at the office and the

idea behind it was if a double ace is a cloud it has an API so if we can kind of control our network a lot better we can also control how the API works and also how the code gets into our a double ace instances so if you remove one of these you could even put in a local data center so you set up an IPSec tunnel into AWS and then we set up Open VPN into the office so that each user even when their home still has access to AWS but we have some degree of guarantee that between AWS and the office through a aw I was the VPN that our traffic would be a lot more secure

and this got interesting because then our a local network had root into AWS so the devs had direct SSH access to VMs in AWS because I used to get irritated with how do you reverse SSH into this box so I can get to this box and this solved a lot of that problem I mean it's a lot easier to ping directly to a box that you already know the private IP for or if you set up the private dnase correctly you could even hit private dnace from the office and where things got interesting is considering that this I am on how you use AWS if you were using a VPN correctly with public IPS for the office

you could actually lock down all access to a WSS API to a single IP or to IPS so I'm making some assumptions because of time that people kind of have an idea but this is I am rule where you're saying any resource on the API in any action can not is the night for IP address that's not in those ranges and you could make it even more more granular if you wanted so generally that meant that outside of the office you couldn't even spin up the ec2 instance or hit certain history buckets or anything anything that we didn't want public facing got locked down a lot and what helped us a lot as well is using

tools like a she copes packet terraform I didn't get to a full vault integration and we did some ansible and the big thing that we did with Packer is that we could have automated bolts so we take the latest aw boon to image and then provision all the dependencies like Python 3 6 and everything on it and then use terraform to both the entire infrastructure so the networking the firewall rules how many instances we wanted how the networking work together all of that in terraform and then if he is how she cook vault the great thing about it we'll get to that a bit more is that instead of hard-coding credentials everywhere in your code you

can access an API and ask for credentials on runtime so it's just thought and memory and then ansible most people know configuration management and setting up environments and that's just a terraform example so it looks a lot like code infrastructure as code I use Google as an example because I I default to a DeBlase too much but you can tell it where which project and it generally pulls like with a double you can use your AWS credentials on your machine so you don't have to put it in everywhere and then you tell it how many of what infrastructure you want sir how many Google compute or AWS ec2 instances so we this helps us did not lose track of

infrastructure otherwise you have a bunch of VMs lying around which I generally shut off until someone came to me and asked why so that we don't have boxes that haven't been updated for two years that have security rules rules open to the world and we did a lot of logging and that was great especially if if you have no idea where it is thought the alch stack or elasticsearch log stash and Cubana is great so what we did in on every API we would put a log stash client or even in the code itself at which ship to a log stash cluster that we had so every request and response was tracked internal errors were tracked so

we picked up you know lots of looking for wp-admin and all of that see kind of like tend to after a while falter out some of the garbage and then kibana is a great visualization tool to actually look at your data where did it come from what kind of request that was Andres Bonz times and stuff and the great thing about this is it's both a performance tool to some extent looking at what's actually going on in your infrastructure load wise as well as you can set up with the new version of Cubana and elasticsearch they've built in if you use x-pac which is their paid for tool or extension that have machine learning tools to look for anomalies in your data

which is probably a great way to pick up on any security problems and you end up building really great dashboards and kibana so we had all across the world this is an example but what we used to have is the different kind of requests coming in how much way in the world that was and anomalies and stuff like that Splunk is another good example I haven't used it myself you used it right but it's great for a big data analytics especially with stuff like security obviously once you've got the data you actually have to do something with it so presenting it in a logical way because the problem with a lot of logging is actually reading logs and that that

becomes a problem so I remember some of the devs used to log per day there was one type of service we had one gig index size in elasticsearch per day and that you know like unless you've visualized it or get the dev to fix it you're never gonna read through all of that and alerting was a great thing and it used to irritate me though because at 3 a.m. might get estimators for service down or something scanning something but instead of looking at a dashboard you can go and work out a lot of great rules for if this kind of event happens it's amazed me tell me what it is or send me an email so even before I started I open up

my phone I check I know which service is being hit and what to look at already so the down that the discovery time comes down a lot sorry I didn't mention that I the Back to the Future car on places that's kind of like Albany you know monitoring is not new alerts on new and it lasts alert if you're using the logstash cabana and elasticsearch stack is a nice it's done by Yelp a llamo way of setting out rules for going through your data and the great thing is they integrate with AWS and Google Cloud so you can have a SMS sent out on a topic to all the admins if you pick up whatever event you're looking for so we

were talking about anomaly detection potentially you could find a security anomaly and it amazed the admin team and they can start looking at it immediately

so the mandatory Back to the Future slider if anyone remembers the Zeta Con 2012 challenge okay that's just a question so I saw a sequence management um so firstly how do we do secret management so some of us don't either don't do it or you don't actually don't talk about it so I guess that's good with my new secrets but it seems she secret management's gotta do anything actually want to hide so like SSH keys your passwords API credentials and all of those things two of the really good talks and one of the earliest talks I could find was something at base at base it from 2012 from some us from Squier this talk was

also the instinct I mean you I mean you actually dug with the seek this management a lot of the Tosa given by software engineers that kind of had these InfoSec kind of problems and they come up with some solutions so the talk from nice McCrory and Justin Cummings is actually interesting talk if you can get access to it and then also uh Turtles all the way down from 2015 I've said it's a really good talk just to give you an overview of the landscape of secrets management Waddell Intel's put today so those guys but if we talk after either references this particular talk or at selling a product with a talk about secrets management okay so

you can have this just tells you like it's basically a really good overview of sickness management gives you really good it's not updated but you can also update it this color folks and you can focus and just get out it but it indexes all of the tools on the right so if you're talking about the cloud I've seen the talk the sky like just a really good overview you actually Swan look into these four products which is I should cope well to have AWS is key management system Google Cloud key management and azuchi vault right a lot of the actual tools that are out there are either did done by Square has she cope in this

cloud Flynn love Stack Exchange or Mozilla and they kind of integrate with your HSM and some of them are compliant that flips compliance nursing so there's quite a lot of good tools out out of the arm and so yeah so discolored tools and they solve particular problems to particular domains and you've probably found it was in there this is just kind of like a good Shakespeare novel you have like the illusion the reality this is the illusion it looks really easy you kind of get vault well sits on your infrastructure you have a client you create it with some token and you write to come some kind of key value store when you get it

back right that's the illusion and I guess this is the reality yeah so you kind of integrate involved you draw some policies and you do this is the integration this is actually from practice talk is in track one off the coffee it's a really good post but you can actually take a token every day but she goes how do you actually um do this I integrate vault so kind of gist of this is you need to know your problem domain first or understand your system in solution you don't just throw vault enter themselves it's probably something you need to kind of understand your context how you want to manage secrets and those things right so

kind of like back to the Zeta con 2012 thing kind of Bernoulli's sticky notes personally get a parson manager so your own secrets I think it's good to start at home if you're a dev and you're kind of working these things looking something called in train and empty out it's for your application so you can kind of hide your environment variables and those things and our lawless society like that's probably the best thing you did someone tells you like just use well I would tell them are you Corvis that's basically what outside to them and that's a lot and then just final warning focus on sickness management first and keeping your secrets keeping your secret secure and secret you know

one of the delight was institution as that down the right chancy Chris Matthews listened at work Donna wrong she would rather use some existing episode solution so like I said like it's kinda like cooking myth you know could at work you don't do it I open you do with caution

so back to kind of a scanning thing so the problem is especially with the AWS or Google or as your environment is that there's so much they look at even in the console forget about the CLI just security rules which is in AWS world firewall rules I had to look through about 40 different security groups do you find all the everyone's ADSL public IP that they put in there that obviously had expired already and then fold up all the security groups so we couldn't add anymore and that's kind of why the VPN made a lot of sense is that people kept putting Croft into everything just so they could get in and I do like Netflix's idea of

giving access to all the the devs because otherwise they become unproductive but you have to manage it to some extent I mean putting your home's ADSL which gets which so often in is really not the best way to do it definitely to mention security monkey and Prowler are great tools to actually just go through your security groups and I am and then a wasp zap to throw in a wasp thing much like burp suite but there's a lot of work being done to use in a CI environments a good example would be salesforce uses it to scan all the third-party plug-ins with it the security monkey kind of looks like that it's quite a bit of a

system because they storing with post grace a kind of snapshot of what it used to be and then it goes and scans AWS and it gives you an idea of where to go and look which accounts or which policies give too much access and it's rather ironic that is three is on there because that's usually the biggest culprit and DGI got burned by that recently and that's also a hard problem to solve especially at Dave's it's three you can give you a static website but also can store logs and other things so to to constantly check those things yourself as a bit of a pain where security monkey gives you a good place to start and our

goodness that's slide you can see but that was just a wasp's app and working it into CI and if you actually need to it's difficult to actually go through talks and just imagine how how it would work to get in but Keegan actually shared this with me and this is a great way to actually find out what could go wrong so someone went and set up an AWS account and did it wrong on purpose to show you how history is a problem how your I am roles have too many permissions or your users so just using AWS CLI you start getting things back with flaws and these different challenges and steps that you go through

and find different things and then get more access and more access and more access and by doing something like this you generally also then start thinking about your own infrastructure what's too open what should I lock down and sort of a general conclusion I think I've actually missed some slides that I must have forgotten to put back in but anyway soldiering so to get back to Dave sick ops what I found for doing it for kind of nine months at one company is that you have to work into your Sprint's that you have to think about security and the ops are on it you can't just leave it for later especially with all when you're doing

the DevOps with terraform and pact and everything you're already doing the work working the security side of things they and CI for infrastructure and scanning so most of our our infrastructure was built by bit buckets pipelines just because that was what we were using at the time but you can track bolts that have happened these they're on a double aces infrastructure already so it's easily easy to isolate their public IP range with security groups even with a VPN and you can use CI for scanning so things like the kind of flaws problems you can go and look at each one set up tools and scan and on a nightly basis CI for Co deployment so you don't have a

dev just going in and SS aging and them change everything put in the API keys everywhere work it into Sprint's and what I do like about Netflix is approach is if they find a problem with giving devs too much access they build a tool for it so just like they have cows monkey to kill things they have a security monkey to help lock things down and they have a lot of other products and QA otherwise I can talk about other things that I didn't put in necessarily

yeah see that's a difficult one so obviously you have to do a DeBlase gives you a lot of tools to encrypt everything that you're putting in there but you'll find that even the big guys get burned so take an example like that the CIA and Lockheed Martin got burned in the last two years by having things on an s3 bucket that was open to the world so they had to give students in the u.s. access because they were working on projects for them and then people just found droves and droves of documents so that's where something like security monkey comes in constantly scan for that kind of thing because either AWS enables right but it also enables

you to burn yourself I mean they can't do everything that's where that shade security model comes in but I mean in that that case I mean nobody likes this closing but disclosing is the first thing and if you've put in the right mate measures you should be all right and remember that this cyber crimes law might get passed in the next year so then you might become liable for that kind of thing more than you are now so who secured it now rather than later

actually sorry I need to actually go into detail so we had to kind of God accounts me and CTO and then all the devs got just enough access for them to do their jobs so quite a lot but not delete things on dynamodb like for instance see I could kill and launch instances but Dave's could only launch instances not kill them so with I'm roles and policies you can lock it down a lot and then you can actually check I think in cloud trail you can actually monitor what the users are doing so if you yes some of this stuff so you can add more things if you need to if you add logstash and you monitor

per host you should be able to see that someone's launched another API that that's not in the auto scale group but it is that tussle between enable and disable so it's kind of a carrot and a stick thing give the VPN look you can SSH directly to 10 1000 6 250 vs. here's a jump box that hasn't been updated for the last six months use that and the big thing is actually explaining it to the devs I had to explain it a couple of times and eventually they get used to it the VP and help them in the end it was annoying getting up in twos dnase to work sometimes but in the end it made a

lot of sense you know so I think one thing that I could mention is using Packer you as the ops or DevOps team can sort of create images that the devs can use but you know that a lot of what's on it already has monitoring you've added security tools otherwise they're just gonna pull anyone's am i off the AWS marketplace so it's it's a lot of work in the background but then getting the dev team to play with and what also what I didn't mention a lot of is we we had generally we were in one region we were thinking of going to another region but that was later on but then in that region we split things up with VP

C's and then sub Nate so each each service had its own subnet and security rules so the API can talk to both Redis and my sequel but my sequel con talk to Redis that kind of thing and then we had to host a I should put that in I did it in the previous talk we had to host the WordPress my the bane of my existence so we put it in a different region and different VPC in a different security group like literally that the only thing that it linked it to us was our AWS account and there you could create organizations you could do dev broad QA segmentations the biggest thing that helped us quite a bit so lambda was

completely separate in a separate VPC and partially that is also just because lambda needs a lot of IPS but then you can give it access it's called peering between V pcs and then you set up rules that only certain services appearing so the more you segment the more you use dev up to DevOps tools like terraform you can actually reproduce that anyone else

yeah I mean the barrier to entry is great because you mostly using the aw CLI which anyone who's using a SS is already doing to go to Splunk I mentioned it because a lot of people tend to mention it but we didn't have the luxury of buying all the licenses I mean we were running elasticsearch without x-pac as well but then you start tooling around that so we ran a lot of lambda functions checking infrastructure sending out S&S alerts so it's either build or buy but we tended to buy a to build as much as we can and something like it lasts alert works fairly well kibana works really well and grow fana can pull in from elasticsearch it pulls

them from sorry that's maybe something I could have mentioned if you want to look at a centralized sort of dashboard graph for now is pretty good because it has a doubly async Google cloud and other support it has elastic search support so it just pulls in all all the data and you just visualize it as much as you can although I didn't spend enough time on graphing I stuck to kibana most of the time

swak security music pushes pushes like a sponge for expensive and DLC using it I stood up please good

thank you [Applause]

Secure(r) Cloud Development

Related talks