
all right thank you and Welcome to our talk on a practical kubernetes security at scale this is based on joint work between two teams in in shipstead developer foundations and product and application security my name is I'm a software engineer at chipset I've been focusing on infrastructure and how development teams can effectively use cloud platforms and I'm still on I'm a security engineer this is work I was involved with while at chipstead and I'm currently at the Norway's wealth fund all right over the past few years a lot of companies and organizations have embraced the running containers containers provide a standard way to package your application code configurations and dependencies into a single unit that you can launch wherever
you like containers are lightweight and portable as they already contain everything that the application needs and can be run on any platform kubernetes is an open source platform for managing containerized workloads and services and is widely used in the industries with more and more organizations adopting kubernetes as the platform of choice shipstead is a Nordic family of digital consumer brands with a mission to empower people in their everyday lives we have leading brands in the north Nordic market across news media online marketplaces and Technology Ventures many of these organizations are embracing and adopting kubernetes new technology brings new challenges and how to secure our platforms as they are moved to kubernetes is one of them
in this talk we'll outline some of the steps we've taken to enable security measures and controls for kubernetes configurations across different organizations within shipstub organizations within ships that are autonomous when it comes to technology choices which oftentimes leads to a dissimilar setup and having to come up with unique brand specific Solutions ships that has different building blocks that are shared and can be leveraged by different brands some examples include identity platforms privacy Services payment services and data and analytical capabilities developer tooling and infrastructure components provide provided centrally are similarly shared building blocks that support the software development lifecycle and product teams across different organizations the building blocks form a foundation that organizations can leverage and
build on top of AWS is a cloud provider that's widely used in ships that it provides a lot of services which now include eks which is their kubernetes offering despite eks providing some things out of the box there is still a learning curve to operating kubernetes cluster and running workloads efficiently not all teams have the same resource capacity or expertise when it comes to investing time and effort in infrastructure and therefore having the ability to leverage a shared building block can be appealing to teams both small and large for this reason we've introduced skates which is a managed kubernetes configuration which comes with batteries included to create a fast track to a kubernetes configuration and a runtime
that's ready for production so skates is built on top of eks and comes with Integrations to existing cicd systems and has a lot of capabilities provided out of the box and is a setup that is similar across the different organizations so we are currently operating close to a hundred individual clusters across multiple different organizations with the steady growth of workloads being migrated and hosted on these clusters so one of the challenges we are presented with is how to ensure a base level of security across clusters that we are managing for different organizations and how we can drive improvements over time
so kubernetes is a complex topic and there's a steep learning curve to master it the complexity makes it so it's possible to simplify application the complexity makes it so it's possible to simplify applications as kubernetes now takes care of many of the things traditionally handled inside of applications kubernetes consists of several different parts firstly we have the the control plane the control plane Works to maintain the desired state of the cluster it has several components etcd the is a is a key Value Store where the state of the cluster is persisted there's a scheduler which is in charge of scheduling ports or workloads onto worker nodes API server is the core of the control plane and how users external
components and parts of the cluster all communicate with each other lastly we have a controller that watches the shared state of the of the cluster and makes changes attempting to move the current state of the cluster towards the desired state then we have worker nodes worker nodes run the applications and and workloads a pot represents a group of one or more containers running together each worker node can run multiple workloads and a cluster can have multiple worker nodes or groups of worker nodes as it scales over time inside of a worker node we have a couple of components there's the cubelet which is an agent that connects to the control plane and registers the the worker node
and then we have q proxy which is a network proxy that runs on each node in the cluster it maintains networking rules for nodes and these Network rules allow communication to your pods from Network sessions inside and outside of the cluster there also need to be components to to support or enable incoming and outcoming uh traffic to the to the cluster and in a cloud environment there are also Cloud specific components that become relevant additionally further customizations can be done by installing uh external modules that can extend and enhance the behavior of the cluster and shape it into something that's usable for you or your organization so in summary there's a lot of components and parts to kubernetes that
make it possible to run applications on top of it but kubernetes does provide some basic security features but it is in the hand of the cluster operator to implement robust security protocols when it comes to security and compliance enforcement
ensuring we have security measures and controls in our platforms and in the building blocks as I mentioned earlier is key to being able to operate them efficiently but where do we start to implement Security in kubernetes in a way that will not introduce hurdles that would kill the momentum that development teams have gotten from adopting the platform so there are a lot of guides on best practices that are out there and best security practices those are all readily available on the internet there's a lot of uh solutions from security vendors so commercial Solutions uh out there there's also a lot of Open Source tools and solutions that are readily available so so for us we have a lot of options
which is great but deciding what to do and how to do it that's the challenge depending on your business and compliance requirements there may be different aspects that need to be considered and it's not a one-size-fits-all what we will be talking about are the steps that we have taken and the journey to improve the security posture of kubernetes configurations in shipstead
Cloud native Computing Foundation maintains an overview of cloud native projects which are applicable in the kubernetes context which covers a lot of different topics there are a lot of established projects out there as well as new and up up and coming both commercial and non-commercial so cncf keeps track of the majority of these projects and the landscape can serve as a guide exploring open source solutions to evaluate what gives us value has been our approach so far and by and going by what is out there we have a lot of options but finding the right fit is that is the challenge so our approach is repeatable security according to the nist cyber security framework tier three
to build as little as possible by leveraging existing tools since we already have uh clusters out there with production workloads or or teams that are developing solutions that that are deployed to clusters we need to ensure that we don't dis disrupt that in in a way that the hampers them we need to recognize that there may be different requirements in different teams and different organizations and align the work with existing efforts in Cloud SEC and appsec as they all contribute to the overall security posture um and lastly to learn from others there are other kubernetes setups uh in in ships that at this moment and we try to incorporate the learnings from from those into into into skates
AWS defines a shared responsibility model that states that they will what they will be responsible for and what's left to the user to take care of in the case of kubernetes a lot of a lot is left to the user but with a shared building block like skates we are able to encapsulate some of the complexity that users would otherwise be exposed to similarly we want to clarify clearly Define responsibilities for a central team that's operating clusters and product teams that may be running workloads on them therefore broadly speaking the operator is to be concerned with the operations and security of the the cluster overall and the users are responsible for the operations and security of their applications or
workloads that are launched in the cluster we as operators ensure guard rails are put in place that will enable the product teams and developers to do so so this continues to empower developers that can still use the build it you run it mindset with some additional support so automation is key to scaling security we use infrastructure as code to get repeatable security across all of the Clusters this enables the same hardened setup to be used while also ensuring that the state does not drift furthermore it enables gradual rollout of guard rails and security measures and the ability to customize those per team organizations or even down to the cluster level we use Chekhov which is a static code
analysis tools for scanning infrastructure as code files for misconfiguration that may lead to security or compliance problems it enables uh sorry it includes predefined policies for checking for common misconfiguration issues and it supports many different types of infrastructure as code flavors and it's it allows us to evaluate our terraform modules as well as our terraform plants before we apply them so Chekhov can be integrated into existing cicd pipelines which gives us a way to catch things before they are being deployed and this gives us more confidence when it comes to making changes and applying them across the board where we don't compromise existing setups so before we get into how to secure SQL clusters is worth noting that the best
isolation like if you have different workloads that should be completely you should do it in separate clusters so shipstates are a collection of a lot of Brands and they already run in their own AWS accounts in a single AWS organization and it made sense for us to also have individual clusters in in those accounts um so the benefit is that we get the extra isolation but we don't get some of the benefits of having kubernetes you know the scaling part and all of that so this is our Target Baseline we say Target because we are still working on it and Baseline because we expect some teams to go beyond this for instance there are already teams
that do egress filtering as well as rasp type tooling so we're going to focus on the kubernetes parts here and even though it's it's pretty AWS heavy we think that there's learnings here for any kubernetes setup and the goal there is to get as much security as possible with each measure while not getting in the way of the users and also with the least amount of effort from our Central teams first off we have securing the control plane and here like there is complexity in operating and securing the control plan and you can largely Outsource it using eks as was mentioned which is the manage kubernetes from AWS and basically in terms of the control plane you have
everything you need out of the box it's secured enough and you don't have to do additional hardening uh you might want to do IPA allow listing if you're concerned about having the API exposed to the internet but it should be saved by by default so having kind of sold the control plane we can look at the data plane and that's all about or mostly centers around the Pod so like how can you break into a pod and once you're there like what else can you have access to so the port might be exposed as a service inside the cluster and you might also have external traffic here coming into it another way in for an attacker would be
through the supply chain introducing malicious called in the Pod and then so once you're there we as cluster operators want to make it as hard as possible to do larger movement to other things either VR container Escape or the other network and before getting more into that let's first a brief history or isolation so you basically started out with the processes trying to run different apps on a single operating system uh and there's some Hardware features to back this up The Next Step was to run multiple OSS on a single physical machine and so we had VMS and again new hardware features were added to enable this and then more recently some tried to get the most best of both
worlds in terms of process processes and VMS and we got biker VMS and if you look more broadly a new features tend to be you know separate Hardware or there's also Research into new CPU instructions to improve isolation so we if you look at AWS and also the other big cloud vendors they are built on VMS so a lot of effort has gone into hardened those so they used to be more sort of um less custom Hardware uh so the trend the the last few years for AWS is that they have more more custom Hardware which is this is a Nitro system so the current generation of VMS you get on AWS is based on Nitro so they also have
and a micro VM option which is the uh is called firecracker and that's used for both serverless functions like Lambda and you can also use it for uh for containers so in contrast to this uh kubernetes came out to new software features in the Linux kernel so you still have the process model but it's um it's in software and you're kind of losing out on all of the hardening and Innovation that's been happening in the VM space as that's why we need to do some extra work when we're trying to secure kubernetes so you can run it in microvms as well but it's you you lose some flexibility and it's unclear how much security you gain by
that so um kubernetes is a leaky abstraction these low level current officials I talked about are exposed to users trying to deploy to the cluster so when you deploy your pod you can also specify these low-level Linux things like namespaces SL Linux and capabilities and basically what it does is that it limits your application your pod to add your Escape into the system so if you do an escape you you can get to other pods running on the same machine and you can also get to this control processes that are also just processes like the cubelet and Cube proxy and the incentives there are misaligned because you are protecting the rest of the system from your application rather
than all the way around so if you're just doing a one-off you might not want to do a lot of hardening to protect the rest of the system from from Europe so the solution we're going for for this is a combination of two things so you have the container optimized OS called bottle rocket which is a hardening of the VM and then you can set limitations on volcano security settings are allowed into the cluster via the admission control so if you look at bottle rocket it was made for this purpose to be able to run containerized workloads more safely it's a open source Linux based operating system from AWS and it is focused on avoiding
persistence rather than isolating different pods from each other but it also helps in that regard using custom SC Linux tools so a couple of the features so they they have the minimized amount of binary so if if an attacker were to get some sort of told having access binary is can be useful so they minimize that and they also Harden the binaries that are there they use a combination of read-only file systems and ephemeral file systems to make it hard to tamper with Falls and also make sure that after reboot you're in a clean state and updates are done atomically so that rather mutating individual files you um you get a you know a fresh set of
files that make up the whole version so it makes it really easy to to roll forward and backwards and they also have a convenience function for improving the speed of security patching in the cluster and unlike we we just bought lucky because it's a great uh combination of security and flexibility so there's a lot of hardening but you can still install custom security tools as well as all the components from the larger kubernetes ecosystem so the second part to the question is the admission controller and it's limiting what is allowed into the cluster in terms of this low-level Linux settings and you could make your custom like custom policies uh going through like each setting and see what should be
allowed and we actually did that exercise and we came up pretty close to the existing kubernetes standard called old security is done and PSS is implemented with something called PSA and is currently in beta in kubernetes and you can also use it with older kubernetes clusters so you can get a lot of a lot out of the box using those systems and it's basically like three levels so you have your disabled you have your Baseline which is unlikely to break your applications while also provide good security and you are restricted if you need more flexibility than that you I can use something like OPI GateKeeper so open gatekeeper allows you to make more some other types of policies as well
including um limiting what Registries images can be pulled from and also verifying signatures we actually implemented both PSA and Opera gatekeeper and open gatekeeper is both harder to integrate and harder to use so if PSS is good enough for you that's where you should start uh to further drive this home that both rocket and PSS is a good match we have this list of recommendations from the GitHub page of both rocket so the first colon we have the recommendations and the second you have the priority and in the third you have our sort of those PSS cover this or not and as you can see like PSS covers uh a lot and it's a great fit with bottle
rocket so for instance you can look at the second one here about privilege escalation and that's restricted in PSS Baseline so it's like a couple of the things for instance you have a setting called privilege like true false that's blocked by this and you also have these various Linux capabilities that can be dangerous that are also blocked
foreign container images are a convenient way to package and distribute applications they may include many attack attack surfaces image scanning is the process of identifying known security vulnerabilities in the packages listed in a container image scanning images at build time enables us to fix any vulnerabilities identified before they are deployed anywhere there are multiple tools available out there for this purpose one of which is uh called trivi trivia is an open source vulnerability scanner from Aqua security and 3v detects vulnerabilities of operating system and language specific packages as well as having the ability of scanning for hard-coded Secrets like passwords API keys and tokens in an image trivi uh has a has a database of vulnerabilities
which include remediation recommendations and that's periodically updated but not all vulnerabilities have fixes and depending on the context of where an image is running the vulnerabilities may not be necessarily exploitable so we need to ensure that there is a a good signal to to noise ratio to to to make this reasonable for for Developers vulnerabilities may not have been identified at the time when uh when an image was being built so it's important to also consider uh scanning currently running images periodically and as we have the ability of try keeping track of images across our Fleet of clusters we have the ability of doing these periodic scanning of of the running images which gives us a the
ability of keeping track and an up-to-date overview of of potential issues
the state of the kubernetes cluster is a dynamic one new workloads are being launched a cluster keeps scaling up and scaling down worker nodes get added and removed users or external systems interact with the cluster so periodically evaluating the existing kubernetes cluster configuration for vulnerabilities or misconfiguration can be a useful thing to do NSA and sisa have published a comprehensive list of recommendations for strengthening the security of a computer of a kubernetes configuration to help companies make their environments more difficult to compromise cubescape is a tool that can be used to determine how well a kubernetes cluster configuration meet these best practice recommendations when cubescape is run it will capture the current state of the cluster and
then evaluate it against the policies that it has built in the output can then be interpreted to see if there are any violations to the policies and that can then feed into actions that can be taken to improve the configuration cubescape can be run periodically to allow us to keep evaluating the risk and compliance scores for different clusters or configurations and enable that enable tracking that over time it can also be run as part of cicd pipelines just just as well to evaluate manifest or catch issues before they are actually deployed to a cluster additionally we've enabled the AWS guard to T4 ETS protection which was recently introduced by AWS it can detect threats related to user
and application activity captured in kubernetes audit logs the audit logs give card Duty the visibility needed to conduct continuous monitoring of API activity to detect any suspicious activity there when threats are identified guard Duty generates security findings and sends notifications that can then be acted upon
so identity and access management is of course important in kubernetes as well and like a lot of this is left to the users of the Clusters to have like minimized privileges but it's also important to integrate properly with your surrounding environments and which is our case is AWS and it's possible to map AWS roles to kubernetes roles so you have access to the control plane and you can also go the other way around where you give pods access to to the roles in AWS so that they can use the other services as normal so we don't talk about secret management for instance uh specifically in the stock but once you have this you can also of course use the AWS secret
manager uh in that way so this way that you actually get access to the adoles credentials from a pod it's called irsa it's somewhat new so you start to use a third-party component but now AWS supports it themselves and of course I'm desk V2 is also always a good thing to enforce when they have VMS in AWS so uh for cluster networking policies uh there like the since kubernetes implements its own networking you don't necessarily have those controls out of the box so in AWS they have some support for like security groups with kubernetes but it has some limitations so uh we've gone with our custom implementation called celium and celium makes it easy to have a coarse grain
restriction on your network so basically there's this concept of namespaces that isolate resources in kubernetes so both from our our back perspective and here when using psyllium at the network perspective so it's it's very easy to to do this coarse grain oscillation another thing to note is that default in kubernetes you have this Linux capability called cabinet Raw which allows spoofing attacks so depending on the network you happen to use with kubernetes that attack might be possible to do but in the case we guess that's not possible and sort of for completeness let's talk about the Ingress controller as well so kubernetes support a range of Ingress controllers by default we happen to use ALB which is supported so that's the
application all Bouncer from AWS with that we also get the benefits of web application firewall as well as the Dos protection we also make sure that all new applications that are being deployed are IP allow listed to avoid accidentally putting things on the internet and of course you could put any kind of security control in front of the kubernetes cluster and then you might want to use something like the nginx Ingress controller to to proxy the traffic to the cluster so that was a run through of the security Messengers um we have opted for in laboratory Baseline and like we think this is a great combination uh or trade-off between security and flexibility and while it's very AWS heavy we think that
it's applicable also in other scenarios like the approach you can use to say yep and for uh ships that teams that have adopted kubernetes the experience has been a positive one everything from increased developer productivity increased scalability and fault tolerance for services to creating a migration path to the cloud so with shared building blocks like skates security controls and guard rails can be introduced and simultaneously deployed across multiple organizations the security measures added further empowered developers and product teams that can rely on guard rails to catch misconfigurations or vulnerabilities that may get introduced in the development process through opt-in enforcement of security controls team can gradually enable those as they adopt the controls and if you want to go beyond what we're
going through here we can recommend the eks best practices guide from AWS as well as the kubernetes hardening guide from NSA and seesaw those go more into detail and are great resources and of course you can also explore the wider kubernetes ecosystem to find the solutions to your problems and with that we are happy to take questions [Applause] thank you thanks guys that was a lot uh I will go with a quick question to grease the wheels before we turn it over to the audience um it's a common thing that many of us have faced whether in our own orgs or as consultants when working with clients who are doing uh migration and looking to take advantage of cloud Technologies
uh AWS uh what have you kubernetes in the cloud um and they maybe think that well just by switching to AWS AWS has more security than we do so we're automatically going to be secure by moving to the cloud but then you you look at many additional security features that need to be turned on configured and paid for how many of these things we're talking about you mentioned some tools are open source in terms of an org that's maybe in the cloud using some of these Technologies today how much of the security stuff would add an additional cost versus just having to be turned on within an existing subscription I want to take that okay
um so given that we are going for a lot of Open Source Solutions here they don't most of these things don't out costs so so Madam do you have a commercial offering as well and um of course there's some convenience you get for the commercial offering so but you you can largely trade off um like doing more like the integration part yourself and keeping the cost down okay uh guard duty is that a garbage that is extra cost even if you are guarded enabled questions from the audience kubernetes people AWS curious anybody all right oh okay shout it out yeah um have you tried running this secured workload on any other Cloud environment to see how well if the work security was
you said it's not too ABS Centric but well so this is very double eccentric but we think the approach is similar so you also have a container optimized OS in Google for instance and so some of the same capabilities you can uh gets also in all the public clouds and of course the the kubernetes specific tools that are open source those would be the same in any kind of setting uh doing this project follow-up question during this project how much time did you spend researching the cncf landscape to kind of find mature security projects to use because uh I guess that's a full-time job yes so we mostly focused on those that are well established and we sort of when it looks
like there's going to be a good solution down the road maybe the best approach is just to wait uh but yeah it's definitely a lot to try to keep track of everything and the point here is also that it's a it's a Baseline and uh something that we want to continuously evolve so essentially it's a starting point but uh you know it's a security security is something that you need to be constantly attending to any other questions from the audience yes that's the spirit all right coming over thank you good presentation um I meant I I I took in that you talked a lot about the preventative security controls uh within AKA or kubernetes environment can you elaborate a bit into
uh the more reactive parts of security in when it comes to to kubernetes things about observability monitoring textual response in such an environment yeah uh so yeah we did kind of gloss over that I didn't mention rasp so there are several uh commercial security tools to are sort of focused on on this scenario talking about to to Monitor and have actions on on what's happening in the cluster and of course uh we'll we talked about uh sort of the audit logs and exporting those and that guard duty is analyzing some of it but of course you can also have all the simulations same Solutions looking at the logs trying to detect stuff um and like related you also get to the
into the cspm space for more sort of General uh Cloud protection and auto remediation so it's there's related stuff that you could utilize that's not only for communities as well I think where our approach was also to to build it up need to need to walk before you run so sort of introducing some of these uh bits gradually and then of course there are uh teams where uh runtime protection is uh is a key component that needs to be added as well all right last question here yes I was just wondering you mentioned you have a hundred clusters and I guess that number will probably increase when you want to isolate things more and more
do you have an efficient way of managing all these clusters you have a platform for managing clusters yeah yeah so we are basing that around using terraform infrastructure as code and then we have built some automation around being able to apply that across different uh different accounts and then applying uh updates automatically but that's things that we have sort of been building on the on the side to support the this type of type of setup yeah