← All talks

Securing Kubernetes in the Cloud - Amiran Alavidze

BSides Vancouver32:3094 viewsPublished 2021-06Watch on YouTube ↗
Mentioned in this talk
About this talk
BSides Vancouver 2021 It’s clear that Kubernetes has won the container orchestration wars and is here to stay. The complexity, flexibility and rapid development cycle of Kuberenetes mean that Kubernetes security landscape varies significantly across deployments and is not well understood generally. This talk focuses on security of managed cloud Kubernetes deployments (EKS, AKS, GKE, and others), which provide a decent secure baseline that addresses the majority of recommendations you’ll find on the Internet. You’ll come out equipped with understanding of the Kubernetes threat model and actionable recommendations for securely running workloads in Kubernetes.
Show transcript [en]

hello and welcome to besides vancouver 2021 this talk is securing kubernetes in the cloud my name is amran alawiza i'm currently director of security of desktop we're a vancouver-based startup helping companies succeed in their digital transformation journeys as security professionals we're often thrown into something we have no idea about often it's a new tool that your company is deploying and you need to figure out how to do it securely to me dealing with these unknowns and reasoning with limited information is a feature not a bug of a security role similar thing happened to me with kubernetes we decided to use it for hosting one of our sales products and at the time i knew pretty

much nothing about it and here we are a couple years later with me being a certified kubernetes administrator and presenting on the topic this talk is the amalgamation of many months of learning threat modeling and incrementally improving security of our production kubernetes clusters and by sharing this i hope you'll be able to benefit from our experiences let's start by understanding what is kubernetes kubernetes is open source container orchestration platform originated at google from their internal project called borg it's been designed from the ground up as loosely coupled collection of components centered around deploying maintaining and scaling workloads it has good support for declarative deployments and immutable components let's cover some core terminology kubernetes is a container orchestration

engine so it all starts with containers containers are deployed from images which essentially are templates for running containers but in kubernetes the smallest deployable unit of computing is not containers but rather pods and pods are one or more containers that are deployed together on a single host those running containers and pods will often need configuration information and credentials to be able to run and those are injected through config maps and secrets to access the pods from outside you would deploy a service and then to horizontally scale those workloads you will deploy a deployment which will specify how many instances of those pods do you need all of these resources can be logically grouped into something called namespace and a lot of

the kubernetes control plane processes will also run as part of the cluster in the cube system namespace let's spend some time understanding kubernetes architecture you can find this diagram in kubernetes documentation the cluster consists of one or more master nodes and several worker nodes master nodes are responsible for running kubernetes control plane processes in the kubernetes control plane we have edcd which is a data store for the cluster we have cube api server which is a rest api for cluster management and essentially the front end of the cluster and all the other components running as part of the cluster interacting with the cluster through this api server we have cube controller manager that runs multiple control processes in a

loop and we'll talk about those in just a bit there's cluster there's cloud controller manager that's responsible for interfacing with infrastructure as a service cloud providers if the cluster needs any resources provisioned in the cloud such as load balancer and there's cube scheduler which determines which pods should run on which nodes on the worker nodes we have cubelet which is responsible for provisioning starting the pods that are scheduled on that node and so it interacts with the container runtime such as docker on that node to do that and it also reports pod status and node status back to the api server and we have cube proxy which is a proxy service responsing for routing network traffic

to the pods on the node we also have cubectl which is a command line tool for interact interacting with the kubernetes cluster through the cube api server it supports both imperative and declarative use and yaml is mostly used for declarative definition of the resources in the cluster to understand how all of this works together let's hear a life story of a deployment let's say a cluster administrator creates a new deployment usually using cube ctl that cube ctl connects to the cube api service and tells it about the new deployment api service saves all of that information into the lcd database and at this stage cubectl returns back to command line so now in at cd database we have

this deployment object that will contain all the information needed within for that deployment including name of the deployment number of replicas number of pods that should be executed as part of the deployment and the images that should be used in those pods now the deployment doesn't actually manage those replicas that are defined for the deployment it only manages the life cycle essentially rolling into new versions of the images and things like that so for managing the the number of replicas kubernetes uses an object called replica set which we haven't talked about before so contr now one of the control loops in the controller manager comes along and finds this deployment object and doesn't see a corresponding

replica set object so it creates one this replica set object will inherit the information that it needs from the deployment object now another control process as part of the controller manager will come along and sees this replica set object but doesn't see the corresponding pod objects so it will create those and the number of those pauses will be created will be a reflection of the number of replicas that have been requested for that deployment and now a scheduler comes along and the scheduler looks at all those new shiny pods and assigns them to specific name node essentially populating a node name uh property of all of those pods it's important to understand that while this is all happening very

quickly but at this stage none of the workloads are actually running in the cluster this is all happening in the lcd database what happens next is cubelets on the worker nodes will check periodically with the cube api server essentially seeing if there's the desired state for them have been changed and if there's been new pods assigned to the nodes that that cubelet is managing and when that api call is made the api server will check with the lcd database and and see and give the cubelet the information that it requests and then cubelet sees this new pod that has been scheduled on the node that it it's not running yet and it will provision the containers that are

required for that deployment and this is how the workloads start so why are we talking about kubernetes in the cloud today as you can see kubernetes is a complicated ecosystem it's almost like a lego-like architecture with multiple somewhat independent projects closely working together there's multiple ways to deploy and configure the cluster which also means there's multiple ways to screw it up there's also a number of things that need to happen outside of the cluster or the cluster plugins things such as networking horizontal or vertical node scaling and also load balancing and or public to private ip address mapping there's only one thing you take away from this talk today it should be that you if you're just starting out

you probably shouldn't run kubernetes on your own now let's talk about kubernetes threat model going back to kubernetes architecture let's think about attack surface we have potentially add cd we have cube api server and we have the working nodes because supposedly they're running your workloads in the cluster and some of those workloads will be available from outside the cluster so if we think about all of that we have attacks on the control plane we have compromised applications running in the cluster imagine a scenario of somebody hacking into one of the workloads one of the applications that you run in the cluster and now having access to the container that's running that application we also have compromised user accounts

and credentials and those can be used potentially to access the cluster and compromise the cluster now and we also have compromised images essentially a supply chain risk applied to your environment in 2018 cncf that managed to cover manages the kubernetes project has done an rfp and selected two companies trail of bids and the treaties partners to do a security audit for kubernetes i highly recommend you go back and read the document they published the full audit report for everyone to see on the github using the link on the slide out of those threat model scenarios the main one to me is the compromised application running in the cluster because that is bound to happen at some

point point of time so let's explore that scenario further and we'll talk about three main things here one network access and lateral movement within the cluster two kubernetes cluster credentials and three container escapes networking is not a core part of kubernetes and is provided by third-party networking plugins also called cni's at the core of kubernetes is an open networking model from kubernetes documentation kubernetes imposes the following fundamental requirement on any networking implementation pods on a node can communicate with all pods on all nodes without net agents those networking plugins may support network policies which is a way to firewall traffic within the cluster but they're not required to do so so if we have access to a container

running as part of the kubernetes cluster potentially we might have access to the instance metadata endpoint which is a way for cloud service providers to inject temporary credentials required by the nodes running in the cloud and this by the way can can also be exploited through ssrf type vulnerabilities and don't require full access to the container we also might have access to the kubernetes control plane as cube api server by default is exposed within the cluster and we can have we might have to lateral movement in a traditional sense and be able to connect to database databases and other services running as part of the cluster that are not exposed to the outside world another vector for lateral movement

that's worth exploring is hell helm is the package manager for kubernetes it allows you to deploy kubernetes applications called helm charts there's two major versions of helm in use today version 2 and version 3 and version 2 requires you to run a service called tiller within your cluster that has privileged access to the cluster and does the actual deployments your workloads might be able to access tiller and might be able to leverage that privileged access within the cluster and this phrase is from helm documentation default installation applies no security configurations let's talk about cluster credentials kubernetes supports multiple authentication options which includes certificate based authentication token-based authentication basic and open id connect authentication but kubernetes doesn't manage users

within the cluster they have to be defined externally what kubernetes does manage though is service accounts and service accounts is a way to provide the workloads that are running as part of your cluster access to kubernetes resources even if you don't create or specify service accounts by default every pod in the cluster will get assigned a default service count from the namespace where the pod is running kubernetes also supports role-based access control those are done on two levels one is roles and role bindings that are scoped to a specific name space and another one is cluster role in cluster role binding their scope to the whole cluster so role essentially defines the permissions that are assigned that can be assigned

to users and service accounts and role binding assigns the role to a specific user group or service account a third topic related to a compromised workload in a kubernetes cluster is container escapes container escapes can happen in multiple ways through container runtime vulnerabilities throughout dated kernel and kernel level vulnerabilities through security configurations things like mapping a docker socket to within a container mapping other sensitive paths paths from a host to a container and assigning excessive capabilities to a container as well as privileged containers and misconfigured uids of the running containers i refer you to an excellent talk a compendium of container escapes by brandon edwards and nick freeman i hope i scared you enough by now so it

feels like a good time to talk about protecting our clusters firstly don't panic secondly understand that your managed kubernetes service provider has your back and thirdly do your homework we'll talk about the same three things we've covered so far securing the networking securing the credentials and securing the workloads kubernetes has a concept of network policies it's a native way to filter traffic within the cluster although it's not enabled by default in most cases many of the networking plugins support it even with a networking plugin that supports network policies all traffic is allowed by default network policies support both ingress and ingress traffic and rules can be applied based on workload labels namespaces and subnets please note

that if your networking plugin doesn't support network policies you will not get any warning or error message when trying to apply network policies but they will just be silently ignored this is what a typical network policy looks like it has a name this one applies to pods that have a roll label with a value of db it specifies both inbound and outbound rules for inbound traffic it allows connections from all pods that have a row label with the value of front end on tcp port 6379 and it doesn't allow any outbound traffic

celium had recently made available visual network policy editor that might be helpful let's try to recreate our policy here we'll give the policy a name we'll apply it to default namespace and to pause that have a role label for the value of db we're blocking all of the outbound traffic and we're only allowing traffic from certain pods and we get a policy here for credentials in addition to traditional security controls like not committing your secrets to a version control system we should also look into protecting cloud credentials by limiting access to the instance metadata service this can be done through network policies as one option as another option and specifically when your workloads might need access to

cloud resources all major cloud service providers can now differentiate between the node and the workloads running on the node and assign roles to workloads directly amazon calls it iam roles for service accounts google calls it gke workload identity and microsoft calls it aks is your id pod identity for service accounts by default kubernetes mount service account credentials to all the pods but majority if not all of your workloads don't need access to the cluster control plane so you should be disabling the auto mount service account token setting for for your workloads and you should use the cloud provider identity management for kubernetes user access for the control plane kubernetes api service is a critical component of a

kubernetes cluster and limiting access to it is generally a good idea all the major cloud service providers allow public access to the api by default but support disabling it and we can also use network policies to limit that access within the cluster it's also important to keep kubernetes up to date and here gke even supports automatic upgrades securing your workloads there's a number of settings that you can use to lock down your workloads for pods most of these are defined in this in the container's security context with security context you can apply these settings to individual containers and pods and with pod security policy you can enforce requirements for the security context across your cluster another option here

is open policy agent keeping operating system on the nodes up to date is another step in preventing container escapes and here gke also supports unattended upgrades if you need there's interesting though not widely adopted options for improving container isolation if you're interest interested look into gvisor and cata when it comes to securing your images manage your supply chain risk by using trusted image sources reduce your attack surface through minimal images identify vulnerabilities before your attackers do by doing security scans of your images and also secure your applications you may also consider security linters for your kubernetes deployment yaml files that can alert you of common security misconfigurations miscellaneous things if you're using helm upgrade to version 3.

version 3 doesn't have the tiller component and is much more secure in general logging you should consider having your kubernetes control plane as well as the application logs in some sort of log management and monitoring system and protect access to kubernetes nodes as a general rule if somebody has full access to one of your kubernetes nodes they can escalate their privileges to control the whole cluster i firmly believe that everyone using kubernetes in production should consider using a cloud native security solution and there's at least half a dozen options now in this space including open source ones so what are those cloud native security solutions are security tools written with kubernetes in mind and are

and are container aware they can help you with many of the security configurations that i've covered they can check your cluster configuration against security recommendations such as cis do image and container scanning monitor behavior of your containers and alert you of suspicious activity this can be things like a shell spawned by apache process or an unexpected connection to the cube api server and provides secure container aware forensics capabilities which is handy as often containers are short-lived and may be killed before anyone can investigate an alert there's a number of open source tools that might be useful in your journey to securing your kubernetes clusters cubebench is a cis benchmark testing tool cube hunter is kubernetes security

assessment tool cubescan is kubernetes workload security assessment falco is a rule based runtime workload monitoring and 3b is container cve scanning tools so where do you go from here my main recommendation is learn more about kubernetes do an excellent and free introduction to kubernetes training check out these other resources and books and i will post all the links from the last two slides in the discord channel time for a demo let's put together everything i talked about so far before we begin the configuration of this demo is not a good indication of what you might encounter in real life but rather was constructed to demonstrate the concepts and threat scenarios in this demo we start with access to a container

running in a kubernetes cluster first thing we do is look around how do we know we're in a container let's check what running processes do we see the small number of these is a good indication we're in a containerized environment let's check environment variables having environment variables like kubernetes underscore port is a good indication we're in kubernetes cluster let's check if we have access to cloud metadata

in real world scenario this might lead to cloud credential exposure for next steps we'll need cube ctl so let's install it we're going to check the service account token

looks like it is mounted which is the default configuration we're going to use the token to interact with the cluster let's check what services are published in the cluster

dash capital e here means check all name spaces so we also see kubernetes control plane services in the list we see kubernetes dashboard kubernetes dashboard is a web-based ui for managing kubernetes clusters it is not part of default kubernetes installation and was deployed to this cluster as one of the workloads let's check if we can access it to do this we're going to use kubernetes internal dns service which makes services available in the cluster a service name dot namespace name

we see that we can access the dashboard this simulates lateral movement within the cluster now we would like to connect to the dashboard from outside let's set up port tunnel link for this using our existing access to this container

we're going to use the service account token to connect we haven't gained any new permissions in the cluster yet but we do have a nicer interface that makes exploring the cluster easier we see our web shell deployment this is how we got access to the container in the first place now this service account also has permissions to update the deployments which we will use to gain more privileges we'll start by making the container to which we have access

privileged

now let's wait for the pod to be

reprovisioned let's check with we're still in the container this ip address here is the ip address of the pod now we see all the processes on the host that's running the container and we can use our privileged access to escalate our permissions to the kubernetes worker node that is running the pod

let's check what we see now

as i mentioned having access to a worker node in most cases can be escalated to full cluster compromise though it's beyond the scope of this demo now let's take a look at how we can fix this

we're back in the container and we'll need cube ctl again

we'll also need a text editor which isn't available in this container by default

first thing we'll do is apply a network policy to block access to the metadata service and the kubernetes dashboard

let's apply the policy and now we're going to check if it worked

metadata service is no longer available to us

and the same stands for the kubernetes dashboard but we still have access to the internet second thing we'll fix is the service account credentials

this is where the credentials are mounted we see the secrets directory here now we'll edit the deployment to disable that

and check that it worked excellent with two simple tweaks we completely disrupted this attack chain this is all i had for today thank you so much for sticking around feel free to connect with me on twitter i am at airman604 i would like to encourage you to join local security groups you can often see me at orwest vancouver dc604 and vancitysec you can join dc604 defcon group on meetup and on mars slack we have monthly meetups and are passionate about community driven knowledge sharing be kind stay safe you