Secure distroless OCI images via YAML

Name: Secure distroless OCI images via YAML
Uploaded: 2023-03-27
Duration: 1 h 2 min 51 s
Description: Victor Bonev discusses securing OCI container images through distroless techniques and YAML-based builds. The talk covers OCI standards, container isolation mechanisms (cgroups, namespaces, chroot), recent kernel exploits (dirty pipe, CVE-2021-22555), and why distroless images reduce attack surface

BSides Sofia · 20231:02:51181 viewsPublished 2023-03Watch on YouTube ↗

Speakers

Victor Bonev

Tags

CategoryTechnical

TopicContainer Security Vulnerability Research

StyleTalk

Mentioned in this talk

Tools used

BusyBox kubectl

Platforms

Alpine Linux AWS ECR containerd CRI-O Docker Kubernetes

Service

Docker Hub

Frameworks

Metasploit

Concepts

distroless

About this talk

Victor Bonev discusses securing OCI container images through distroless techniques and YAML-based builds. The talk covers OCI standards, container isolation mechanisms (cgroups, namespaces, chroot), recent kernel exploits (dirty pipe, CVE-2021-22555), and why distroless images reduce attack surface in Kubernetes environments. He presents VMware's internal approach to building minimal, vulnerability-free container images and compares them against standard distributions.

Show original YouTube description

Victor Bonev

Show transcript [en]

foreign

good evening everyone I'll be speaking and doing this presentation in English and the topic for today I believe it's one of the most interesting ones in the uh kind of new trends we're going to speak about secure destroys oci images we're going to Define first of all what is an oci image what is this true list and what makes this combination secure by all means uh we also going to explore our approach to a new way of building what we have used to call Docker images via Docker files and we're going to perform this action via ammo files so we're going to focus on what the first presenter uh Bojo has talked about about the containers and

how nothing is secure in the container world and at the same time we're going to leverage this we're going to look through some exploits and a lot of interesting stuff um first of all I kind of reminded myself to introduce myself and so I'm Victor bonif and I'm part of VMware carbon black team uh carbon black is the team that manages cyber security inside VMware and we kind of develop a lot of those Solutions which are the xdr MDR EDR uh words that you have heard about the previous presentations as well I'm a senior developer and I'm very happy to be here and discuss gather opinion from each one of you and hopefully you can learn something that's

good and uh kind of the companies look for a lot let's go through the the table of contents so first of all we're going to focus on what is oci we're going to Define it we're going to have a little bit of a brainstorm with our view then we're going to speak about the oci isolation so we're going to do a crash course which is 101 crash course which you used to have at the University understand what is c groups what is file system and the container aspect talk about the change route or sea root and namespaces then on the third chapter we're going to show some exploits and we're going to take a brief overview of the key moments

that have happened in the past year of course many things have happened I'm not going to show all of the exploits but this I'm going to focus on the more interesting ones and based on the previous presentation we had a bit of chat about Alpine Linux so we're going to talk about Alpine Linux loved by many hated by even more so we're going to understand why is that the case uh then we're going to talk about this release what makes this release image and it's yeah it is a lot of uh interesting topic and we're going to focus on the on the last chapter we're going to talk about project Micron or michaelian also it is a project that we

try to develop internally inside VMware carbon black and it is our approach to solving some of the issues which we will outline during this presentation with LV together so let's talk about oci what is oci it is the open container initiative what you have all referred to Containers Dockers images that is basically oci and that's the right way to refer to the container world it is an oci image it's not a Docker image it's not a podman image it is the open container initiative that's the standard wire of you know how those things that we're going to talk about are based upon so we're going to have a little bit of introduction what is this all about

you're most familiar I personally I believe that you're familiar with Docker you have heard it during this presentation you know what is it about and uh Docker is very popular historically it has been here for maybe a little bit less than 10 years yeah cool well kind of I guess that right all right so the next platform for oci images and I'm not going to refer as containers is kubernetes of course and I want to pay attention to all of you that those are totally two separate things no matter that they handle containers in the same way they're totally isolated we're going to look why is that the case first of all if you follow you know all

those containers by behind the word container there are a lot of services and software behind and sub-components so we have to break down what this is all about what is behind that structure that makes isolated environments unique and why is it so popular so behind Docker there is a container G which is the container demon basically behind that's the um basically you can refer to it as an engine but it's actually a demon that does the work for you to spawn and basically handle behind the curtains uh and in preparation for the next step on the other hand for kubernetes kubernetes does not run on containerdy by default it used to have Docker shim which was removed it's no longer the

case and kubernetes uses a different runtime uh demon it's called cryo or you can also see it as as CRI as well that's container runtime interface uh and cryo was supposed to be a very lightweight uh very performance oriented container runtime interface so there are more than this one uh but we're not going to observe the rest we're going to focus on those two so uh what comes after those demons uh of course that is the open container initiatives so uh dopen container initiative sets a standard of what a container is you can think of it as a protocol even though it is not but the same way of how for example TCP is defined in a big protocol in a

RFC yeah it's very different but you can think of it of an analogy what makes a container you have to have a a checksum you have to have a file system you know a lot of stuff that needs to be there in order so we can call a container understandable by container D and by order container runtime interfaces it's a set of standards that we have our agreed upon and it's being a continuously developed updated so keep in mind the containers of today's might be totally different from the containers of tomorrow since all of those run times and container runtime interfaces keeps developing new features come in and that's why we need to certify pretty much every three years this holds

for kubernetes as well so in order to get a certificate and say a certified kubernetes administrator I have to certify every every three years due to the fact that everything changes pretty much everything a lot of those things changes maybe not the core ones but a lot of those stuff change so what what is after oci and what comes after oci is basically run C run C if I'm not mistaken was first introduced in 2015 it was version 0.1 and this is the executable or maybe is curable is not the right word but it's the runtime of how your containers are being run in the Linux world and run C takes always a Json configuration if you have worked with

containers you have probably seen that you have to provide a Docker file in order to build stuff and you have a lot of from run uh a lot of commands which are integrated and those commands basically bundle different Json files it could be a Json file for the networks it could be a Json file for your volume mounts so all those configurations and pretty much in the dockerwood uh everything is in in Json uh there's b-zone as well everything fits to run C it basically runs C executes all of those things all together as a big chain so that you can have what you have called container so far that's how it works behind the box in a

short summary without the details but this would do just fine for our demonstrations later on the showcases which I have prepared for you what I want to take from our view together and if you have to remember something is always question the whatever you come across that doesn't apply only to containers and kubernetes Docker whatever you use always question how this works and by no means those are magical services or software which are unique something is happening and if you don't know the answer to it there is a very high chance that you'll be exploited why you can be exploited due to the fact that you don't know the stuff behind it simply as that

this was stated by some presenter I don't know who so how it works understanding those pieces of software that do Leverage The Linux kernel system and its components alongside with it attributes is the key and as famously as the engineering from the Mandalorian have said it this is the way and this holds for everything does not apply only to oci containers to kubernetes you have to understand the concepts so we're going to talk about the oeci isolation what I'm going to do without you with our view is a crash course 101 what makes sort of you probably have heard again uh why use containers in a company or if you ask someone in person well they provide an isolated

environment and my question will be oh how so so in order to understand how that happens you might have heard you might have gone to the next level and probably heard well containers are a bunch of c groups and namespaces but what I have witnessed is that the questions about c groups and namespaces also are unanswered for most part so I wanted to show you and I I know that it might not be very visible which I'm I did my best but uh the kernel future basically c groups or control groups those are native kind of features and they provide resources isolation for example imagine you want to run a process and you want to allocate only

256 megabytes of memory to that process you can do this via control group uh all of them resides under CIS slash FS C group what is unique is that you you can create your own c groups on most Linux distributions uh they're already predefined c groups those are mostly your memory management CPU uh something that's related with network something that's related with volume mounts and you already probably map the features that Docker provide are not so unique we've got some Docker or or kubernetes basically what is exposed through a lot of its Flags is actually goes to the native kernel feature which is C group of course container D and cryo which is a container runtime interface leverage

this via flag and I'm going just to spell uh say out loud what is written below so I can do a Portman run or equivalent podman is different flavor of it's a different software similar to Docker I can do podman run with detached mode with uh hypnd and allocate uh minus m flag which will limit my memory usage to 256 megabytes and I'm going to spawn on nginx and latest nginx container what it's going to do is create a a separate C group under my parent process ID which is podman because podman course container D and it's it has it already knows the namespace isolation which we're going to talk in the next Slide the namespace

isolation will be your container checksum and you have spawn containers you have observed that every container has a unique checksum a unique shot so basically what you use to reflex is controlling c groups and basically you're controlling kernel features there's no magic behind it and in that case if I simply Echo the value I will see that I have allocated uh like a 2 million 68 000 bytes yeah the equivalent in bytes for 256 megabytes and it's present in my file system so we're going to take to the next level and talk about namespaces as you recall containers basically namespace in c groups so namespaces what are those those are even more important those are

kernel features for resource partition partitioning into set of processes um if you compare this to c groups the c groups limits the resource usage while on the other hand the namespaces limits the resource a process can see what do you mean by that imagine you go to a shell into some server and you know just simply through that I can execute again I will run a container I'll do podman run with an interactive mode again with the same nginx latest image I'm going to execute a shell this will basically uh I'm going to use the no term don't use it SSH into the Container basically involve a bone sound bone shell session and I can list

namespaces so the command to do that is lsns and I can see there are different namespaces each comes with an unique ID you can see it on the left column and there is a type of namespaces so we have now understand that there are namespaces and the different types of namespaces what is listed here we have time we have user we have net we have Mount we have UTC which is related to the universal time we have IPC which is inter-process communication we have pit which we you have heard in the previous presentation you know what Piet is for sure and you have C group out of those uh namespaces have Associated bit Associated user and

what command has access to what namespace so now we're going to the more a bit in depth namespaces two and we're going to talk about what those kind of namespaces are and it's very important to understand the namespaces so we can provide security for our containers if we don't know that we can say that our container is secure so we have Mount we have process ID we have Network again interpros communication IPC Unix time sharing system UTC user ID and a control group so what is odat means well it all follows that those are simply attack vectors those are the primary targets of how you can exploit a container and we're going to observe some of those exploits that do

leverage this some of those namespaces not all but at least some and you can see that there are a lot of attack vectors nothing again if you refer to the the first presentation for today maybe nothing is secure let's see containers do use namespaces to partition different resources imagine I want to run a podman with a BusyBox image I'm just going to give you uh an idea what busy box is BusyBox is one of the most lightweight images it's not a distribution by any means it's a type of a Linux image where all of your essential commands that the previous presenter have shown AOS cat Echo uh then netstat whatever you can think of ipss those are how bundled into one uh

executable which is kind of a a greater uh set of those commands so they are not built each one by one they're bundled into one there is busy box and there is another flavor you might have heard it's called Toy Box same principle you out of your Linux command that you have used during your terminal sessions are basically bundled into one executable and once you refer it there is something that's happening that's closely to soft links and the soft links basically refer to different parts of the binary so you can command can run safely so what we do here that's what BusyBox is and we execute a shell what this does is it's going to create a

separate namespace that will allocate my hostname and you know that if you spawn a container you see some checksum or maybe you don't see checks I mean depends on how this is configured and executed through runc and containerdy but usually behind you there is a hostname that's being created Network different types of c groups and a process apparent process ID so I can list you know I've now I'm into the Container interactively so I can interact from within so I can type yes and see what my processes are and currently there is not much not much of it going on I have shell and I have PS those are both root owned by root but what happens if I

execute another shell so this if I execute another shell this will basically do shell within shell and if I do PS I will see that I have two shells so what is strange about this that I'm in my child shell process but I'm able to see the parent one observe here that my process idea is number one and if I invoke shell again I have a different process ID and this is the active one which is three but then I also am able to see the parent process ID which is in that case one that's the parent from which I have originated from and even if I spawn a PS3 which is same as three for file system but for

processes I'm able to see that shell is my parent with Pros id1 I'm currently running on on the child process with bit number three and I've executed PS3 and that have allocated me uh process ID 11. so it is strange we're going to see why and before we go into the detail I want to do with our views on brainstorm and there is a simple process inheritance uh on the left we have a cluster it could be a Docker machine it could be a kubernetes cluster it's some sort of a cluster where many nodes or containers can run either through container D or through cryo for kubernetes and I have basically spawned a very traditional showcase I have my

backend I have my nginx it doesn't matter where they run on containerd or they run on cryo I have four containers and I have my back end I have my nginx I have front end I have some metrics for logging uh which that's how it I love to do things there was a separate container that's responsible for logging it doesn't interfere when I update so imagine here that I have exposed for my nginx port 443 and through kubernetes this is quite uh easy to do you have either cluster IP or you have node IP for your exposure there is another type of exposure but we're mostly focusing on this ones so for that case that is a note Port basically your

Port is exposed and can be uh you know you can interact with it so my question to all of you is what is the worst scenario that can happen based on that picture yeah you don't have to use the mic if you have ideas please raise your hand

yes so communicating with the inter-process communication that's why it is uh bad but it's not the worst but it's a great observation can we think about something else any ideas

right precisely exactly so this is the root of the problem and the worst thing that can happen and there were cases like that in Amazon in Google in the major vendors for for clouds and what has happened is that you can escape outside of your container towards the cluster and now not only you have exploited like how I can do that imagine I have a vulnerability which I can exploit in nginx it's rare it doesn't happen I know but let's say it's not nginx it's software that I have written so I can exploit that it's either it could be buffer overflow it could be used after free those are two of the most commonly used

vulnerabilities that are being exploited I load my shell and then I have access but I have access only towards that container so I'm isolated I can't see what is the back end doing what is the front end doing I have access to that particular service and you know miraculously their issues and exploits that can allow me not only to control that container but I can take control over the whole cluster and this is drawn exactly through exploits that are targeted on c groups and namespaces and uh mostly you can Target container D you can Target cryo and you can get leverage of the cluster and now you do not control only your own data the cluster can have thousands of

notes and I don't know how many containers uh maybe about 10 000 if it's a big one and owned by the Enterprise Corporation and I can I can control pretty much everything doesn't matter I have hole control so I'm going to show you two of those exploits which do leverage Linux type of vulnerabilities the first actually the both of them were discovered last year uh the first one is CV 2022 0847 the second one is CV 2022 1085 and what is the CV just so we are on Common Ground that's the common vulnerability exploit so notice the word common those are vulnerabilities that we know that do exist there are many that we don't know

that do exist so what are those exploits the first one main cause it is initialized by buffer flag variable and this is a future that's uh in the uh in the kernel it's back in the kernel that can be exploited basically page cache is always credible to the kernel imagine I have to write a specific file but I do it in such way and I can write a c program which I can write a cache but while writing the cash in a specific way of course I'll show you how this looks like I can write a cache without the kernel checking for my permissions so this came actually from a ticket the exploit is called uh dirty

pipe you you have seen it probably in the previous presentation one of those presentations it was called dirty cow so basically that's the successor of that exploit and what it does I can write for example Etc uh password file and write it as a cache replace it with my own password elevate my privileges become root and then restore it from a backup so you know you wouldn't notice that those this file has been written down and then restored there's no way you can guess that of course if you don't have some kind of a audit monitoring of the actions that have been performed the second exploit is actually related to the Heap based buffer overflow in Legacy

parse pyram function again it's a it's a bug in the links kernel and what it does is there are certain namespaces and the certain namespaces have different aspects that they can interact with certain kind of futures or or kernel classes an unprivileged local user say I'm uh non-root stating this clear I'm not root and I want to enable my capsys admin such that it opens its capsys admin is a is a Linux kind of definition of uh or something Legacy but it's we trigger it so it can execute that Legacy uh code and what it does it's open a file system that does not support the file system context API and then it falls back to Legacy handling where we

hit that exact exploit so it's something that's very similar what bonjour have shown in the previous uh in the first presentation with the printer there was a printer that was printing something that's uh for legacy you know how you can send uh a gif on your phone and that gif becomes PDF and your phone gets exploited that's the same thing we're triggering a legacy function that's I don't know how many years old nobody knew it existed so a fast recap there is Docker there is kubernetes those are two different ecosystems and they run on different demons one is containerd one is cryo and what I want you to emphasize on is that kubernetes and Docker do behave

differently in the aspect of the runtime so uh what I'm going to show I hope it's not very visible but I can share the slides with our view after the um the presentation I execute a busy box image and I do a PS so I can see that my uh process I have two bits both are owned by root and I have 5 15 49 for my Linux kernel version then I do an unsure and that's on the docker provisioned container when I do an unshare that's basically detaching myself from another namespace from the namespace and trying to create a new one it says oh unshare operation is not permitted that's on the docker if I execute the same thing on

the kubernetes with the k run K is just an alias for cube CTL k run interactive same image busy box shell it's the bone shell it's the same shell and I try the same command unsure it will run so what I wanted to emphasize with this very simple example is that they behave differently if you test your application in a Docker local environment and say hey boss it's done we can deploy it to the cluster I've checked everything there is no way this can get exploited kubernetes do things differently of course this has been a much improved with the latest version of kubernetes they are now filters that you can leverage but you can state that explicitly so you have to

know what you're doing uh so the lesson to take home is that they behave differently kubernetes and docker the next uh exploit that I wanted to show you is dirty pipe dirty pipe is an exploit which is the successor of the recall it's where I can write a cache to the Linux kernel without the Linux kernel checking for my right permission so I can write pretty much anything and I can override the ETC password d file so what I'm going to do here I have created a scenario where I use pound cat pound cat is a remote shell it's very similar to uh probably some of you have played with Metasploit and meterpreter it's again a sort of a remote shell and

you can spawn sessions uh basically it's very interesting you can check it online it's called pound cats open source um what I do here is I trigger and compile that specific exploit it's written in C and that uh it triggers the vulnerability it's compiling it says compiling verified exploit attempting to run completed successfully and uh what I do I check my ID and I have become root initially I was not rude but then I once I do that exploit I have be I have elevated my privileges and you don't know how widely spread that exploit is that was fixed in the later versions of Linux kernel 5.15 by 29 so it's it was fixed in the last seven months

or eight months um imagine how much it will take for the industry to catch up and update not everyone updates their kernels regularly so everyone makes makes them vulnerable and if we don't know again uh simply by attacking the kernel we can Elevate privileges become root now imagine what will happen if we find problems and combine different exploits so we can escape outside of this container and reach the cluster with root privileges it doesn't matter whether container whether our container is running with non-root we can Elevate our privileges the next exploit uh it's pretty much a key buffer overflow and for this I have prepared an image I have uploaded to the public Docker Hub

under VP bonus besides hyphen Attack One and I create a pot which is called besides attack one then I SSH or spawn a bone cell session interactively and check what my uh my processor so I list my namespaces and the idea about this hacks that I can change my name space so uh if I do a captch print which is capture um uh basically capture the current the bounding and the current iib namespaces the I can do unshare urm and basically uh unshare urm tries to detach from the current namespace and spawn a new namespace and now I have elevated the capsys admin privileges for which I I was supposed not to you know I

was outside of that namespace they didn't have access to it but simply by invoking unshare I'm able to change my content of visible namespaces and namespaces which I can reach to so that is one way of container escaping that was fixed in uh I believe quite recently but it's one attack that you can work and some vendors which I will not name have suffered from it so the tools that I have used I always want to uh State my software which I've used for that presentation the first one is asking Nemo for which I uh recorded my terminal sessions it's very useful I recommended it it's uh you can you'll remember the name because it's an

an acronym for ASCII Cinema so it's asking Emma and the second one is a bind reverse shell it's called Palm cat you can check it out and you can also watch the uh the introduction from the guys that did invade it did invent that remote show uh they're great I mean I've learned by watching that session alone more than I have learned in five years and the session was 50 minutes about remote shells so another important point that I want to make during this presentation is that security moves with time and what is secure today will not be secure tomorrow what do you have observed so far are Kano exploits those two exploits that I

have shown you and I want to again emphasize that the kernel kano.org is one of the most icing software ever developed you know this is watched by so many developers so many eyes are developing on this including our company sponsor building but it's so big that a certain developer takes a certain class there is a subset in that class for that future and there is just one little fragment that he's focusing on and that's reviewed by many this is one of the most secure software ever written and it does have exploits yeah despite it is at the root of everything it runs everywhere you'll not find something that does not run uh Linux kernel even now the new

versions of Windows are shipped with Linux Kano and they support that it's very safe piece of software it's present everywhere now for a moment imagine how many common vulnerabilities uh other softwares could possess think of a moment about this because if one of the most observed piece of software have such vulnerabilities and those are criticals and of course the industry tries to catch a patch regularly Etc but the order software is not on such string monitoring usually you can have a developer pushed code and there could be buffer overflow that could be used after free and it's yeah it's a lot more dangerous in at least in my opinion um no not only that but this software can

be added on top of public uh images Docker images or container images or all CI images they all refer to the same thing so the next time someone asks you oh I wanted to run that service very quickly you probably will just go to download the first Public Image that's available on Docker Hub not saying that you do it many many developers do that and those images are full of vulnerabilities even there is snuck now running on Docker Hub which is their scanner and you have approved and certified verified images and repositories it's still like tons of vulnerabilities everywhere inside those public images don't trust them so the next topic I want to to bring is

bring on this table is Alpine Linux and my question towards you is Who currently uses Alpine okay you know this would be bad because this presentation is not about Alpine right okay so uh yeah Alpine is loved by many and hated by even more people why is that the case um usually the process towards coming to Alpine Linux goes through a size comparison and I have shown just an example showcase I want to build uh and provide basically an image that can be that can run go and build go whatever that the case is if you build go then it could be a larger image in size but I wanted to just provide a comparison

um about how the choice to pick Alpine is usually done in in companies not only Enterprise companies one manager and that brings us to our previous talk one manager checks all that image about dividends one gigabyte that's a lot what's how can we bring this down into into a smaller image and developer checks say oh you know I saw this Alpine Linux they pretend to be pretty much small in size let's give them a kick see what happens usually the people don't know what the four other words mean below that graph meaning distrance busy box and scratch so I will explain those to you this stress we're going to talk about in the later slide uh scratch is basically you

can refer to it as a no so it's a no image it's empty busy box I have explained that our recap that's an image that's pretty much scratch with the essential Linux command so you can still uh Echo Cat OS sort of those stuff that commands that you use day to day uh and you also have scratch Advance which is Advanced scratch but I still know so Alpine why why do people pick Alpine yeah you have package manager there you have APK and APK can install very easily additional software on top of your image I can do APK add and install I don't know Vim GCC music compiler whatever I want quite easily so understanding

yeah again understanding how those pieces of software leverage links kernel system components and its attributes is the key and this is the way as the man who have said it so I will speak about why I personally never use Alpine uh the main reason for that is that Alpine uses muscle which is an implementation of the C standard Library so it doesn't use the traditional glitzy or the new C library it uses muscle what muscle does is it's more lightweight it's performance optimized so so with DOT it's faster and simpler than glitzy but it's used uh you know it's used by other Linux distribution in glip C is used by other lens distribution muscle is kind of specific

to Alpine so what is the you know the backfire of using muscle in Alpine because Alpine size reduction strategy is based around muscle so they're compiling the whole stuff into a different lightweight C library which is obfuscated it doesn't present with the same features so the first thing that happens is native call hell software compiled against muscle runs only on muscle what is meant by that if I use Alpine during my cicd pipelines I do compound native code I have to run that code again muscle Lipsy will be immediately segmentation fault uh so I'm limited if I'm stuck with alpine I'm stuck for life I I can't run that on on other C library that's

only for native code we're excluding Java we're excluding uh yeah other types of non-native interpretation or uh Tech Stacks talking about native code C go rust Etc next thing node.js and that's JavaScript node.js goes to the muscle compiled native code by node Gib so node.js can call native code by node gear basically you can access your um your C classes from JavaScript and that's done through a library which an awesome person wrote uh through an npm it's called not Gip false big time uh outside of muscle again we have the same problem and we're referring to the native called hell so if I compile node.js native code on muscle I have to run it on muscle

otherwise I'll get memory uh hell I'll get segmentation fault immediately I'm not able to run my software uh on my or my service outside of muscle the next thing is very uh I believe it's it's one of the most interesting at least for which I have come across and it's DNS hell so it's funny so uh bear with me muscle by Design so we're talking about the muscle C library does not support DNS over TCP which means that uh usually you won't give up a damn about this right uh you just would notice and the chances to read that randomly anywhere is pretty much zero uh it will just explode at some point and the way DNS are being resolved the

responses and so forth for muscle is done through UDP it's not through TCP it's done through UDP UDP is limited to 520 512 bytes and that's you know by by their standards they said ah that's enough you wouldn't need more than 512 bytes which kind of refers to the Bill Gates famous saying who would use 4K of memory or something like that and of course there are cases where you need more and I want to tell you a story about uh a case which we had running kubernetes and again a manager came says oh well you know I talked with other managers they use Alpine and we should use Alpine as well they're great they

stated that everything runs lightweight slight ways it's fast awesome let's do it let's onboard Alpine though uh the cluster starts throwing unknown host exceptions and basically it seems every it was seemed like liver breached we were exploited big time we got unknown host and we had our Network so we verified this but nothing worked everything stopped working at some point and this was the fact due to that kubernetes and the internal DNS name resolutions they used tremendously long DNS names so you have something that's I don't know how many you can get tremendously long and five uh you know 512 bytes is not enough and you just not get your DNS resolved in kubernetes if you use Alpine and who

will have thought about this right what are the chances uh and it was just a big No-No for us once we have to get this and gladly we hit this in an isolated environment but at first it was like oh God we got hacked um we didn't know what what was causing this and we understood there was this DNS but we thought it was DNS hijacking and someone took control of our DNS that's what we thought so we're going to talk about what could be a successor to Alpine and it's this for Less so what this release is and I want to again ask who I've heard or use this through us right okay so two people

excellent um so this race is not a buzzword I know that this bezel attack has you know come across a lot of us but it's not a buzzword it's the the the right approach towards working with containers especially uh in the kubernetes ecosystem so the initial concept is developed by Google and uh you know they started with some of the rules what this release is they started you know basically the same approach where your manager can say oh it let's use something that's smaller in size but they were a poll you know they were opposing this and said okay we're going to develop our own stuff uh maybe for usually it fails you know the

chances to succeed developing your own ideas have a great chance to fail but they succeeded and their concept was right so what this relates is it the name refers that it is a Linux image without the distribution but it's not it's not the case right the distress image means that you don't have any command or anything that's present you only have the things that are needed so you can run your application no matter if that's Java if it's native code is go uh whatever you only have the capabilities of the software you truly need and of course you can think of that brings the size tremendously the base displays images like three megabytes or even less

so there are no package managers that's why it's probably not understood by many because you want to work with package managers so so we thought there are no shells or any programs there's no less there's no more there's no Vim there's no CAD there's no nothing you cannot shout just you can run your executable that's it in most cases and it's perfect for kubernetes clusters why is that the case because we are limiting the attack surface on our clusters so when we're packaging our services in a distress image and we're bundling this we can ensure that we are only shipping the essential stuff we don't have overhead on our traffic cost and we we are not you know

striking down our ECR repository if we're on AWS by pulling tremendously huge images yeah maybe if you're doing that's something that's very specific you might you know this might not be the right fit for you but this is currently something that's lost by many that approach it's perfect for KR clusters and what Google did is it provided different set of distress images so you you have static Debian bass based noise cell python Java Java 11 Java 17 node.js etc etc they come in four flavors you have Nom root debug debug Norm root in latest um and of course this release image does differ in size the static file system is around two megabytes if you add some

additional stuffs like certificates open S cell handling Etc and of course Java if you are Java Java is a giant it will get the size will increase but relative to the the things you need it's still very small so size is relative and uh I wanted to demonstrate this with something that we we basically monitor our images and how they behave internally through plot and I've picked some of the images which are publicly available for go reading top to bottom go node.js nginx and we have developed our own kind of a distressed version of those images which I will call VMware slash go VMware slash node.js vmos nginx those are not public images so if you search them they

only own our private registry If You observe the total CV count on the public images like for go there 461 present on March 6. in comparison we only had eight that's how we have limited and addressed the attack surface and it's the same version of go same goes for node.js and goes for nginx and if You observe at the end of the graph we basically we address them with time and we work very close to zero and those include high medium and low as well so not only critical so so some of the VMware inside and I'm going to wrap it down we wanted to create a symbiosis between package managers because usually when you're

creating a Docker file you usually do a run commands and those run commands install different stuff so it's usually how a Docker file looks it's one from statement something that your image is based upon which someone else developed then you have a big chunks of a run statement usually in one block to be in one layer even so it can be a little bit lightweight and it's there so many limitations with Docker files that if we wanted to kind of do then maybe add something that's better than Docker files and again leverage the package managers so we ditch Docker files and we switch to yaml syntaxis and it was our plan to save big bucks

uh as we wanted to reduce our cost of goods sold and basically our Cloud bill on our Cloud vendor and overall technical debt we wanted this to be very easily maintainable because if you developed uh images container images you would observe that if you want to bump the parent image with and fix address vulnerabilities Etc you have to build uh you know synchronously the rest of the images so it could be I don't know how many uh built operations and constant bumps so it's one parent another parent child the child of a parent so it gets like I have to build 10 images if I want to have a proper inheritance um we wanted to have a far more security

and approach flexible built and dynamic images so we something that we refer to project Michael and it is a Insight name something that we plan to develop it's a new way to build distrust images if you search on the internet about this nothing will pop up something that's internal so there's no advertisement in this one so we something that we understood is Docker files are in the past and there is one Docker file that's um we used to have and we still maintained you to several reasons but uh we wanted to switch from Docker files to yaml syntax I know that it's not very visible but we wanted to build an oci image against ayamo and what is the good what what

this does to us is that we are no longer relying on storage for our images in a container registry we can do this runtime and if you have watched or seen kubernetes in action you can do a port specification and you can specify an image and there you can specify uh like an a repository which is an oci repository path towards your image with the appropriate tag and so on you can just include an yaml file which is local and you don't have to pull I don't know how many megabytes of image each time and you do this runtime it's very quickly it's like that and basically it calls your package managers uh it supports uh dignified uh DNA

um yum aptitude get uh APK even though we don't use it we don't recommend using it because it was breached and yeah usually you you just handle the package managers accordingly so we wanted to live even you know that was the first iteration then the next reiteration we wanted to use a cloud native Foundation emerging project which is build pack uh and build packs IO so what build packs IO does is it can look into your repositories and create a Docker file specification it's it's an it's a tar at the end your image is just an archive but it can create the right capabilities in order to build or run that specific image imagine I I have a C plus plus

repository with some C code in it I throw a bunch of stuff in it and I can just uh execute build pack against it and that will create the the necessary recipe so I can create my image without me observing what's in there without adding additional libraries and so on it does it for JavaScript for rust for golang for Java C plus plus C whatever you name it so we combine this and uh we don't create a atar archive but we create a yaml specification which can be broken down and you can inherit this leverage this separate it into different pieces and you can of course develop on top of it without bundling inch time separate image upload

and so forth so the process is uh it's very unique in an aspect at least I haven't seen anything so far and the flow usually uses Co which is the code build so if you go to GitHub and says uh Co it it's something that's used in go to produce a minimalistic tire archive basically your file system for your image and that's consumed by the container runtime interface there are two separate Sims that you can pick either you do a build or you do a runtime because the images that I do requiring build are totally different they're usually bigger they're not for production runtime images on the other hand means to be you know only the

things you need so you can run this in production so we have build we have run and we have build packs that does feed what needs to be added in terms of packages and we have a process where those packages are managed by different uh package managers and you don't have to use run or anything like that what's used to be in the docker files Docker file is pretty much outdated by now um so each package is broken down into capabilities and that's that's again a feature which we uh it's not something that we have invented but we have improved upon so if You observe upon nginx nginx if you do yum install nginx or up to you get install nginx

um there are different capabilities inside that image so nginx is a bundle of its sub components sub-components or capabilities yeah and we just uh it's we're making a compatibility match Matrix where we're matching only capabilities that we do need for our specific package the what is the end result okay so what's all this doing we're creating an image that's minimum in size easily built and easily maintainable and in comparison you don't need to contain a registry this is artifactory Nexus Hub ECR and Amazon you do just do things on the runtime the cacd flows usually repository cross-referenced by centralized repository it executes a child job prior to the build a reproducible Yaman spec is created for that required image for

that specific repository in which is using build packs uh it then it consists of two yaml file specification one for build and one for runtime pick your own thing you know we just do both of them at the same time our images are assigned with cosine and with the appropriate time stamp stamp and built on the Fly no artifact storage no HDR and last but not least uh just for a comparison uh size wise image size is drastically lower like the Public nginx Image is 56 megabytes and by using all this novel approach we managed to bring it down to 3.8 megabytes that's fully functional nginx as you can include in your uh a kubernetes cluster for audio load

balancing works um and you can fed it to your uh K9 kubernetes definition and you can cross-reference it with a timestamp and so forth so that's how I had guys and thank you very much for your attention and do you have any questions

Secure distroless OCI images via YAML

Related talks