Infrastructure Testing at Scale in CI/CD

Name: Infrastructure Testing at Scale in CI/CD
Uploaded: 2022-06-13
Duration: 42 min 7 s
Description: Agile enterprises are constantly facing new challenges when it comes to embedding security in the Cl/CD DevOps processes. One of the main pros of adopting DevOps is gaining speed but that does not always match with having thorough security checks, especially when those checks are performed manually

BSides Cyprus · 202142:0776 viewsPublished 2022-06Watch on YouTube ↗

Speakers

Spyros Manglis

Tags

CategoryTechnical

TopicDevSecOps

StyleTalk

Mentioned in this talk

Tools used

Standard

Concepts

Vendors

About this talk

Agile enterprises are constantly facing new challenges when it comes to embedding security in the Cl/CD DevOps processes. One of the main pros of adopting DevOps is gaining speed but that does not always match with having thorough security checks, especially when those checks are performed manually for each release/build. How does the new deployment affect the exposure of the environment where the application running? - New binaries - New logging mechanisms - New log files - New monitoring - New permissions - New configurations - New secrets How do we make sure that the environment is safe and does not introduce new privilege escalation vulnerabilities for example? Answering these questions for each machine in a landscape of thousands of VMs, images and IP addresses is not an easy manual task. In this talk, we want to share how we solved the issue in a large agile organisation like ING, and present our solution and approach to automate infrastructure testing in a Cl/CD environment built on top of Ansible and STRIDE.

Show transcript [en]

hello everyone and uh kalispera to for the greeks my name is uh spirus mangliss thank you for being here and joining this presentation and thanks also to the uh besides cyprus teams for making this happen um in the next few minutes i'm going to share with you most of the challenges we are facing when it comes uh of making and keeping a complex and big environment secure right and especially how we can take all the manual approaches that we have in an organization and how we can start automating this this uh this process now what is all about and pretty much the agenda in in the first part i kind of see the presentation split it

on two parts and the first part is gonna i'm going to share the challenges and uh our experience behind behind custom infrastructure uh testing and in the second part i'm going to talk in and i'm gonna share in details uh how we came up to the end solution that we have right now and i'm also going to share the future work we have we are thinking about this and the roadmap and we are going to close with the uh hopefully an interesting qa now few words about me um my name is piros maglis i'm almost 10 years within within iit my background has been shared both in development and operations personally i really like to write codes you know

and read code in a security context and i like to build and write security tools and solutions i'm also the main passion behind this uh uh this whole project and uh at this moment i'm i'm working as a security engineer in ing netherlands and i'm also principal security engineer here at music so let's start to the uh uh with the main topic of the presentation now what we see here is actually a simple bit of environment right development test acceptance and production um so when it comes to creating an application delivering a service or putting in production any new ideas someone might have testing it's it's it's a it's a fundamental path for an organization

right so the applications or all components are freshly built on a development system this development system might have no testing capabilities and in this uh our presentation has been presented represented as a single instance right now once the developer is confident enough all right that the product is ready and can be uh used he's pushing this product uh to the test environment to verify that it works as expected and this is actually the environment where most of the functional uh testing is going to happen as well now uh assuming all the tests are successful in in a in the test environment the the application is being moved to the acceptance environment which should be the exact same copy of

our own of our production environment right um and the reason behind this is because this the acceptance environment is where all our testing and manual automating is gonna be it's gonna happen um yeah as i said so acceptance is gonna be used for uh integration and security and compliance testing right and at the end if everything went well and all tests have been succeeded on acceptance as well then we move the the application to the product to the production environment and also making available uh to all users and other systems okay this is a really simple detox environment um how about something more complex right what happens if your company has a millions of systems or user right

your data environment grows exponentially in complexity also the interactions are between others between the systems is become more complex right we have interaction like with the load balancers with enterprise service bus databases microservices and and you name it right you can imagine like in such scenario checking all of those systems if they are secure enough and there is no out of the box vulnerabilities or easy to spot vulnerabilities uh yeah it's good it's gonna take years to do it manually and uh uh wish you good luck with that and also like even if we are able to do it in a few days right our result it's possible that it's going to be obsolete when we are uh give it to

our clients because in a big organizations like like like ing think that it's possible that developers they will push a code to production multiple times per day so when you finish your previous testing your your last testing it's it's already old right it's already another version on production so okay what we can do now for this so first of all before taking any further steps the most important part is understand what we are targeting in the organization right which part of the whole stack are we are going to target in in our automation in this slide we are seeing the start presenting with two different blocks the first block is the the application which is uh uh the application layer which is

made of yeah the application itself of or the the api available uh libraries are delivered with the application and any system calls between the application and the underlying operating system and the second block is uh the infrastructure layer which is actually the operating system itself binaries services ports files and uh uh yeah and more other stuff so our focus on this presentation and for this solution is purely on on the infra layer right now uh we know what our target is right and and what we want to understand as a next step is how to assess the security of of the stock in a cicd environment right uh here we're representing a cicd environment right and

so in general and a really good practice in a depth coops environment is to provide feedback to the application to the developers as soon as possible so for for this reason we may have uh integrated um we have we may have embedded such functionalities in the uh developers nde and also we can use centralized incremental start sustain to to assess uh the deltas um [Music] because we may hear a lot of abbreviation if something is is not clear please uh please let me know i prefer to to call them with abbreviate abbreviations because it's way shorter than the whole thing so uh yeah if something you don't understand please feel free to to ask questions later on or even

jumping into the into the presentation other minded so at this stage those two will assess only the the first layer of our stack the application and uh the api in sometimes it may also assess that the libraries but the reason i left it uh outside of of this phase is because such tools must always be complemented by by other tools right do not rely only on on your sas tooling so software composition analysis or scheduling is something that should be used in parallel with our sas tooling and it's actually the tool that is going to help us find any vulnerable libraries installed and shipped to production right uh and we can say that now actually in

this part we we have covered the first two layers of of the stack the test phase is most of the times the latest facing a ci cd pipeline and it's where our tools like uh dust or ears can be a battery to automate the security assessment of of the third layer when they're defined in in the stock which is the system calls right so think about lacking in in in the build process we are going to test most probably the uh uh application api and the libraries and and in the next step we are gonna uh check the the system post now in in uh um in the ci environment after the whole uh sdlc lifecycle is

done completely your uh your application is still not directly delivered to the production right most of the times there is a manual phase uh uh in place which is needed and this is where we were we were actually stuck uh on how we can uh assess the small deltas that yeah we are actually have to do it manually now if we rely on the tool that we already have in place right it doesn't cover the underlying os it doesn't cover anything of the binaries the kernel uh nothing nothing about us so this introduces a really a really big overhead right because think about this now okay you have uh uh um [Music] procedures in place a tooling in place

that we can test those two thousand applications in the same time that's no problem uh then you have your engineers or pen testers uh going to the output of those two is manually confirming what is true or false positive right but what happened what happened with the with the infrared we haven't seen anything regarding infra yet and the problem is think about doing this manually for uh uh 2000 machines right uh i think the challenge is easy to understand it will take way way too long um so also the the process of manual infrastructure in big organization most of the time is a big overheads uh you need to get some mpas you need to get approval first for those

mpas uh based on the network's augmentation you want to uh create a version of the system uh you may have jump hosting place and no and all that so it becomes a big of all overhead of doing it manually also the machines being that a vm be that the docker you can name it but which delivered to the developer development teams they are not getting assessed after uh the developers do any changes on the machine right so i'm guessing when a developer is getting a yeah a base image on top of that he needs to install third-party third-party packages maybe some tools that he he needs to do his own work right nothing is being done

to assess those kind of installation afterwards uh the vulnerability analysis tools are not viable in this case and uh yeah we are ending up with the questions with the question how we can assess the security of the deltas on the infra layer now please and please please don't tell me once again manuel uh believe me it's a pain i'm not gonna say where uh and any any i believe any normal passion uh it's gonna react somewhere something like this uh especially in huge organizations it's it's really common so what we need to do is to find an idea and let's start automate and this is where ansi becomes in place i hope most of you are already aware

about ansible if not i'm just gonna uh um tell you a few things what ansible is i'm not gonna go into more details and deep details that's not the the purpose of of this presentation so in general ansible is being used for provisioning right spinning and killing machines on-prem or on cloud orchestration uh for example uh configuring or connecting one application uh with the network layer that it can communicate with other applications and with other users for configurations uh you can share and apply configurations as across multiple instances in in the same time and with this and of course deployment now the first question most of people asking is why ansible and especially when we have

so many tools available okay uh we're gonna see the comparison later on but let me first ask a answer on why ansible um the most important thing for me when i start thinking about how to make this happen it was yeah to have as less as steps in the process as possible for both the security engineer and the developer teams so ansible is an agentless they don't have they don't require to install any any agent on on the servers right it uses shapes and winram to log into the remote systems you can use it to keep state of your machines uh i'm going to explain later in details what that means it's easy to scale it doesn't matter if you

want to run one task against thousand machines or thousand tasks against hundred machines um it's easy to be done now how much time uh that will take if if it it depends on your network segmentation and network connectivity we can have preconditions uh in in our playbooks as well which is really helpful when we are dealing with uh different stocks right so we don't want to run for example a java test when the task we are looking doesn't include any java installation and it's also really extensible it's a there is an operations community you can find modules and uh download them for free if you go if you visit the galaxy.ansible.com there is many ansible modules for pretty much

everything that you can download those modules are being created from people like you and me and there is also modules that ventors creating themselves all right let now to answer the second question why not other tools um so i don't really have much to say on that expect the three things that you can see also in the in the slide ansible it's really easy to understand really easy it's a young format you can write also your own ansible modules in python right it's really easy to use i definitely believe that you can write and run a simple playbook in a matter of minutes and the last and most important for me it's it's not even comparable to me with with

other tools because uh ansibly it's not a tool right um it's how we use it in this project to take advantage of its own functionality right so by default it doesn't do anything you we need to build the playbooks we need to build the test we need to build all those kind of things and to close this this question uh we can actually embed all those tools or create like a wrapper around those tools with an ansible role uh uh um also like in the in the in the previous presentation of of nicos right we could do pretty much the same and instead of uh going into the server and running the route and uh manually uh doing this with

this way is yeah it's pretty much uh uh easy to be done um how ansible is mostly used within the security right um most of the use cases is stoplight security standard like the the the stink and the stick from from visa for the um yeah fun of um [Music] top gear out there i don't mean the guy the british guy which preserved in the top gear also pcid says standards network and system configurations nfc also ansible being used in front hunting as well now uh okay so let's talk a bit about the process uh we we need to follow right so first of all we need to understand our infrastructure uh landscape and how it

changes so what service runs in our infra when and how the the system changes we want to adapt the the security standards or like pci dss metric and and map it to our internal threads right and in this in this stage we don't really need the whole meter framework we don't really need probably the whole pci uh we can just take what is applicable to us right and and adapt only those in the third step in the phase yeah we need to define our internal threat model so we took everything outside we have our custom threads or only internal applicable threads if you like so we need to combine now now those twos those two

uh and later on we need to do the the mapping right uh now in the stage five uh it's actually when we're we're start writing our first tests based on on the threads and and the next step is actually the first assessment of of the image so we can also have in in this table you can also have the uh the base template so what a base image includes inside and the next step is just starting the increment and start scans right so to be able to be uh so to be um able to get as more efficiency out of this we found really crucial to have some technical and business qa in process this business and technical qa was uh

delivered to the developers uh and it was mostly to answer yeah simple questions that will uh give us a better understanding on which test functionality it makes more sense to spend more effort right if that makes sense so simple questions like uh uh when the default must change you know is is there any any standard process that the default stack is gonna change or it changes randomly uh uh which is the most common packages that developers installing after they they get the image right and pretty much the same goes for the the technical uh qa like okay um do you have any message scanning in your uh pipeline right is uh uh um how often and when patches are

installed on the machine um if they are aware for any uh possible external threads because keep in mind that the people who are actually using the systems to develop applications are the people who most of the time know the systems way better than than us the security engineers right because they they use it in everyday life so when it comes to phase two we already adapt in the stratas we have pci dss stig meter and whatnot which we combined and then we built our internal uh threat model now in this slides we you see how our internal threat model looks like right and we are actually now in the phase three of the process we have defined our threads they know

based on on a strike model and if we look of an example in this threat model right uh uh if we are taking the linux in the um thread second thread attacker hosts a malicious application on the web server to steal credentials and secret info now it's a little it's a little bit more complex testing right uh you need to have a lot of free requirements in place but as long as i'm aware there is i don't know any tool that has such a testing capabilities right to logging into the server check if you know maybe i don't know any any certification forget is forgotten on the server if i can read that certificate maybe has also

a weak password and if i have that i can easily replicate an application on the on the server and use that to yeah maliciously maliciously uh um [Music] interact with with the other applications um so um so as i said all those such tools are really great and they're really uh yeah it's good to have them but we do have some some blind spots which uh which is totally fine right to have blind spots but at the end as engineers we need to find a way to to cover those those flight sports all right phase four and where actually the the interesting part starts to begin uh mapping the the threads to the actual test um

[Music] this is actually our internal istg uh infrastructure security testing guide we have already defined a threads which yeah in this example is information linking application property files and you can see in this case how uh uh a manual test case and manual test scenario will run to to cover this this specific thread okay now what about the playbooks here we see two playbooks um those two playbooks could be combined to cover one single thread right in to be specific this this thread which is a single thread is actually covered with these two to play books right extracting we see on on on the left exact extracting sensitive info and on the right uh log manipulation right

now as you can see here all the rule set is based on on regex and also you can have um [Music] sensitive word how um you can create your own flags there right your own sensitive words that that you can look and the most important thing is in this case is actually that we have the the ability to read the files so in this case if we know if we select in this example because we can read the files we're just looking if contains specific uh um wording right um of course as this project grows and becomes more mature this functionality can be replaced by something else all right let's let's recap uh what we have done so far

okay we first step was our internal discovery we want to understand what's going on in our organization uh we we kind of complete that we've been uh questionary to the devops teams right second step was our mapping our standard mapping the uh all available public frameworks to our internal uh one which is the third step so we took the available uh threads and uh we combined them uh with the internal threads and we have the uh istg at phase four right and again at the end at the phase five uh we just actually uh start writing our own tests and uh putting map everything and post everything to ansible now how is being used within the whole

process the project how it's been used uh uh with the whole line within ing and uh let's see what is the definition of our default stack which is already mentioned now we want to define what is included in the base image we already have that in this in this phase right um [Music] so we want to know what kind of libraries installed what services running in in in this case also it's good to to create our first base template for for the base image after collecting on on those facts and we can use that template for keeping states later on uh apply hardening and provisioning new machines and after after that we uh as i said as we as i said we

want to know what is being installed on top of the uh base image so after the base image is delivered to the teams it still makes its own change from that so now we know all components are installed on the machines so here is actually uh one more uh um good place to create the the template for our custom stack right now we have the previous template which is our base image and now we have another one template which defines our custom stack what kind of vulnerabilities now we can we can look on that and what we are actually uh what kind of vulnerabilities we are looking in within ing first of all we are looking for a

vulnerable binaries from the machine in out-of-state software denial of service vulnerabilities some privilege escalation techniques right don't expect to find zero days no no i'm not gonna say that don't expect to try uh um yeah fuzzing binaries or kind all those kind of things but uh there is functionality implemented for well-known privilege escalation techniques which if it's available it's going to take advantage of a technique um extracting sensitive information also folder and file permissions looking into projects and way more and at the end it also provides a report now what we can solve this way we can catch all the long the low-hanging fluids we can replicate zero days in in the same concept uh i guess most

of you are aware uh about the latest uh apache uh vulnerability right in in reality you could use ansible to replicate uh um that's that proof of concept and then we can use you can use it across all your system you can enforce configurations and of course automate the obvious all right but so far we are only talking about tools about automation you know there is people in in companies working through right what about the pen testers um until uh a few months ago the way that this project was uh used it was uh manually um the pen testers were will around the uh playbooks manually themselves those playbooks will yeah will trigger the ansible role

then the ansible role will start running all those tasks against the uh um the targeted machine right and at the end you will get a result a report sorry which uh we had to again manually uh review i can i believe that i can share with you a video just give me a second to show you how it was uh uh running previously

please let me know if if you can see can you see my screen repeat all right so let me then start here so in the in the left it's obvious we see the uh the role in uh the playbook that we are looking right now is actually checking uh um uh if ssh allows empty passwords and uh so to the way you you we used to run it is pretty much going into your console uh provides the uh arguments that ansible is required right the host.ini file is your inventory to put it that way is where you defined all the ips and hostnames for the servers you want to test the application is just the application

name um so because in it's really common in a single server in a single machine to have multiple applications if we want to check only one application one specific application we can provide the uh application mrs argument um the folder is where the report is going to be uh placed and the environment is in which environment the uh the role is run in this case it runs on on acceptance now the whole playbook the whole role it takes about um i think one and a half minutes and that was pretty much what happened till a few months ago every time you want to do an infra pen test every protester will have to run the the

role manually after the role is finished get the results of the report check the results and then define if uh um yeah actual manual pen test is required or not now we see here that it finished it finished pretty much in less in less than a minute but that it makes sense because only it only has like five six six tasks this this demo and uh the change that we see here is just the the creation of the report um in in the whole debugger output as you can see here is like you see all all the the the dirty output of ansible so it's not parsed right it's not clean you see all the tests failed and uh

succeeded and to move a little bit further uh this is um the uh the final output of of the project it creates that that report uh now it looks a little bit better but i have to admit that the designing it can take a lot of improvements in the report we have a way to say you know if a test passed or failed or if a test requires requires manual check in in this case in this state that we see right now in the example right now this implementation wasn't uh available all right so let me continue then with the rest of how it looks today now today's been implemented in in cicd uh it doesn't run manually

um the and the way to trigger it is depends on on at the end right now depends on the developers uh uh but every time you you commit a code you can pretty much have the ansible role being executed against the the node machine in cicd and get also the latest report the the report with the approval or uh disapproval of of the release so now uh yes that was let me share one more video with you because i couldn't do that as a live uh we are on

okay here we see what we see here is actually um [Music] we are on our on azure portal we are on on our devsecops our pipeline right and we are going to trigger for this example just the pipeline manually um so right now at this stage we need to provide some informations to the back to one is the pentest npa which is going to be used which is actually the the user name right but the password the application name and and the host those are uh being can be provided dynamically from the azure portal and uh yeah we are looking just to move forward and uh make it this available of of uh in a possible way as well so the role

at this stage is pretty much started right uh indeed we started manually but think about that this will be triggered in any code commit if if you want to and it takes about one and a half uh one and a half minute now this role is actually yeah as seventy eighty percent uh infra enumeration right think about all that in this role is pretty much what a pen tester will do when he's logging in on on on a machine right uh it's it's already been uh written as an as an asymptote test and it only takes one and a half minute to run okay um here we see uh as we saw previously in the in the command line we saw we see

the the raw output of of ansible which is as i said previously it's dirty there is no parsing it's it's you're gonna see the failed and the succeed tasks so this is available on your pipeline and also when the when the pipeline is is finished um we get available in the azure portal also the report in html format which is also available for download now in this what you see what you see what you see here you select the first two they required manual check uh this is because yeah i mean at this stage you can we can really have pass or fail right we can know if if if a test is successful if it's vulnerable or not so

everything that is being shown as a manual check or fail means that something is is going wrong with uh yeah with uh with those tests and as i said previously which you can also get the report as an html and open it again in in your browser really really uh yeah not that um weird stuff and fancy stuff here uh the report can get a lot of design improvements all right uh that's pretty much it regarding the the whole solution and why how we end up uh here right and um the whole idea behind the project some of the future work we have on on a roadmap is assign a risk score to to the tasks

so now and this at this step we want even to know you know if if like a a test failed or to put it differently if a test did not pass what is the risk of that test if it's low you know we can we can uh know the priority of of fixing it and goes on and goes on so we can approve automate approval disapproval is is already in place um we will as i said i want to implement password vault and azure volt functionality for secret management uh there is a plan to release the the whole role as an opportunist of the open source project and the last and the most important of work for me is i want to uh

map the same uh the windows thread modeling also to to to the same role so after having everything in in place i can sit back and just you know enjoy the fun and uh the output and pretty much uh that's i what i have to share with with everybody i hope uh yeah you like the the idea on a bit of unorthodox use of of ansible right uh but i have to admit that the whole solution it uh it actually helped us to decrease like 60 70 percent of infrastructure testing with uh within ing which i find it really important uh to use that time to spread it in a more uh yeah more interesting work

right okay is there any any questions uh let me because i can i don't see the uh the chat

there is a

Infrastructure Testing at Scale in CI/CD

Related talks