A SAST Story: Effectively Adopting Static Analysis for Profit

Name: A SAST Story: Effectively Adopting Static Analysis for Profit
Uploaded: 2020-12-04
Duration: 29 min 1 s
Description: Title: A SAST Story - Effectively adopting static analysis for profit Static Application Security Testing (SAST) solutions have been used by enterprises for over two decades. While such solutions are considered necessary in any development shop, the technology has been notorious for reporting a hig

BSides Philly29:0173 viewsPublished 2020-12Watch on YouTube ↗

Speakers

Sasi Siddharth

Tags

CategoryTechnical

TopicDevSecOps

StyleTalk

Mentioned in this talk

Tools used

grep PMD

About this talk

Title: A SAST Story - Effectively adopting static analysis for profit Static Application Security Testing (SAST) solutions have been used by enterprises for over two decades. While such solutions are considered necessary in any development shop, the technology has been notorious for reporting a high volume of false positives. This problem is especially significant in larger codebases and legacy applications. However, with the right skills, process and a little bit of time, teams can extract a lot more value from SAST tools. While SAST tools might sound like plug-n-play solutions, they require constant care and maintenance to achieve optimal Return on Investment (ROI). This talk is meant to be a discussion on identifying the various techniques that will allow teams to utilize SAST offerings to their maximum potential. It will also cover ideas for designing processes that will not only help use SAST tools effectively, but also prevent various categories of vulnerabilities from being introduced into the code.

Show transcript [en]

[Music]

hi everybody uh good morning and uh welcome to my talk about static application security testing my goal here is to talk about uh static analysis tools and processes and how this basically pertains to the sdlcs in our respective jobs um my hope is that we can all come away with some tidbits on what works best for us a lot of these stories are obviously based on my personal experiences uh but i think we can all relate to a lot of what we're seeing here quick intro about myself uh i'm sassy i started out in apsec as a researcher almost 10 years ago at this point um i spent a lot of time doing vulnerability deep dives and

creating detection algorithms for dynamic and runtime analysis tools and then also spent a little bit of time doing some dns malware analysis um finally moved into real world appsec a little over three years ago um i'm currently involved in you know providing security advisory and process improvements for a large enterprise so let's start with a quick introduction to uh static application security testing tools uh so a lot of these tools have various techniques in which how they detect vulnerabilities i didn't want to touch upon one of these techniques which is data flow analysis and this will be relevant as we go along the presentation so data flow analysis is where one of the static analysis tools tries to

understand how data flows through our source code and this is done by emulating function calls obviously it's not executing code but it's going to assume that certain method calls are being called with a certain amount of data and that way it can identify vulnerabilities one of the techniques for doing data flow analysis is paint analysis now taint analysis involves three different kinds of methods as part of the algorithm first one is called a source method a source method is where you have untrusted input entering your system uh so a simple method like request.getparameter this is exactly where an attacker would inject some sort of malicious value if he was attacking an application the second type of method is called a

sync the sync is where the vulnerability would actually manifest itself so let's say for a signal injection this would be the actual data uh the actual method that's going to call out to the database or for cross-site scripting this would be a response.right where untrusted inputs just coming back into our response and finally the last type of method is called a cleansing method the cleansing method is where if there was untrusted input entering the system this method would end up sanitizing those values so again for sql injection a standard parameterization function would be a cleansing method uh for cross-site scripting this would be a standard encoding method um and the way to analysis now works is you

introduce taint into the system at the source you trace it through this data flow if your paint reaches the sink before reaching before passing through a cleansing method that would mean that that data flow is vulnerable because it never got sanitized however if it did encounter a cleansing method then we can say that it's a safe data flow so classic uh advantages of uh sas to take a quick overview um the biggest claim is that you're shifting left we do want to find vulnerabilities as early as possible in rsdlc uh static analysis has also been around for a while which means there are a bunch of tools established processes uh various people giving talks about

static application testing and so there's just well-defined methodologies of how they can be integrated into our build pipelines and identifying issues as early as possible however if you've ever had the unfortunate opportunity of triaging static analysis tool results you would know that the biggest problem is the sheer number of false positives that we see and this isn't to blame the technology itself the technology is built with the assumption of finding as much as possible this is just built into the sas philosophy uh but there are multiple reasons why this is the case um most common ones are definitely not an exhaustive list here but unknown cleansing functions where you know your engineering team maybe has

their own custom methods for encoding values or making things safe uh or validating inputs based on your application's context and a standard sas tool isn't going to know about these um sources and syncs where a tool might assume that certain methods are valid sources or syncs however your application might be built a certain way where certain sources may not make sense which means an entire data flow uh can be excluded from from your tank analysis and with modern language constructs coming in it just gets really hard for tools to keep up you might have languages that perform dynamic code injection or you could have annotations you could have firmware or some sort of middleware that's just

not going to show up in your data flow and these are just going to be like gaps for the static analysis tool to to really understand what's going on in your code so that brings us to my proposed methodology my goal here is not to really come up with the next new technique for sat static analysis but rather let's let's see what's available today and how can we then bring this into existing processes in our sdlc uh to just better leverage all these static analysis tools so the first step is the most obvious step start small it's really tempting to just throw an entire repository at a static study tool hit go and then look at

the number of results that's going to come back it's rather you don't want to take a more methodical approach start with really small number of vulnerabilities maybe just the high value ones maybe there are a couple of type issues that are more relevant to your application you just care about cross-site scripting so you just care about xcs and ssrs whatever it may be but just start as small as possible next verify your data flows since you are starting small hopefully it's easier to do a few spot checks but look out for those pitfalls that we just mentioned is your application uh defining annotations or middleware classes or external dependencies that are injecting code in some way

shape or form just having an idea of how the application's working would help identify those gaps also are there any custom validation or sanitization functions as we discussed before in general as a security team we all benefit from having some sort of an inventory of security controls that our applications are using so this might actually end up being a nice side effect of this effort where you actually collect all these um sanitation sanitizing methods which may or may not actually be effective so it just acts as a nice uh nice book here uh next one is to triage with purpose uh the reason i specifically call this out is you do want to have a proper system for

how you're prioritizing the results uh because even by starting small you could be dealing with hundreds or even thousands of issues depending on the size of your repository so start with prioritizing based on any number of criteria that's relevant to you again it could be based on the severity of uh the findings that your tool is offering it could be based on certain types of vulnerabilities that you care about whatever it may be just try to create a method to the madness uh before really diving into it and then we come to one of the interesting parts here which is creating custom roles most static tools out there uh regardless of whether they are proprietary or

open source do offer a way to create your own set of rules uh they don't have to be new techniques for static analysis but rather helping taint analysis uh in a way that they understand your application's context you have custom cleansing functions you can define those do you have additional sources and things or you want to remove certain sources and things you should be able to define these using these custom rules so i'm going to give a couple of use cases here uh based on how custom rows basically help me in my job uh first one we're going to touch upon is open redirected open redirect or unvalidated redirect so very straightforward vulnerability obviously we're looking for

redirects in the application that shouldn't be happening without any validation in the example i have here we did have an application that we were evaluating and in the first line you can see that there was a query parameter that was accepting a url uh so anybody could basically provide any url however the way the redirect was constructed was that the domain part of that url was hard coded and as you can see in that example it's pulling it out of some context which is hard coded into the application so even if you do provide an absolute url in the query parameter the redirect is going to be unsuccessful you could not get out of the domain

and we saw hundreds and hundreds of false positives because of this coding pattern that developers were just using across the board in this application so how do we create a custom rule to help us basically negate all these false positives and just you know find the actual issues so the way we dealt with this is even though response to redirect is a sink from a data flow analysis perspective we decided to assume it was a cleansing function if and only if the argument inside the redirect method was actually a concatenation of multiple values and the first value of that concatenated string was always a constant now i'm really oversimplifying this uh there can be more uh nuances added

here you could be looking for the exact variable of that constant that's being ingrained in there you could be looking for a type to say that that is actually a string you can even look for exact values of what you're looking for but just giving this idea of the fact that there is going to be a domain at the start of the spring does increase the increase the idea that yes we can trust that if there is a a constant at the start this would mean that we are now um looking at and a validated redirect basically the result of this was that we were able to remove hundreds of false positives out of the

result but more importantly this kind of cleared the fog out and we were now able to clearly look at more than 50 true positives that just stood straight out of the scan result right so yes there's a little bit of investment here to be able to get to this point but now that we have it we don't have to go back and triage more false positives any new issue that comes in is most likely a true positive from this project the next use case i'd like to talk about is a signal injection again straightforward vulnerability um hopefully we don't have it in most of our applications however unfortunately i do see this from time to time um so this was again a

case where i was looking at this application uh it was dealing with concatenated sql queries that was being called through the execute sql method so yes it definitely looks vulnerable uh the static tool did the right thing by calling this out however on closer examination we found that some of these queries weren't even accepting user inputs they were just queries with a bunch of constants in them and so we started seeing a lot of these false positives so we wanted to begin to understand what is exactly happening now the data access layer on the application offered two methods for executing sql as you can see in the examples here the first method accepts a sql string

and accepts a list of parameters and the implementation of that method was basically parameterizing the query before calling the database however the second version of it was just accepting a string and calling the first method with an empty list of parameters so we ended up seeing that there was a pattern where developers were just copying the usage of the second method and they were concatenating queries and sending them through some of them unfortunately did have user inputs but majority of them did not which means we were again dealing with a bunch of noise and not able to get to the actual holy grail which is the vulnerable results so what did we do here there are

multiple ways where you want you can address this the first thing is you don't want to improve security sorry improve accuracy so making sure that uh you have established sources for the sql queries what do you define as sources where user inputs are coming in from so the query parameters as we discussed making sure second order sql injection is taken into consideration uh as well as flagging data flows where execute sql was being called without parameters so the way we ended up doing this is we decided to leverage what we what developers uh call design patterns most reasonably mature dev teams uh do end up developing a certain set of design patterns they might not have them officially

documented more mature teams typically do but the idea is that development teams would expect certain coding practices as part of their uh everyday uh every job and this is what they use during their care reviews and they have this agreement so from the security team we decided to leverage the existing design patterns that they already have and introduce a new pattern for the sake of security the idea was to rewrite the data access layer in a way that will enforce the usage of parameters regardless of whether you have parameters or not so what that means is that the new api on the data access layer now just has one call for execute sql which has the sql query

and a list of parameters there was no way to circumvent the list of parameters so what happens when you have a real reason where you don't have parameters to send in you would send in no so how does this help us what we ended up doing was we created a custom rule now that just catches calls to execute sql with another list of priming rules so does this result in false positives of course but the number of false positives is greatly reduced so now our team just gets notified every time there is a call to execute sql with a null it takes a security engineer a quick glance to identify if that's truly a false positive

or if it's a case where the developer shouldn't have passed null and he should have used parameterized list of queries this is just one of the ways in which you could solve this problem this is a slightly more complicated problem than the open redirect but this was kind of the approach that we ended up taking but the most important one here is the fact that design patterns can really help one thing i do want to call out is that static analysis tools are just one part of the puzzle anytime we're thinking of a problem from security perspective we do want to think of people processing technology so the technology aspect is solved by the tools but we also want to look at

areas where you can leverage people and process because that's when you end up coming up with a full-fledged solution that can scale over time across all your teams and the last step in this overall process is just doing this over and over again as you can remember we started small we are here with a reasonable baseline we're going to go back expand the number of vulnerabilities expand the categories until we are done the most important part of this exercise is actually not this iteration but the end result of it we want to come to a baseline that brings us to a point where we are comfortable that we have reviewed everything that we wanted to review at

the start maybe it's just the rate of criticals that's just the list of highs just a list of 10 categories or over or stopping whatever it may be because once you have this baseline now you're at a point where every new release of this application is going to give you a new set of findings that you need to review but it's going to be a really small subset rather than reviewing an entire repo with hundreds of thousands or maybe even tens of thousands of issues depending on the size of the project right so the real value in actually implementing static analysis as part of your sdlc is not in the scans but in the baseline

and what you do after you create your baseline so this was all great we were able to really know if this really works awesome in reality we know that's not true right security teams are always struggling to keep up with the number of findings that are coming from a number of tools and data sources that we have available static analysis is just one of the pieces in that puzzle so does it really help us do our job better let's take a quick look at a typical static analysis uh testing cycle to understand this right so i'm just giving a couple of examples i'm pretty sure this isn't exhaustive but the first model i'm thinking of is

where you're committing code to a branch or developer as a committing code to a branch you have a pull request created hopefully they're not committing directly to the main branch uh a bunch of folks go in give some peer reviews the request is finally merged to the main branch as soon as the merge happens it triggers the static analysis scan because it's built into your build pipeline and then you have the results come in and you can start reviewing them that's uh one typical model the other one maybe you have a very large repository it's too expensive or just too long to be uh triggering scans for every single pull request in those cases you would just have a

scheduled scan right maybe you're you're pulling from main branch you're scanning it once a day once a week once a month whatever it may be depending on what's what's most convenient and then eventually you can still review the results fall bugs and make sure that everything is valid however in reality what happens is that the product actually gets released before we as a security team go in to review the scan results so sure we were able to shift left in scanning our code but we really couldn't review the results so did we effectively shift left here uh because we're still finding those issues after police it's as good as doing a pentest at the end of it or any other

security control you have at production time not in build time so what can we do about this can we shift left any further than where we are with static analysis this brings us to a concept of proactive scanning practice scanning has been catching up for a while um the idea behind this is quick time to results from static analysis tools and when we see quick uh we have two different contexts here one is quick in terms of the speed of the scan the scan itself has to be as fast as possible typically seconds if not minutes but not more than that and then quick in terms of as early as possible in the sdlc uh and when we say early

even build time is not early enough we want to be there at development time the most important thing here is that we need to be in the same context that the developer is in because typically when we find vulnerabilities after the developers done with the feature they're done they've moved on to another project so to come back get that context of what was going on to be able to identify the fix for whatever bug we've fired it does take a there's a cost associated with it so we we do want to be in the context of the development so now let's see how proactive scanning changes that pipeline that we were talking about so as usual

developers are going to commit go to a branch create the pull request but as soon as a pull request is created you have a static scan running on the pull request and now you have the results from the pull request which means it's still in the context of the developer itself they can change the pull request immediately with the fix you can do a remediation scan immediately because we mentioned that the scan has to be really fast once we know that the fix is in then we merge domain so here we have truly shifted left because any code that goes into main we assume is safe enough obviously it's not going to be an exhaustive list of

uh issues that we can find but at least things that we can identify and things that are very important for us to identify this early we should be able to do that at this point this brings us to an interesting question is this the most we can shift left right we're talking about pull request here can we shift even further left what about uh scanning the feature branch even before the pull request is created uh or maybe even further left what about identifying bugs at the ide when the developer is literally creating the vulnerability for the first time right um a couple of challenges there uh with the feature branch one of the things is

different development teams use different branching strategies people have a feature branch you could have another branch from that feature branch you could have independent developers having their own branches that merge your future brands there's a ton of ways to do this right and as a security game we do we don't want to be dealing with all of these methodologies keeping corner cases into consideration and so on it just doesn't help us with the the coverage itself of the of these issues and at an ide level there are actually tools that help us do this uh there are vendors that offer id plugins that can be installed but again it means that we're chasing developers you know making sure that

they are installing these plugins keeping them updated uh they're using them the right way and so on so pull request i believe is a reasonable compromise where it's early enough but it's still at a central point where as a security team we have a certain amount of control over a single point uh so regardless of what branching methodology you use you do eventually have to come into the main branch so having that level of confidence uh seems to be a reasonable compromise at this point there's one caveat i want to bring up with respect to static analysis we've been talking about data flow analysis all this time but that's just one type of rule so with

full request if you think about how data flow analysis would work we typically don't have access to the entire data flow right if the pull request is a brand new feature that has all the new files with the source and sync and everything involved in that sure you should be able to do a reasonable amount of analysis there but what happens if it's a change to an existing feature right you could have just the source in the pull request but the sync might already be uh in the code right so it's not in the pr anymore uh or you couldn't you won't see the sources and syncs but uh you might be introducing a new cleansing method or

invalidating an existing cleansing method by mistake right uh these are not these are not going to be caught by a data flow analysis on a pr because you just don't have enough code in there so there's another type of rule called a structural rule all the structural rule does is it verifies the semantics of the code that you have at hand it could be a single line of code it could be an entire method it could be an entire class it could be pieces of code but your rules would define these semantics and you should be able to run these semantics on these these semantic rules on your pr it's not going to be exhaustive

obviously it's not going to be as it's not going to be able to cover a lot of the vulnerabilities that data flow analysis would cover but it does help with those initial goals we were talking about which is fast scans and fast times results so some of these low hanging fruits that might get missed or just you know lost in oversight we should be able to catch them with proactive scanning restrict with structural rules so let's take a quick example of what i mean here so um i i know i can't see you guys here but i'm pretty sure you're cringing when you see craft so grab us a tool that all of us use

developers and security folks are like uh security especially if we want to take a quick look at you know what something's happening in a code that we are new to we would just run a grip to look for a specific method call or a pattern right if you are a team that literally has no static analysis tool at all in your build pipeline at least grug is a good start right it's not the best option but it is super simple to get started in uh i'm going to be using a single example across the next couple of slides here which is looking for insecure connections established from your code uh so we do want to ensure that any uh

tls connection starts with dls1.2r3 or sorry 1.2 or 1.3 not anything below that so that's the goal of these rules here so i just have an example here it's looking for java files because this was a java example and it's basically saying hey if you're not using tls1.1.3 report this it's not the perfect rule it's going to have some false positives because it's going to be reviewing commons as well but at least we have a start pmd is a more established open source static analysis tool it's been around for a while it supports multiple languages it does have a few out of the box rules but it also allows you to create your own custom rules

and what's even cooler is that it allows you to define your own custom language so if you have proprietary languages developed on xml yaml json whatever it may be you can actually define these languages in pmd and then create custom rules to find vulnerabilities on your proprietary language which is awesome so what pmd does internally is it creates it it creates this data structure called an abstract syntax tree the asd the ast is just a tree structure uh and you should be able to traverse this tree or this structure to look for vulnerabilities you can do structural analysis in a straightforward way uh in the future you can also do data flow analysis uh there are ways

because all you're doing is you're just traversing the tree so i have an example here this is a structural rule again for the same thing uh it's looking for uh sslcontext.get instance in the java um code and it's looking for cases where you're not using kls 1.2 or 1.3 so the rule does look a little complicated than obviously but the nice thing is pmd also offers uh what they call a rule designer that actually parses out any sample code that you can provide and it gives you the asd structure so uh you don't have to actually remember all that constructs in that rule it's pretty straightforward the last tool i want to touch upon here

is sengren this is short for semantic grip it's been around for a while but now it's actively maintained by r2c uh it is basically a glorified version of crap so you're still doing searches on your code however it does have some understanding of the semantics of the language you're scanning so it will exclude comments and so on and it is trying to look at executable code as much as possible uh the rule patterns here are again very straightforward uh you're just creating like grub style patterns to look for things that do or don't exist so as you can see i have an example here where it's looking for base again ssl context.get instance but it should not have tls1.2 or 1.3

in the arguments in there so they have a few straightforward examples uh on their website but there's also a nice community where uh you can get some more custom rules or even develop your own topic and the nice thing is it's really fast so what am i trying to say in conclusion uh so static analysis tools have evolved over the years uh they've come a long way my goal here in talking about the last three open source uh tools is not a comparison between the three but just a starting point for what's available out there it is by no means an exhaustive list there are a ton of proprietary and open source tools out there that you can use

there are multiple ways to get started so i'm hoping that what i offered here gives you something to start off if you're new to static analysis or if you already have some good tools and processes hopefully this will help you mature into a better process over time but regardless of everything said and done the best process that you can roll out in your organization is the one that fits your sdlc uh it doesn't matter how many tools how many vendors are out there no one can claim that they have the best option that fits your needs that's up to you to decide so what's best for your team's culture what's best for your engagement with the engineering

teams where do you really want that feedback uh between developers in between your tools and how your build pipeline is set up all that matters in finally deciding the right tool and the right process the one that fits your sdlc thank you for the opportunity uh feel free to share experiences uh on twitter uh and uh i'm around if you need uh if you have questions thank you

you

A SAST Story: Effectively Adopting Static Analysis for Profit

Related talks