← All talks

The Power of Guardrails: How to slash your risk of XSS in half

BSidesSF · 202225:14265 viewsPublished 2022-07Watch on YouTube ↗
Speakers
Tags
Mentioned in this talk
About this talk
Colleen Dai and Grayson Hardaway present research on secure guardrails—safeguards that make it easy for developers to write secure code and hard to write insecure code. Through analysis of 125 open-source repositories, they demonstrate that guardrails can prevent over 50% of cross-site scripting vulnerabilities, and discuss why shifting from bug-hunting to eliminating bug classes represents a more scalable security strategy.
Show original YouTube description
Colleen Dai • Grayson Hardaway - The Power of Guardrails: How to slash your risk of XSS in half Why do the same security bugs keep popping up repeatedly, those we all know from the OWASP Top 10? We believe the future of security lies in eliminating vulnerabilities by using secure code defaults and present a study showing that secure defaults can significantly raise a company’s security bar. Sched: https://bsidessf2022.sched.com/event/rjqc/the-power-of-guardrails-how-to-slash-your-risk-of-xss-in-half
Show transcript [en]

um welcome to this session in which we are going to learn about the power of guard rails how to slash your risk of cross-site scripting in half from colleen dye who is a security software engineer at r2c and from grayson hardaway who is a security researcher at r2c so take it away guys [Applause] awesome sorry for the technical difficulties um cool i'm colleen and together with grayson we'll be talking about how to use secure guardrails that encourage developers to write secure code that can actually decrease your risk of cross-site scripting significantly so a lot of companies have around 100 or more developers to every security engineer and it can feel pretty easy to be buried in vulnerabilities and feel

like you can't really keep up but some companies have actually been changing how they run their security programs and it seems to be working pretty well they're higher leverage they're more effective and they're keeping their company secure even while keeping the same ratio of developers to security engineers so today we're going to be talking about how these companies are doing that and we're going to be talking about secure guardrails in particular secure guardrails are essentially safeguards that are put into place to make it easy for developers to write secure code and hard to write insecure code so we'll talk about how guard florals can actually pretty drastically reduce the number of vulnerabilities you have to track down and fix based on some

original research we've been doing examining real crosstalk scripting vulnerabilities in open source software so here's a brief summary of how we're trying to approach security so we used to really think about eliminating individual books through pen testing bug bounties etc but now industry is trying to shift towards killing bug classes and secure guardrails in general and this achieves pretty scalable systematic and long-term wins but how do we actually know this is effective well we performed some research to determine whether secured guardrails is effective in practice and it has shown us that secure guardrails could have caught greater than 50 of the cross-site scripting vulnerabilities in our open source repositories so who are we um i'm currently i'm

colleen i'm currently a security researcher at r2c and i graduated stanford in 2020 with my bachelor's of cs and my master's of statistics and i'm grayson i'm also a security researcher at rtc i've been there for three years in the past life i worked for the us department of defense doing all manner of different things from low level stuff to big data analysis cool thanks grayson and what's r2c well we're a static-based stag analysis startup and our mission is to improve software security and reliability so here's a brief outline of what we're going to be really talking about today so we'll first discuss why secure guardrails are important and then we'll talk about how we tested whether secure

guardrails is effective in practice what we found and what we've learned since we've conducted this research so i'm first going to talk about the os top 10 which we really can't escape from these days so more specifically i want to really compare the os top 10 in 2017 to the os top 10 in 2021 you might see some similarities so if we look if we take a closer look we can see that seven out of ten of these vulnerabilities have remained in the old's top ten in 2021 that were the same in the oil top 10 in 2017. some of these might have been renamed some of some are in their original form some might have been categorized with

other things but they're all there still so given that all of these vulnerability classes have stayed pretty much the same over four plus years we really need to take a good look at what we've been doing in the past and what we should be doing in the future so what we've been doing in the past has been what we call playing bug whack-a-mole which are pen testing bug bounties and finding individual bugs and what the industry is now trying to move towards and what we're really advocating for is secure guard rails which involves continuous scanning enforcing safe patterns and making it harder for developers to write and secure code but you might ask actually the evidence that this is what

we should actually be moving towards so you're probably thinking we've been finding bugs for ages why should we change what we've been doing um well we'll first give a few examples of what some forward-thinking mature security programs are doing and then we'll go into some research we did in preventing cross-site scripting in the wild so let's first take a look at what a few companies are actually doing so let's take a look at microsoft and explicitly let's take a look at what microsoft did when they were transitioning from xp to vista so they were able to get 41 percent of vulnerabilities reduced just from banning third copy and functions likes their copy and it's not just microsoft that's been

doing this netflix google and some other big companies who are forward thinking security have also been doing similar things so netflix gave a talk during appsec cali about how they're moving towards a paved road and killing bug classes and also google has stated in one of its books which is building securing reliable systems that it's pretty unreasonable to expect any developer to be an expert in security reliability and writing good code so a better approach is to actually handle security and reliability in common frameworks languages and libraries and ideally these libraries only expose an interface that makes writing code with common classes of security vulnerabilities impossible and addition to what these companies are doing we're going to reference our

research which finds that using secure guardrails could have prevented 59 of the 100 for the instances of cross-site scripting in our open source dataset cool so now that we've talked a little bit about why secured guardrails is actually important we'll talk a little bit about how we tested whether secure guardrails actually works so our research question that we wanted answered was are secured guardrails actually effective in practice and we needed to translate this into something that could actually be tested so we translated this into a methodology which is how many instances of crosstalk scripting in open source software could have been prevented with the use of secure guard rails so in order to make sure that our

research didn't actually last like 2-3 years we needed to determine our scope so we chose cross-site scripting in particular because it has high impact you can see that's been in the oauth top 10 for for a large number of years and also if you look at the bottom of the slide we can see that the hacker one top 10 most impactful and rewarded vulnerability types includes cross-site scripting in 2020 it's also relatively common and a lot of web frameworks have pretty standard cross-site scripting mitigations and this is important to us because this means that there's concrete code patterns to look for and therefore we can pretty easily write rules for this and finally and last but not least we

wanted to prevent cross-site scripting in our own code so after we determined that we wanted to look for cross-site scripting we then determined which languages and frameworks that we really wanted to look at so we chose these frameworks mainly because they're popular and it includes java.js so after we determined our scope we then went into um getting a bigger idea of what architecture we needed so here's a brief overview of what we did in order to run our research so we first wrote some cross-site scripting checks with some grub because we were most familiar with it and then we collected some open source added with bigquery and github we later filtered on relevancy and made

sure that we only retained the languages and frameworks that we really wanted to review and then we ran our cross-site scripting checks on stem grip with on on these commits and then last but not least we tried the results cool so our first step was to write some checks to detect some violations of cross-site scripting guardrails so we used some popular security guides some documentation for frameworks purposely vulnerable apps and our own security expertise to really define patterns for all the way that cross-flight scripting can occur so here's an example of a cross-site scripting policy we developed for flask and we actually run this on our own code and a goal of a policy like this is

essentially to say if you follow these recommendations you're pretty unlikely to have cross-site scripting so you can check out our crossfit scripting rule set and all the rules that come along with it if you want to use it to analyze your own code and after we wrote these cross-site scripting checks with some grubs we then moved on to the next step which is collecting some open source data we actually looked into two sources so we first look into bigquery because it's a pretty massive data set it's easy to mine and it's also easy to obtain obtain the parent commits and we'll talk a little bit about this is important later however it was last updated on march

20th 2019 so we really wanted to retrieve some up-to-date cross-site scripting information we did manage to obtain around 5000 commits with bigquery but we wanted to make sure that we obtained some up-to-date cross-site scripting and this really led to us really led us to the github search api so some pros of the github search api are that it's up to date data and there's a lot of commits with a lot of information to grab um but it was really difficult to actually obtain the parent commits and this was because we had to script in order to obtain these and also of course the api provides only up to a thousand results for each search so we had to get really creative in

order to bypass this awesome so after we obtained some open source data we then filtered on relevancy by eliminating the languages and frameworks that we didn't really want to look at um and then we proceeded to run rules we ran our xss checks that we wrote with some grip on this open source data so here's a diagram of the commits we retrieved and some other information that we decided we needed so the bottom commit is to commit with the fix something something crosstalk scripting something something in its message the commit where the actual crosstalk scripting vulnerability was fixed and the parent commit is the commit where the crosstalk scripting vulnerability has not yet been fixed and

we run our some group crosstalk scripting rules that we wrote on the parent commit because the vulnerability still exists there and we retrieve the git difference between the parent commit and the commit with the fix something something crosstalk scripting something something message so we can see whether we actually caught the fix and here's a brief overview of our scripts that we ran in order to obtain these results so we first went up a ec2 instance and we wrote some scripts that first took the json output from bigquery or github downloaded the repositories that we needed and then retrieved the differences between the parent commit and the commit with the fixed something something exact message we then proceeded to run the cross-site

scripting rules on the parent commit and then save the get differences and some group results in the lightweight database so that we could retrieve it later in our triager so finally our fourth step and final step in this architectural diagram is triaging so we actually classified the samko results for each commit as a true positive or a false negative and a true positive means that some group detected the cross scripting vulnerability or some group detected a cause that led to cross-site scripting and a false negative was everything else and grayson will actually give a few examples of these thanks colleen can everyone hear me i see thumbs up great so what did we find let's dive into the

results so as colleen was talking about just a second ago we had two cases that we consider to be true positives the first one is detecting the direct fix and so this what you see on the screen is a rails example where the cross-site scripting vulnerability was um in the application in an erb template that was using html safe if you're not familiar with it html safe does not mean that it's safe it means that you think that it's safe for rendering and it'll just spit whatever html is in there onto the page which makes it vulnerable not safe so you can see that the fix here was to remove html safe from this expression and then as an additional

measure to sanitize it in order to make sure that doubly make sure that it doesn't render something unsafe and so the same grip rule that we had written for it was able to detect this is a the semgrep output on the command line was able to detect the html safe and so this is an example of a true positive where we detected the fix directly so if you're interested in the rule there's a short link on the slide so you when we make the size available you can go check that out the second case that we consider to be a true positive is detecting a cause and so this is a django example where the

body of an html email was not properly escaped it was just put directly into an email the fix for this one was to add some escaping to it as you can see but the real problem was that auto escaping had actually been disabled globally for all email templates and so normally in django you can uh they are escaped by default the setting for this one had disabled it globally and that was the actual cause which a sem grip role was able to detect and so we also considered this to be a true positive we also had many false negative examples which are things that we did not detect so i wanted to give you an example of

that so this is an express application where some html is being rendered using a javascript template string and some user data inside that javascript template string so we were not able to detect this fix which just encoded some of the data because at the time syngrep was not able to introspect into template strings to understand that it was html and that it was being rendered as html so the overall results here are that we could have prevented 59 we were able to detect 59 of the vulnerabilities the total number of repositories that we ended up with after all of the gathering that colleen mentioned was 125 we had 140 distinct commits so several repositories with multiple cross-site

scripting fixes in it the number of detected true positives was 82 for a total of 58.5 7 rounded up 50 59. this is a chart that sort of summarizes all of the data the magnitude of the bar or the number of commits that we had for that framework the red or the false negatives and the blue or the true positives you'll see this chart a little bit more later so i'm not going to dwell on it this is the same thing in tabular format organized by detection rate if you're interested in that as well so some takeaways from this are that we believe that secured guardrails can work from a prevention perspective so there's a 59 detection rate and an anecdotally

just looking at react for instance we only had one reactor rule that was checking for dangerously set in our html there was a 41 detection rate in react just looking for that this is very similar results to what microsoft got banning stir copy so i think an identical percentage actually um having no guard rails in your framework whatsoever significantly leads to significantly more continuous vulnerabilities that are getting introduced and so the primary culprit that i want to point fingers out a little bit is java and jsp templates where the template engine is not safe by default nor is the safe path very obvious in order to make it safe you either have to explicitly escape everything or you have

to load in a third-party plug-in basically to make it safe by default so the amount of effort to make this safe by default is very high which means it's very easy to do it incorrectly which means there's a very high detection rate in here anecdotally as colleen mentioned earlier we run the flask rules that we developed for this on our own code and it's prevented two known cross-site scripting vulnerabilities from going out into production which is super cool um moving away from the uh sort of like vulnerability results some language specific insights which we found very interesting uh are on the slide so one of them was that xss was really hard to find in go

uh out of all of our searching we were only able to find three commits out of everything that we searched that had fixing uh cross-site scripting in it we have a couple of hypotheses for this one is that maybe it's really hard to do and go go tends to be pretty secure uh generally from when we've looked at it in the past um or maybe just not very many web apps were written in the frameworks that we looked at um we're not super sure on which one it is but it was pretty interesting to find um the other one is on the other side of the chart uh which is that outside scripting occurs most often in

client-side javascript by a huge margin as you can see by the bar um 44 of the commits that we had were in client-side javascript that's 31 percent of our entire corpus and then of this of the 44 commits 17 of them were improper assignment to inner html so if cross-site scripting is an issue in your organization this is the place to start looking for it look in your client-side javascript look specifically for unsaved usage of interhtml we did this research probably a year and a half ago during the covid times and so we have learned a lot since then that we would be remiss not to tell you all about at this time so i want to talk

through a little bit of this as well so we are strong advocates for guardrails we think that it's a really good way for apsec engineers to scale their influence and sort of mitigate risk holistically in a lower effort way than continuing to search and search and search for bugs however this uh principle is great in theory it's really hard in practice so we have tried to bring this um discipline out into our own company and and other companies um and what we found is that even companies that are bought into this principle that are bought into secure guardrails really struggle with the implementation the reason a couple of reasons for that are that guardrails intrinsically

produce more findings which means that they're a few more things to triage and so if you are only worried about the things that are really hair on fire um guard rails is more like taking your vitamins and doing exercise rather than taking a painkiller and we found that um many times people just want the painkiller and so um the uh guardrails tend to be lower urgency because they're not like explicitly vulnerable right away company politics is actually another big one financial and security benefits can be hard to quantify why implement guardrails when i'm already measuring my vulnerability reduction for instance and then the third one is that if we're asking devs to use something different this is especially true at

code review time it incurs a cost and so we actually have a hypothesis right now that guardrails might be better implemented in the editor sort of akin to spellcheck where um you can apply it as a suggestion rather than as something that you need to address as part of your code review and so we're actually going to be experimenting with this over the next few months um some of the solutions that we're trying to do to make this a little bit easier or the pill easier to swallow it's adding additional context to our sum grip rules so instead of just sort of flat saying definitely do this definitely don't do this we're trying to add some additional

context a lot of features have been introduced in simgrep recently such as taint analysis so we're using that to sort of weed out things that are definitely not urgent we're also experimenting with interfile taint analysis um use with something called deepsome graph that we're calling it which is currently proprietary we're also collecting um feedback so we have mechanisms in semgrep to report to us false positives and false negatives which is really neat and we have a feed of all those things we check them constantly and we address them very quickly usually when they come out if you are interested in using these in your organization they are all all of the rules are available on our public docs

so the website is up there some grip.dev slash docs slash cheat dash sheets all of the frameworks there have have policies there if you're interested in learning more about how to roll out secure guardrails in your organization check out this talk by our colleagues clint gibler and isaac evans which was done at global appsec sf2020 in conclusion we think that guardrails has a really positive like detection rate where we really believe that it can help appsec engineers scale out there are definitely some challenges with rolling it out in a real organization that we're still trying to solve that many of you may be trying to solve as well but yeah we believe in its future and we

really hope that we're able to make software more secure for everyone so that is the end i believe we have a few minutes for questions so we can take those now we've got four minutes for questions all right in the back

yes the question was could we give you a sense for how many some grip rules we had to write for each policy um sort of spitballing but on average i would say about 10. there's about 10 rules per framework um give or take depending on the actual language and framework mitigations yes

the question is what about false positives basically so for this research we were specifically focusing on the detection rate so we're really only looking at true positive and false negative rate so false positives are definitely a consideration which we alluded to in the more findings section as we've tried to roll it out at organizations so it is an issue and it is an issue that we're actively trying to look into um the rules themselves there's sort of two ways to measure false positives one is with the intent of what the author is trying to find so if the rule finds what you're looking for then like that's a true positive which is you know basically is it wrong is the

rule wrong that would be a false positive but the one that's more relevant i think is does a developer take action on the finding and if they don't take action on the finding then it's effectively a false positive because it didn't fix anything this is the more interesting one and this is a really difficult problem that we're still trying to address so

um in the middle

do i have a sense for why we have many false negative negatives for express um one of the one of the reasons well for some grip javascript has traditionally been a hard problem for us because the only way to really detect like whether you're in express is to look at the typical route pattern construction which there are other ways in which that can be constructed so that's one issue is that the sort of information that we have to anchor on for express typically is a little bit difficult to find precisely the other reason is that is like the false negative example that you saw on the screen there were a lot of cases where html was

just being rendered sort of directly but some grip at the time didn't wasn't able to understand this is html embedded in javascript and so that uh case came up pretty frequently and ended up causing some problems the other one is that there were a handful of commits that were actually json apis that were being fixed for cross-site scripting issues and so those are a little bit harder to detect because there's no html necessarily associated with it so there are a lot of safe ways to do json apis there's a couple of unsafe ways and depending on the perspective of the developer the security engineer they may consider it unsafe and produce a fix for it that we weren't able to detect either

time one minute any other questions yes

the question is did we try a similar approach for other vulnerabilities we have a couple of policies that are similar to this but we haven't done we didn't do the full fledged research because it takes some time so we have it for command ejection and insecure transport like http over https instead any final questions great we'll be around uh please come talk to us um yeah thank you everybody for uh coming