The Smart Fuzzer Revolution

Name: The Smart Fuzzer Revolution
Uploaded: 2017-02-15
Duration: 46 min 55 s
Description: The last 2 years has seen greater advances in automated security testing than the 10 before it. AFL incorporated known best practices into an easy-to-use tool, the DARPA Cyber Grand Challenge provided a reliable competitive benchmark and funding for new research, and Project Springfield (aka SAGE) i

BSides Lisbon · 201646:554.9K viewsPublished 2017-02Watch on YouTube ↗

Speakers

Dan Guido

Tags

CategoryTechnical

StyleKeynote

Mentioned in this talk

Tools used

American Fuzzy Lop Eclipse Spike Understand Visual Studio Code

About this talk

The last 2 years has seen greater advances in automated security testing than the 10 before it. AFL incorporated known best practices into an easy-to-use tool, the DARPA Cyber Grand Challenge provided a reliable competitive benchmark and funding for new research, and Project Springfield (aka SAGE) is now available to the public. These new technologies have the potential for massive impact on our industry. How do these tools work and what sets them apart from past approaches? Where do they excel and where are their limitations? Is it possible to use them today? How will these technologies advance and what further developed is needed? How much longer do humans have as part of the secure development lifecycle? I will discuss answers to these questions and more in the "The Smart Fuzzer Revolution." About the Speaker: Dan Guido leads the strategic vision for Trail of Bits’s products and services, and manages its day-to-day operations. Dan prioritizes work on automated, scalable tools that make a measurable impact for elite organizations ranging from Facebook to DARPA. Since founding Trail of Bits in 2012, Dan has built the company with people that span the gap between academic research and real-world problems. He pushes his team to study complex computer science topics, and modern attackers’ tactics, techniques and procedures. It’s through this approach that Trail of Bits addresses the root causes of its clients’ challenges, and develops tools that make a lasting impact. When possible, Dan prefers to share the knowledge those tools embody, and to open-source them to the infosec community for use, maintenance and improvement. In addition to his professional work, Dan helps moderate Reddit Netsec, organizes Empire Hacking, and supports ambitious startups through hack/secure’s advisory board.

Show transcript [en]

well uh I guess I'll get started now uh welcome to the beginning of bides Lisbon uh I am Dan Guido and I'm here to give first keynote in the red track so if that's what you're here for you made it the right place uh so I am uh fleu in all the way from New York today um to talk about the smart fuzzer Revolution uh I wanted to put into context some of the research advances as well as some of the Practical advances made in the state-of-the-art of automated bug finding of the last two years um I work in this field and I've been incredibly excited by this and I want to share my excitement with all of you and show you

not only how you can take advantage of this kind of new technology but also how you can contribute to it uh so I work at a company called trail of bits uh rather I founded it a few years back um we specialize in research and development of tech technology like this uh it really excites us to work in the most uh Advanced parts of uh computer science and to merge that with computer security and to make it practical for people not only in uh you know government and research uh Fields but make it practical for people in commercial industry as well um so there's there's been a few blog posts that we've published where we've disclosed clients of ours like

Facebook and Amazon where we've worked on analysis tools that help them find bugs or help them secure their networks uh but you can read more about us on our website if you're not familiar so the agenda for today uh I want to talk a little bit about the history kind of where fuzzing came from where automated bug finding techniques came from uh what people have done in the past and why what we're doing today is so much different um I'm going to discuss the current state-of-the-art uh the last two years uh what has happened to really move the ball forward and make these techniques and uh make these uh well yeah make these techniques more accessible uh to regular developers

as well as make them significantly more effective than what came before it um and then finally I'm going to talk about what problems lie ahead of us because we are not there yet we don't have the one true bug finding system that can just replace all of us with a machine um but we can get there and there's a few different research roadblocks there are funding roadblocks there organizational and developer roadblocks that we need to overcome that all of you can help me with uh in order to get there um so yeah the summary here is we are in an automated software testing Renaissance uh it has never been better to work in this field than

today okay so smart fuzzing I I just need to define a term before we go on uh fuzzing is filled with all kinds of really opaque uh terms um smart fuzzing well sorry fuzzing is generally just trying random inputs right uh so smart fuzzing is what if we analyze the program under test and instead of trying random inputs we look at inputs that are likely to get new code coverage and otherwise touch more internal state in the program based on what we know about it so smart fuzzing enables us to use an entirely new domain of knowledge to hyperfocus the activities that we undertake to test that program uh fuzzing deals with huge numbers of

combinatorial combinatorial um State explosion put this on silent um and in order to deal with that we need better techniques to uh narrow down that path narrow down the number of uh inputs that we test to only the ones that actually get us code coverage only the ones that let us explore new state smart fuzzing is how we get there okay so history um in the beginning right um 1998 uh fuzzing was sort of accidentally termed uh in a class assignment for an advanced operating systems course at the University of Washington uh a professor asked his students to uh write a program that thre random input at other Unix command line utilities uh essentially what we're talking about here is just

catting devie random into the program uh we're not concerned with State we're not concerned with uh structure we're not concerned with whether the program even expects to get anything like what we're giving it uh we're just flinging crap at it and hopefully it breaks um surprisingly that's a really effective strategy uh that worked for about like 10 years and then people decided maybe we should improve upon this a little bit and in the early 2000s we started to see people come up with generational based fuzzers where we generate some kind of data structure that is what the program expects um and you see this occurring in parallel among number a number of different research groups people like

the uh University of Olu in Hawaii uh created protos and Dave itel self-published his seminal paper on block-based fuzzing uh which later made it into that great chapter in the Shell coder's handbook and he built Spike um so these basically allow you to construct some kind of protocol or a file format using a BNF and then annotate what types of data are expected in all those different fields and then the fuzzer would make smart choices about what what type of random data to put into those fields and that was pretty good um at the same time we also looked at mutation-based fuzzing so uh this is really resource intensive unlike I mean I guess m generation is resource

intensive as well you have to sit there and create a spec in a BNF and like test it to make sure it's what the spec actually says but with mutation we're assembling a giant Corpus of legitimate data we're we're downloading all the PDFs on the internet putting them into a giant directory and then mutating and flipping all the btes in them and running them through all the different PDF readers that we've got right that's mutation um that's going to find some bugs but the only bugs it'll find are the ones that are exercised by the Corpus that you generated the Corpus that you collected um and it's it's difficult to really guide a mutation-based fuzzer down certain paths

in the program you have to work with what you've got so what's common to these all of these is that we're simply applying random mutations to well-formed inputs except in the random case I guess to well-formed inputs and observing the results we're just spending more time constructing what we're going to send to those programs and then hopefully the program that is under test crashes more often uh now along the way there's been plenty of iterative advances uh we've tried to improve upon this basic strategy in every way we can uh We've created you know we've measured code coverage and created minimum input sets so instead of randomly running every PDF on the internet we can use a tool that

identifies which of them uh get you different kinds of code coverage and then eliminates the ones that are redundant from that test Set uh we can reuz crashing inputs this is what a friend of mine calls a rocket propelled chainsaw so once you find I like that term too uh once you find a uh a crash once you find a valid crashing input or rather an invalid crashing input um you can explore the state around it explore the code around it by continuing to fuzz that crashing input um generally bugs cluster together right uh we can try to categorize crashes with uh hus stics such as Microsoft's bang exploitable that was released many years ago uh

which some people love some people hate but um regardless it did Advance the field a bit um and then back in 2005 2006 the precursors to Cle uh came out XE and dart uh they were some of the first attempts at combining concrete execution with symbolic execution and managing the state explosion through path pruning with STP um and then finally we've got uh better smt solvers than we had many years ago like Z3 right so these were all just iterative advances where we made some more progress on the general strategy of just taking wellform inputs mutating them and then blindly observing results so in 2008 there is a clear delineation where dumb fuzzing ends and

smart fuzzing begins and that is delineated by something called sage that Microsoft made I I actually did not realize we' be AIC moft building when I gave this talk um stands for scalable automated guided execution and it was published in a paper named automated white box fuzz testing it's really one of the seminal Works in this field um it's a fairly accessible paper if you want to go read it I really appreciate uh them stating what they did so clearly the key advance here is what they did is uh they they combined fuzzing with symbolic execution to dynamically generate new tests um while managing the state explosion problem so there's three steps here right they run the program

and trace it and symbolically execute that Trace uh turn it into symbolic values to figure out um what all the new constraints would be what you know this this integer value needs to be above this in order to pass through this gate and reach this next basic block then they use a constraint solver to figure out what new inputs would satisfy those constraints so they can generate those new tests and then they rank all the different new te the new inputs that they've generated with the con rint solver to figure out which of them are actually going to be most effective at gaining the most code code coverage uh so the combination of these two

techniques is that they can drill into a program significantly deeper with more Precision than a random fuzzer would be able to uh and Sage was uh wildly uh used at Microsoft during the development cycle starting in I think Windows 7 uh hugely successful tool and some of the reasons it's successful are are not what you would expect as a security researcher right so there's clear research advances here we can do very uh uh uh Advanced things compared to the uh fuzzers that came before it we can overcome simple check sums for instance that would have stopped any fuzzer before it cold um but really the key benefits with sage was that it was easy to use and it was

available um you know Microsoft made it work on binaries so there wasn't source code you didn't have to have a compile able set of source code in order to run it you just gave it the output of whatever your Dev cycle was um it worked on very large programs you could run it on Microsoft Office and it would actually complete in a reasonable amount of time it would find real bugs um it doesn't require the developer to sit there and come up with a whole bunch of sample inputs to give them a whole bunch of uh sample document files it could just start from nothing and find something um and then finally when it actually did work it found real bugs and

it gave you the inputs that uh caused those bugs and they had some analysis around it so that the developer was sure that what they were fixing wasn't just a a a ghost in like a a nulo reference or some kind of low value code quality issue it was a real security bug um and that's why Sage was really revolutionary the combination of all these things now at the same time uh there's more development going on in the open so Sage not available outside Microsoft AFL open source um in comparison you know AFL is not quite as smart it doesn't use symbolic execution the way that Sage does but it has the benefit of being incredibly simple to

install and use for a bug finding Tool uh the key contribution for AFL is that they use compiler instrumentation to make programs easier to fuzz um that that is really the the key thing that AFL got right it's also just a well-engineered piece of software there's a lot of uh really common knowledge out there about how to construct a fuzzer and mihal zooki packaged all of that up into a really nice easy to use tool um yeah uses code coverage to guide fuzzing it's somewhat fast but easily parallelizable um and the most important thing is that developers use it right unused tools don't find bugs it's kind of kind of obvious uh the proof here they have a

really nice uh little uh trophy case I think they call it uh this is as much as I could reasonably screenshot but it goes on from many pages below this uh I think there's one in here my intern found my my 16-year-old High School intern like AFL is really easy to use um I I can't stress that enough but it's really attacked hard Targets this is kind of a testament to how uh we're really only at the beginning of uh the fuzzer kind of evolution here where even even a dumb fuzzer can find bugs and things like SQL light and JavaScript core and SSL and open SSH uh you know it's really really impressive so to recap uh Sage is smart

uh it intelligently explores program paths but you can't use it uh AFL is dumb you can use it and it's still incredibly effective uh it'll get stuck on the same kinds of things that all fuzzers will uh like simple check sums just Bang its head against the wall so you've got a couple of different uh from a research perspective here which one is better like can we create something that combines the best of both uh why do we have to be satisfied with simply you know a dumb fuzzer and um how do we extract all of this incredible knowledge outside of Microsoft research and outside of Microsoft's development cycle into the into the real world not not the

real world but the rest of the world um so uh note here on on some terminology at this point in the presentation I start to flip little little bit from smart fuzzing to automated bug finding systems uh I just consider automated bug finding to be a super set a little bit of smart fuzzing um so that trips you up that's what it means so now uh we're in about 2012 2013 and DARPA has decided this is a problem they'd like to attack this is a problem that they would like to fix uh they're not satisfied with the status quo they believe that the public publicly available tools to automatically find bugs can be better can integrate more of

the currently available research that's out there and they created a program called the Cyber Grand Challenge the Cyber Grand Challenge is modeled based after the capture the flag contest that we're all familiar with uh except instead of humans playing there are only machines uh the machines have to do everything a human would do in a capture the flag challenge they have to uh they have to receive pones basically and find find uh vulnerabilities in them uh those vulnerabilities are called proof of vulnerability it's a crashing input of some kind or it might be a uh an input that gains some register control to demonstrate that there's uh some path towards exploitation for it um and it needs to fix that vulnerability

it needs to fix it in a performant way um it needs to you know not slow down the program a 100 times it needs to uh keep its services up so that other competitors can still interrogate that Serv and collect flags from it if they have a valid exploit um the competition was measured by a fairly complex scoring algorithm that I'll get into a little bit later but um all the crs's were given points uh the same way that a CTF competitor would and the CRS with the most points would win the contest right so that's that's the Cyber Grand Challenge in a nutshell um the really strange thing for me and tril bits was one of the

competitors in this uh was observing the the uh converging strategies of all the teams that played uh in the public presentations that every team has given since the Cyber Grand Challenge ended which was at def Defcon this summer um you saw them all combine the same basic elements you saw them combin fuzzing and symbolic execution in some way that they could communicate some of them sprinkled in other program analyses that uh were you know wellknown um nothing really revolutionary here um a lot of them tried to prioritize which input puts to test um you know all all these different everybody was concerned with resource control and allocation because the Cyber Grand Challenge scored you based on it

uh but the same general strategies for every single one um and you could prove this based on the the outcome based on the scores when you looked at the score in the final event in that live YouTube stream that they had um Everybody basically tracked very very close to each other the difference in points between first and seventh was really not substantial um now that's not to say it was kind of a failure what we did find though is we found that we came up on several really difficult research problems that we don't have good solutions to and these are all topics that you could spend ages uh working on um things like how do you efficiently

mutate inputs uh not just in terms of selecting them but uh what mutations to apply uh all of these every single one um could fill a PhD thesis and I hope that some of these topics are explored uh after the contest has ended now um now we kind of know what the limits are of applicable technology and like how far they get us and we need to figure out the solutions to these problems to advance the state-of-the-art again so I mentioned that we competed in this challenge uh here is one of the solutions that we came up with um so we had a really difficult problem we're a very small team we're a commercial company I have to pay salaries I can't

recruit a team of like 50 grad students that work for $500 a month um so we ended up having about three maybe sometimes four people working together on our team with no prior development that we could take advantage of so we really leaned on open source technology and we leaned on code bases like CLE like a python symbolic execution framework that we found on GitHub and then later hired the author of uh we um used tools like ramza for input generation and wrapped it in our own dynamic binary tracing engine that we wrote um but these tools were not built to talk to each other most tools that security researchers build work in isolation they Silo their data they

produce some kind of output on the terminal uh they don't accept all the same kinds of input uh it's really difficult to strap all these tools together into an integrated system so we we found very quickly that the universal input that all of these programs take is input itself the inputs to the program that it constructs uh so we modified each of those programs CLE Pi uh python symbolic execution framework um various other tools that we Ed to test for bugs um and we collected all those inputs they generated fed them back into a knowledge base and then in that knowledge base which we called minet ranked them for the minimum set of maximal code coverage uh for those

inputs um and use those to select which ones we would further test for and then once that inet evaluated what the best ones were it would actually share them with all the rest of the tools that way if a fuzzer got stuck on a check sum it would give the best available input that it had back to minet and minet might hand it off to a symbolic execution framework and the symbolic execution framework would overcome the check sum create a new valid input that gains more code coverage send it back to minset and then minet would send it to a to a fuzzer again and now magically the fuzzer was able to overcome that code

path that it was stuck on before um so this is uh kind of key to our strategy um it worked very well um but it's also very similar to the techniques that other teams came up with so the benefits here this challenge really um you know beyond uh constructing several practical tools that are now open source and identifying these research problems that we now must overcome to advance this field further uh cyber Grand Challenge was so effective because it got the right people in the room talking to each other um it got academics thinking about fuzzing again where they thought this was a dead field uh you don't see a huge amount of academic interest in fuzzing

um you know in in many many years uh they've all moved on to symbolic execution and other kind of programming languages related topics uh whereas fuzzing kind of just halted um after the the first initial attempts at at publishing stuff uh security researchers never thought that academics could be practical they didn't understand the value um that certain new papers had presented um only until they were pressured inside of a contest to apply them and make them work to to compete against other team members and then none of these people actually knew how to create distributed systems that would work at scale to find real bugs um I think probably the biggest challenge for us going into this competition because

we have a mix of academics and security researchers trailer bits we did not have distributed systems expertise and we definitely felt fell on our face pretty hard just trying to figure out ec2 and Docker and react and all the rest of these things that like regular developers understand very well um so most importantly uh cyber Grand Challenge Advanced the state-of-the-art and Bug finding problems we thought were Out Of Reach are now solved um and I'll give you one great example so crack ater uh crack at is a name for a very subtle vulnerability discovered in 2003 by markout um the the total sum of vulnerable code that we're talking about here is about 50 lines of

code but there is a state machine in it with so much hidden complexity that it is essentially impossible to find with purely static analysis uh the path explosion problem is particularly acute with this piece of code um yeah there there are more atoms in the universe uh sorry there are more there's more state in this 50 lines of code than there are atoms in the universe which is insane um the effect though the the impact and the reason why it's so important to find bugs like this is because the impact of this flaw is back in 2003 you could take over essentially every male server in the world so this is a this is a cliff

we need to scale in order to find bugs that are really important um you know we can't just let this be so it's kind of been Everest for a lot of these automated tools here is a shamelessly stolen and uh fairly uh simple uh illust of what this flaw looks like um it's a state machine it has a very simple job it validates email addresses um so it looks at these brackets and it looks at these parentheses and um you know increments or decrements a buffer based on what it finds now the essential vulnerability is that there is a pointer into a copy buffer that is not decremented when you receive an opening parentheses uh you

can go back and read papers upon papers upon papers about crack ater it's not really important for to discuss everything about it right now but the point is that out of the massive amount of State encoded into this program there is one input that causes the vulnerability there's one input that that can exploit this program um and it is incredibly difficult to find um however uh the point here is that we were able to find this vulnerability there are several systems that can now find this vulnerability post cyber Grand Challenge because it was put up there as a um as a test uh during the final event um so our CRS can find this and it

finds it through a variant of the analysis boosting I talked about on the last slide uh when we actually do get all those inputs one thing that I neglected to to to mention is that instead of Simply minimizing the inputs we also mix them so if we have one valid input and another valid input we might just cut them both in half and merge them back together again and that mixing can eventually result in in a crashing input for crack ater uh and we can reliably reproduce this behavior um not just in a lab but we actually brought a live demo a live demo of it to the last conference that I was at uh so this is

something that uh no one really thought would be solved by this challenge but was so the upsides here are all the automated systems found real bugs total success right uh some of the bugs were thought to be impossible like crack at uh the challenges so the Cyber Grand Challenge released um all of the the scoring system that they used for all the different automated systems um and they put it on the internet so that other people constructing analysis tools can measure their own tools against it as well um and then some of the bugs that were found in those challenge sets were actually unknown to the authors themselves so even when you're trying to construct the program that's maybe only

200 lines long or a th000 lines long or 20,000 lines long for the purpose of inserting vulnerabilities I'm sure many CTF developers understand that sometimes you unintentionally uh produce alternate ways to solve the challenge um and that that definitely occurred and the automated systems were smart enough to find even those bugs so brief aside here uh these challenge sets are one of the most important outcomes uh one of the most important developments from the Cyber challenge uh they are 247 C and C++ Network Services that Implement all variety of uh software they are not simple tests of vulnerabilities they are not like uh you know 10 lines of code that Implement a buffer overflow uh they

are complex programs that Implement custom Network protocols and image formats and all kinds of other systems um they don't Implement rfc's there's no way you can prepare ahead of time for them um and they all have one or more exploitable crashing vulnerabilities the all documented they come with C cwes so you know exactly what vulnerability you found uh they come with tests with not just valid inputs but also crashing inputs um they come with patches so if you're testing patch strategies you can measure against a reference set they come with performance measurements so you know if you patched it how much of an impact you had on the performance of that program um and they're all

available on the internet and we took them and ported them to Linux and Windows and Mac uh yeah no I mean cwe so common weakness enumeration yeah so the common weakness enumerations are um kind of a a taxonomy of potential flaws so you can have uh you know heat buffer overflow versus an integer overflow versus a stack buffer overflow versus uh you know type confusion and all these different kinds of things so every single program in this data set is annotated with the cwes that are embedded inside of it that way you might find a program that's incredibly good at finding type confusion bugs and you can test it against this entire data set and find uh

measure if you found all the type confusion bugs um but then as a analysis tool user I can demonstrate or rather I can determine that oh this tool only finds type confusion I need to augment my testing with this other tool that also finds uh you know buffer overflows and then with the two of them combined I get higher coverage um you know that way I I know what I'm finding and I know what I'm not uh this is really the first data set of its kind it's incredibly useful uh we have a blog post talking about it uh and it's aptly named uh your tool works better than mine there's a question mark there prove it um that's

what the challenge sets are really useful for uh so I highly encourage people to take a look at these and um try them out on their own they're actively developed by us at this point we're still making them work on Windows not quite there yet uh so the downsides here so cgc wasn't all you know roses uh most teams that built systems still require expert operation like I cannot I I have my CRS in a GitHub repo my team can run it if I open sourced it not many other people could um still requires a lot of expert knowledge to to hook up these things to real software uh the usability problem was not scored in the Cyber Grand

Challenge and therefore there was no real advancement made on that front that's fine it's not really darpa's concern that they they care about advancing this research it's on us and it's on everyone in this room to figure it out uh post challenge um the Cyber Grand Challenge was conducted inside of a custom operating system called decree it was made to be easier to analyze easier to reason about so that these competitor systems could actually find bugs uh they were worried that it would be too difficult to Simply throw like Firefox or something at these programs so they lowered the bar a bit to ensure that the teams could score points um they did that with decree and as you you

can see it's only got six system calls no files threads or signals uh so now you know it's it's difficult to take tools that were developed for decree Port them over to Linux where you have hundreds of system calls where you do have files threads and signals um and that's an ongoing effort for me as well I have tools that are still custom built for decree that I'm trying to rip out uh and then there was a lot of distractions um you know if we had two straight years to focus only on bug hunting um I may have gotten farther on that one specific problem but DARPA cared about more than that they cared about automated patching they cared

about exploit generation they cared about how these things fit into Network appliances they cared about deployment strategies they cared about IDs um I don't uh that's that's okay you know for them it's okay for me uh but um that was that was somewhat frustrating for uh for me as a competitor in this challenge uh so to recap here uh cgc gave a much needed boost to the field exactly the right time as this was really heating up um as people were starting to think about sage and uh what kind of inut impacts Technologies like that could have outside of Microsoft um it is now damn it now obvious uh that automated bug finding is is possible and

very effective uh we have prototypes available uh you know you can go on the internet now and you can download entire uh code bases that were funded by this challenge um and the challenge sets that were created for the cyberg Grand Challenge enable me to measure my tools against yours and we can have an honest objective conversation about the benefits of each so okay now we get into the predictive part of the keynote Where Do We Go From Here what's the next

step so these kinds of tools are going mainstream I I think about automated bug finding and smart fuzzing as where machine learning was 10 years ago uh I hope that 10 years from now we are in the same exciting space as they are today uh machine learning 10 years ago didn't have a solid metric for comparing uh techniques against each other um systems like uh SV comp uh came out and enabled ongoing competitions between different strategies for machine learning uh to help researchers make progress um and once they kind of settled on what worked and what didn't it was then possible to start engineering Solutions around a static Target instead of a Conant moving one uh

you've seen now that Microsoft and Google Microsoft finally actually took Sage uh allocated a team around it in order to bring it to the public and they called it project Springfield you can actually sign up for their waiting list to gain access to it uh only works on Windows software at the moment only works on binaries still fantastic uh Google has another program called OSS fuzz where they'll run lib fuzzer in the cloud against your open source software uh so these are really going mainstream it's possible to take advantage of highly automated hugely scalable uh automated systems of fine bugs um my prediction is that Apple may start soon uh in the cgc competition as our internal State we

used lmir uh we did that because the intermediate representation of the lvm compiler is made to be easy to reason about it's supposed to be used by a compiler to implement various analyses for you know optimizing code so it is a really great Target as well for analyzing that code for bugs um Apple has most of the lvm IR or the bike code or the bit code for um all the iOS apps that it has in the App Store um and it would be trivially possible for them to analyze that code to find really basic security flaws like API misuse and privacy issues uh and there's no reason they couldn't start doing this tomorrow uh they don't have to roll out software

they keep it all on the cloud because you've given that bit code to them um so they have a really good business case for this and they have all the all the lvm developers on staff you know they employ Chris latner the original author of lvm uh so I really think that um we'll finally see basically you know most of the major technology companies using this in practice uh internally and then potentially exporting some of that technology to the public uh yeah so I explained a little bit about this so we could automatically analyze apps for bugs back doors and API misuse there's no reason that you need to use humans or some kind of janky

Pearl script to you know assess all the applications that are coming in that need to get approved you could reduce the time for apps to get approved and go into the App Store from like one week to a couple hours um and once Apple does it everyone else probably has to do it too uh you know Google won't be able to rely on just bouncer or they dynamically execute the program and observe the environment they'll likely integrate some kind of static analysis as well and bug hunting technologies that were funded by cgc and created in this giant explosion of interest in this field um there are some Road Blocks here uh so I mentioned that it's really

difficult to use it's really uh uh unused uh tools don't find bugs um so analysis developers analysis tool developers must show software devs that tools can make their lives easier um we really need to give more thought about the user experience Behind these tools uh they are not easier to use than get and get is hard enough um we actually at traila bits have tried to offer services around automated bug finding to our clients and there's a lack of trust uh among the most Savvy clients out there in this technology uh we know that it works we know the benefits that it offers but uh the field that came before us is so rif with examples of programs

examples of tools that just produced millions of false positives and wasted so much time among their developers that many even security Savvy companies are unwilling to think about uh the current generation of tools um so AFL shows us the path forward here uh really starting with a simple dumb tool and then slowly improving it over time is the strategy that works uh and something that requires expert operation is going to die in its tracks uh this funding roadblocks as well if you're working in this field there's really only two places that you can get money from uh you can get money from DARPA and potentially other military agencies that have an interest in this uh most people consider cyber

security and vable software to be an issue of National Security uh you know the US military purchases a huge vast amount of code that's embedded into all kinds of Hardware um that they use every day uh there have been programs at DARPA that have studied um uh uh ecus um uh the uh computers that are inside of cars and helicopters um and the potential for those computers to be disrupted by somebody that they're at war with so that's really kind of a national security concern and they care immensely about finding and rooting rooting out all of those bugs with automated means um so they'll give you money for it large R&D shops might give you money for

it uh you know Google might give you money for it Microsoft might give you money for it but they'll probably keep it all internal so what are you to do if you're a company like mine or you're a company that has similar Ambitions to advance the state-of-the-art in smart fuzzing and automated bug finding um how do you get paid for that uh if if customers are untrustworthy of the technology and the only two people you can get money from are these two uh so we need to we need to shift this uh there needs to be more investment from private firms and from nonprofits especially to understand that security affects everyone uh somebody has to pay

um somebody needs to to help us move the ball forward um so making that business case and finding a way to communicate it is really key to to advancing the field and then there are research roblocks and these are these are quite simple these are probably the simplest of the three that I mentioned uh so we need a realistic Benchmark to show that the tools work and how they compare the DARPA challenge sets are already almost there um and my company has taken that on as kind of a a flag that we'd like to plant something that we'd like to to assist with uh bugs are getting harder so we need to continue to improve our

analysis techniques um you know none of the tools that we've described today can really find bugs in JavaScript engines or jitting jitting JavaScript engines which is where we can uh consider one of the most risky pieces of software for consumers today um so we need to continue to advance those techniques to find um bugs but we can't lose sight of what those techniques are capable of today even if we can't find bugs in a jitting JavaScript engine we can probably find all of them an image parser um and is it worth it to root out every single bug that's available inside of lid PNG probably yes um and I I think that's a great result uh and then a call

to arms when we build tools it is no longer acceptable to build tools that simply Silo themselves that work in isolation uh it is hugely important that when we construct new analysis tools that they're capable of communicating to each other um it it was a huge problem for us to try and strap all these things together into an integrated system and that was the only way that we could make progress on any of these Solutions so if you're writing tools today uh ensure that it has some kind of documented output format and some kind of documented input format that enable other people to use that tool inside of a larger

system uh so some predictions um I think in two years uh we're likely going to have uh what what DARPA calls the centur uh which is uh combined uh human human horse um yeah so computer assisted program analysis so when you're bug hunting uh bug hunting is going to start to change over the next two years um right now you know you might open up your copy of understand or your Eclipse IDE or Visual Studio whatever it might be and you're using or or vim and C tags um or for the insane among you emac um uh and you might be finding bugs that way you might be reading code from top to bottom tracing

input uh you know using your IDE to Pivot through the source code but it's still really a manual process based on your own intuition um I think the next two years we're going to start to see tools that augment that manual process with automated techniques um and a lot of those automated techniques will be adapted from the ones that I discussed in the rest of this presentation um I definitely think we're going to start to see app stores take advantage of automated finding techniques to help developers submit secure applications to them uh it's just it's it's too easy not to do uh the business case is really there the talent is in the right place

for it already um it has to happen and then these tools like project Springfield and OSS fuzz will remain around um people will use them in larger and larger numbers it'll be slow but over the next two years you'll start to see the vast majority of really um popular open source programs start to take advantage of continuous ongoing automated bug finding in the cloud five years down the road um I think we'll start to see Cloud idees uh where people are writing code that automatically help you create tests and automatically um uh insert your program into a giant automated bug finding system that uses smart fuzzing on a continuous basis it won't even be something you think

about uh and hopefully developers at that point will have some understanding that fuzzing is as useful to them as unit testing is um and then finally research tools need to standardize on conventions and benchmarks uh and that we'll start to see winning strategies emerge um and consistent uh focused competition around the development of new techniques so concluding this is you can take advantage of all this today uh if you're someone in the audience that's writing code or you're someone in the audience that's trying to find bugs which I hope is because everybody uh you can take advantage of this um you know lib fuzzer I think is really one of the unsung heroes here uh AFL works great

it's very easy to use and it's got that really nice like asy um uh ASI art status window but lib fuzzer is really a Workhorse that um doesn't get enough credit uh for most developers lib fuzzer is actually the right decision there's a couple of guides for how to implement that in your own code base and I'd strongly recommend checking it out uh trailer bits my company has released many tools that we used during the Cyber Grand Challenge including the CB multi-os that's challenged binaries multiple operating systems uh G is our fuzzer uh that wraps ramza in a dynamic binary tracing engine uh Mema is what allowed us to operate on binaries we could take binaries lift them back to

lmir and then use all of the lvm based tools on that code um and then DARPA themselves have released a vast amount of code that they use to run the challenge including the decree operating system system uh which might be fun to use so really in some here smart fuzzing is one of the most exciting and rapidly advancing fields in security I hope that some of the things that I talked about convince all of you to take a second look at it um and uh I'd just like to thank a couple people from my team too before I stop and take questions uh ardam deberg for really helping me a lot uh put together all these slides uh

Peter Goodman for writing G and um for working on our cybergrind challenge team and uh Ryan Stewarts for really leading the way at uh continuing to develop and maintain the challenge binaries um so that's it thank you guys for having

me I see I have a few minutes for questions if anybody would like to to ask any here 25 Engineers on my team uh we're globally distributed we're based in New York uh 12 of us are in New York and the other half are all over the world uh most are in the US but I do have people in Canada France and Argentina I do internships I absolutely do them uh I do them year round um so the way that we recruit for internships is we actually come up with the problems first and then we recruit people that can help us solve them um so we have several uh things that we'd like to make

advances on um for instance we want to some of those subtle problems that I I wanted to attack are problems that we're considering um instead of uh we're looking at advancements to AFL for instance proof of concepting kind of various um analysis advancements inside of AFL uh we want to continue to develop the challenge binaries make I don't know what's causing that but uh continue developing The Challenge binaries and make them more useful for measuring more tools against each other uh we actually want to run tools um on them so for instance actually at this very moment not at this very moment about three hours from now um a colleague of mine is in New York presenting on using the

challenge binaries to measure various configurations of AFL against each other AFL does have some different configuration switches and there are different Forks of it that have different versions uh like AFL fast uh there's an AFL for Windows there's uh you know mainstream AFL and then there's configur ation among them well if you're trying to find bugs what's the right option so we took the challenge binaries and ran AFL across the mall with various configurations to see when they found bugs how much processing power we needed to find bugs um to see uh you know if there were differences in the bugs that were found like if one configuration again like found all the type confusion

and one bugs found cwes of a different type uh so I'd like interns to work on that as well um it's it's a pretty wide set but we do them in the win we do them in the spring and the fall and the summer we do them year round and people work remote um and we have a slack and like everybody just mobs on problems it's fun yeah sorry for the pitch didn't mean to do that but you put me on a spot uh there's a blog post about that actually if you scroll back about three blog posts on our on our blog anybody else okay well I'll be around all day uh you guys can follow me on Twitter and

harass me there if you feel like it I'm I'm tot it must be if I step like this way uh but if you want to keep up to date on this stuff um you know my company cares about it deeply and we try to uh spread the knowledge that we have uh through blog posts and tweets um so if if you really want to keep up on it just keep up with what we're doing and I'm sure you'll follow along thanks everyone [Applause]

The Smart Fuzzer Revolution

Related talks