← All talks

Dynamic Web Scanning at Massive Scale

BSidesSF · 201250:5024 viewsPublished 2017-11Watch on YouTube ↗
Speakers
Tags
Mentioned in this talk
About this talk
Erik Peterson details the enterprise challenges of assessing thousands of web applications rapidly, drawing on a Fortune 100 case study where his team automated dynamic web scanning at massive scale using cloud infrastructure. The talk covers architectural decisions, cost management, security considerations when scanning at scale, and results from a project that assessed nearly 3,000 sites in under two weeks, starting from 30,000 initial hosts.
Show original YouTube description
Building your own Zombie Horde - Dynamic Web Scanning at Massive Scale (Erik Peterson) In the 12 years since automated dynamic application scanning tools have been available, DAST has gone from something a few in the know were doing to something everyone is doing, but are we really all scanning our web applications? The number of hacks would suggest either the tools are broken or we really are not scanning enough. To understand what was really going on I met with dozens of fortune 100 security and learned that on average only the top 1% of web applications at a fortune 100 company are being aggressively tested both manually and using automated tools but the rest are often going without any security testing at all. Reasons given were that it was just too cumbersome of a task, scanning that number of sites would be impossible and at the current pace would take years to assess everything. Clearly a better solution is needed. In my talk I'll discuss the modern enterprise challenges that stand in the way of assessing thousands of web applications rapidly in parallel, the trade offs that have to be made as well as those that don't and why you have no excuse to be scanning everything. I'll detail the cloud computing platforms I researched and choose and the key things to consider when attempting to do anything at scale. Finally I will review the results of a project that started with over 30,000 hosts and ultimately ended with a fully automated assessment of almost 3000 sites in less than 2 weeks time.
Show transcript [en]

so the time it would take to scan everything with with with traditional methods right they'd probably measure that in years or decades imagine if you were the guy who is responsible for scanning everything at fortune 100 company anybody here responsible for that well you're probably lucky because you'd probably hate your job because you'd be doing nothing but scanning applications all day long all the time 24/7 and you'd be running some tool that you're just mindlessly having a bang on would be a never-ending task it'd be complicated tasks could be a painful task right so remember I said I was gonna take you on a little journey let's listen let's think about how we got here

again you know what what exactly is a web application in itself well most of the time we think about is just a web server something that's got some pages on it right well those days are pretty much gone we know that we know we've got you know ASP or dotnet or Java or something running behind the scenes we've got database in there we know I've got some other stuff we also have a whole bunch of stuff going on in the client as well right we've got flash we got Ajax we got JavaScript it's all mixed together maybe it's html5 maybe it's the next thing get this cool thing called the cloud all right tying into to every

little bit piece here maybe I've got some web services running out there that my website goes out and talks to maybe it talks to my back end maybe it's glued all these things together so what part of that is your app web application it's actually the whole thing so if I think about testing my web location it's not just testing that one little server sitting on the left-hand side there it's about testing all those pieces so think about the complexity of doing just that process right even if I'm running a tool day in and day out and I'm running that scan how am I going to go and test this entire system right the other scary

thing is that entire picture right there that's your new perimeter if you think your network perimeter sitting behind your firewall and everything is locked down tight 48:40 80 and 443 are wide open and that is actually your app Kayson perimeter the applications are you are your new network perimeter and remember you don't even really know how many of these things you have right it's way more scary than remember when everybody talked about perimeter going away because of VPN clients and VPN endpoints this is way more dangerous than that because applications are everywhere marketing cannot be stopped right they will be creating websites today they've launched 500 of them while I was sitting there talking right and

that's great that's their job we want to enable them to do that but how are we as security professionals I'm going to keep up with that all right so I look at this and I really think this is you know if I look at just a fortune 100 this is really a dynamic analysis failure right automated dynamics analysis tools they've been out there for 10 years yet if we think about every fortune 100 company to suffer a website breach they all have been using dynamic scanning tools right I know at least just from the ones that I can kind of calculate know for certain about 82 percent of them exactly use the dynamic scanning tool probably 100 percent of them do but

they're all getting hit over the last five years every single one that suffered a breach has been running a tool but the problem is most of them are not scanning all the web applications they have most of them aren't even scanning the ones that are getting hit because they can't keep up with this this problem right they've got hundreds and hundreds of apps the apps are exceptionally complicated they can't scan them properly you just can't keep up right so I see this as kind of a failure in general above dynamic analysis technology over the past decade in terms of the technology we developed and I'm I stand up here fully responsible having worked with you know

some of my colleagues to design these things and probably you know started thinking you know we're well intentioned but unfortunately we might have the wrong approach so why are the Fortune 100 really not scanning a hundred percent well so I sat down with about 20 of these folks that I could I could pull into a conversation I asked him you know kind of simple the simple question and surprisingly actually got 1 out of 4 different very simple answers and sometimes it was multiple answers but basically it in a nutshell number one was they just can't keep up they can't keep up with all those applications they can't keep up with marketing the key keep up with the

business the web is what's driving their their revenue stream and nobody's gonna stand in front of that so can't keep up with the applications it's too expensive to go and scan all these applications there's no permission in the organization they don't have the authority to go and scan these applications and overall they see a lot of really bad results when they do it when they go and sit down because scanning applications is tough it's a complicated process it's way too complicated it's a lot of noise in the reports every time they run a report they kill a tree right does anybody killed a tree with an audit report have you guys killed a tree with an auto

report look recently don't do that please think about the children okay think about the future come on work with me here you guys we're in a tent okay all right thank you have we come on freezing back here right warm me up all right so bottom line right today's security professionals they have to pick their battles right I would like to say that we need to get out of that mode actually because I every security professional I talk to they say the same thing I'm triaging I'm picking my battles I'm picking the things that I have to focus on I think that's a little unfortunate so we're triaging right we're picking our battles but are we really ultimately

losing the war right so I think we forget the triage means I'm selecting the most important things from this large number of things that actually all require my attention but ultimately I can't keep up with that right because I can't keep up because I'm unhappy with the results because I lack the authority I don't have any money I end up triaging what I'm gonna end up scanning so I only scan 1% of my applications and at the end of the day even though I have an application security program like the rest of the fortune 100 I'm still getting hacked well triage doesn't equal an application security program all right if your application security program is built on

triage that's probably a problem so what if we simplified this whole process instead and focus more on and understand our assets right understanding of the risk level understanding of everything that I've got because a security professional it's probably best if I focus on what is my perimeter what is my application perimeter what is this wall or lack of wall what is this this world that I have between me and the folks that are coming to my my business and spending money and coming to me for for for whatever services my company sells right so if I could do that instead of trying to just focus on that 1% would that be worth it so with that in mind I

want to introduce kind of a thought thought hearing and believe me I'm gonna talk about building the zombie horde at some point okay this is all the lead-up it's good stuff so how many people have heard of broken window theory is a kind of a social thing and this isn't actually had lunch with somebody they said is that related a broken window failure and there's so apparently a whole other theory that has something to do nothing nothing to do with inner city crime or anything like that so broken window theory is so it's not that this is all about interceder crime so broken window theory is in 1982 right this whole notion of social decay related to

very minor things causing larger crimes right so I have graffiti I've got trash on the street I've got some broken windows and that tells me as a member of society that nobody cares so it'd probably be okay if I went out and install something probably okay if I held somebody up for their money it'd probably be okay if I killed somebody and we saw this happening in the 80s particularly in New York and other cities where crime was kind of spiraling out of control and so we started to address the little things and it had a profound impact on the big things so what I'm talking about here is again trying to point out you know we've been

focusing on the big things the 1% the triage and we're not focusing on everything we're not focusing on the trash on the street the broken windows and I think that has a lot to do with why our security programs are failing so I propose that and I don't you know honestly I don't have enough data to support this I'm still spending a lot of time trying to figure out if this is true and if you has helped me over maybe the next you know six to twelve months maybe my next talk will be on this exclusively but what I think is you know even finding the low-hanging fruits looking for those broken windows right looking for

everything scanning everything you're gonna have a much better outcome right for example how many people have pen testers many people tested applications for a living for the hell of it or because it's cool okay three people I'm totally shocked I expected everybody to raise their hand so oh okay well I'm gonna want to be too so that's okay again the right side you guys suck okay alright keep it up so as a pen tester how many times you've gone to a website you go oh man this is gonna be juicy you haven't even actually done anything you just you just loaded it up in a browser and you go oh yeah totally owned right you just you

see the site and you're like oh yeah right so it's the trash on the street it's the broken windows it's the graffiti right you you you just these little things of the indicators to a bigger problem right so if I could focus on those because honestly those things are actually pretty easy to find pretty easy to trip over if I could just focus on those own and really send a message that I'm out there to find and fix those things that might send a message to the whole ecosystem of software development and application development that I'm serious about application security so that could be an application broken window theory in action right and I think honestly this is why a lot of our

programs have failed because we focused on just the important things and the message doesn't get out that we're serious about application security for all of our applications for our entire application perimeter question

well they don't because I don't think a lot of people hold them accountable or even let them know that they care right it's true and I'll tell you how many people have studied computer science in school does anybody remember taking a computer science security course really I'm impressed I think you guys are lying but that is awesome okay undergrad Wow I want to send my kids there okay where did you go the 70s they cared about it wow that is impressive see the truth of the matter is like you know in the good old days everything was better right okay all right so perfect example there I used to have to hike to school ten you know mile

ten fetus anyways so guys this side is the first ask the question so you guys are on a little on the edge here there's a great question right but I think people don't really let people know that they care right because they're only looking at that one percent they're like oh it's only the this one app that anybody cares about we'll just crank out the other one all right so what if we could scan everything let's get to building the zombie horde all right so what if we could scan everything right that's really what I was getting at if we could build something that could help create an ecosystem where we create this this notion that the the trash on the street

the the graffiti that everything is is not acceptable well what would I need in order to create that reality and I would need something that would enable me to scan everything and not go insane doing it right and that's one of the reasons why we came to this this kind of conclusion how am I gonna build a system that could theoretically scan the entire Internet if I wanted to and we decided to call it the zombie horde because we were watching a lot of zombie movies at the time don't press that button okay press that one okay so what are some of the technology challenges right so now we have the kind of the theory

and the thought behind that but what are the technology challenges if I actually decide this is how I'm gonna solve this problem what should I do about it right so besides the obvious big technology challenge which is how do I go and run hundreds of scanners and create the zombie hurt I'll get to that in a second there's actually some really big problems lurking behind it that you should think about first the first is a lot of scanners have false-positive problem right false positives at a massive scale are ridiculously annoying think about what you're dealing with on just to scan my scan bases if I have a thousand scans running and they're generating a thousand false positives

each I might as well just shoot myself in the head right so any manual work is required you have to measure that manual work in the second so maybe at the measure that really in the microseconds because if you're not measuring it to that level it will not scale automation is a requirement so I'm gonna do this so think about that if if you go well it just takes me 15 minutes to configure this thing if I have to do that three thousand times that's a half a year's worth of work before I've actually started doing anything right so I have to think about everything in terms of seconds or microseconds and it's not enough to just go and build this and

then try to solve those technology challenges I also have to figure out how am I gonna keep this this engine filled with targets to scan because as it turns out nobody knows how many web applications they have now companies tend to say oh I've got a couple things over there and a couple things here or maybe I've got a hundred or here's a list but when you really dig deeper you end up finding there's generally two to three times as much stuff as I thought was out there so it's not enough just to be able to scan the sites you actually have to be able to go and find them in the first place and make them valid targets to skip well

of course there's also people and culture challenges right technology is easy people are difficult right so business has a few demands kind of unlogical illogical demands but well-intentioned right they want production safe testing have you ever heard that this is gonna screw up production while you're doing that is it right by the way plugging something into the internet could really screw up production right so think about that right yeah in addition to you know trying to at least be kind in terms of how much traffic you generated the site which is something you can control there's really no such thing is truly production safe but at the same time if you plugged it in the

internet you're probably getting pen tested every 30 seconds anyways for free you just aren't getting the report right somebody else has got the report the other piece is we have these ideas of scans and maintenance windows and you know can you do that in off hours because I'm worried about my production site right you should put that notice on your site so that then the hacker comes to your application they go oh it says it's the maintenance windows this weekend what was I thinking right so they're not worrying about that either right again you're getting that / - you're getting that free pen test every 30 seconds and and then we've got this expectation this unnatural expectation

that we're gonna find everything when we do is scanning this is why we got stuck in this 1% mindset in the first place which is that you're not gonna find everything and just because you're not gonna find everything does that mean it's not worthwhile to go and scan everything absolutely not right you but you have to set this expectation right from the start when you start a program like this you're looking for the things the most important you're looking for the broken windows you're looking for the trash of the street you're trying to make it clear that application security is important and that no application is going to be left behind oh god I can't

believe I said that all right so forgive me all right so what our rules are engagement if we're going to do this right so we have we know we have to absolutely maximize our automation no special configurations no training no people look for only things that we can automate right we don't want to be buried under thousands of false positives we want to be internet-facing only because I'm not going to deploy a massively scalable cloud like environment inside your environment inside of your IT shop set that up and then break that down into seconds I don't want any complications these scans remember I don't want to be spending 15 minutes per site configuring it so it's

gonna be no authentication no complications I'm gonna look in the places that everybody can reach that's not a problem and then those hackers don't know scan windows so I don't I don't want to scan with you either right we have to be thinking about scanning as a continuous process so strangely enough the denial-of-service thing is the one thing that is actually easier it's the easiest of all the things to actually control and make sure you don't do because you can throttle yourself there's some other areas where it's actually a little bit more complicated which comes to injecting attacks into the site and having having an impact there but there's some smart things you can do

there but at the end of the day there's no such thing as production safe but it's also no such thing as running an application on the internet that's safe right

right so maybe but you're right you're right we're not trying necessarily be secretive but actually you there are things you can do to try to run to the radar and I have a statistic from my case study which I'll share with you about how many people actually noticed sorry it's a number as to how many people actually noticed us so question I want scan it whenever I feel like it whatever it it probably be better to do it do it the time with people around anyways right but those are the rules of engagement so this isn't for everyone right somebody might go I can't do this right so you if you really want to do

this this is where you've got to start you've got to get your your your business leader you see whoever is responsible for all this thing to sign off on this and I've seen it happen for fortune five companies sign off on it and say just scan it do it so I know it can be done I know it because we did it but some maybe it's not for everyone but the thing is I think the education we really have to share with people is you plugged it into the internet it's getting that free pen test every 30 seconds so is that still worthwhile hell yeah because it's fun but also because we can deliver to critically missing

things right one what is my application perimeter you can start there that's an amazing thing in itself the - now I have a baseline of all my applications never had that before and it's with that baseline that I actually have a better chance of properly prioritizing the actions that I take instead of just blindly picking the 1% I can actually pick maybe the 1% that makes the most sense to go after or the ones that look the most worse off or maybe the ones that I didn't even know about right I've run discoveries where see so the company has found web servers sitting under people's desks they didn't even know they were there and they were

driving revenue generating business nobody had ever knew that they were created great walked over that moment and unplugged it because it's completely unknown entity right now I'd like to see how that happened every single time get unplugged but sometimes you know sometimes you have to ask somebody for permission but being able to scan everything and knowing what you've got that's really the beginning so if I'm gonna do this this is really something that could not even be considered before the advent of an elastic on-demand computing solution power electricity in the form of computing power that's on-demand this is what cloud computing is to me right and really ultimately when you think about cloud computing I know it's a buzzword

everyone goes on cloud computing to the cloud it's actually pretty amazing when you think about it from a programmer if you can allocate computing power the same way you might allocate memory that is a game-changing event that is really what cloud computing is about and it was not possible to even think about building a solution like this or doing big problems like this without something like that it's also awesome that you can only you only need to pay for what you use and that means it's not gonna cost me five billion dollars to do this or you so here's some of the solutions that I looked at I looked at Amazon's AWS I looked at Rackspace as cloud I looked at

go grid and I really had five criteria that I was kind of comparing everything by cost api's flexibility global reach did they care about security it's important to me and it probably comes as no surprise if anybody's how many people have ever used AWS or Amazon's web services awesome isn't it it's pretty good how many people use Rackspace it's also pretty cool right I like both of the solutions I think they're I think they're great we chose that we chose Amazon later we went and deployed it on Rackspace as well so we could have options I think they're both great but ultimately you know at the time what cost me a thousand bucks on Amazon not so much maybe a little bit

more on Rackspace but ridiculously more on go-gurt I don't know why but that's what we came out and probably a lot of it had to do with how we were using Amazon spot instance as well I'll talk about in a second but really it was just this massive community support as we were building this thing yeah right it's it's so that's on the next slide actually that was when we started that nuclear war so when you're building something it's really good to know that there's a community out there trying to build stuff so that when you have a question you can ask people hey does this API work and they go oh there's a bug I'll

fix that for you it's super important so tons of tons api's tons SDKs for AWS and that was that was one of the tipping points and then they've got an enormous amount of infrastructure that you can build on top of it's not just ec2 but a whole bunch of other things we didn't necessarily use all those things but pretty useful so we went with Amazon these are the services I altima tended up using obviously ec2 EBS which is for storage s3 also storage slightly different model I won't go into the details there some other little bits and pieces actually when I started I am didn't even exist that was really annoying having to give everybody the

same username and password they fixed that thank God but they have all kinds of stuff that you could play with maybe I'll create a talk just about 80 AWS but those are the services you used so here what were the other considerations right so actually we didn't start war we actually called up Amazon before we start scanning anything and said hey this is what we're gonna do is that a really bad idea will you even let us do that and then we told them how much computing time we're probably going to use and they said that sounds like an awesome idea all right they're in in this business just like everybody else and that they

really like the amount of computing time we use so other things to think about though is you know it is difficult to manage your costs when you think about how you're gonna build that back to your customer or how you're gonna think about how much you're spending per per system there's no really good mechanism and a table us to track this right now so we had to deal with that they say they're working on some improvements I haven't seen yet but I'm convinced they're gonna do something better about that and then you really have to think about building a survivable system you can't just assume Oh Amazon is gonna take care of my uptime for me has anybody ever heard

of Amazon going down it's happened a few times right that's okay as long as you keep that in your head and they give you all the tools to design around it you just have to realize that just because you went to the cloud doesn't mean all your problems went away it's actually they're still there but now you have a bigger toolbox of what should address them right so we got approval and they were very kind to let us let us do what we're gonna do but it's totally a violation of the Terms of Service if you just go and do it and by the way to begin with you only you're only given twenty instances so if

you wanted to go fire up a thousand instances you're gonna have to talk to somebody which is a surprise to me I thought I could just put my credit card number in and purchase all the computing power in the world without talking to a human but that was okay so this is a slide that I chose this color to wake everybody up at this point so there was some security considerations so we thought about you know maybe maybe our security problems won't go away because Amazon will take care of it for us wrong right we have to take care of that as well so here's some of the things that we did so we made sure that our data is

encrypted it was really important to us so we used encrypting file systems we used SSL and SH everywhere we just never sent anything unencrypted we never stored anything on it and oh by the way don't store your keys in the cloud that means it's not encrypted so you have to think about how you do that but when you provision your boxes you can bring them up you can push the stuff in there's ways to get around this actually wrote a blog post on my blog if you find it talking about how to how to go about encrypting ec2 instances file systems on ec2 instances and you need to make sure you got the right access

controls in place I hate passwords so we use keys for everything we made sure that everybody's got individual accounts all the api's are controlled and then and then you really want to think about where your physical systems are because you might want to distribute the load you might want to design for failure we primarily use EES us East one but we could deploy this really anywhere the way we've constructed it and spread it out and the great thing about that is if you decide that you have a client that's in San Paulo or Tokyo or Singapore someplace you could easily be popping out and scanning from that location without much work on your part in terms

of time and availability it's been pretty good but again you need to design for that so we we considered that and how we how we deployed and then cloud security Alliance if anybody went to that earlier today the CSA had a thing at RSA I didn't go I was finishing this presentation that was a joke so anyways so they've got a great place to start but ultimately security is your responsibility so if you're gonna go and use something like Amazon or Rackspace you can't just say I've outsourced it to those guys they're gonna figure it out I read somewhere that it's PCI compliant awesome it doesn't do anything for you it's your responsibility you have to

make sure you get all these things right you're managing a bunch of systems they happen to be running in the cloud there's still your boxes to make sure that you've locked out right so this is where we really spent a lot of time because the first time we did this we had a bunch of epic failures and then realized that we were not designing for failure and it was really our own fault so it doesn't necessarily mean that AVS is constantly going down it's pretty stable actually but occasionally things do happen so you want to expect systems to fail as you're building a system out you want to make sure that if things are are terminated suddenly particularly few

spot instances how many know what spot instances are I'll talk about that in a second so real quick spot instances are like you buy computing power at the mark great and when your price goes up when your price when the price of that computing power goes above what you're willing to pay your instance just goes poof your machine goes away which is fine because you're paying a really cheap rate but if it if you're not if you haven't designed to handle that kind of poof than everything you've got goes poof right so what we did it was ultimately yeah like that so what we did ultimately was anticipate that there's gonna be random latency at every point

the system so it's part of testing that we would randomly kill stuff we'd randomly slow things down we'd randomly just make things awful and see how our system handled that Netflix has got a really interesting blog post about this I think they created a process called chaos monkey where they go out and randomly kill stuff it's pretty amazing and they run this in production which is pretty cool so we thought it'll pretty hard about designing for failure when we built this thing and we also thought about something that was also really important which is making money and we thought about designing for it for cost so that means you have to really get religious about thinking about the cost

of everything down to the penny right because when you're only running one machine and it costs you five cents extra you might not be thinking about it when you're running a thousand machines it can really add up really really fast so we thought about everything in terms of computing hours bytes transferred storage keeping IP addresses around longer than we needed to everything that you possibly do Amazon puts a cost on everything and you really need to be religious about that and then you really need to make sure that you're only using what you need but there's some things you can do that you can optimize and these are just some of the tricks that we kind of thought about right for

example an instance is charge by the hour so if you've got a machine and it only ran for five minutes the medes won't leave it around up until the 59th minute before you kill it in case some other work shows up you can save some money that way actually when you're using a lot of machines that actually happens more often than you would expect and then sometimes it's just cheaper to lose work and start over then building all this complicated infrastructure to handle that just some things that are kind of counterintuitive when you start thinking at scale so here's what we what we came up with ultimately this is this is kind of what dare I use the term hybrid cloud which

we've got a private cloud that Amazon's constructed which is really our our solution for dynamic scanning and we interfaced parts of that with with the with the solute with the with the systems that were running in Amazon particularly for starting up things or kick-starting the whole system and also for pulling the data back because we didn't want to leave the data out there any longer than we really had to each scanning system uses a dynamic scanning engine that we created as well and that's kind of the the part that I don't go into too much detail here but it's really important if you're gonna build a massive zombie scanning solution that you have a scanning engine that you can

deploy in the cloud as well just a small little detail so we we've been working on that actually for for a couple of years until we invaded it last year and then we built what we created you know around July last years is the central controller that the monitors everything that's really the brains behind this whole zombie horde there's got to be at least one brain that all the scanners are chasing after and we built the whole thing in Python and Java we use the bo2 API which which is really friendly Python makes us so convenient and then our scanners written in Java so the dynamic scanner this is all I'll talk about here but it was designed really

for this kind of deployment we were expecting at some point to deploy this in our own kind of cloud environment and that was under development for a long time before before this project started but when this opportunity came around thankfully it really lent itself to it this is the kind of the lynchpin piece that you've got to have and you know it's it's it's not easy building a dynamic scanner that could be an entirely different talk so just some key design considerations that lent itself to automation to lend itself to this kind of deployment we didn't build it for pen testers like every other scanner ever built we built this for automation we wanted to make sure that it was

absolutely as simple as possible it made all the unimportant decisions for you then it only looked for the critical flaws because we didn't want to kill forests we didn't want to deal with the false positives of doing that and and ultimately this thing just produces these little atomic facts about the application as it's running through that and we do a lot of post-processing to figure out what we're gonna do with those facts right one of those includes determining if there truly facts right and then we made it as safe as we could so that means trying not to denial a service the system trying to be as low-impact as possible for example there's just a couple ways

I'll just mention one for example how many people know about sequel injection everybody awesome okay so so there's a lot of ways to test for sequel injection everybody's gone to a website and said ah I've got a single ticket my name right oops sequel injection right so if you do a single tick one you know or one equals one kind of style thing you know what you actually done you've modified the you've modified the logic of that query and if that query happens to be behind that that that's Sicily you just modified happens to drop tables or delete records or anything like that that could be a non production safe activity right so one of the things that

we did when we thought about all the attacked attacks that we inject is to think about how that impacts the production system that we might be running and actually created injection strings that do not change the logic of the query that was that took about a year's worth of research to cover all the database platforms and all the permutations you could possibly imagine but those are the kinds of things you have to think about so it's not just denial of service but also thinking about how you're you know going to inject things into the system so in terms of the AWS instances so for the scanners we used specifically m1 large instances these are about 7.5 gig ram

boxes didn't really need that much memory but it was just kind of how the how the system broke out there about default nine gigs hard drive we bumped that up to 30 but you also have something called a ephemeral storage which you get with each instance and if you have an m1 large you get to about 400 I couldn't remember exactly was about 400 gigs ephemeral storage this is just storage it's there if your instance disappears or turns off it's gone so don't put anything important in there but if you need like a lot of temp storage or anything like that while you're while you're doing your work it's a great place to put it we actually

ended up merging those together as a raid 0 array and encrypted the whole thing and we saw a certain definite improvement in our performance in doing just that within Amazon there's a whole bunch of science behind how Amazon manages EBS volumes and ephemeral storage EBS volumes are off on some sand somewhere I have to go over a gigabit network whereas federal storage is a little bit more local I think and it runs just so much faster so if you're gonna have any kind of working space that's the place to put it and we used Amazon spot instances wherever possible to save cost the controller was another large image we had a big spot to put all

our processing of our data and that was on-demand obviously we didn't want to have that shutdown on us but it's this controller could be split out between multiple regions with multiple systems kind of talking together and ultimately though our network and storage costs are negligible compared to the computing CPU times that that we needed so so this is where I talked a little bit about spotted sis because we used a lot of spot instances and for a while it was just like the most cool thing in the world they never went down and we weren't even at the time thinking about if they were to go down when we first started playing with them just dumb

don't do that and but they're really cheap so why wouldn't you use them well occasionally you get these weird market fluctuations so even though you're paying maybe 11 cents in the the market going rate for that image is 34 cents but suddenly somebody decides they need a lot of computing power from Amazon and so you get these spikes so it can go from 11 cents an hour to $20 an hour the other day I saw it go up to 99 dollars an hour right and if you're using spot instances and you're doing a lot of processing you have a hundred instances or a thousand since they all go away if you haven't agreed that you're gonna pay

$99 for your instance it's probably a bad idea to do that because that would be very expensive so you need to think about how you're going to store data backing up being able to pause and resume and start over and things like that if you're if you're running your scan and don't ever set that price higher than you're willing to pay because sooner or later you will end up paying it

right so that so the question is if one in one person asks for one instance and they're willing to pay 99 you know bucks does it move the whole market and the answer is no it doesn't what moves a market is a lack of lack of capacity right so when we started ramping up and getting to the point where we had a thousand instances to play with we actually started to see at certain times the day we could move the market and and figure it out you know and that created some really interesting effects when you try to launch a thousand at the same time and half of them come up and then the other half crash because you just

moved the market and then the other half come back up and then it crashes it yeah it's bad so you have to kind of think about how you how you go about consuming all that computing power yeah it's you know it's kind of interesting I I'm really curious to see how it how it progresses but if you look at the spot in this price over time it's kind of crazy something to think about as you as you're building this so these are all our top-secret details that I wasn't going to tell you they've all been redacted right I'm not allowed to tell you about the fifth developer on the project he's not here anymore I'm

kidding I think but in this day and age we we have to be careful we did build this solution to totally dominate the market of course right so um here's some other pitfalls to think about when you're designing a system like that bugs can have really frightening consequences particularly if you're not really paying attention when you have bugs that are launching remember that programmatically power and you have a bug in your code you can accidentally launch hundreds or maybe 500 instances with a little bug and if you're not really paying attention or maybe you go for a cup of coffee you can come back and go that bug just cost me six hundred and forty five

dollars and forty eight cents not that that ever happened to me by the way the expense report is in the mail but it's totally true and it's really tempting to just kind of test in production there's no test like production as you're playing with this stuff don't do that because it could be really expensive literally really expensive and we were lucky but I've heard of other horror stories of people starting up machines and then going away and forgetting about it letting it run for months and they thought they had ten and there was like running five hundred and they're in trouble right so you can spend money really really fast so you need to think about

that so let's talk a little about a case study so we that's a big truck so when we first started you know working on this project the idea of scanning this many applications doing this kind of thing had never been attempted before nobody had really thought about scanning this let me say how much time I got right at the 15 minute mark that's perfect yeah so people have been scanning applications by throwing lots of bodies at the problem and we had a customer come to us and they said look we have a real problem we don't know how many applications we have we don't know what's wrong with them we know we're under attack please find

out before the hackers find out because we know they're pentesting us 24/7 so we said sure let's see what we can do for you and thankfully we write parts that kind of come together we had been working on this and so we kind of moved quickly to to to design the system a little bit earlier than we thought and we we found out that actually you know this was this was something that was going to be really doable and and the interesting thing is that they started with by giving us just an enormous list of 30,000 dumped out of their DNS I said well here's everything we think we have we're not really quite sure and so we

had 30,000 domains that came out of that DNS and results of like a bunch of random end map scans and they said we need you to scan all this and that came out to about 30,000 sites 30,000 systems we thought wow that's gonna be that's gonna be fun scanning 30,000 systems how fast can we do that so as part of that process we we went through a discovery and we found that actually only about 5,000 of those were unique sites they registered a lot of typo domains they registered a lot of dead domains they registered a lot of stuff that just somebody had registered years ago we're still paying the bill and long had forgotten about it and that turns out to

be pretty common fortune you know fortune 100 companies where they've got bazillion domains and they don't even know what they've got so we went through the discovery we came out to about 5,000 sites and then we had a couple filters that we put in place and that came down to about 3,000 well specifically 2990 four sites and so we set out to scan those sites so zombies don't sleep at night neither does the zombie horde so when we started the scanning we kind of ramped up slow because we had learned our lesson about launching thousands of instances into moving the spot market and we started to ramp that up we didn't have any scanning window we had full full full board

permission to go that rule of engagement is really important we would not have been able to accomplish this without that and we started slow and then really rapidly grew and then really by day five we had scanned pretty much everything things are starting to settle down but the interesting thing is there was zero notification that was sent out to any of the site owners it was so important to test we just needed to do it but it also gave us the opportunity to really test how we're going about this kind of production safe mode only two of those people actually even noticed we were there and I that's probably more a comment on the sorry state of

application IDs than anything and the only reason those two people actually notice is because we hid a form and and some poor guy had like 10,000 emails in his inbox in the morning and by the way if you're ever testing a site and somebody goes stop that you're sending me emails every time you test that site you should say stop that you have a really bad form that needs capture or something oh my god what are you thinking right anybody can go to that web forum that's a vulnerability that's a problem so we worked with him on that and at the end of this whole process we consumed about 18,000 hours of computing time it's about two years of computing

time in about a week it took us about four and a half hours for each site to be scanned and this one we only use 500 scanners now we use about a thousand and out of that we found over 1500 validated flaws and I say valid is because we we wanted to actually know how well this thing was working so we sat down and went through everything came out of the system and we built a whole kind of process to manage that and actually as it turns out thankfully we we had a pretty low FP rate because we're only focused on the things that are most important yeah so each one of those sites is unique and each one of

those vulnerabilities is unique that's right yeah yeah there's a unique flaws so think about that they had 3,000 sites that had never been scanned they never sat down to do it they'd only been focusing on the top one 1% they were getting hacked like everybody else and we'd come along and scanned everything using using a very broad very fast very go-for-broke scan everything kind of approach and found 1500 things they didn't know about that's pretty amazing so the process after that was actually to instead of spending all of our time scanning which is where most application security programs get stuck they spend all the time scanning I talked to one client he said by the end of 2010 t12

will have scanned all our applications that's a year from now right I said wow that is awesome you got a ton of work going on but isn't that kind of kind of unfortunate so what do you mean we're gonna have everything scan how awesome is that I'm like your entire application security program is about scanning not about remediation not about building better policies not about helping people understand educating your developers to write better code it's all about scanning so all that means is by the end of 2012 you will now know that you're screwed and you now have a lot of work to do alright so the amazing thing is if you can actually stop thinking of

scanning that way and you start thinking about scanning very broadly and very quickly like this is that your application security program moves away from just thinking about scanning and now on to fixing things the whole project itself was completed under three months that included going back in rescanning applications after they fix vulnerabilities after they dealt with things turning off a lot of those applications because they didn't even know what the heck they were doing out there in the first place and that they should have been taken offline a long time ago and you know in a way I think this kind of you know fed into the broken window theory a little bit in the sense that

all of these all these sites where this low-hanging fruit was found right they they target that with with deeper more aggressive scanning later and that's where they found the majority of their really scary problems and so over time I'm really tracking this to see if if you know overall security independent of what's actually found is actually gonna increase because I think that'll be one of the key indicators that this kind of theory is in effect so in closing there is hope so there's really no reason to not scan everything the costs aren't aren't too scary the fact that you can you can build your own dynamic or zombie horde is certainly interesting or you could just you just certainly you know

shameless self-promotion you could give me a call I'll be happy to do it for you but you know ultimately this makes delivering the results to focus your program not scanning because even though I've just been talking for the past hour about scanning really what I want people what I want to do is change the conversation for most companies so that they don't think about scanning because it just happens that fast and they focus really on all the things they need to do afterwards but if you are going to do something like this make sure that scalability is our religion automation is a key if it takes you one minute you want one minutes to long if you have to

do that 10,000 times make sure they optimize for performance optimize for failures in other words expect the failures and optimize for the cost in terms of both time and money and you know in expect expect the unknown just because it's in the cloud doesn't mean that the problems that that you you have an existing IT are gonna go away so that's it thank you for your time I love both the right and the left side here you guys are awesome I appreciate the questions any any more questions yeah

we break it up because it's more scalable that way and there's actually some bandwidth costs within Amazon's Cloud so if you put it locally there you your costs are a little bit less and and there's less latency between the controller and your scanners so that we break it up that way there's but you could you could do it either way yeah sure it's very heavy bandwidth yeah compute time accounts for 95% of our cost I I'm actually really convinced every day I'm convinced that something must be wrong or Amazon forgot to bill me or something because the bandwidth is huge but I think the price is just that good but we do we try to optimize a lot

of stuff and as surprisingly the great thing is a lot of people host their sites on Amazon when you're scanning a site hosted in Amazon your bandwidth bandwidth costs are almost free right but but yeah most of it is computing time after that it's storage and then it's bandwidth any other questions awesome thanks guys really appreciate [Applause]