BSides 2019 Day 1 Track 2

BSides Vancouver3:33:32338 viewsPublished 2019-03Watch on YouTube ↗

Mentioned in this talk

Tools used

Microsoft Defender for Endpoint

Platforms

AWS Azure Microsoft Entra ID

Languages

PowerShell

Concepts

API Gateway IAM

Show transcript [en]

test test test test

foreign

thank you

yep hello hello cool yeah thank you okay I think we're has any questions can you just repeat it into the microphone so that everyone in the according to here as well okay perfect and I think we're at time so I'll get started okay cool thank you

okay okay hello everyone Hi um I'm Jeff McDonald uh and along with Mustafa bavna uh jugal and Andreas today I'm going to be talking to you about a Windows 10 feature which has MC Integrations for script behaviors and how we're using it to you to create machine learning protection as well so how you can use these Behavior Logs with machine learning to provide customer protection for your customers as an AV vendor or as an Enterprise as well uh so I'm actually I'm out from the Windows Defender AV team so I work as a machine learning researcher for Windows Defender AV where we build models and pipelines protect your customers using machine learning although I come from a

background more of reverse engineering at reverse engineering malware Android doing more deep technical analysis uh okay so first a little bit about the script threat landscape probably a lot of you are very familiar with what these malicious script attacks look like um so they're used in a variety of stages of attacks so one of the most common ones is in Social Engineering attacks usually these type of attacks will be archives that are attached to emails as fishing lures to customers and then within that email archive there'll be a JavaScript a Visual Basic script or an HDA file which they try to trick the user into opening but these militia scripts are used more than just as the initial infection

Vector they're used a lot as stages and attacks after getting a foothold on a machine as well when they especially when they don't want to write a file to disk so they're commonly used in phallus attacks both both as stages during the attack to avoid writing files to the disk but also as a persistence method on Infected machines where they'll actually set up a malicious script content within a in a registry key and that's the malicious execution point they're used a lot in advanced attacks as well specifically on the Powershell front there's entire red team Frameworks and penetration Frameworks built around power shells such as power exploit for example and it's one of the choice tools

for lateral movements after infecting emission infecting Network as well so scripts when they get executed on Windows get hosted in script execution engines so it depends on what type of script you're dealing with if you're opening a Visual Basic script for example it would be hosted with W script or cscript.exe on a Windows machine and it's fairly similar for JavaScript files as well it would be hosted in W script or cscrip.exe which are windows x windows script execution engines HTA file is an HTML application and it's also one of the very commonly targeted script types that you see in Social Engineering lures and that gets hosted with mshta.exe so social engineer social engineering attacks often rely on these scripts as I

said so this is a quick example of what one of these attacks look like so in this case they're saying hey please refer to the attached invoice now if the user were to open this archive file inside there would be a would be a Visual Basic script file now if the user were to run this it's obviously stated obviously obfuscated in pack Visual Basic script they usually change their packing methodology per campaign weight per attack weight that the attackers carry out so they'll they'll get they'll create a new Packer for their visual basis script or JavaScript they'll test it against all the AV vendors they'll make sure it's not detected then they'll carry out a really large attack wave

against customers over a period of about two hours sending out tens or hundreds of thousands of emails um usually these scripts just access act as loaders for additional malware so for example in this case it's not clicking Okay cool so in this case uh for example I just downloads a payload from the web and then executes it so it's an executable payload a PE file and then it loads ransomware onto the machine encrypts all the user's files had it been successful okay so um this is a quick look of how scripts are used in persistence kafter is one of the more famous malware families which use malicious scripts for persistence upon infecting a machine so it actually

doesn't use scripts to apologies

yeah it's coming back here okay sorry about that okay yeah so kafter does not use scripts as its initial infection Vector so it's not true file less malware I wouldn't say however once it infects the machine it sets up its malicious JavaScript content to run in a registry key so in here you'll see in Ms it sets up an auto one registry key which actually invokes mshta.exe and then passes the malicious script content on the command line to mshta so in this way there's actually no file no malicious file on disk in this attack pattern uh scripts are really easy for the attackers to obfuscate they've got many different ways to pack their scripts there's pretty much an infinite number

of ways to pack it here a few examples of of what some of the packing methods look like but it's really challenging as a security researcher as an AV product to to protect all of our customers from these attacks because they go out of their way to test their new Packer algorithm against the AV products before carrying out their attack campaign and Us in order to respond to it as a traditional AV product normally what a researcher would do would be they'd collect a few emails they would collect a few of these attacks from the same attack campaign and they would write a regex-like expression which would be able to detect all variants of the same

attack wave then we'd go we'd have to take that signature go through false positive testing make sure we're not going to have a huge false positive then we deploy it as a signature to our customers meanwhile their attack waves work on a much faster Cadence so they would they would begin the email campaign in them they might be and they might their attack Campaign Will might only go for a one hour period where they send out a hundred thousand a hundred thousand emails attacking a hundred thousand different customers and there's really isn't time for for a human to be involved or for a signature release process to be able to actually protect all these customers from these attacks

so really machine learning is needed for proactive response to protective protection against these type of attacks and in Windows 10 there's an amp there's a we introduced a new feature called AMC which is available to all the AV AV security products that are out there to help combat the malicious script problem well it's used in part to protect against a malicious script problem so AMC is a standardized MC is a standardized anti-malware scanning interface it allows any app developer you don't have to necessarily be a Microsoft app developer to create calls into the default installed AV product on the computer so this doesn't have to be Windows Defender AV this could be Kaspersky or Symantec as the AV product

which is installed on Windows and then they'll be the Handler of the AMC calls now any app developer can create can use this API to call to call for scanning of content on your machine but in Windows 10 we've introduced um we've we've introduced Integrations of ampsi calls into the Windows 10 scripting engines to help combat these militia Scripts so you can learn more we've got this all publicly documented so you can go have a look at how how to create calls or how to handle calls into AMC so at the Windows 10 launch at the Windows 10 launch we announced Powershell Integrations Visual Basic script jscript and UAC interest so these Integrations um so Powershell first of all is open

source you can actually go check out all the source code on GitHub but it creates an MC call every time every time Powershell code is compiled so at at compile time of Powershell code it creates a call into the MC product with a whole Powershell content so this provides Dynamic this will allow any dynamically loaded a Powershell content to be passed to the AV product to scan every time you load Powershell on your computer any any new Powershell script content so if you pass if an attacker uses the command line to pass in a huge Powershell script encoded by base64 on the command line that'll be that'll be unpacked out of base64 and then that

will be handed into the AV product to scam before that Powershell executes so there's similar Integrations for BB script and JavaScript where any Dynamic con dynamically loading new JavaScript code and also prior to executing any JavaScript or Visual Basic script code it will create calls into the AV product to scan so this this includes all these MC Integrations are in cscript.exe W script MST mshta.exe or anything that uses the javascript.dll or vbscript.dll and same for Powershell any way that Powershell is is is is used now what we're going to be more talking about today is in the Windows 10 fall creators update we actually introduced Behavior instrumentation of these scripting engines so not only do we get

the dynamic script contents during execution we actually get the we actually get the actual behavior of the scripts during execution being passed into the AV product installed so this looks more like Behavior logs like you can see here which is actually the behavior of the script not the script content and we have a similar Behavior instrumentation now with office 365. so if you if if a victim if a customer Falls victim to a macro enabled attack they agree to enable macros macros are running now it's actually blogging the behavior of that of that macro content during execution and that's going to be passed to the AV products installed on the PC as well to protect the customer

um so lots of security products do support scanning of MC content at least in some limited way uh here's a here's a list of some of the AV products that do support handling of AMC calls at least that we could find using public public sources okay so now we're going to talk a little bit more about this AMC Behavior logs because that's key to key to the ml topic here 's an example of what a malicious Visual Basic uh sorry malicious JavaScript code is in this case and this is the corresponding malicious behavior log so here you'll see it downloads the downloads the payload downloads a PE file payload from the web and writes it

to disk and then if you look at the like fourth Behavior log from the bottom it's actually changing the file format of that downloaded file to have a PE file header so and then they're open they're they're hosting it with one dll32.exe at the bottom there so whenever whenever the behavior instrumentation hits uh execute or run type command it pauses the pauses the execution of the script content and then it hands in the whole Behavior log to the to the amsi interface to scan the contents and if it is if the antiviral product decides to block it it's actually going to abort execution of the script and the execute command is not going to be allowed to be completed

so here's another quick example this is a Visual Basic script.in code command and if when this is executed you can see it didn't provide much meaningful Behavior there was no run or execute command so there would be no AMC Behavior call created to the MC engine however the dynamic content calls would still have been created but not the behavior log if the Run command is reached multiple times in a script it's going to create multiple calls to the MC Engine with more and more Behavior available for use in the classification so now before we uh now now to get into the machine learning side of things I first need to describe a little bit about how how our main machine learning

application works in Windows Defender the bulk of our machine learning protection exists in the cloud so when the when the when a client here on the left initiates a scan against file or a behavior we actually asked the cloud for a machine learning decision so we said we actually don't want to set up the whole AMC Behavior log to the cloud instead we describe a featurization of the content up to the cloud this includes the emulated Behavior it includes the fuzzy hashes of the content and it also includes specialized machine learning vectors that we've deployed for each one of the scenarios so then we have this these features get sent up to the cloud and then we have

about 40 real-time classifiers that run on clusters of servers located around the world and all these classification stores get combined with ensembles and we run decider rules and ranking logic in the cloud in real time combining with offline deep learning models and many other data sources around Microsoft to come up with a decision in real time so the problem is it's too costly to ask our Cloud for every uh every every encounter that our customer encounters so instead what we do is we also deploy lightweight machine learning models in the client which we actually pair with the models that we deploy in the cloud so we actually train a light lighter weight version of the model which is

optimized for performance and low overhead on the customer's machine to do classification client-side and only if it evaluates positive will it then send up the full build the full feature set describing the content send it up the cloud for the full classification so first we're going to talk about how we built the featured Vector to describe the content the purpose-built machine learning feature Vector so this is the part of the features which are sent up to the cloud which is the machine learning feature vector so we we detonated in sandbox environments a large collection of known clean and malicious scripts while recording AMC Behavior logs and we used that for the for the features to base

our feature selection of what we're going to deploy to the client so first off of course we we include his features all the com function function names which are referenced in this case we had six com objects which are referenced and similar for all the com function names so we had 193 com objects referenced and then in order to describe the the function arguments to the in the behavior logs we do character engram description of the content so this is we take like character four grams five grams six grams eight grams 12 grams and we consider all those as potential feature candidates and I'll show you what these features look like in a minute so in the character engrams of the

function arguments explodes into a very large number of potential features too many for us to deploy to the client so instead what we do is we do a down selection based process to choose which of those character engrams are best to deploy to the client to separate clean from malicious content uh so in our case we use our learner-based down selection which uses a linear sdca classifier and then we do the down selection through L1 feature regularization and feature trimming basically we train a machine learning model and then let it sort out which are the most important character in Gams to deploy to the client and we select the most important features you can think of

it like that in this case we actually only deployed 300 final engram features so a small little character tokens you can imagine sequences of characters to the customer to describe the function arguments as a quick side note for those of you more familiar with these processes we we did compare the learner-based down selection versus Mutual information down selection and we found that the learner base produced features that that that resulted in higher quality models so now this is the more fun part this is actually I'm going to show you how the features actually describe an example malicious script content here so first off we would describe all the com function arguments which are referenced by the script in

this case we've got about five of them here and then we'll be describing all the com functions which I referenced within the script highlighted I think blue there and then this is actually these all these red parts are actually what all the final 300 character engrams that we deployed to the client describes so it's a pretty reasonable description of some a lot of the features within this this Behavior log so this this is just the machine learning feature vectors that we that we deployed specialized to the scenario um this is added to other features such as the fuzzy hashes and similar so another problem is the um another problem with it is the um is that the sandbox space logs don't

really represent the behavior of these scripts in the Wild on customers machines when they're infected so what we do is we actually just use the sandbox logs to deploy the specialized features to the to the machine and then what we actually train the model is based on in the wild customer encounters with known good scripts and known malicious Scripts because there's a lot of Enterprises out there which use all these all customized Enterprise scripts to their Enterprises and there's of course we don't have those files to make sure that we avoid them as false positives so instead what we do is we actually train on on all of our customers in the wild encounters with all of these emcee behaviors

uh so the client-side model is focused on being highly performant we've got highly performant uh we've got a highly performant engine for running logistic regression models so it's optimized towards um low memory usage as well as low CPU costs when scanning content and then that's what's paired with our heavier model in the cloud the heavier model in the real-time models which we deploy in the cloud can be a lot heavier they can use a lot more memory they can use many more features they can use a lot more computation costs to make the classification but it still needs to be scalable because we handle about 4 billion queries a day in the cloud so we did some sweeping of the different

Learners in application to this problem we found that the a linear linear based model is performed the worst in this case we did we did try a fully connected neural network it's not not exactly a deep learning approach but I did outperform the linear models by a little bit however we found that the tree Ensemble models perform the best overall the reason why we think that the tree ensembles perform the best is that it's able to model feature interactions for example if we have and is able to model uh a different malicious or clean weight depending on whether it has a fun specific function argument and gram match with it with a specific com object match within the log

versus a different com object match within the log because it can actually mean very different things and a tree model is able to better model those feature interactions so in our case we chose light GBM the light GBM is a tree Ensemble it's just 100 trees to come up with a classification nothing too fancy about it so here's a quick look at the different feature contributions to get an idea of what features were most important to the model making uh decision so of course including all features perform the best we found after that the uh the specific targeted machine learning feature Vector that we had deployed performed performed was the most important were the most

important features to making the decision after that the fuzzy hashes had a very significant role in the classification and finally partial hashes which is like partial partial hashes of the behavior content did not perform well in the classification okay so in a quick summary Windows 10 has script script Integrations um for for a lot of the different script types it has Dynamic script content scans which is Dynamics script content scans before execution but also during execution of scripts when they try to load further script content it includes the dynamic Behavior calls scans so here's a quick summary so Powershell has all the dynamic script content scans and then for vbscript and JavaScript we have both the dynamic

script content scans which is the script content itself and we have the behavior scans for those when it comes to office VBA macros that was the feature that we announced very recently I think about two months ago we just have the we just have the behavior based Amity calls however AV Vengeance are already scanning the macro content prior to executions there isn't really a need for the for the dynamic script content in the case of office macros security vendors can effectively use machine learning against these MC content to create classification on the Windows Defender AV side how we do it is we do paired client machine learning models which are performance optimized with a heavier Cloud machine learning

models um the purpose-built machine learning vectors greatly improve the quality of the results so I'd recommend if you're to approach a similar problem that you'd want to use purpose-built machine learning vectors we we also have built machine learning models specific to each one of those scenarios those each one of those MC scenarios so we covered the VBA Behavior Powershell JavaScript and vbscript Dynamics contents as well where we did a very similar feature deployment process to decide which are the most important character engrams to deploy to describe each one of those content type scans and Windows Defender AV at least from the script protection scenario is just part of our overall protection story there's a lot of there's a lot of other

features within Windows Defender AV which can can entirely stop a lot of these script-based attacks like there's ASR rules attack surface reductions where you can like disallow any macro content from doing an execution at all you can now write entirely stop entirely attack patterns like I know our Office 365 team has an entire machine learning team dedicated to scanning and detonating scripts before they arrive in your email inbox so just we're just part of the picture on the AV side um we're we're hiring a lot to win this Defender IV so Shameless plug if anyone's interested we've got lots of reverse engineering PM software development positions they're mostly based in Redmond although I'm from Vancouver myself most of our team is

based in Redmond okay cool thank you any questions [Applause] cool yeah thank you so we have a second microphone and if you've got a question I'll just pass it over to you

so it depends on what your operating system is like there are different minimum requirements like Windows 10 builds required to be enabled for Office 365 also has I think it's like the feature we announced about two months ago that AMC integration requires a minimum Office 365 version which I think most people don't have yet does is that what you're asking for cool

yes yes it does yes Windows Defender 80. yep so I had a question on you were saying that the on on client models had a curated set of rules that they're looking at that then send basically that's curating what the data is going up so I'm wondering how do you make sure that your rules aren't self-reinforcing themselves and your data set is getting too narrow yep that's definitely a good question um so so to so so as I understand what you're saying is how come we know that the the client machine learning model is just bringing up exactly what our Cloud Model already detects how do we deal with all the unknown files that are out

there how do we know if they're being caught with a client classifier or not so yeah and how do we prevent that from happening so although I I cut it out of the one slide we actually have more to it than that than just the client machine learning model bringing up queries um I actually included it in this picture oh come on

okay okay so we have more than just the client machine learning models bringing up queries to our Cloud we have whenever it's a high risk low frequency event we ask the clown for decision so we and also we have uh random sampling that occurs in the client where we get random sampling of of customer encounters with with files so we actually are able to train and discover stuff that we're not detecting when it's missed by the client classifier if that makes sense why do you think MC is not adopted more widely by the third-party AV vendors that's a really good question I think I think a lot of the AV vendors are trying to compare the value versus the engine

to support the feature it's more than just adding MC integration to your AV product you actually need to write you can you can you don't have to use machine learning protection like we have it allows you researchers antivirus researchers on those companies to actually write signatures and they would like signatures like for example it's creating a suspicious exe file in Temp and then running it and then that would outright block that so the traditional AV signature way of writing antivirus signatures against a behavior can still apply to this but if they add support for ampsi they also need to invest researcher time to write signatures to act against it and building a machine learning scenario

like this against Behavior as we found out going through this project is a very challenging is is a very challenging experience it's a very challenging from like controlling the amount of false positives and and deploying it as a feature end to end it takes a significant amount of work on Research side I would say especially so there's a lot of support cost to it and comparing that to the value I think some of the AV products may have decided that maybe maybe they don't want to invest in it I don't know of hearing more maybe your products are investing in it but I really doubt whether they're they're actually investing the research or time pack

feature properly though right sorry I have a question um so the data that's generated from this seems to be like really valuable especially I'm kind of thinking of this from an instant response perspective um and even like maybe there's something that's run from the MC engine and it doesn't get flagged as malicious but are there my question is are there logs for this that's accessible or is it down to the AV provider to create these logs that's a really good question um I have heard reference to I think the Powershell script execution logs I think being recorded with Windows Event Telemetry so like I think you can set up Windows Event forwarding of some of

these events like not to the AV product and I think they are logged as etw events Windows etw events so you can add hooks to look at that but I'd intended for this presentation to do a little bit more more research on how you as an Enterprise can get those logs forwarded but I I wasn't able to get to it unfortunately but I know there are Enterprises like including I think Microsoft as an Enterprise where we are especially forwarding the Powershell execution events and that's a really good good use case as AMC but I'm sorry I don't have a lot of insight on that side a question there and there oh yeah perfect thank you so if they ask the cloud

pieces triggered what is the average response time from Windows Defender cloud and then what are the common use cases you see for the implementation of the MSI applications like yeah so uh our like round our our round trip response time I would estimate is around between 100 and 200 milliseconds it depends on which country which country you reside in um uh the second question was uh what are some of the example use cases of like Enterprises of these script types yeah you'd be surprised like Enterprises use like even outside of like Powershell is like ridiculous amounts of Enterprise use but when even when you look at visual basis grip and JavaScript being executed on these machines it is used

really widely in Enterprises believe it or not it really surprises me there's a lot of customer Enterprise Scripts so yeah Enterprises use like those HTML applications believe it or not for like I've seen it for like Auto auto server deployments and things like that there's lots of really interesting legitimate use cases of those script types I found yeah you mentioned a random sampling of binary assets as well as an upload of events of assets that represent a low frequency by but high probability of of of being actually malicious is there any way of being able to opt out of that uploading sorry sorry could you repeat the binary you mentioned um random uploading of binary assets oh no that's not random

binary uploading okay so that's just the random feature description being sent out oh I see just if it was yeah not random file submission I understood okay good stuff is there any fast submission at all um so not in the MC path like these buffers I don't you don't have a way to request them it's complicated yeah I got two questions uh is Windows Defender still using the Intel threat Defender technology sorry the Intel Tia it's a a Intel threat detection technology I've I've I think I've heard reference to it but I I don't know anything about it certainly it's not using it the defender used to okay and then the third question is amps

is going to be coming to server Technologies uh the server os's yes I think Windows Defender AV does support Windows servers cues I think starting some I think I think that started within the last year or two so there is official support for them yep

any further questions oh one more question okay yeah all right so you said 200 milliseconds uh return time between lookup for features and then the decision of of whether it's bad or not during that time are you actually halting execution of that script and waiting for that and then executing it yeah or you were just just letting it run and then after the fact going oh that was bad we're able to control that depending on how confident we are on the on the ask to the cloud um so it's we we are trying to hold it as long as we can to wait for that decision when we decide to ask the cloud but it depends

on what the trigger the reason we ask the cloud right

okay yeah yeah thank you I think that's everything cool yeah thank you [Applause] thanks Jeff and lunch is coming up I guess we're a little early for that break thank you

that's actually foreign

okay there's one testing testing so that would be the one that you can run do you want to hold on to it or we can sit in on your turn

no problem

testing testing

foreign

you are prepared also awesome they seem to go walk yeah yeah it's looking a bit light yes

so that sounds like we need two of them just remember where I normally is that they have one full I think I actually left mine up very cool so you don't need one anymore

okay these are VGA so it's probably one here like that except for HDMI okay hold on I will run back so this one almost worked okay so you've got two there if I could breed them would be perfect

I'm a big one talking too long so I'm always like it's okay yeah you'll be like five minutes left good um

as long as it's not an Apple One I don't know it looks generic but I have other ones if that one causes these problems oh anything Apple do anything even if even if it fits it doesn't stop so do we turn this on or I just go into projector

yes we're just trying to turn on the projector

yeah it should be one of these but I'm not as familiar with the projector so okay all right thank you

oh you think so okay I

I would guess

it's okay

it's old school

I will take all these back we should be good let's grab it yeah and it should be on so Okie sorry hi everyone welcome to uh besides Vancouver 2019 today topic is how to secure serverless by Matt Caroline thanks

thank you okay hey there everyone um just while people are making their way in and out uh just for a show of hands because I like to just gauge and talk to what you guys want to hear who's already using serverless today okay and who's just heard the buzzword and interested about it and who who's not doing any development on serverless but is worried about securing it like it's sort of one of those things where hey yeah okay we've heard the buzz it doesn't need any security doesn't need any patching that sort of thing but you don't believe it okay cool um so yeah my name is Matt Carolyn I've been doing I.T I don't know about 15 20

years uh last 10 years I've been in the cloud space so working on AWS and Azure um and yeah serverless is something I don't know working especially being in the cloud space for the last 10 years I've heard a lot of buzzwords that I've been like I've been doing that for a long time when did this term suddenly uh present itself um so serverless is one of those things where kind of in my mind a lot of the things that were great about Cloud when it first came out were sort of on the serverless side and then we all started saying hey but I'm not used to this give me VMS and things that I can almost tug

and so they started bringing out uh block storage and VMS and things like that that we're more uh used to and and that gives us a bit of job security because we can now patch them and uh like we might not be doing firmware updates anymore but we like keeping those boring tasks uh on hand sort of thing but I do feel like that evolution is happening in my mind it's sort of we've got uh we've got the old school servers it went to VMS uh containers has come along take you a lot more off our plate but ultimately where we really want to be uh well in my mind anyway is serverless and SAS and those sort of

pieces where we can actually concentrate on the end users the experience and things like that but in saying that there's still there's still a lot to look after and in breaking up all these things into uh into a lot of different buckets there's a lot of security pieces where we can open up the gaps where we're not just looking after one rack anymore or one rack on each side of the continent we've now got lots of little pieces all over the place in this software defined world so I wanted to begin with uh just talking about what is serverless because really if we don't understand what we're actually trying to keep secure it's hard to really effectively secure that

environment if we take on what we've been doing previously and apply that to the new world you'll have a lot of apps uh tools and things that just will give you false positive readings they might say hey yeah you're all patched because you don't actually need to worry about patching or there might be other pieces that you didn't worry about previously because it wasn't your responsibility but now they've come into your world I see that a lot with government especially where in government it's a lot harder to shift the mindset of hey I'm just doing databases or I'm just doing the storage bit and this is my world particularly with serverless it it tends to bring in everybody from all the other

facets to bring around so so really serverless is about um not having to if if everybody here knows this the traditional stack where we've got the hardware at the bottom then we've got the operating system the middleware and then your data really most of that stack if we're running it in the cloud where a lot of that is looked after for you by the cloud provider now you can run serverless on premise in which case yeah you still do have to look after the hardware it's look it's running on but that's more of the the main thing that most people are running out there is on the cloud so the reason that you might want to run serverless on

premise most of the time is for testing to have it close to you test things out not have to worry about redeployment and Bug fixing that way test it on premise and then apply that up to the cloud so in your testing world you might be a bit more relaxed on looking after pieces there although it's still very important to have some pieces which we'll go through in a bit so so if we keep with serverless running on the cloud uh really all you're having to look after basically is your code uh you're looking after your code making sure you know no vulnerabilities there and we're trying to uh really make sure we keep things as

simple as possible for yourselves while giving the cloud provider or all that responsibility that we used to look after such as the operating system the middleware patches security those sort of things so the benefits so why would we do that as I touched on just uh that Focus being able to not have to worry about all those uh tasks that uh have kind of hindered things also the other negative to having patches and security pieces always having to worry about with VMS and operating systems and firmware updates and things is your code can sometimes get impacted by that whereas when we've got a cloud provider looking after all that for us they're keeping everything very consistent so

that we shouldn't have to worry about the impact of what they're doing in the back end to us with our code so we can then focus on the code and then also scale so the idea of uh traditionally we're running VMS we might be running clusters if we're doing containers we've got swarms of containers to look after load coming in uh that sort of scale is all looked after by the uh the cloud provider for you so you don't have to worry about that scale you don't have to worry about re redesigning your code for the traffic coming through those those sort of pieces and again I'm assuming here like when I'm talking I'm talking about if we

were to do an application from end to end serverless I must say also a lot of times when I'm using serverless it's for one piece of a job uh I I come from so I come from a background of 20 years ago in the developer space got into the sysadmin space and more helping developers get their code working the orchestration those sort of pieces these days I'm finding myself back into that developer space because everything's code now again but yeah doing that during that scale there and then flowing on from just uh talking about like so how do they bill you in the cloud for uh pieces where you've just got code you don't have to worry about compute

storage those sort of pieces it's just your code that's going up there so what they do is they'll bill you for a lot of the times it'll be for while it's being executed so while that code is being executed up there so I'll talk about a few best practices around that um there are like and when I talk we're talking generally about best practices there are exceptions to the rule where we might say hey uh I need because of my usage my use case I need it a certain way definitely when we're talking about serverless it's more the security is more around keeping things in order rather than hey this is the only way to do it and you're

silly if you don't do it uh this way so if we talk about uh AWS specifically uh for a serverless I just threw a bunch of icons up here there's a lot more uh that uh Amazon classify sort of in that serverless realm uh now I'll run through a heap of them it's kind of debatable sort of like the word cloud has become now and uh every everybody uses the word cloud I hate the word cloud now um and there's a lot of like uh was it everything's converged uh that word seems to be overdone and I'm starting to see serverless now everybody's like well isn't that serverless what's the definition of serverless and we start we

start getting pretty long lists where we basically end up saying hey if it's not a VM and it's not attached storage then is it serverless but really um the real Crux of it is uh Lambda on AWS so when Lambda was released a few years ago my mind just went great I've got all these little Linux boxes doing little cron jobs and I hate the fact that I'm paying like 14 a month or something just for this server to sit there idly by just to collect some logs or to reset a service that I know fails every now and again because I'm a lazy coder and I don't want to actually find the actual problem uh so Lambda I was using but

they didn't bring out scheduling at first and a lot of my stuff used schedule now they got scheduler so we could do little codes that are scheduled and we also got other pieces that'll flow on from this that it's all connected to and a lot of these things are serverless a lot of the some things you also not serverless I end up a lot of the time saying hey I've got I'll start off with a monolithic Legacy sort of architecture because a lot of my clients are moving from that Enterprise space over to the cloud and piece by piece we'll look at pieces and we'll say hey you know what before we're putting everything on one server now let's break

them all apart all these little services so something that you might use Lambda for today if you're not using serverless it might be something that is just a routine job maybe it's a command line or a Cron job or something like that that you have running in the background that you just happen to have uh maybe it's your domain server or something else that uh you've got a better compute free and you've said okay you know what I'm not gonna use my rack space or number of VMS just to have this one simple task I'm just going to have it uh run on this existing server so those sort of things we can break out on

their own and have running as just just some code that's maybe scheduled or triggered by other events uh and break these things out the other advantage to doing that is you've now got less dependencies if previously you're running these little tasks um sort of piggybacking of this greater more important server say it was your domain server or your email server or something and that broke then that could also stop that small function that you have as well from working in the first place so so lambdas really so then we've got a whole heap of other services we've got Lambda Edge which is really looking at your uh CDN your your Cloud front on AWS and getting your serverless code

executing closer to the end user uh fargate S3 storage is another one now why I'm putting these in bold is to kind of connect a simple story together S3 storage is like I said a long time ago before I even really heard the term serverless thrown out there um there's this S3 storage from Amazon it's object storage you don't have to worry about any servers it scales for you automatically just create a bucket start throwing your files in there um so in effect that's that's serverless there uh you got their elastic file system to look after uh your block storage as well for your dynamodb serverless like database there that Amazon's own Aurora has now gone

serverless about six months ago so you've got an option there for them to give you the same benefits of their Aurora on a RDS but with the advantages of serverless API Gateway is something that I'll probably touch on a little bit later on Authentication uh uh and Amazon SNS one of the early services on AWS to give you notifications about pieces that are happening so you could use that sort of as a trigger for serverless uh sqs with that Q service step functions which are kind of related to earlier on Lambda to have things uh triggering Kinesis and Athena around uh just your data Big Data moving that data changing it into the right format that you want

so there's probably a lot of other services on Amazon that you could I mean there's over 100 Services there running on Amazon but a lot of these I find really good stories of interconnecting these pieces together so the ones that are done in bold Lambda S3 and SNS um you could have uh your Lambda taking some input storaging it in S3 and triggering a notification for you once that's arrived in there maybe it's resizing an image or something like that but as you can see it's quite a long list and I've I've just talked briefly that just then about a simple resizing of an image and it took three different services so how do you keep like right

there as a security vulnerability of having to keep an eye on that sort of uh amount of pieces whereas before you probably have one VM it was a minor job within the VM probably part of an application and it sent you an alert that the developers of that application did for you if we look at azure try to illustrate it for a different story here really we're talking about like we've got a trigger we've got the serverless piece here so Azure functions similar to aws's Lambda so we'd have a trigger here something comes in to uh event Hub gets triggered triggers your code to run off and then we can be messaged about it stored in a database

or a object storage there now all these things lead us to best practices around security so really I painted the picture of all the options there around uh serverless obviously a lot more but really leads on to um several several problems that many people Overlook out there so authentication is one of them so if we look at the actual code side of your code there now you might have multiple devices uh you might have mobile devices you might have uh desktop like Windows and Linux accessing might have websites accessing you're probably going to have a separate uh function or a code set for each of those to display things in a different way different use because the way people

use or want functions out of an application is different on each of those devices so because of that you're going to want to make sure that you're keeping things in check things are going to the right areas and we don't have any holes in there now you could have everything going through directly to your code you could use um different authentication engines you could use like Azure ad or you could use IAM accounts um but really the the way I drive people if they're doing coding like and this is like I said just an option is the API Gateway so if you're using an API Gateway within Azure AWS or any other Cloud providers out there this will

really help you manage the direction and path of all these uh commands that are coming through to your code but also make sure of the uh authentication at the same time at that sort of uh Central Point another another piece is making sure that um you're actually for the end users looking after that authentication as well and your admins and other access areas so if we're dividing up groups who's got access to what what databases have got access to which other steps in your functions so really looking at making sure and this is a practice that's not just for serverless we talk about it all the time in other areas but particularly here we want to have least privileged access to

make sure that nothing slips through the cracks here because we've got so many different pieces running around Cloud controls so as we talked about earlier where we've got a lot of the stack being looked after by the cloud provider what I direct a lot of people to do is start off with using the native Cloud tools within Azure and AWS to manage their serverless environments uh so like managing like looking visibility and management and controls to serverless environments is still relatively new and there are a lot of third-party applications that do a really good job much better than the cloud providers themselves but you may not need that level so I always direct to start off

with the cloud providers themselves because they're going to have better integration a lot of the times and a lot of the times it's free free integration that you're getting anyway uh you're not going to have other pieces tied in that could fail that could have issues so there's a lot less that can go wrong I'm serverless does become uh it's great I use it like I said for little pieces but when you're trying to do a whole solution with serverless in mind it can become a bit of a mess so the less pieces that you have in play the better and if you're using a cloud provider their own native pieces then you can just a lot of the times those same tools

to monitor you're also using it for other pieces that might not be part of that serverless application so less tools to learn as well there but obviously there are times where you you need to maybe you've got PCI compliance or you've got health information Pi data that you need to have stronger controls than what the native Cloud providers provide in which case yeah look at look at third party then but I wouldn't jump straight into the third party bid I'd start off with the cloud native pieces so visibility that I touched on before that's probably the biggest threat I see to um to serverless out there is that sort of documentation uh is one piece but

what if you've got several people doing different pieces you've got your development life cycle happening and people just aren't documenting like in reality the amount of companies that I go to and I say hey can you just give me a Visio diagram or something where like give me a lucid charts of just your VMS or just give me a number how many VMS have you got and I'll get like oh yeah we got 300 and then we find out later it was actually 900 and you're going how could you make that sort of big mistake they're VMS like you could just go into VMware and just it'll pump out a report and tell you but it's because people

just lose track it was probably 300 two years ago when they last looked at it they've never really looked at it since because they don't have that sort of Automation and management in mind so we're so serverless it's gonna get even worse than that because you you're not going to have one VM that looks after multiple functions within it you're going to have multiple pieces all running around and that visibility and documentation and planning also can lead to like not just better efficiencies in your code but the security and making sure that we're not opening anything up there and I see it all the time like the S3 buckets in AWS I think a lot of people are kind of

dismayed that health organizations and that are just opening up publicly their buckets but it's probably due to well laziness but laziness due to not documenting things and and they're probably trying to work out I would hope that they at least attempted to do it securely and open up a path but they just couldn't work out who needed access to white or what uh what subnets are being created for where or what I am so they've got no organization or visibility on what they have so therefore they just opened it up public and went hey that works um and never expected someone to do a scan on everything public on S3 so visibility I I think is more critical

than ever with serverless because you've got a lot of pieces in play uh and then the dev life cycle so throughout the life cycle of your development on serverless just keeping our security in mind so uh so I was talking earlier about a lot of a lot of coders out there when they're doing serverless they're probably going to bring the serverless code onto their own personal laptop and we can't stop this as admins and that like we see it all the time people out there like they've got file storage and they go off and use Dropbox and they go share uh files and things like that that way I mean you can control it but the more that we put up

barriers around things the more people find ways around um I mean I I used to say years ago when Facebook and that was becoming something that business owners were like everybody's just doing Facebook nobody's doing any work they're just on eBay all day let's just block it all what we found when we blocked it all they just tethered off their phone they still wasted their time everybody started using their phones but what was more effective was documenting what's allowed what's not and I I do a report for my clients where we do a shame sort of thing where we'd say hey look this is how many people were on Facebook this week how much percentage of the day

they're on each and We Know Who You Are we're just going to put this on the kitchen fridge and just kind of Shame people into going uh you know like because there's legitimate reasons people use social media and that for personal like within work you don't want to go into a bunker for eight hours and come out the other end without any communication to the world but there's a balance and so that's kind of it here like coders are going to be lazy and they're just wanting results and security is boring it's got to hold things up uh and it's going to get in the way so just having those policies in place having it part of the dev life

cycle before it can move from that piece sure let them have their test environment on their own laptop do the code be lazy but before it can even leave there we've now got security steps that we've got to make sure are ingrained in there you've now tested it you've shortcuted things you you've got it but we've got policies and security practices within this whole life cycle because otherwise what's going to happen is you're going to go through like testing QA and all that get to production and then the security team is going to be like whoa hang on no this isn't going to work and everybody's going to get upset at each other because all of a sudden the code now won't work

the way the security people are forcing it to go so having having Security in mind throughout that life cycle um and then that kind of flows on to my I think this is my last Point uh around General best practices which is the dependencies and third parties so if we've got that Dev life cycle happening with uh Security in mind at each step you're going to pick up on pieces where um where you're going to have dependencies of pits that aren't serverless or are outside of your environment or it might be a third-party SAS application that you're pulling in or maybe you're passing emails through a third-party service something like that but making sure that that is in that Dev life cycle

making sure that security is in mind for them you don't just pass it off to them and say it's their responsibility is the tunnel between secure uh are we passing it off in the most secure method uh are the third parties it might not even be as simple as them being some sort of SAS uh conduit it could be um a third-party company that's got access into your environment access to your data that goes afterwards I mean uh I I got told the other day it was five years ago I find it hard that it was that long ago but when Target uh all the credit card information got taken from Target years ago that was the third

party that was it was not actually Target that uh let that slip it was a third party that was handling their credit card information and they just didn't do due diligence on how they were handling that information uh so yeah so keeping that security in mind all the way past and I kind of see the third parties or other dependencies as another piece of that serverless bit like I was talking about earlier we got all these little pieces the puzzle to make up like this serverless solution and even though the cloud providers are looking after the whole stack and even though third parties could be looking after a lot for you you still got to make sure that the

traffic between access for them is least privileged all those sort of pieces sort of tying together

Okay so when we're talking about uh we're talking about AWS um it really is very very similar topic to the other Cloud providers uh and we've gone through a lot a lot of the pieces there uh if we went back to um the the different Services we have there there's a there's a lot of things to be mindful of a lot of different options um actually I'm just realizing API Gateway which I talked about earlier was one of the points I was going to make it this this point uh but yeah you've you've got uh you've got a lot of serverless pieces in play and then for Azure um I think actually I'll just skip over

these two bits about I found I'm finding that I've actually talked about a lot of the points that were for them kind of Blended them in previous um so on that uh yeah in wrap up I think the biggest important thing around serverless like I hinted at earlier um is not not your traditional security methods of having uh I mean you're still going to have your firewalls and your network structure still properly in place but it's more around having that visibility I think kind of ties everything together if you've got that visibility and control then all those other best practice pieces that I was talking about the authentication is going to make sense you've documented everything

you've got that visibility uh you've got that control happening um and and all the other pieces of the pie sort of blend together and Pull It in so it doesn't matter if you've got many different services or Blended to come to one solution that aren't in one uh one VM or or cluster together as one service you can blend this you can have databases as a service you can have your code as a service you can have your front-end cdna CDN as a service but as long as that's all visibility and control in place whether using the cloud provider or third parties or your own documentation your own life cycle management that's how you're going to be effective

in and seeing those vulnerabilities come in in the first place so uh on that uh does anybody have any questions yes thank you thank you Matt um can you talk about the API Gateway deployment um models a little bit um whether or not you would recommend um on site or versus on the cloud deployments for the API gateways okay so if you've got your your production running on premise versus in the cloud and using your API with model yeah a hybrid model yeah so and you you're talking about yeah yeah hybrid model where you've got a mix but you've still got your API Gateway running through like Amazon's or azure's API managed Gateway for you

um I mean personally it it depends but personally I guess I'm Cloud biased so I'm going to say yeah I would run your API Gateway within your cloud provider because they're going to look after a lot of the management for you um and then really the biggest thing on hybrid that I consider is latency between the two so uh what What's your late it depends what you're writing like maybe your front-end code and that is up in the cloud maybe your databases are on premise your Pi data that sort of thing you're trying to keep in-house maybe that's the reason for hybrid um and what's the latency between those so I would sort of as the path that I would tend to go with

uh and like I said I'm just talking General would be your API Gateway for traffic flow through there into your Cloud environment and then you're on premise you'd have like an Express router VPN a direct connect whatever it is depending on your latency back to your on-premise um if we're talking about like doing controls of access through your on-premise up to the cloud doesn't come up for me that much so I wouldn't be able to like go into that in much depth but uh but yeah I definitely happy to go through more detail around that sort of architecture if if you want and if anybody's interested uh around API Gateway it it is hard there's so

many different services to like drill down on sort of thing I was I was like do we pick one good method and drill down on that but there's so many different Services out there I thought I'd give everybody just a taste assume that everybody kind of has a inkling of uh serverless and the cloud providers out there uh does anybody else have any other questions all right I I hope this sort of overview introduction to things to be aware of has been useful if if you do like API Gateway and that want to get into deep diving with me I'm happy I'm going to be actually at the scalar Booth around the corner for the next half an hour or so

um happy to get into discussions also um if you want to find me uh run a meet up here in Vancouver for the last few years been running one at code core we talk about all things Cloud uh also run an Azure User Group in Seattle and uh and then yeah we're doing a lot of um podcasts and that on cloud topics every month as well so yeah so happy to go over any topics you like outside afterwards in more depth and thanks for having me

chance of visibility

BSides 2019 Day 1 Track 2

Related talks