← All talks

Advanced Bot Landscape

BSides Munich · 202228:03119 viewsPublished 2022-05Watch on YouTube ↗
Speakers
Tags
Mentioned in this talk
About this talk
Bad bots account for roughly a quarter of internet traffic and employ sophisticated evasion techniques including headless browsers, proxy rotation, and anti-detection mechanisms. This talk dissects the architecture of advanced bots used for credential stuffing, carding, and scalping, details their key components and obfuscation strategies, and explores detection approaches including behavioral analysis and device fingerprinting.
Show original YouTube description
Bad bots traffic represents around a quarter of Internet traffic today and is predicted to increase. This traffic includes website content scanning, stolen credit card checking, denial of service, inventory… In this talk, we describe how as a security company we tackle this variety of threats, how we started our research, the challenges we faced and the solutions we provided. This talk includes an overview of the general trend in terms of popular bot programming languages, software development frameworks. Then, practical examples will be taken from the most prevalent bots from the OWASP top 10 automated threats. The general architecture of those bots will be displayed. The main components explained before drilling down to the key features they include to remain undetected. How do they evade captcha systems ? How do they avoid fingerprinting ? From the naïve approaches we will introduce you to the most stealthy features. Speaker Yohann Sillam Yohann Sillam is a researcher from Imperva. He continuously monitors cyber security attacks detected in the wild, publishes blog articles about hidden ones and finds innovative ways to tackle them. He has more than 3 years of experience in cyber security, especially in malware analysis.
Show transcript [en]

so hello everybody nice to meet you my name is johann cinema as jane said i'm working for an improv company uh and this talk is going to talk uh to be about uh advanced bot landscape so before we start i give you a short introduction about myself so secret researcher with four years of experience in cyber security i spent the major part of my time analyzing malwares but more recently i spent time on web application security i try to understand the advanced particle system try to see their behavior if there are ways to detect them or even block them so this is the agenda of the session today first we will see together the advanced

bar ecosystem then i will show you the internal structure of an advanced bot and lastly we'll see together a few different techniques and the detection that we can provide so let's go but before we start sorry short definition of bots bots are softwares that automate action on the internet um there could be good bots for example the google googlebot that's cooling the web in order to improve its search engine result and you have bad bots for example the ones that are scanning all in the internet in search of vulnerable websites the difference between the two is the content of the server being reached by the by those bots for example a bot that's not respecting the robots.txt

file at the root of the website will definitely not be a good bot okay so the market of bad bots is huge the market just talking about ticket it was estimated to be roughly eight billion dollars in 2017 and we can see that the market of bots is growing general uh web scrapper uh software market was valuated to half a billion dollar uh and this was like almost this year you have communities gathering uh hundreds of thousands of people exchanging advices about bots selling bots buying bots and there are many ones that i'm going to show you in this presentation so this is such a huge market that very efficient bots can be very expensive here is the example of a bot

called the wrath aio out of stock in its own website and on the recent market you can find it for no less than one thousand seven hundred dollar here aiok means all in one that is this bot is able to uh to scalp actually it's an advanced scoping bot to all to two targets all sites uh develop with the shopify framework uh okay so now let's talk about about traffic in term of numbers so i've been spoiled a bit but it's completely okay so bad bot traffic was measured to be roughly a quarter of all internet traffic in 2020 where a good bot is only representing 15 percent of it and when we drill down into this

we can see that more than half of it is the fact of advanced bots so what do we mean exactly by advanced bots so have a few things in mind the first one is a usage of headless brother browser technology this is a technology that enables the rendering of web content almost the same way as real browser will do an example of such technology is selenium for example we have invented the rotation between anonymous proxies most of the time there will be a residential one because they are harder to detect and also advanced anti-detection mechanisms for example the ability to mimic human behavior during the solving of captcha we'll see an example of that later

okay so what's the purpose of advanced bots so they can be used for many things including carding the fact of creating checking uh stolen credit card numbers found for example in dark web and test it against e-commerce websites credential stuffing the fact of taking combo list of username and passwords from the dark web and check it against a large range of websites in order to find at least one match scalping the fact of purchasing automatically a large amount of premium items from websites in order to resell it later at a higher price on result markets so this damages the reputation of the commerce website and also generate a loss in the long term for the company

because it destroys the strategy of premium items and generate frustration for the customer denial of inventory the fact of for example making a website unusable via the massive usage of dummy cards and so on so among all of those types of um about activities scalping has become so popular that noit has become a kind of a sport so this is a video taken from a youtube channel called butting with burger showing one guy using two different bots advanced bots actually in order to scalp the premium release of one specific company so he's using on the left side about called cyber io and on the right one red set uh robot so each of the land that you can see

it's pretty difficult to see i'm not sure if you see it well uh is a scalping task operation and the their status is evolving initially it's connecting to the website then filling the form and eventually solving the captcha and do the purchase operation so what you can see in the left at some point the lines turn green which means that the full flow from the connecting to the page monitoring of the price and the solving of the captcha was entirely successful and on the right side you can see that it stays black at the last stage it's not possible to read because it's too small but it says px banner this tells that the security vendor of

the website was able to detect this traffic as uh non-legitimate and was able to block it so this video is roughly a year old and this shows that in real time the bypass of the security vendor by advanced scalper actually called cyberario so i showed you this this uh gif in order to show you that this is an actual concern to be able to block this kind of traffic plus this kind of advanced softwares okay so one of the challenges of this research was to get access to the files of the bots because very often bots were out of stock in their own websites and even sometimes on the recent market they were not available

so at least for you set sources that were very useful to me in this investigation so first you have general hacking forms for example uh hack forums crack.io node.2 fraudster cruise install.live all of those are hacking forms in general and when you so you can search for a specific tool or you can see a new tools that are coming out you'll have one post describing the tool and most of the time eventually you'll have a link to an archive that's stored either to unknown files or make a file upload you'll have of course to be careful because those may give you a small gift uh they may be also infected by malwares so you should analyze it in a specific

environment okay so here other kind of marketplaces that were very useful so as i said scalping has become extremely popular now you have even marketplaces that are dedicated for a type of scalping oh okay for example ticket bots so ticket scalping or sneaker scalping here okay sometimes we were able to get access to the source of a bot in github it was the case of cyber io for example or behind misconfigured server we were able to get access to the file of one advanced bot called the kegel okay so once using all of those techniques we were able to gather something like 40 advanced spots so only advanced we try to answer a few questions

what are the statistics in terms of programming language the term of headless brother technology what's the internal structure of these bots and what are the efficient techniques that they are using so i'm going to show you this now so uh the great majority of them were developed either with electron framework or even sometimes with a dot net written with dot net framework and when we uh drill down and see uh what's the headless browser technology behind we can see that puppeteer is the most common headless browser technology used here but we can see also other ones selenium is very popular.net browser but we have also other ones like essential objects jigs browser which is uh for java

playwright etc now let's look at the internet structure of an advanced bots so of course there are many different bots so it's not possible to do it for all of them but the idea is to take the ones that are the most successful so we go on discord channel we try to see which bot are the most expensive the most recommended and because often attackers break uh their outcome um on those challenges say that they were able to escape this and this prominent release so i decided to pick a bot called the project destroyer so a bit more than a year ago it was worth one thousand dollar it has the ability to scalp all of those

kind of websites that you can see here and the reason i pick it is because it has many functionalities so most of the evision tricks that i will show at the last part of the presentation are part of this bot but in general the idea of bots are the same be able to connect to website without being fingerprinted be able to perform automated actions in a simultaneous way at a very fast pace so um we could split this is a very simple version of this uh of this bot so we could split uh in the left side what's the input what to provide with this bot and on the right side what is inside of

the bot so to be provided to the bot a list of proxies so usually ipport username and password a valid key for a captcha solving service for example capmanster a list of profiles so including billing information names addresses etc the attacker may have several ones and then inside of the bots you have the headless browser so in this case it was a puppeteer um usually you'll have also additional modules that will improve its uh ability not to be detected for example in this case the puppeteer extra plug-in staff then you have an evasion module that would be responsible for all of the tricks to uh to to make the bot hard to detect you have a profile management module

that's responsible for storing the profile provided by the attacker proxy management module that will be responsible to store the proxies check if they are valid measure the speed and solve them according to according to it you have a cookie jar so it's an interface provided to the to the attacker to generate valid cookies for the target website so most of the time uh 24 hours up to 48 hours before the release of primary items the attacker will have to generate a batch of cookies and then they will be consumed during the scalping operations you have a capture management module that will be responsible for extracting the captcha from the page send it to the third-party service

and then retrieve the answer and solve it in the client side and you have the operation module so in the operation module most of the time you have one folder for each target victim that you have for example in the case of project destroyer you have 37 folders so it will contain the script in order to click on specific button to to fill a form etc and then to call the other components okay so let's zoom in and see what happens in uh one specific task so the headless browser here puppeteer is started with with as i said a feature in order to make it harder to detect for example set the webdriver to false

then the bot will craft plausible device attributes for the bot to look like a real device i will show some examples just later then the bot will pick projected cookies and proxy for the specific target then the bot will monitor the price uh of the premium item that it wants to purchase in a loop if at some point the price is below the threshold that is set by the attacker in the task then the purchase operation will happen when the captcha is provided to the to the bot it will be sent to the third party and the captcha will be solved okay so now let's look at a few efficient techniques and some detections that we

can provide first of all uh it's not actually uh evasion it's a code protection so the code of this bot was quite strongly affiscated when you try to expand it this way and execute it you will crash with this specific error so what actually the code does it checks the difference of string representation of functions when the code is not non-touch and when the code is being debugged so i had to understand which part of the code was responsible for this check disable it and then i was able to detect the other tricks used by the bot that i'm going to show you now so one of the tricks that they used was to add non-mandatory http address to the

fields of the requests so here for example you have a list of uh seven different fields that are non-mandatory http header fields and uh there are randomly a certain number of those fields are added to the request that will be sent for one specific scalping task so this is done in order to make uh the bot looks like a slightly different device each time it's performing a scalping operation so this is not actually javascript code i geofascial data and show it in like pseudocode okay another small trick that's worth mentioning it's called the address jigging so the bot is extracting the addresses provided by the attacker and then it's parsing it and uh finding the keywords associated to roads avenues

lanes streets etc and then slightly modifying it in order for all of the scoping tasks even though they are directing to the same address eventually to look different for any uh security device that we try to find scalping operations for example there is another function that will add a leading zero before the numbers of the addresses okay another important topic that can be discussed related to advanced bots is the ability to to stack capture solving services for example there is a tool a new tool called aycd autosol that can be integrated with some advanced bot including cyberio for example that any uh the attacker to uh to prioritize different uh skeptics solving service for example

first the capture will be sent to a cap monster and if the captcha cannot be sold by this specific service it will jump to the next one and so on this maximize the probability that eventually the scalping operation will be successful okay so now let's look at a few detections that we can provide for this bot so instead of the code i was able to spot this specific section that's responsible for crafting device attributes plausible for for the bots so here you see the available hath so this is supposed to uh represent the size of the screen of the device not the size of the window and it's said to be random number between 1000 and 1600

pixels uh but it's like uniformly randomized so there are values here that don't correspond to any device on the planet most likely uh because it's uh really precise so if someone really is really paying attention to what happens on the specific device uh he will be able to detect that this is probably not a real device but about so this is a flow in the in the code of the of the bot as i say here in the documentation this is the size of the of the screen okay another thing that we can do is extract the snippet of code responsible for the solving of captcha in the client side so eventually the bot is using a function

that will generate a busy curve in order to solve the captcha so this is more or less how it will look look like so you can see the curvature is pretty smartly done you can see the acceleration the deceleration so a simple device will not be able to detect this as illegitimate but if you pay more attention you can see that a real human will not draw curves like this it will be more irregular so we try to just to to to check if it was actually true if we were able to to see a difference between uh real traffic can fake traffic so to do so uh we decided to measure the acceleration of the curve

so this is the graph that we obtained we can see that the average is set to open one pixel per millimeter square with a maximum value to roughly open 15. so we try to do the same with a human curves and this is more or less the graph that we obtained with the average higher up to open 15 and a maximum to open five so the question here is not if we can detect bots using this kind of technique this bot actually using this kind of tething is how much time do we need in order to be accurate enough to be sure that this is indeed a bot okay so as a conclusion the managers of market bots

is a large and expanding today then there is a predominance of electron framework for about development and the puppeteer as a headless browser technology and lastly it's a cat and mouse game which means that um knowing the code of the opponent being able to understand the tricks that he's using is a great advantage in order to help the detection for the detection of those of those tools uh that's it are there any questions [Applause] so do we have any questions for johan and the bad bots silence um i actually have a question about the one of your slides could you go back to the slide where you had the captcha services the captcha beating services or

whatever they were there was a list list yeah uh this okay yeah this this one so um i'm looking at them and there are three there and on the right hand side there is balance 0 49 984 do what is the balance for so the attacker has to register for each of the services that we he will integrate with aosdio to solve so he'll have to pay for each of those services and then each time they are requested then the balance will will decrease so when the balance is set to zero so you cannot use this service so probably it will skip automatically to the next one uh-huh okay so then do these services cost different like per time of use is

there a different price between the different services or are they roughly the same um so the prices um uh uh um roughly the same they can like they can vary uh a bit but more or less it's uh it's uh something like open one dollar for uh for each request something like that okay interesting and then you talked a lot about scalping um in your presentation and to me uh i'm very old-fashioned scalping to me is where i go to a concert and there's somebody standing outside holding tickets up and saying please buy my tickets and the concert is sold out and i pay a bunch of money for that what does scalping mean in sort of this

realm in the in the bot world so i said that scalping is uh um okay it damages the so first it's something that's uh uh almost everywhere in the market of bots and actually this is where the most advanced but we can find the most advanced bots because um they need to have a very strong interaction with the website it's not only about like sending one request it needs to keep a communication be able to move from page to page without being spotted as a malicious bots so as i said uh scalping damages the reputation of uh companies when they're successfully uh and that's why companies really try to invest in order to be able to protect again

those threats so this is why this presentation is like slightly oriented with a scalping but let's say okay and then what is the target then of a bot that's in that is engaging in sculpting what's the target yeah what what is that um so there are many targets so i show a few of them so for example uh website developed with shopify from works but there are amazon of course ebay walmart adidas also i mean if you go to any of the markets that was listed above you can see the list for each of the birds that you want to purchase you will see the list of targets that it has the ability to um

to to attack okay i think we have another question up here yeah um yeah thanks for your talk and i got a question about bot protection i work on a software as a service product which is very very attractive for people who are trying logins from database leaks and um for us the invisible recapture from google was the only solution to protect our hololog in against bots and automated attacks do you know how safe they are like a recapture i think it's a top of the notch or one of the top of the knot solutions but do you have any experience um so recaptcha is relatively strong but of course i mean they are always the the

machine learning tools that are currently used by uh third-party capture services are stronger and stronger even at some point what they do is they just automatically send it to a specific person that will solve it manually so this is i think the case you have like a tool called death capture that's doing both of it it's uh so if the captcha is simple it solve it with machine learning if not it sends it sends it to uh to someone usually in india that will uh that will solve it so it's a it's an ongoing challenge to be able to um to uh to improve uh uh capture strength um but there are many different ways to try

to detect but not only about like capture this is like the the end solution that actually companies now are trying to avoid because it's time consuming also for the consumer it's bad user experience thanks

hi uh you talked about uh randomizing or pseudo uh minimizing the shipping address for some scalpel that is trying to buy all the sneakers on the market or whatever um has anyone tried talking to the shipping guys of ups and stuff like that because it sounds like an integration with those guys can provide a feedback loop to actually detect them um so this is an interesting topic i don't think that's okay so it's probably below i don't think that uh this is something that we did indeed this is uh this is something that can be done but i mean i assume that people are uh sometimes writing shortcuts for also addresses so uh it may lead to false positive um

uh one option probably would be in the logs of the um so propose a service for the website to to aggregate the logs and check if there are any uh um aggregation that can lead to detection of scalping operations this is can something that can be done eventually more at the level of the log analysis okay thanks

do we have any other questions for johan all right we're okay go for it uh hi so um with this technique let's say uh professional sculptors are to say what quantity of product might this person be looking at and then how this person actually gets then things um what's the quantity sometimes it can go up to 50 different box of let's say sneakers that cost each of them 100 or even i don't know a bit more so it can be sold at much higher price so you can see in mark wrestle markets for specific brands the prices of items can go up to a thousand dollar so when you have 50 bucks so you multiply the two numbers

you can you can make like um a very large amount of money so this also explains the price of the bot that i showed you brady io uh two thousand dollar this is an investment but eventually um sometime it pays off

all right we are getting to the end of our time so please give johan a round of applause for bad boss