Smokeloader: The Pandora's box of tricks, payloads and anti-analysis

Name: Smokeloader: The Pandora's box of tricks, payloads and anti-analysis
Uploaded: 2022-10-19
Duration: 47 min 26 s
Description: Over the last 2 years I’ve looked into the malware as a service (MaaS) known as Smokeloader. Smokeloader is notorious for being heavily obfuscated and currently is leveraged for delivering additional malware. I took a deep dive into the malware and reversed every stage of the bot. With this knowledg

BSides PDX · 202247:264.5K viewsPublished 2022-10Watch on YouTube ↗

Speakers

Pim Trouerbach

Tags

CategoryTechnical

DifficultyAdvanced

TeamRed

ResearchCase Studies and Incidents Analysis Technical Deep-dives

StyleTalk

Mentioned in this talk

Malware

Smokeloader

About this talk

Over the last 2 years I’ve looked into the malware as a service (MaaS) known as Smokeloader. Smokeloader is notorious for being heavily obfuscated and currently is leveraged for delivering additional malware. I took a deep dive into the malware and reversed every stage of the bot. With this knowledge I ended up recreating a client of the bot which allowed me to view the various commands sent to the bot and to explore the protocol and backend of the C2 panel. This exploration of the protocol led to vulnerabilities in the panel as well as over 10,000 issued payloads for my bots over a 6 month period. In this talk I will document each step of this journey as well as the results of passive intel gathering. Pim Trouerbach (@Myrtus0x0) is a Senior Reverse Engineer at Proofpoint as well as researcher on the Cryptolaemus team. His passion is analyzing all aspects of botnets and reimplementing network protocols to infiltrate them. For his day job he mainly focuses on E-crime research and improving detections and extraction capabilities for ProofPoint products. Outside of his dayjob he writes code for the Cryptolaemus team to process and reverse engineer Emotet & IcedID. Pim is also an alum of Lewis & Clark College where he graduated in 2018. BSides Portland is a tax-exempt charitable 501(c)(3) organization founded with the mission to cultivate the Pacific Northwest information security and hacking community by creating local inclusive opportunities for learning, networking, collaboration, and teaching. Twitter - @BSidesPDX

Show transcript [en]

next up uh from uh starting it well right about now um we've got one of our longer 50 50 minute talks and I'm very happy to introduce you all to Pim jarbach who is going to be uh giving a talk called Smoke loader the Pandora's box of tricks payloads and anti-analysis he's a reverse engineer so everyone give a big welcome to Bim

can you all hear me okay cool all right all right so quick introduction to myself my name is Pym I am a reverse engineer at proofpoint my main focus is on equine botnet so things like ammo Tech cubot ice ID and so on but every once in a while I'll get the random apt request but I'll be asked to reverse engineer some nation-state Trojans and whatnot I'm a member of the cryptolamus team so for those of you that don't know what that is it's a team of roughly 25 researchers all around the world where we for the last three or four years have been fighting with the botnet ammo tet and I've done a mix of reverse

engineering for them and software development to do automated malware processing uh my background is in computer science I got a degree from Lewis and Clark uh in computer science um and my first job out of college was a software development role and I had a deep interest in malware at the time but I didn't have any formal training on reverse engineering so I kind of wanted to combine those two things and I started to get pretty good at reverse engineering and I also really enjoyed writing code so the for me the kind of nice blend of these two was Network protocols and specifically malware Network protocols that's kind of what I specialize in for malware analysis

and there's my Twitter and GitHub okay so for the agenda today we'll be talking about uh we'll be going into the history of smoke loader and we'll be getting into the first stage that will analyze the set the final stage and then I'll talk about how I actually achieved a fully static config extraction for this malware family we'll be going into the communication protocol that it uses to communicate with the command and control and then we will talk about the payloads that I actually received from this botnet so what exactly is smoke loader a smoke loader is a piece of malware that is classified as a loader and this basically means that its entire job is

to deliver additional malware so you can kind of think of it like the UPS system where people can just send malware through it um it first appeared in 2014 it targets solely windows it's around a 30 kilobyte payload which is pretty small generally you see them around like 100 to 150 to 200 and this malware is actually written in C and assembly now pure assembly is not something you really see in malware too much but in this case which I'll get into later they actually have to do a good chunk of this development in assembly so while smoke loader is a loader it has additional plug-ins that kind of extend its capabilities that'll be for data

exfiltration and just additional actions on objective and whatnot and from a reverse engineering perspective people really like to reverse engineer smoke loader because it's highly obfuscated there's things that smoke loader does that people don't see in other malware families um and since it was since it first came out in 2014 it's actually has had continued development throughout its life cycle so they generally have a round day update every year or two where they add additional features the entire package if you wanted to purchase the panel a bot and all the plugins and everything it would run you about sixteen hundred dollars and what's interesting here is there is a check where it's make sure that the machine

that it's infected is not a Russian machine and this is something that you cannot remove this is like hard-coded within the bot everything else you can modify but this is a check that is not allowed to be bypassed and finally this is a multi-stage malware so it has that first stage and then if everything goes well with the first stage then it will get onto the final stage of the malware wow that does not look good um all right this talk might be tough with some of these diagrams um but this is basically the listing that they have on the Forum where smoke loader is being sold you can go and see what they're actually what they're

advertising and what modules they have and it's really nice from reverse engineering perspective when you can just go and see what features they have makes my job a little bit easier okay so the current set of plugins that smoke loader actually has is a form grabber and this is advertised it's really just stealing credentials that are sent in HTTP post requests and whatnot I'm not sure the efficacy of how any of these work I haven't reverse engineered these plugins then it has a password sniffer which is just going to stiff sniff the network traffic for like FTP credentials and various other credentials the low hanging fruit and then it has a remote PC which basically

acts as TeamViewer so it's not something where they have like their own session they actually join the session of the user and then they have a fake DNS plugin so if you wanted to redirect all traffic from google.com to some to your own IP you can do that this doesn't work with SSL so it's just purely HTTP traffic and then for they have a file search module so you can basically give it a regex and it will find all the files that match on the host and then send them back and then it has a procmon module which was kind of interesting frockman is generally a tool that people will use for incident response and

whatnot but in this case they actually use it to they can basically Define events and actions to happen when specific processes are created and then they have a DDOS module a standard keylogger and then the email Grabber which is just going to steal the Outlook address book and whatnot so how is smoke loader actually used let's say I was in the market and I want to buy a botnet I want to start my own but I don't have the dev skills to write my own so I'm going to see this smoke loader and it's I decide that it's the one that I want to purchase so basically I'm going to get the C2 panel a bot and

all the modules and I set up my panel and I start infecting machines let's say somehow I was able to infect 300 400 500 machines then I can go to all my friends and be like hey do you guys have malware that you want to deliver to all these machines that I have infected and they'll be like heck yeah why not and I can basically say well if I'll install your malware to 100 machines for a hundred dollars so it's basically this as I mentioned before it's really this delivery Network and you can have a single bot that is tasked to deliver 50 plus malware samples which just with with just its initial check into the command and

control so this process is generally referred to as selling loads so the other big malware in this family of loaders is going to be private loader basically does the same thing where they sell a service for installing your malware x amount of times but this model has flaws it's something that's kind of used by lower or mid-tier criminals because a lot of these hosts that are infected they're infected with like 30 40 different remote access Trojans info Steelers so you really have cases where they're all exfiltrating the same set of credentials over and over that people have seen for the last 10 years so the data you're actually getting is not going to be very valuable

but if you just need like raw compute power for DDOS or stuff then I guess this could be a viable botnet to use okay so now we'll get into the operational model so let's say I'm the admin and I have a couple partners from the command and control server I'm able to send the following to the Smoke loader bot I can send the plugins if I purchase them I could send actual raw executables I can send executables that are encrypted with rc4 and I can actually send URLs that point to clear text executable so it's a way for instructing the bot to download payloads from a remote servers basically so in the actual listing of the smoke

loader Forum post they actually say you have to crypt the panel they specifically say like this is not fud like this is a sample that you need to like uh pack basically so for those that don't know uh packing is basically if you were to think of your malware sample as a onion you just add another layer to it of encryption or compression or something and this is basically to defeat basic antivirus and things like virustotal checks and so on so let's say we have a pack of smoke floater sample and that's the sample that actually lives on disk here so then that is going to get unpacked to our smoke loader our initial stage of smoke loader and then

if all the checks pass in that stage then we are finally going to get within memory the unpacked final smoke loader stage okay so with this understanding of this botnet and having me talk with a bunch of other friends and them saying like smoke loader you know it like delivers tons of payloads it would be really cool to see what they're actually delivering I was like well like I have access to a bunch of malware feeds but what if I want to basically turn this delivery network of infections into a delivery network of intelligence data basically like I wanted to turn smoke water on its head and really turn it into a passive intelligence gathering

tool so this is the process that I came up with so there's a stage one for smoke loader the initial stage that is going to decompress the final stage of smoke loader and then from that we need to instruct all the configuration details so the command and control servers the encryption Keys the versions and whatnot and then we need to understand the network protocol because we need to be able to write a client for this botnet that can communicate with the command control without actually causing infections we basically just want to save off all the payloads that were sent so now we'll get into the analysis of smoke loader stage one so stage one is really where all the

interesting obfuscation of the malware lives the functionality of this is really just to check if the host is a viable victim for the botnet so it checks the Locale of the machine just to make sure it's not doesn't have like a Cyrillic keyboard or something like that it checks for sandbox artifacts virtualization processes and then if all this checks paths it's going to decrypt and decompress the actual bot now some of the main obfuscation techniques that smoke loader uses are going to be opaque predicates and then it actually has this technique called runtime function decryption and a a slew of other anti-disassembly tricks that's not great either okay um so getting into opaque predicates I

just want to give a quick introduction to what those are so in the top we have our incorrect disassembly so for those that can't see there's basically a instruction that is a jump not zero and it points to location X in memory and then there is a jump zero instruction which points to the same location X now as humans we can see that a jump not zero and a jump zero is going to cover all of your cases it's basically like if you were to write code if you had a conditional that said if true else if false like it's a condition that is always going to happen so we can easily see that this jump is always going to be

taken but disassemblers can't realize this they don't they don't have the ability to process this Boolean logic so the disassembler takes the byte immediately after that jump zero and starts disassembling from there but that's actually the incorrect implementation so in the bottom we have the correct disassembly where I told the disassembler don't disassemble from this location disassemble from this one so that's where we can actually see in the bottom here it's doing a pop ECX and then doing a jump that is actually the correct flow and this is not something that is going to have any effect on the malware it's not going to slow it down or cause any issues with its execution this is purely just to make make the

lives of reverse Engineers more difficult yikes okay so now we have the runtime function decryption so basically all the functions that are of interest to smoke loader it encrypts their function bodies so normally when you have your source code let's say you're writing a python application you have your function call and then you can just read the source code and see what's happening smoke loader doesn't allow you to do that so basically what it does is when it's calling a function that it has deemed important or basically eighty percent of its functions are encrypted it gets a reference to its current instruction pointer and it gets the size that it needs to decrypt or encrypt and then

when the function is called it will decrypt the rest of the body and then at the end we actually have another call that will encrypt the rest of the body back up so from a static analysis perspective this makes the malware really tough to look at because you're not going to look at valid you're not going to be looking at valid assembly instructions you're going to be looking at encrypted code so the only way to really statically reverse engineer smoke loader is to add additional scripting on top of it so how does smoke loader actually Implement that so in the bottom here on the decryption results we actually have a function that I threw in idapro's

decompiler after I did a bunch of work to decrypt the function body and you can see at the top there they have the decrypt code body call and then the rest of that code is basically what I was able to decrypt and then at the end they have another call to decrypt code body which in this case actually encrypts the rest of the body back up so even if you were to take a memory jump a smoke loader you would never have a snapshot in time where all the functions were decrypted so you really have to go the python approach and manually parse these function bodies so in the case of smoke loader it does this by xoring the body

with a single byte xor key so in this case they use EF um so the next thing they do that I found interesting was they have a way to get obscure string references so they have a call instruction here that does a call to location four zero two two four six and that basically does a jump to uh the address after these strings now for those that don't know a call instruction in assembly what it really does is it pushes the following address onto the stack and then it does a jump that's all the call instruction does but immediately after they do this call instruction they have a pop into the the ESI register so it's basically

giving ESI the address of that sbie dll which is the sandboxy dll it's a common tool that people can use for analyzing Windows processes and malware so this is just one way that they references reference strings indirectly and this also actually breaks disassembly and The Idler Pro decompiler so just another thing to make static reverse engineering more difficult so now getting into the actual execution flow of what stage one does so the first check that it does is it checks if there's a debugger attached and it does this by reading the process environment block which I'll talk about later and then it loads two dlls kernel 32 and user32 and if it detects that it

has a Cyrillic keyboard then the malware will exit but if that check passes it's going to load ADV API 32 and shell 32 and then it actually does something interesting it takes ntdll which is kind of your main Windows deal all your lowest level dll and it copies it to the temp directory and loads it from there and this is a technique that malware uses to evade EDR systems because EDR systems commonly look to see if ntdll is being loaded directly so so that they can place their hooks into the functions so in this case it's a attempt to bypass that um and then they have some basic checks just to see the check its own file name

so if you like if it's sampled.bin or virus.bin or something the malware is not going to execute and it checks if there are dlls loaded within its own process that relate to sandbox and then it checks if there are VM processes run so it checks for like virtualbox parallels VMware and so on and then finally if all those checks happen it has a a if statement basically where it will if the host is a 64-bit system versus a 32-bit system then it will decrypt and decompress the 64-bit payload otherwise it will do the 32-bit payload

so now if we were to think about this smoke loader initial stage in memory it's basically broken up into three chunks so we have the top chunk being our smoke loader stage one then our second chunk is going to be our 32-bit final stage and then the final bit is going to be our 64-bit so how does smoke loader actually extract this final stage within itself so it decrypts it with a 4 byte xor key which I have in the python implementation there and then it actually uses a algorithm called lzsa to decompress now for those that know things about compression algorithms they're incredibly difficult to implement um and I was actually able to find an

implementation of this decompression algorithm but it's in a raw assembly which makes it kind of difficult because you can't really call that from any python bindings and you can't really there's no C implementation no nothing so it makes analysis even more difficult there but let's say we were able to decrypt and decompress our payload and now we have our final stage of smoke loader so the final stage this is basically what the final stage looks like in a hex editor now if you notice there is no PE header so this is not a valid Windows executable so how does this load well basically it uh it functions as Shell Code so it's position independent code where it needs

to be able to resolve its own access because normally when you have a Windows executable you can rely on the Windows loader that is going to properly load your executable memory and make sure that you can make all the function calls that you need to make with shell code you don't have that feature basically so this final stage needs to be able to resolve those things all by itself so the main thing or the main features that this final stage really has is just C2 Communications and to inject and deliver or to inject and receive these payloads that's all it really does at the end of the day so since it's a Shell Code or

effectively Shell Code it needs to initialize its main client so it needs a couple things that it there's a couple things that has to do that normally you would rely on the Windows loader for so it needs to find the correct dll handle so this basically gives it access to the libraries that it needs and then from those libraries it needs to figure out the functions that it wants to call so it needs to be able to find all those addresses for functions and then finally it needs IPS and domains to communicate with because it needs to check into the C2 to get its list of tasks um and then we it needs the ability to

gather host information and in this case that information is sent to the command and control server or then the admin of the panel is able to filter their Bots by the various information that has been sent okay so how does it actually so how does the smoke loader sample actually able to do this well it reads something called the PEB or the process environment block and it's the structure that is present within all windows processes and there's and normally it's really for like additional metadata and debugging purposes but in this specific case malware authors love it for this one particular field this ldr field here and that's basically a pointer to another struct and that struct is this smaller

one on the you are right and that basically has a list of all the dlls that are loaded within the windows process so as you're unpacking your malware samples this basically gives you access to a bunch of dlls where you wouldn't have to or where normally you would have to do a bunch of math to try and figure out how to call these functions and stuff but Norm but in this case they can just rely on the in-memory order module list to get handles to dlls so on the right here at your right we can see that they're accessing the PEB and then they're load they're accessing the ldr field or member and then the in load

order module list so this Loop that they're doing here this do while loop is basically their way of hashing the dll's name so this is a common technique that malware uses where it basically can store references to Strings and to um yeah just just strings without having to place the string within the sample itself because let's say it had the string x64 debug or something in there you wouldn't want to put that within your sample because then just from looking at the strings of the sample you can see well okay well maybe it's looking for a debugger so maybe I don't use that specific debugger but in this case they generate a hash and then they

xor with a four byte value and if that value equals this C3 fd16d then they know that they found the handle for ntgll they know that they can preserve that and use that for later purposes and that is the actual hashing algorithm that smoke loader uses I think this is pretty consistent across all the versions but at this point it has found the dll handle for ntdll and kernel 32 and then from there it has these functions which I've named resolve Imports into struct where it takes the handles to those dlls along with a array of API hashes basically just hashes of function names and it will iterate over all the functions in the dll hash them and if it

matches the given hash then it saves that address to its own structure so it does this for user32 ADV API Ole 32 win HTTP and DNS API so the bot is basically initialized at this point there are two things that it does before it actually returns to making Communications and processing payloads it creates two threads where they basically check for malware analysis processes and they use the same hashing algorithm that I listed earlier to basically iterate over all the list of running processes and all the window names of all the processes and then they hash those and if they match any of the given hashes then they know that they're that a malware analyst is trying to look at

their malware sample so they quit execution there so if you're ever debugging smoke loader and it just randomly quits on you this this might be why so you might want to patch out these create thread calls okay so now that the client is initialized we need to talk about the network communications how the bot actually sends the data that it gathers to the command and control server and what data is actually sent so the bot will send the version of smoke loader so I mentioned earlier they basically have a One update every year and the version is just going to be the given year and this version is actually the first thing that is checked within the

um response on the C2 panel as well as the bot side and if that value doesn't match then it discards the rest of the packet and then it's going to send a 41 byte bot ID which is basically a concatenation of host information and effectively for my purposes I just set this to a random string just so that I never got any so that they couldn't block me by bot ID and then you have a hostname which in this case they have a max value of a 16 byte string again I just set this to a random value and then we have the affiliate ID so this is the field that is extremely important so let's say I purchased smoke

loader and I set up my botnet I would name my botnet Pim or something then basically every bot that checks in needs to have that affiliate the affiliate ID of Pim and if it's not that value then I know that it's a bad bot or that it shouldn't be connecting to my system or it's just not going to get any payloads so you really have to make sure that if you are extracting the configuration of smoke loader that you're able to extract this affiliate ID because it greatly influences how many payloads you get and then it sends the user's privilege just to see if the user is admin and then it sends the windows version

so smoke loader actually has support for three commands now botnets they all have support for commands and they generally will have uh five to ten commands or something but smoke loader only has three and they are really just for the purposes of modifying the final stage so they have I which stands for installed persistence they have R which stands for remove or uninstall so if you send them like a bad packet or something you might get a r response back that was something that I actually dealt with When developing my smoke loader client and then U is actually a update so if the admin of the botnet buys the new update or New Year's version you'll see this command to tell

your Bot to update to 2022 or something whoopsies um okay so now that we've analyzed stage two we need to figure out how to extract the configuration from smoke loader um so I really wanted to um actually wait so we have to be able to extract the final stage from the initial smoke loader stage and I didn't want to rely on any sandboxes because I just wanted to be able to do this all statically and throw a thousand plus samples at my code and just be given the final stage so I mentioned earlier that for lzsa there is no source code implementation but in this case I actually used the CPU or the assembly instructions within smoke

loader itself to decompress the final stage so I used something called a CPU emulator in this case I used unicorn and it basically allows you to define a start address and an end address and as long as you set your arguments correctly it will basically decompress the payload without me having to write source code for it so I know that this function in assembly is going to decompress I just have to set up the correct arguments and then I can basically wrap it in Python bindings and then I have this nice little python function that can decompress the final stage so at this point we're able to take 1000 plus samples and we can easily figure out the

xor key and we can decompress the final stage so what actually constitutes a smoke loader configuration so this Json blob here is basically how I organize all of my configurations I extract configurations from multiple malware families I will generally have a family field in there where I can easily figure out what the malware Family actually is and then we have our list of c2s and then we smoke loader actually has two encryption keys for network communication so it has one key that is used to encrypt data going to this command and control server and it has another key that's used for decrypting data from the command and control server and then we have that affiliate ID and

the version and with just this information you have everything you need to create your own smoke loader bot and to start a new uh a new bot so how do we actually extract these encryption keys so for some of these like mid-tier or kind of lower tier malware families the way that they actually work is when they have a bot Builder it's not like they're putting in the new command and control servers into the source code and then compiling everything and doing it that way basically they take the compiled code and they strip out where the command and control servers are where the encryption keys are and they save those as variables and then when they go

to build a new bot when you pass those fields in it will basically just copy paste them into the raw binary itself so this means for our analysis it makes extraction very easy because offsets for things aren't going to change they're not going to change bytes for assembly instructions so in that case I actually was able to use regex which I'll get into here so this is the python code that I actually use to extract the encryption key so we Define our regex now people generally use regex for Strings and whatnot but there's nothing stopping you from using it for assembly instructions and then we iterate over all the matches for that regex and then we unpack them

with the correct engineers and we append it to our list and return the list now I never had a case here where I got more than one result so I guess that's just showing that this technique can be really valuable when you have these like template Builder malware families and then we can do the same thing for version extraction we identify where the version is stored or referenced within the assembly instructions and then we create a regex based off that iterate over all the matches then we can do some light checking in this case I just make sure it's a value that's not over 0x ffff I should really just do the year um but this approach works extremely

well for the final stage like I don't think I encountered I think I processed 200 300 samples and I didn't have a single one where I wasn't able to extract this information so now for the command and control servers for this malware the way it works is they have a global uh array basically of string pointers or byte pointers basically where each one is going to be a offset to a encrypted command and control server so it iterates over that Global list of commanding controls pointers and it reads the first value and that's going to be the length of the command and control server and then it reads the next four bytes and that's going to be

your rc4 key to decrypt the command and control server and then finally we can rc4 with that four byte key and we can rc4 decrypt the length that we extracted earlier and then from there we have our Command and control so in Smoke loader they use anywhere from two to ten c2s from what I've seen they either have a pattern of in this case they use host file host and then some random number they follow that pattern a lot where it'll be a concatenation of three words and then a random number at the end or they'll just go full just random string mode and just have like six different domains in there that are all random

strings so putting this all together I wrote a bunch of code that could find all these uh these obfuscation techniques strip them out from the binary decrypt all the function bodies and decompress the final stage and it does this all statically so hopefully aha so it found a bunch of opaque predicates here and I actually patched those bytes out so it makes it really nice and easy to look at smoke loader in a disassembler and then we identify all of the function calls where it goes to decrypt the function body decrypt all of those and then we identify the start and end address of the 64-bit and 32-bit payloads and then we're able to decrypt

and decompress the final stage and then from that we can run our config extraction which is the output here in Json um and this is something that I'll open source so if people want ideas for how to do a more complex conflict extraction they can use this as a reference or whatever okay so now we basically have this pipeline where we're able to take a stage one smoke loader samples and extract configurations from fully statically so we don't have to rely on any sandboxes or anything so now we need to actually implement the bot of smoke loader so for the data that gets sent to the C2 this is the clear text Data that's actually sent

the packet gets encrypted with that 4 byte rc4 key that I mentioned in the Json configuration but the first value it sends is going to be the version and this is the first thing that the malware checks if this value is off at all it discards the rest of the package so if you want to do any fuzzing or any sort of like just analysis of the network protocol you have to make sure that you understand the network protocol so it's not just going to discard all of your data that you're sending to it but then we have our bot ID in this case I just used again just a random string we have our

computer name just set it to a random string then you have your affiliate ID which is the only field that really has to be set correctly and then we have our Windows version in this case I was pretending to be a Windows 10 machine and then we have our win bit I think that's I don't actually know what that one is and then the Bots privilege level so this is basically the privilege level of the user that executed the process and then we have our Command ID our Command option and our Command results so those are the three fields that are going to change throughout the package that we send to the command and control

server everything else fully static you can just hard code those in there throughout the entirety of the bot's lifetime and then finally there's um it appends the packet with random data or it's going to be data that's being exfiltrated so then for the response packet it's the same kind of thing where the first two bytes are going to be the version and it checks the bot also makes the same check but if it's not that value then it discards the rest of the packet and then it actually sends a two byte value that is going to correspond to a uh a number so in this case this corresponds to 48 and this basically tells the bot there are 48 tasks for you

to download make 48 requests to download it and then you'll get a payload back and then we have this hard-coded separator here and finally we have this plug-in string where it basically is a plug-in underscore size equals blank and if that value is not zero then it is going to or then the rest of the data in this packet is going to be all the plugins when they're encrypted and it basically instructs the bot that hey there are plugins that you need to process so how does this actually look over the wire I set up my client and I let it run and captured some traffic in Wireshark so in this case everything is made

through post requests to slash and then they have the encrypted rc4 packet but if we look at the response it's actually returning a 404 so this is something that you have to keep in mind when dealing with malware systems is they aren't going to follow like standard practices of like if it's a valid response return of 200 or something I think within the bot they actually check to ensure that it returns a 404 before it starts processing the rest of the data so this is just to keep in mind that malware systems are not they don't have to abide by the same rules that standard developers do so now we need to get into the order of

communications that the malware needs to send so we started off with that one zero zero zero one command which basically puts us in the botnet panel so at that point we are a live bot and we've incremented the number of bots that this Bot Master has and it's going to tell us you need to pull uh 48 tasks so we make a bunch of requests we send the command one zero zero zero two which basically will return the uh the specific tasks that we need to inject or write to disk or in some form or another execute and then we need to confirm back to the command and control server that yes this actually worked we properly

installed this malware and then the actual like I mentioned earlier that that this process is basically called selling loads so then it will actually increment the loads counter and say yes that is another uh load for this actor so proof of concept I wrote a bot for smoke loader in go and I set up my own command and control server here and basically what it's doing is I basically said hey there is a oopsies here we initialize our bot so we're basically registering the bot with the command and control and in my panel I had it say there is a payload at google.com or twitter.com and then I put a actual payload that is being sent to smoke load

or Bots so at this point um I have a fully working bot where we're able to pull tasks and dump them to disk with various metadata and we can confirm back that yes this wasn't exactly or this was installed when in fact it wasn't but the plugins I have yet to address the plugin so I'll kind of going or I'll be going over now what the structure of these plugins is and how to parse them it's not something that I really saw people talk about anywhere so if anyone's looking at smoke loader and this helps you then great so there are two structures for the plugins the main one is going to be this smoke plug-in container which is that

bottom one there and basically when the bot registers with the command and control it sends back if there are plugins it returns back a blob of rc4 encrypted data now that data is basically this smoke plugging container it contains information about how many plugins there are and then it basically goes into a loop for processing all the plugins so the first value for the actual plug-in itself which is this top struct here is it gives the size of the plug-in and then a 15 byte rc4 key which I thought was kind of interesting you don't generally see 15 byte rc4 Keys they'll generally keep it like 16 or 32 or something and they actually use that

rc4 key to decrypt the plugin and inject it into memory um now these plugins I didn't get a chance to reverse engineer them because it's kind of the same format as the final stage of smoke loader where um there is no valid header to it so it makes it kind of a pain to analyze and I just didn't have enough time um but now we were able to implement our bot and we're able to pull payload so now we're kind of in the stage where we need to set up a environment where we can passively pull payloads and gain that intelligence from this botnet so for the setup I set up a Raspberry Pi under my desk about 10 months ago the

bat the bot gather or the the bot gathered configurations from various sources it would register with the command and control server make the appropriate amount of download requests where we would get a bunch of payloads those would be written to disk and then we could post process them later so some caveats here I did not run the Bots for very long I basically let the bot register with the command and control and I went through that first Loop of tasks to pull so I never ran a bot for I think more than five minutes I never switched any geolocations or used any proxies of any kind everything was just done from my uh apartment here in Portland so I'm sure

my my IP appears in a bunch of smoke loader panels right now and I've randomized data but I didn't make it look believable like I didn't have like actual names in the username or actual host information in the host name so that was something I probably could have done better but I wrote all this information out from all these payloads into a CSV um that's basically what we're seeing there in the screenshot so some results I get that that's really small so I'll just read them out but over an eight month period I captured ten thousand samples and I was feeling really good about myself thinking like hey I got this really cool malware feed

and where I can like where I have definitive proof that smoke loader is delivering X malware at this time with this affiliate ID but really only 2500 of those were valid PE files and I just did not parse the results properly a good chunk of them were HTML files and like 403 responses so that was my bad um but I for all the PE files that I actually got I submitted them to the hatching triage sandbox and if it was able to classify it as X Y or Z malware family then I kept a record of it in that CSV so some of the most significant results I got from this work I apparently got 614 Deja Vu samples which

for those that don't know it's a ransomware written in go but I think the signatures for this aren't very good go binaries are quite large in nature and I've seen quite a few false positives with deja vu in the past so that might need future exploration and then we have archive and Redline which are kind of your standard info Steelers you're one of the mill info Steelers but what's interesting here is we have smoke loader delivering smoke loader which is that first bar there so that's kind of like this weird recursive system where smoke loader admins are paying other smoke loader admins to deliver their own uh smoke loader so you have this like weird recursive system of

smoke water uh delivering smoke loader but then some other interesting ones here is we have iced ID and gozi or isfb however you feel like calling that and those are kind of two more mature malware operations where they don't really have to rely on this selling loads model like they can they have their own ways of sending Mouse spam and they don't have to put all their effort into this which I thought was kind of interesting that they then were using this selling loads model to load their malware but who am I um and then also there was Topsy which for those that don't know tafsi is a spam botnet and so it was kind of this other case where like you

have smoke loader Distributing malware which then delivers Topsy which then sends more malware so it's just like these hosts get so mangled and just have so much malware running on them that they I'm sure that the uh that they aren't running very well so some additional observations when I started this and set up the bot I was not able to properly extract the affiliate ID so I kind of I knew that there were some main ones and in that case it's Pub 1 Pub 2 and Pub five I think there's a pub four and maybe a pub three in there even but I had it hard coded to these so we can kind of see

that Pub 1 is the most active one so maybe in the future I would like to figure out why that one why I got the most payloads there so some Reflections from this work I was pretty bummed that a good chunk of the results that I was saving to disc weren't valid PE files I think if I actually looked at my data throughout this process and saw that a good chunk of it weren't valid and they would have done better work to make my bots more believable and I would have added more IP rotation to the setup because within smoke loader in the panel you can actually say like I want this payload to go to bots in this

deal location and for the longest time since 2020 actually that was the only ver that was the latest version that was out but then three months ago they actually came out with version 2022 which I was not aware of so most of this work was done with 2020 but I tried my bot against the 2020 version of the panel and it works fine so they don't have a network or a protocol update it seems to be just plug-in related and I would like to probably reverse engineer the plugins just to see that they actually are doing what their advertised is doing I need to look into the affiliate IDs and how they're extracted the way that I

know to extract them is to read the last four bytes of state of the initial stage but that does not work for all the samples so I need to figure out for those samples where it doesn't work how that is actually implemented and right now the Unicorn component basically reads A entire like legitimate smoke loader sample it doesn't execute any of the other functions but it just feels weird basically having my software development product rely on a legitimate smoke loader sample so some additional resources I did most of my analysis just kind of with my own reverse engineering experience here are some cases where other people reverse engineered smoke loader they might have different results open analysis recently

did a fantastic I think three or four part series on analyzing smoke loaders so I definitely recommend checking that out if you're interested and then night wolf insert PL put out good resources on smoke odor that go over the obfuscation techniques and some of the other things that I didn't cover um so what will I be releasing from all this I'm going to be putting up somewhere all the malware samples that I actually got so if people are interested in investigating this botnet they go and take that data set maybe find inferences that I couldn't come up with happy to share my Ida analysis files if people are interested the slides for the talk and then that config extraction tool

that I wrote I'll be open sourcing that here I just have to actually remove that valid smoke loader sample and just strip it down to its uh to to just the decompression code and then finally a CSV containing all of the results that I that I got throughout this eight month period So yeah thank you I guess if people have questions happy to take those

I have a question about the payloads he delivered Ollie debugger as a payload and ran it would it disable the mount the the botnet yeah I mean you would you would have to tell the debugger kind of how to initialize it to debug itself but yeah I mean you definitely could like just start up a product you wouldn't even actually have to deliver all the debug like it just checks by the name by the executable name so if you want to like you gain access to a smoke loader panel you can go and push tasks and cause all the Bots to shut down totally viable wouldn't recommend it but you know up to you

so during your career of you know reverse engineering you were were you at any point uh able to trace back to the command center where it was hosted is it maybe in like a major cloud provider like Azure AWS gcp stuff like that and if you did what were those Cloud providers response um so I looked into some of these command and control servers a couple months ago they're at like kind of shady or hosting providers so if you send them a request that hey you have like a valid C2 panel they'll ignore you that's generally how it works for some of these shadier places um but I forgot to mention but the actual like 2018 version of smoke loader

the panel was leaked so if anyone wants to go look at like what a C2 panel looks like for a like Enterprise grade malware operation you can go do that that's actually how I set up my own command and control server for it any final questions all righty well thank you all I definitely appreciate you all letting me present to you [Applause]

Smokeloader: The Pandora's box of tricks, payloads and anti-analysis

Related talks