
hi I'm going to talk to you it's talking about some toward they don't know yet Anubis and the work that you did on a particular pocket that is interesting because it has some some characteristics that make it good for presentation it's not too sophisticated but it still uses some people and some interesting features and also it was first detected and we didn't know what it was so it's good because we can show the whole process from the detection of the EGA to the understanding of the malware and the reversing doctrine of the DJ algorithm so short shorts what about me cap bread technical security enthusiasts started working in security some time ago and we penetration test in technical security
consulting highlighter started doing some forensics which eventually led me to mom we're reversing and just started to really enjoy it and finally now or get a new piece networks where I get to do rehearsing all the research but net tracking so it's pretty cool so what we're going to see the idea is to give you a short overview of what we do at Anubis and what we help however the whole process from the detection of a new DGA to the detection of what and the understanding of what malware it is on the first part of the presentation we shall i'll show you a bit of how we detect DJs and and on the second part
will go a little bit deeper into reverse engineering and show you how usually the reverse engineering process called we use it to understand the inner workings of nowhere so detecting ETA is malware first of all just is charged side to give a short introduction to in gso GS our domain generation algorithms so in fact there to simply algorithms that are hard-coded into the male and that generate the domains that are used for command and control of death mother so the main objective of using these years from a malware point of view is to increase resiliency so it is harder to take down if you take down one current control center the bar where we'll keep
on generating new demands and all the attacker has to do is register one of those domains and the malware will connect the infected pods will connect to the new demand so it's harder to take down these kinds of buttons and so in the dj's there are different kinds of each yet so there are fixed the dgs that simply generate the same set of demands over and over again when example this is the rounded but yet which just generated a waste of 300 I think domains so and it just iterates over those three hundred domains there are so some DJ that are hard-coded that have a hard-coded scene so the seed is the development or a
number that is used to produce a different set of domains what the podmaster can do is after a while he simply changes the seed and send the new campaign and the new malware in the news the new bots will connect to a whole different set of domain so even if there is something holding and some people detecting a group of hosts the new sealed look us the DGA to generate a completely different set of demands another common type of DJ's are the time-based CDG eight so here the seed is based on time so the malware will not generate the same set of the mines it will generate different domains every day or every other because it is based
on the time and finally there are some tougher ones like the external value-based seem honest and good example of Jesus is the beat em malware it has been recently published the DJ has been recently published what it does is it uses the currency exchange for a given day as a seed to generate the domain so we cannot predict the currency exchange for tomorrow good we would be rich and the gas from the buttons would not need to do to create button exhale so what it does is it great and you shall tgi's every day that's from between and predictable even if you have job rhythm you cannot generate eg8 further you cannot generate the domains for the future so now on the
left you have an example of a DJ that we so recently it just looks like what it is it's just a random there actually is my friend it's a algorithm generator but it looks like random domains and that's that's an example of the DGA so how do we detect it this slide shows a bit of how will it take and how we get to the domains and how do we get to sync hold the domains that the memoir connects to in order to show them at the cyber cream product which is the flagship product of Anubis networks now we have two main entry points which are the top boxes there's a network-based DJ detection and
there is no file based somewhere analysis system so on the network based in a detection what we do is we process network traffic from internal and external sources and we have algorithms to detect when something that puts like a DJ appears on that traffic and wind ens craft so when it does we detect a new DJ and then we use that to cinco it and just start receiving connections from those buds another thing that we do is we do file based power analysis on an automated fashion we run thousands of samples every day and we see if those samples try to connect to non existing domains for instance a sample day is executed and tries to connect to a large
number of non existing domain is probably running a TGA so that's another way to detect the dgs and finally what we do is we use the information from both these sources and try to connect them together to understand what malware is connecting to each each yet because lots of times we identify the ga we know that it's probably malicious but we don't know what power it is what family it isn't what is it why is it good at connecting to those domains so just using a short example that similar on the on the right is a is an image from from an internal system of ours that shows and EGA being detected so the little the little dots that you
see our our hosts so you see that there are lots of ideas that connect to lots of different posts when did and the same IP is connected his hosts so this this is a typical pattern of a TGA and this is what lets us know that hey there this is probably ETA let's look into this let's sing call it and see what what it is so what do you is largely automated that still has some manual processing at the end the detection is largely automated we detect the TV is as they appear and we use our see how infrastructure to sync home plus dimension process the thousands of hits the thousands of hits per second that we
receive and the millions of infections they pertain it appear after that the TGA processing is mostly manual some some parts are automated what can be automated we try to automate it but there's still some manual investigation going on and so this is what we'll show you next what are the advantages of this base TJ detection it's we can detect new dgs as they appear as soon as something appears on the internet trying to to connect when a new morality's infecting machines and all the missions are trying to connect to these amounts we detect it and but most times we detect it and we don't know what it is we don't we know the domains and we can see parts of the
protocol that connect to our cycles but sometimes we don't know what it is and let's bring this to a question that appears lots of times there hello miss you lots of emails have discounted here what partner is this because sometimes you just don't know what is this brings us to the next part of the firebase automatic malware analysis it's a sandbox that processes tens of thousands of samples there is constantly processing updated samples and it does catch the following features or because it is a pretty common features from frozen boxes these days the basic static analysis it finds the be Heather's the strings values the umpires exponents of that basic safety analysis that it does
anamika nalysis so it runs the malware and in charge to find what the mower does which is what fat creates what registry keys it creates or changes new taxes and network traffic what network traffic it generates it also has a nice feature that it is able to use physical machines there are some malware that won't run or smt the measures implemented to detect virtual machine and so we have some instances that Doraemon physical machines and they are able to run this power at ultimate so it is critical stuff then there is also a public for instance if you want to try it there's a you are out there is his free you can use it anytime you want to
just blow the sample we don't run it give your report and so this is what it does in terms of samples gives us a report and you use it for two main things first to detect malware if you're using PGA's and we do this by detecting malware that tries to connect to the mains that don't exist because when you use it oh my when you use alleviate most of the domains that the power connects to will return run existing and that's that's the way to detective and the other is the main reason why we use this sun sand box is to detect samples that are already see cold because we detected them with our network based analysis
system but we don't know what they are we don't know what family or what power does it so we can implement gallows and smart rules into the sandbox and detect those patterns that we are seeing on our sink rolls so we can get alerted as soon as the sample that matches those patterns to sound this is an example of the ultra interface and you can see here this step shows the NX domains so the fourth column shows the domain that the malware tried to connect you that doesn't exist and you will see that there are this is just a static image so we won't cheat but there you will see that the malware that connects too much
domain that's probably a TGA so this is an example of they've kept their charter so finally we've made all this and we identified the vga who are we to network analysis or we did for the teacher to five analysis when what we do next is we try to aggregate this information so that we can add it into the server field and it give it some contexts we need to add into the soccer field until at least tell what family it is that ideally we need to give some more information than that what mount one's behavior is nowhere as what does it do what is it trying to put down from its intentions so what we do is put on all things we
try to query all the sources of information that we have try to identify the smaller as soon as possible if we can't identify it we will put signature to detect them and finally when we detected what we can do is you can reverse engineer the samples and understand what they will be too if they have a DG I understand the TGA what it does and finally a quitter will pacify all this malware and add context to side of it so this is the first part that shows you how we detect smaller the next part of the presentation will be focused on the reversing part of it so so the whole story is we detected a
botnet the Piron run on the network face detection systems those are some of the domain sets which we saw showing up and we didn't know what it was we seen hold it and we looked at all the Aussie sources they were at the time and there are simply no information about it we saw that it was targeting almost exclusively the US and you a but we didn't know we saw the traffic patterns arriving at the sinkholes we saw the domains but nothing there was no and now it is on virustotal that showed anything like it or I'm malware nothing showed anything that we could use to identify this person we have to create signatures and wait until we found something and
this again brings us to that question it shows up sometimes what what what bug net is this so what we have to look to try to understand what button is this we had that humility to sorry the US and UK not much community hell so curious HTTP headers you can see here it has this check this is this is the data that we can extract on cyber feet and what it has is this tells an HTTP header in check we do lots of related that strange not commonly seen so we knew we had this check ever we try to find some information about malware it has a check Heather on the network communication and knock each other and you can sell me to
other interesting features it uses in each end user agent saying people's Microsoft crypto API 633 which seemed pretty legit you will get it also use that earth mean for all URL that's that's in the real path that microsoft windows uses to download the certificate revocation list so it was probably trying to point in into common network traffic by using a URL is used by Microsoft Windows messages process we appearing on on with proxies and on proxy walls so it was time to blend in this is what was the information that we had with it we used to create our signatures place them so eventually we found the sample once we have a sample the immediate questions are so is this
categorized and detected by antivirus software is there somebody who made the reference paper on you two already reverse smaller those what it does do so what is it do and another important question is do we have visibility over the whole botnets on our sinkholes and if not can we increase it because if you remember from the types of tj's if it is for instance scene based DGA be there can be several seats and we are only sync only one of the scenes and we are not seeing the whole button it's just a small subset of it so we need to understand what kind of TGA is to understand be free if we have it all and
if not how can we get on the other seeds and increase our visibility also we received that check header and some post data on the sinkholes and we wanted to understand or what is this vector you totally seem to encrypted it was pretty much Trevor and we didn't know a request we had you want you to understand because you saw with their information you that we can use to for instance identify unique infections instead of eyepiece so we wanted to understand what it was so we started doing some reversing optimality we started doing pretty much to the standard reverse engineering process of malware so we started by using basic safety canali it will toward strings inside malware the
heathers of the PDF file and imports and what we found was that those strings that were going useful strings in the Malheur the rest was just courage I think readable the imports were also on Windows densities feels pretty small table of imports so this told is that most likely is no respect so most malware uses some form of packing in order to hide respondents and make the reversal process a little more difficult and to avoid anti virus detection and what it first it uses a stub that will if you have a section or part of the sample a part of the file that will contain the real cold in an encrypted or an encoded format and then
it will have a small particle that will run it will run to decode the white part of the power and then it will jump to that part and run the real cop that's its distinct simple overview what what what acting is so it became a script or it can just use X are you several the ways of doing it softly perspect okay not much we can see from basic static analysis so we started by reading it and by running it with Swiss Miss awesome certain things we saw that if the once you friend created a few files it creates a file called software protection platform as BBC that a deed holds pretty legit added it to system 32
directory it creates some temporary files and random names all right and it uses a common key issues for registry key that's used for persistent so we tell in itself to current version ran so every time that the user was in it would run so it it it uses this feature to add persistence on the system and then if that wasn't enough it still at this second mechanism for persistence we created a windows service and also pointed to death first time and finally it started to try to resolve all of the minds of the vga and try to connect them so this is what we have from the panic from the dynamic analysis but still we
knew pretty much the same we knew few files it created we knew it connected to the domains that we were seeing hauling so ok this is the sample this is the fun but still what does it do this isn't tell us much so we have to do some reversing on it first thing we did was try to identify to pecker how is this our bags and what we did was use the common program that is for reasons PID and we ran it and it identified it has been fasm one but five this is grand assembly so it was telling us hope this is written in assembly there is an object so will you in perspective
probably just a pecker written in assembly that was the conclusion that we got and no matter there are a few other pecker it turns out there and i'll return the same results that says there is no packer ID so we cannot use a no writing mate and back there are some for instance upx is a common factor and there are you can end back in choosing ups with respect like that just run at all we have the original power on our hands so we have to do some manual and back anyone taking its most of the times it's not hard it's it's just a question getting used to it it gets easier with time so what we did was read that Holly
debugger to the process when and use we started the momma and we set a breakpoint on the ritual of our function this function is commonly used to allocate memory region on earth memory that the mallard and uses to write the impact the mower code and assembly into so it will just jump week after so we knew that once we got to the execution of the ripple log function then the real problem would be fired so that's what we did and after that we just got the return value from people work which gave us the memory section that was created to store the the real code and then we just created breakpoints so they're going to code
started running in that section we knew well okay this is now your regional cult that respect so and at that point we would dump the process yet what we do we have some some examples here that show that sure what we need so here in the first image you'll see the virtual the function just finished being called then you can look at the ebay x register on the antis the red circle is that shows you intend to return the value of the function so we knew that probably this hour we'll do it you know code would be written into that section and then move that section and we saw it tell already read Brian a few privileges so this is
probably it and power will jump to it and execute from there so what we did was we just set a few breakpoints and finally the malware when it on this session once it's landed here is well it's not very easy to see but this lower section can see the MC Hammer that's typical from the beginning of the pp5 and there is the beginning session so we just use the plug-in for 40 any bugger that's called holy dump and we just dumped the memory section when set the appropriate entry point of this charge and we got your original code of the mat workshop it was no impact after i'm hanging things were a lot different they're well mark imports and there were
several functions you can be streamed so there were no names on the functions which we make the universe's a little bit different and there are lots of strings now that had some meaning so you can see Constance their head the URL in the path the pki and escort so all that was now accessible and we could start reversing it so after and backe is we wanted to know about what physic do after all and it was to be running streams we saw a lot of information that give us clues about what if you it was it has strings like screenshot probably big screen shot and a lot of functions that are used to take screenshots report it also it had this
TV info something probably get her she's a meaningful a 10 yet dot PE m dot exe it so there's something with PDFs and it had some strange names but he gave us an idea probably did it also had a group of comments that we prevent it given this is the meaningful task least run in verse registry keys sir it's pretty much gather some information that's what we saw so following this if you have Doug PDF but yet see the strings in memory we saw using either bro where it was referenced and we started dressing assembly you see what it did and we got to the function and we understood by just skinny we didn't have to just 22
radius and we learn by both sometimes if you just came through the golden see what calls our calls are being made you get the general idea of what it is doing so what we saw was that it was actively searching for dot PDF and not txt files on system and it damn did something to it and replaced them we did the same files for instance if it was a little touch a PDF file it will replace it with a file that was just with the same name and dot dot exe so it was probably infecting the files with itself and then it doesn't like what it was doing looking further into it we started to
see that it also meant some men in the browser capabilities it hoped the libraries of Internet Explorer Google Chrome Firefox in fact all teaches those two areas and ser trainers careful and in internet explorer hooking function it had a hair this is memory structure it is pretty useful because it just shows the name of the functions that hooks and underlying all of it the the function let it replaces in it so it was pretty neat devotional exactly what was being content replaced by what it's pretty easy to see what it did if we called edco that didn't put it out here to just be too much we something to people is capturing every post that the
browser it has free HTTP POST and sending it to the government control so it was they didn't have a typical phone from our to have configuration files that will show some targeted bangs and it was secure information for a specific bank or something this was just capturing everything if it's a post request we wanted until send it all the way to did the government controls so after this we just walked through the code and we found the main rule after each we can send to the government control code so every time intensive into the content control hood right when it receives the answer we did one of these three things so it will take a
screenshot or it would be exact q 2 final or it would run the season 4 copies of course these names we gave their fees net the names are just super their names were just with random numbers that no it was it was a number that I don't usually use them and so it didn't have any means we manually why would we reverse it given his full name so it was really easier to read so at this point we knew pretty much what about where did see we have fewer other answers we wanted so we wanted to know what the DGA was what it would buy and we wanted to know what the network protocol was and if we could extract
information from it so we started work on discovering the UGA discover DGA the same kind of method was was used so we found a string that we knew actually not a string we found that we needed in connected using HTTP so you looked for HTTP related libraries that made a request we found internet open and a few other related libraries and meet them okay this is the function that makes the request to the sea do so we just backed traced the actress the code and assembly until we found the DGA so if you make trace it using other products a way to it has this feature that creates the culture that you can see here so we were is we need internet
the night loop well it had that internet open so we just asked it to create a chart of all the functions that would fit into it and he did and so we saw that which was felt by post base the screenshots is simple and they all went to this free this function that we waiter named prefer connection so most likely their printer connection is getting the domain that will be used for the party collection so that's where we started looking I'm eventually in this system before its installed there you can see that will also initiate we saw that there was a function being called just before the prepared connection and you can see that the repair connection
receives the URL it's a big aiims are surfing it receives the HTTP method both receives the season Howe and it also needed to receive the demand it did because it was built on that previous call to the function gabon member we renamed the vga so started looking into that Rancher because not too complex but it helps something some it had several sub functions usually there we have to read it a slip just reverse it back into code and so what ensure here is that this last line shows you the best part of your where he decides if the top-level domain movie but info arc net or com and that section shows you again yeah and then
section shows you hear the an initialization string that you choose and then it went through some mathematical operations of younger to create the string of the domain then I didn't put here that there was a thought skimmer that would connect these two parts so we just reverse it and here was the bundle is fighting for it so this is this part of its same algorithm between in vital and this is this part so it has this initialization string and it goes through some operation and it just generates the domain so at this point we had discovered the ETA and we could now edit for DJ database and we knew that it was technically seated here is the whole
DJ go and if you work here this is already seen is included it was included in another section of the code it's actually probably be really hard to find but it had evaluated it was on a global variable and the malware rather use this and you couldn't get in time change the seed and the malware would just try to connect to other domains another and would generate a different set of the lines so we knew that we could not win we might not have coverage of the whole botnet and we might just be seeing one seed of the botnet and maps hello however we started searching virus tone tone with getting our hands on every
single that we could get that was related to to this malware and we didn't see any sample that had a different see so that was it unfortunately after some time there sometime after we start reversing it oh no actually we start to reversing it at some time after it was started so the Marvis at the same time that we were reversing it the other was making a change to the to the DGA and it was it was basically saying that the scenes 32-bit data based only she started to be used only as a fallback mechanism so mower started using hard coded IP addresses for c2 and only when those who are not reached because they had been
taken down would the malware start to use the g8 to try to regain connection to its dominant across so this was not good for us because when the DGA is our fallback mechanism we could be going only see them for a while the malware author regains control of the botnet and we stopped seeing them for a while so it's not good for us but still we had an idea that we had covers of the whole botnet at that time cuz contrariwise with its few thousand infections and those infections started and staying there so the malware writers just abandon them and they slowly decreased over time so we have a vga it's just nice we knew what what mama did we knew
how it was connecting to us we need that we had coverage of their toll but had death version at least and we needed to understand the network protocol so this is a this is image from Wireshark that shows the HTTP request that was arriving at our sinkholes southern plus is this a common post request that fact the Microsoft survey to significant revocation list that will request to strange cutting / suppose to borrow the certificate request certification with strange you can depth check header that was strange and then the post paid any perceived we routed so how would we go and try to understand this well just followed the same process because all the streams were acquitted so we started
looking for the check Heather the jet stream remember and we went we found it being used here so we just choice to go and saw what it did and if you look at you here there's this this is a function that this and that use student basically a student copy and it uses the check error and something that was encrypted before it so any chosen to create that part of the request so we knew that it was what it was we did to know how it was built and to do that we saw just choice the gold a little bit pepper and we found this this string was also used it was most likely the thing
that was used to send to being sent for to encrypt country and before it helped Islam so the check ever had a version client from get her name yo ass groups some to fresh start and less sinta so this was was what was english those check helps so okay it's useful it would be nice to get the computer name is a client and the group this group would most likely beat the met the usually buckets for some of them are grouped into logical groups so that the bud masters can whisk them or keep track of what our new infection infection so you to be nice to see if the route changes if we have several groups so we
started to see if we could understand how the network protocol was being encrypted and the grid function don't shot here we open it and it was scary it was enormous it had lots of sub functions and it's the name of it was had named in it so with our symbol so it was just terrible and it would take forever to try to reverse it line by line so what we did was we use a combination of static analysis dynamic analysis guess where it gets over it try whenever so first we try to identify each of the sub functions without actually reversing it to do this who knows how long it is but thanks we will
scheme to dissemble you understand how they would connect to each other and then finally we would battle weekly NOS the malware with a debugger to try to understand the missing bits and to verify the exemptions that we would make we would make during the process so to understand what functions were caught without reversing it we used there are some tools that give us identify common cryptographic routines and we knew this pen to have some crypto in it so what we did was we ran the very hopeful as I missile and I discover what it does it just searches to the the binary code and it looks for magic numbers cryptographic magic numbers and lots of quadratic
functions use them for instance you saw this Pentagon that was related RC six so this one that was joined in too crc32 although 75 magic numbers children the nectar number generated so we needed in these dis the functions that content each of these magic numbers well the function at Kentavious was amplified for the Pegasus assertion 32 this was actually sit so we already knew some things about it next we just needed to connect the dots after that there were still a few functions remaining that we didn't our what worth what they were and but looking at strings we saw let it have invented a version of I believe this is a library that used to pack
commonly used to pack malware tesam compressing function so the Maori was using this to compress part of the pillow so it will it be so large and so connecting the dots we solved it what day was it ran the pad function to pack the peyote it then created the checks on encrypted it in the encryption it uses the term random number it used md5 a fun RC six so I just threw it open it and what it did was this species that goody that we got it chooses the perfect Xavier and it uses this for both the jackhammer are the post header on status which is the same seven function to ingredient three games the quickness
very ones to send the fans in size he doesnt sit here to the searcher he packs it stunted and he tears it to the structure out rate to check sir and adds to the searcher then it's increased this using RC six with a random key that was generated to the remember generator and it chooses an md5 and his concatenates this and defile her enemies and it improves it using a symmetric crypto geometrical part was the tough part because most asymmetry will at least not our essay or most common ones don't use magic numbers so we couldn't find any her here so finally we got to the Bible it was depicted bubble and again it had
the key and the md5 unfortunately because it was a symmetric reto we could get the fumbling key not the private key and so there was no way we could decrypt this it wasn't my sexercise but we would just COC random later arriving at our signals so just to finish up what we did we found a new one on DJ on the internet of the power our mechanism started attacking what seemed to be a pattern what seemed to be in vga so we see hold it and started analyzing the traffic that came into it we will print try to understand what it was we couldn't find it so we created signatures and waited for example and we find we found a
sample we just reversed it to understand what it is what was the TGA the TGA part is important for us because we need toilet for internal database of TGS and because we need to understand we have full coverage of the buttons and finally we need to go up we didn't need it but we intend to understand the network protocols to see if there are some useful information that we could extract so this is it of hope you found it entertaining at least to have any questions