Pack Your Android: Defeating Obfuscation and Packers in Malware

Name: Pack Your Android: Defeating Obfuscation and Packers in Malware
Uploaded: 2018-11-08
Duration: 33 min
Description: Swapnil Deshmukh walks through techniques for analyzing hardened Android applications without rooting devices, focusing on defeating obfuscators, packers, and protectors used by malware authors. He covers static analysis of AndroidManifest.xml and classes.dex, demonstrates real-world malware samples

BSides Philly33:0013 viewsPublished 2018-11Watch on YouTube ↗

Speakers

Swapnil Deshmukh

Tags

CategoryTechnical

TopicMalware Analysis Mobile Security Reverse Engineering

StyleTalk

About this talk

Swapnil Deshmukh walks through techniques for analyzing hardened Android applications without rooting devices, focusing on defeating obfuscators, packers, and protectors used by malware authors. He covers static analysis of AndroidManifest.xml and classes.dex, demonstrates real-world malware samples from Google Play, and introduces tools for extracting and analyzing encrypted payloads.

Show transcript [en]

now we gotta make sure it's on there good just a quick audio check are you guys able to hear me awesome thanks great so our next speaker is Swapnil and he's gonna be talking about how to defend our Android devices from malware awesome so good afternoon everyone my name is Swapnil Deshmukh I'm here to talk about backing your Android what we would be discussing about would be how to unhardened the hardened applications that are already out there from an malware standpoint itself so the applications are normally hardened using the techniques called as office Cashion's or packers or protector itself and we'll see without even rooting your devices itself how can you extract information from there before we start

with our presentation itself there is a quick housekeeping stuff that I just want to run by you guys the first one is that whatever we would be seeing in this particular presentation itself the presentation the research that led me to this particular presentation itself and the samples and the demo itself is already a part of a github so feel free check it out if you had any questions itself you can certainly reach out to me and I can help you out in any form I could the second one is a disclaimer so this is what my company has asked me to state out the views and opinions that are expressed in this particular presentation itself is purely - an

individual researcher itself and I would really appreciate if we don't target - the company that I'm currently working or working on or have worked in the past so now that we have it out of the door itself a quick Who am I I'm a co-author of hackers exposed series the latest one that we have is mobile hacking itself that got published in 2014 the second one is I'm currently leading a security team for emerging technologies technology itself for a fin tech company a company that you regularly use for payment itself another one is I'm malware researcher as well we look at different malware's that are already out there either in Google Play stores or our trusted Play stores itself identify

them report it back to the Google itself so that they can take that application out a few places where you can reach out to me the Twitter handle is up there another project that I have started is an a malware framework itself it analyzes it's an abstraction of all the different malware's that we can see from an endpoint standpoint itself at this given point itself the focus area is very much towards the Android and the IOT side of things but gradually we are going to create this framework across for all the different endpoints that we can think of this the code itself is currently in first phase where it will just do a discovery of all the different

backers or protectors or office caters that are out there in the face to itself it will remove all the all the redundancies although all the hardened code itself to give you a human readable code so that we can understand what a malicious author is trying to do at any given point there is a live instance of this application that is running as well and that would be in the malware repository so so now so now that we have a level set around what I have done so far viii as a security evangelist as well why is it important for us to understand what Android malware is all about and how is it impacting us so these are the

few news clippings that I have gathered in past month or so that basically talks about the magnitude of mobile malware itself and how it's impacting us and it doesn't have a audio it's basically talking about millions and millions of devices that have been impacted at this given point itself this is about a malware that got released out in 2017 recently called humming bad it was collecting a lot of information from a user standpoint in itself and sending it back to a command and center the third one is where they're stating out that it's impacting 21 million uses itself and there was a handful of applications that they found in Google Play Store itself that was impacting users so the

takeaway from these presentations it's all from this news clippings itself is that there are lots of malware authors that are impacting our trusted place tours itself your Google Play stores or any third party place knows that you're trusting at this given point itself they have already infiltrated in there as well on top of it it's all impacting like 20 21 million all you all users so considering the US population itself it's one and every 11 people who would be impacted with malware so that's a very big surface area itself the third is that there was a so post report that came out with regards to the sample growth of an malware at any given point

and they are expecting they are predicting a 31 percent year-on-year growth of malware genomes itself that's a very very big percentage right of we we right now have a sample of close to two hundred and thirty-odd genomes so a 30 percent increase of that is close to 300 300 malware genomes in 2018 so that's a huge malware that would be released out they even state out that one in three applications that was released in 2017 so this particular year itself has some or the other malware genome itself so it's one in the three applications that you guys or we have installed in 2017 itself may be impacted with malva genome so it becomes very important for us as a

security personnel itself to understand what the malware is trying to do and how we can identify those so the main questions that we all would be having at this given point itself is there are tools out there open-source tools out there that will analyze a lot of applications for us right so why exactly is it so difficult for us to analyze and harden the application the main reason why it's difficult for Google Play or for malicious authors itself or antivirus as solutions rather or even for us to evaluate the code is because it's heavily office catered it's completely mangled code so we are not able to form a human readable code after D compiling it so which makes it very

difficult for us to read it along with that they also have something called as group detection checks so if you're running it in a compromise environment itself or if you're running it in a VM to to a memory dump they will identify that and they will basically deflate the malware content itself because of which it's very difficult for someone to identify it along with that in 2016 itself we saw something called as anti tampering being attached to malware authors itself anti tampering is basically creating a check sums of different code blocks that we have within the the byte code and those checksums would again have their own checksums itself and they create a pyramid of this check sums making it

very very difficult for anyone to modify the code or even create a hook so that we can put the logs around what what a packer is doing or what a protector is doing at any given point in 2017 there was there was some there were few of the malware that we saw that was encrypting SD card so your SD card information itself was completely encrypted and the information was hard-coded within your application stack itself however they were using something called as white box cryptography which will off escape the keys that are office gating or encrypting your SD card so if you do a decompilation of the entire code itself to identify whether where the pass codes

are it was very difficult for us because they create a secure world it will have an abstraction of an crypto library altogether so it was very very difficult for us to identify what those keys are to decrypt the data so that was a new trend that we saw in 2017 on top of it a lot of malware authors are already using something called as anti emulation which identifies that you are running either gdb or JD WB for example or you're running it in Google bounces for example which identify which validates the application before it's been published and Google Play Store and it will either create time box scenario in case of Google bouncer because Google bouncer

only validates an application for five minutes so this malware content itself will deflate for five minutes and once it's out in Play Store itself it will in after that serve which makes it very difficult for Google bouncers to identify those code or there are certain ones that will see that you're hooking it either through a gdb or J DWP which are which would basically create a break points for you and it will deflate the malware content itself which will make it difficult for us to like identify it so now we understand the challenges on why we are not able to identify it at this given point itself there are few research papers that are already out

there and there are few over here and there are few on the github as well that states out on how you can get this data in a human readable format however the tools that they are released out are obsolete in many different ways because the Packer has gone has grown over a period of time itself and the tool has stayed back in 2016 or 2015 itself making it difficult for us to understand what they are trying to do however they are very good stepping stones to a stepping stone to understand how we can read through a packing packer or how how can we read into a protector file itself so all all said and done there are few

hardening techniques that I've been talking about for a while office cater is one of them this is one of the easiest one to bypass at this given point itself office caters are either wood they will scramble your data serve your class name method names or variable names would be mangled in some form a very good example in case of Java we have package called comm dot b-side filly 2017 for example then it will change that from that to com dot a so it makes it difficult for us to understand what exactly that office keishon technique is doing along with that they also have something called as control for office creation so your if statements or try-catch statements

itself would completely change to do a switch statement with lot of dead code been put in so as an ethical hacker if you are doing a breakpoint around what this applications are doing at this given point itself it becomes very very difficult for us to track that because then you have to go through a different call function itself that will call a different return function or a different stop that returns a null value itself because it's a dead code it's not doing anything at any given point so that's why it becomes more challenging for us to read through the code when when you have control flow office keishon Java reflection and string encryption both together makes it another Avenue for us

to make make it difficult make the code very difficult to read what ideally they do is they encrypt the entire string and they put that in the file itself in the Java bytecode and on top of it they use Java reflection so they're calling class for name or method for name for example with an encrypted string and you first have to decrypt that string in order to understand what the next action would be that they would take so a few tools that we looked at this is this is a malware that was released out in May 2017 this this was using a tool called ProGuard which comes out of the box in case of Android itself this particular tool the

one in orange or the one that is highlighted in orange itself is how you basically office Kate class name but this this was a ransomware it will it will basically create a device lock for you and you have to pay point six bitcoins to them to in order to get the lock code back so when you D compile the application itself the lock code was hard hard coded itself so if you look at the one in the red the lock code was one zero zero eight if you traverse through the method itself what it is trying to do is its setting a passcode and then it's calling the EMM services the management services itself to put a lock on the device this

law can only be unlocked if you are setting the right password or you can reset the password rather if you are you if you have the right password itself in place so which makes it very difficult for a normal user to unlock their device itself so either they have to pay pay the malicious content itself or the ransomware or other or they have to wipe the entire device itself so that they can use it from that point on the second office cater is actually a that is office skating the code serve this particular protector is is called TZ malware so this was released in November 2017 so a month ago and this was impacting a lot of African market

itself so they basically were connecting to a command and center itself from where they were pushing a lot of code back on either an update on Google Play itself which would which would be a malicious who play and what they were doing was they were off the skating using Java reflections itself so if we look there they were using claw clasp dot for name with of you with the encoding after that and they have a function called on create and they had seven different instances with different arguments when passed over so in order for us to decrypt that data itself or decrypt the string we had to traverse back through all the different seven function calls that they had and after

that we were able to understand what library are they calling at its given point itself so after after a day and a half of like reading through the code itself we understood that they basically are calling a function with this particular method name itself so they were calling run and that run was basically loading the loading the file itself the third office kata that we looked at was an SMS service there was basically sending SMS to a premium number itself with your geolocation so they were basically tracking you at any given point itself they were using a very strong string encryption with with the non human readable code so as you can see this is the one that we got from

a Java bytecode we were hardly able to understand what exactly is trying to do itself so we went through so there are two things that they were doing they were calling an import function so if you look at the HHH thing that was the first one that got called for every time around they were trying to decrypt the data so that was our first entry point to look at what exactly are the trying and then there was a function call that was called with three arguments in place they were sending the strings and the two other two arguments were the characters so that was the function call that they were calling this function called the

first argument and the third argument was either ANDed or odd and the number of times it would be and or it will go through an or itself would be based on the second argument the character would be converted into an int first so we looked at those we created a Python script that would basically read through the entire auxin code itself understand what we what they're trying to do and get a human readable malware payload injection itself that they're trying to inject at any given point so makes it very easy for us to like understand what what they're trying to do at this given point the after office keishon itself the second hardening technique that we

have looked at is called Packer Packer does a dynamic of code modification off of the code itself so there are two execution environment that has been supported by Android one is the dalvik and another one is Android runtime itself so dalvik supports dynamic code loading at any given point so what what most of the practice do is they have a native code and that native code would start injecting new code back to back to this the stub that they have created and that stub would basically create the decrypted code to decrypt the real classes context and we'll look at few examples as we go along the second one that we looked at was in fly loading so

they modified the header bytes itself that would have the classes file in the buffer itself and they will pull that out from the buffer and decrypt that data so it was in plain sight itself that they will try to hide it so under the hood this is how Parker actually works there are in an application file itself there are two parts of it there is a compiled manifest and there is a source compiled source itself the compiled manifest is called androidmanifest.xml that gets invoked that Android manifest itself has a complete map of how to call Java files and what native files to be would be called as well and the source has the actions and the

views behind it and it needs to be called so the main thing that we are evaluating when we are trying to unhardened an application from a static perspective itself is the Android XML androidmanifest.xml and the classes context so what a packet would do is it will basically modify your manifest android manifest so that it basically first calls the stub which is the loader classes.dex and then once that stub has been called once the stub has been open itself it will decrypt the class's context that actually has to be executed and the buffer itself would be read in either in the memory itself or it can be done and file so the first one that we looked at

was released in October 2017 they were using a very newish Packer that we haven't looked at yet called jung-jae to this this one was loading a file a native file itself and that native file was reading certain header buffers and that buffers would take that information from there mangle that buffer a bit and then create a classes context from it so by looking at a byte by looking at the code itself by looking at the assembly code of the native libraries loading it we created a Python script that basically would Reese Campbell this scramble code that they have and pull that information from the header bite itself so the one that has been so the

ones that highlighted in red is the increase in the buffer size it's the header size itself a classes thought decks would have a header size of close to hundred and twelve ish or so bytes in this case they had had a header size of like six figures so that was a very good indicator that they are trying to hide something inside the the header header by itself so we pulled that information out in a read buffer itself after pulling that information out we so the first 1024 were the real classes dot text and after that was the header files so took that entire header files put it right in front of it now and the late

later 1024 was our classes not text right so we stored that in a text file used X to jar to convert that file into human readable code now and that's how we were able to pull the class of store text that is actually the malicious content itself the second one was another one that was released in 27 November 2017 this particular one was a fake whatsapp application it would first install a gaming application and after the gaming application has been installed it will it would call a command and center itself and pull a fake whatsapp application from there and replace your whatsapp application that you already have installed and push it as an update for you so we talked about

stops so if you look at Android so application tag itself they have an Android name called Android support multi text that's the first one that gets called for executing the load of file itself so it's security through up security they just renamed it to something else so after they make that particular call itself if you look at the screenshot below the one that is highlighted in red is where they're calling up the the stub along with the hash code itself this hash code would identify that it is this particular library that it has to call along with this arguments this the arguments will be sent out to a load library called secondhand this is a native code that

they had this native code would first do a device integrity check to make sure that they're running in a real device and it's not in an emulated device itself and the second thing that it will do is an application integrity itself to make sure that it's not being breached you are not hooking into emulators itself once that's done they will take that hash code itself and they will take a new file that has been stored in the device into a read buffer itself and decrypt that classes or Dex and store that in the memory now next next time after us after making sure that the integrated check has been successful it will call the command and

center and pull an apk file from there and install that on your device itself now if if that fails it let's save the the check itself fails then it will basically deflate this classes.dex from the memory itself and load a gaming application instead so in order for us to analyze this further what we did was we basically created a Python script that would be released out as a second phase of our engagement that would basically use this as a hash code go good go and talk to a secondhand native library itself considering that it's a universal binary most of the apks that we looked at our Universal binaries itself so we are talking to an x86

itself pulling out that hash code from pulling out the classes third decks from there by using the hash code that was provided in the code itself and we were able to pull out a classes not decks once we have the classes stored X itself you use text to jar to just convert that file from there on the third in the line that we have is Vicki Vicki was used for an adware so this particular application itself would click ads on your behalf this was loading to native files itself and then it was loading the text file the encrypted classes third text files that we had there was loading those and when we inspected the Java files further

they was trying so they will decrypt this data and store this and files folder itself within your application directory so during the run run of an application we just did an a/b ADB backup to just pull that information out from there and then further analyze it using the tools that we already know

so that third in the list is the protector so protector makes our life lot harder to be honest because it's a combination of both office keishon and Packers so protector basically would be using a loading stuff a loader stop but the information of how to load that stub itself would be office gated using the string or control flow office kitchen that we already have seen in the past along with that they will have dead code injection in there as well making it difficult for us to read read the code one such example that we looked at was text protector this text protector itself was loading a stub as you can see through the application stack itself but

along with loading this tub itself the class's context was heavily office catered at this given point itself so it was very difficult for us to like that racket on what exactly are they're trying to do so we went through all the different seven classes of office keishon and how they are trying to do the office case in itself removed the dead code and after that the first thing that we came to know is it's calling another a Java class that will have the stub in place so we got god that's a text protected dot name was one of them that gets involved in both so we went in there we got the stub information in the

hash code itself that they're using and that provided us with an avenue of how we can be compiled the classes context so now that we have looked into all the different office keishon techniques itself there is a quick demo that I have this demo

[Music]

so this this demo is basically the the repository that we have so at this given point itself we have 427 different genomes that we have identified from a from an android standpoint itself they all have been published to on a repository itself so this information would basically give us different types of payload and payload injections that they're trying to do and it will provide you with more details around from where you can find more information about it along with that we also provide an avenue of osterlich upload an APK itself do a static analysis of that application that has been uploaded the text file that has been uploaded to do further inspect what kind of protector or what

kind of a packer out there using so an example of that so this is an example of one of the packers that I uploaded recently so as you can see it provided provides me with the Packer information that we are using and it also provides me with where the application is getting injected with in the manifest file itself so these are like the pointers to us how I can analyze this either analyze this hardened application further so so now that we have seen a quick demo of how the application actually works how we can see an hardened application itself and how we can retrieve the classes dot text which is the Java bytecode itself that has the information

the malicious payload itself where can we go next so we are already creating malware genomes as it as you can see now the next step within our process itself that would launched in January 2018 itself is to do a static analysis of this application so now that we have Dex protector and we have the map also now that we have classes taught Dex and we have the manifest file itself how can we marry both the things together itself to understand the entire call graph of it and based on that understand what kind of her protector are we using or what kind of a packer are we using and break that to get a human readable

code itself so that we can analyze that further to identify what are the malware's that are already out there in our trusted app stores of the second thing that we want to do is do a dynamic analysis of this application as well to further analyze what premium numbers what information has been sent to this premium number itself and that would be done by creating a lot of hooks within the application itself as we identify statically what the application does so this dynamic analysis would be done through packing packing of an application so that being said we are also looking at IOT as I stated before this is one of the malware that I am researching at this given point itself

that connects to your light and that information is being sent out back to a command and center itself in two forms in form of an MQTT format itself that has information where the light has been turned on or off or do you have any certain home alarm that has been compromised and that information will be gone going back to a command and center itself along with that they also have an XMPP protocol that will send out your information with regard to your location or with regards to your SMS services itself so we are looking in details around that and this is a very good example of it because what they're trying to do at this given point itself

is create an MQTT host and protocol along with that the string 3 and string 4 that you're looking at is the username and password that they're sending on rbr on behalf itself - to attack us further with what information they can get so that being said that's all that I have from a presentation standpoint itself let me know if you have any questions with regards hey thanks for the presentation I had a question about the the libraries that could be used for for creating something like this is it the libraries themselves that are of the problem like the the frameworks that people are using or they just where people actually knowingly creating these just to make exploits meaning is it if

it's just a college degree in our regular app using the framework that has all these exploits already built into it or they just creating it themselves that's the first question so there's a mix of both that I've seen so far there are 30-yard standard ones that are out there that does a packer or protection in a certain format itself however in the recent time itself like the outcry that we looked at or jung-jae two that we looked at in this particular presentation itself those are homegrown those have been made by certain malicious malware authors itself to further off escape the code in such a way that we cannot read it so it's a mix of both at this given point and the next

question would I kind of saw a brief bit of the poppy see that like how what is what exactly does that do is he just looking at the manifest file and then following the class that that Java or how can we look at all that the details are I guess true so in order for you to in order for you to get the loader this loader stopped installed you first have to it has to be the first one that gets called and the only reason that only way that you can do it is through the the application tag itself so when we are inspecting that further the first injection point would be application tags and that's the reason we're with

that's the first one that we are looking at Thanks any other questions great another round of applause then for

Pack Your Android: Defeating Obfuscation and Packers in Malware

Related talks