← All talks

Discovering C&C in Malicious PDFs

BSides NoVa · 202125:2838 viewsPublished 2021-06Watch on YouTube ↗
Speakers
Tags
About this talk
This talk explores the structure and analysis of malicious PDF files, covering binary file formats, obfuscation techniques, anti-disassembly methods, and command-and-control indicators. Through static and dynamic analysis demonstrations, the speaker explains how to identify malicious code within PDFs and conduct effective malware investigations.
Show original YouTube description
Presented at BSidesNoVA 2021 on June 5th, 2021 Demonstrate different kind of structures in the binaries as a PDF(header/ body/cross-reference table/trailer), explaining how each session works within a binary, what are the techniques used such as packers, obfuscation with JavaScript (PDF) and more, explaining too about some anti-disassembly techniques, demonstrating as a is the action of these malware’s and where it would be possible to “include” a malicious code. By the end of this “talk” it will be clear to everyone, differences in binaries structures, how can the researcher should conduct each of these kind of analyzes, besides of course, it should seek more basic knowledge, with file structures, software architecture and programming language.
Show transcript [en]

so hi everyone welcome good morning i'm trying to announce our speaker felipe perez whose talk is titled discovering cnc and malicious encoding and other techniques so real quick some housekeeping uh the platform automatically ends at the time the scoop is built in there's nothing we could do about that this is how it lives but here's prime we'll have a q a during the last three to five minutes right to ask any questions during the presentation we don't have time please join black and ask your questions in there uh we do encourage you to stop by the sponsors area in the expo channel which is on the left hand menu for more information with resources and

career opportunities there's also an open invite if you're in the area for those with the b-sides nova ticket at punchbowl social in arlington tonight starts at 6pm and with that i'll hand it over to ours so thank you guys thank you for inviting me here and uh today we're going to talk about discovering cnc malicious pdf right and uh talk about me you know it doesn't matter who i am and uh of course i need to present something about me and uh here in my contacts on social medias if you'd like to you know this is my webpage with some information about me and i have a talks about some conference that i am talking and

uh i have there some articles that i've been published and here my twitter and my github and if you'd like to send my questions and to talk with me and share acknowledgement i really appreciate right so let's uh introduce myself i'm a security developer location and security research at supernovation it's a brazilian company provide responsible to provide some you know different consulting services not a consulting service but it's a developer consulting service right and uh i'm advocate for this awesome project hack is not a crime responsible to you know to bring to the community this kind of uh thinking right so because because hacking is really not a crime and uh it's um it's a mindset it's a creative

mind and i invite you to know this awesome project i'm a part of this type staff team of the deafcom group here in sao paulo and talking from brazil now here from my office in this case in my balcony right is my small office here because it's important you need to uh to be creative right and uh i'm security instructor at hacker security it's a brazilian company i have some courses in portuguese there and i'm structuring writing an interviewer in these three magazines right so let me talk about our agenda our summary during this conversation first of all i'd like to bring you some uh simply simple concept about the thread because i would like to explain more

about the mowers and something so it's very important to understand this the simple concept and after that i will explain more about the mr analysis and some structure about the pdf and i will do some demos during this conversations right and so first of all what is thread of course it's not my definition it's this definition it's based on these eyes right it's a thread is defined as a potential cause right an incident may cause arms to the system or incen or organization it means it's a software attacks in a death of the intellectual property or identity death or maybe sabotage so information distortion are example of the information security threat it means all those things related to a software

right because the software can be just on a code or maybe this code maybe can be um compiled for example or changing to uh some apps or something like that so all those things the end of the day it's related to a software it means so when you think about the thread maybe you can think about the softwares as well right so so first of all when you need to realize and i need to perform and realize you need to perform some analysis uh related to a malware for example or another different research first of all the first step you need to define this as an identification step you know it's a simple life cycle

right so because when you have some sample you don't know if it's malicious or not so maybe you have a malware malicious software or maybe you have a maldoc a malicious document you need to investigate investigate that to understand if it's malicious or not so after that you can improve uh improve you can decide on the best method to apply or statistic analysis or maybe a dynamic analysis and after that you can create some reports because this is very interesting because you can present that to the manager tech lead or coordinator or whatever and in the end of the day when you produce you know wanting to generate something like a report you can improve

your defense's mechanisms because you need you can understand what is the really path that this kind of malware or this threat going through your network for example and if you generate a report you can improve your defense's mechanism you can see if there's some security sensor it was it have for example has a good best practices applied in the settings or something like that and after that if you have a small company or maybe if you have if you work in a big company you can uh create and you can build it this cyber threat intelligence of course you have many tools inside that and you can you know using many different automated processes to generate these intelligence this is

very important because of this you need to understand about known threats and about unknown threats that that is the key here because you can prepare in themselves this new uh attacks and of course the threats are are changing all the time so you need to restrain any cyber resilience in your environment right it's a simple life cycle perfect so let's talk about the statistical analysis first step usually this is the first step using our studies because it's related to our this kind of analysis describe of the process of the analyzing of the program code or maybe a structure this code to determine for example it's some function uh maybe if you have some dll or library

inside of your system operation that this library calls using some function so this is the the point when you need to know we need when you look into the statistic analysis the program itself doesn't run at this time let's pay attention here because it's you don't is a good thing say it's not a run time behavior right it doesn't run it's time of course it's depending let me show oops let me see here it's the penny of the of course the program that you are using yes because usually it's more safe because you know as it could themselves inside of your environment right so because of this usually is the first step right so the second analysis dynamic

analysis usually it's a different because it's based on behavior right so the exclusively and it means it's the interaction that the mower has when executed or used or you know in your environment it means it's run time you know the difference between it means um the runtime it's no run time right so this is the difference very simple of that right of course you have many other method or a strategy to explore more the dynamic analysis or the statistical analysis but of course you don't have time to explain this during the one conversation right probably we can generate you can produce some courses of the dynamic analysis you can produce another course of the statistical analysis the reverse

engineer it's a part of the um this strategy inside of this statistical analysis right or maybe dynamic analysis because you run and after after that you can open the code or you can see the behavior of the binary for example you can uh using the reversion engineer right so you can be easily i automated there are today's sites uh on many sites today that you can perform this kind of analysis the name it's a similar antivirus scanning like of aristotle for example it's antivirus scanning when you put some sample inside there or maybe you can put some you know url outside there and you can analyze the behavior based on some engines provides some security

sensors right so that's the point here or you can produce yourself the delay the lab inside our environment create this kind of concept call it called sandbox when you have some virtual machine and you can ex close next cloud you can put the sample inside of this virtual machine and you can see the behavior inside of this virtual machine right so this is the difference basically so okay philip i know it's clear so let's talk about the physical and logical structure of the pdf files yes but before i would like to show you something about the base is very important right so let's see for example if you remember during the beginning of our conversation the

first step it was very important uh we need to understand about the identification itself do you remember yeah so here i have many different symbols because i know i don't know if he's malicious or not right so let me or if you would like to talk with the chat now or if you i if you i am for example in person i will ask you what the first comment i could use here to identify what is the type of the sample for example so maybe some people suggest me the file command right so have i have file commander to use to identify if it is what is kind of extension or module of this file right so here we have

microsoft word file let me think another i have here my friend bill not my friend but you know it's a pdf file here it's a version 1.5 let me think about another let me think here we have a linux let me check here the linux mod 32 uh only that in this case it's health right so it's a binary it's a bootable binary using in a linux platform let me look in another file it's linux dot text let me check here wow it's not a text in this case it's a healthy right it's a binary not a text here it's a good key that i would like to explain you about the bass right because i use a goodness awesome uh

comment or two maybe fio it's a common actually of course compile it inside of the unix platform and um but how this command works because you know some people i would like to understand more about the reverse engineer or you know research but just like to use the same tools but it's of course it's very important use tools guys but my point here in my idea was clearly sorry my idea here during this conversation to to bring you about this talk about the basis right because i need really understand how this file works in a background this is the key here i will explain more details about that for example let me the first of all we need to

understand how this file works how this file identifies something inside the binary how this works when you execute something that is the key here so when you look in the map maybe you are thinking felipe i don't like to read a man of the the menu of the whatever you know in the technology or maybe in a constructor or wherever i don't like to to read a manual when you need to compute something in my house i don't like to read the manual i don't know i know it's very common for everyone but when you talk about the security and you know research and offensive security your defensive security it's very very important right so take a look

at this here we have many important things like for example here so take a look at that so the magic tests uh of course here uh we have other explanation but i i just would like to show something here the mesh tests are used to check for files with the data in particular fixed format right so here we have some data in particular fixed form right okay the canonical example of this binary is equitable in this case it's compiled program right because i have a code but it's compiling inside of the unix platform whose format is defined in elf.h so here maybe if you have time it's important to read this kind of information here

maybe you can find here some instructor of the elf right so let's continue to read okay so let here let me oops that's me okay no no better here okay this file have a what a magic number it means all those binary has a magic number it started work where is stored in a particular place near the beginning of the file that tells the unix operation system right that the file is a binary executable take a look at that so it means all those binary has a matching number so when you use a could some file command this file command search for this matching number inside of the binary usually not usually but this uh

informations are stored in where in particular places near the beginning of the fire it means all those informations when you execute this file command search for this information inside this banner so okay i understand that so how this work of course when you have in your unix platform all those informations are in this directory here take a look that's that here the concept of this magic okay has been applied by extension to the data files any file with some environment identifier at a small thickest fix it offset take a look at that into the file can usually be described in this way right so this information is fine this file is re it's ready from where

slash atc magic and the compiled magic file take a look that it's here use user slash share me scan magic and magic doc mgc okay so and by the way here if you see or in additional if you if you homeslash dogmatch.mgc exists you will be used in a preference to system magic fire it means you can create yourself binary you imagine and for example a malware dock philip you know i can create that and of course you can compile it inside of your environment but i i would like to help you to understand more deeply this so i download this file search code to understand more how this works so have i have here let me check here

magic and mac yes so i downloaded all this information off from from the debian right so to understand how this works of course i am using the kali linux here it's based on the idea right so if you see here we have what many different databases of the what of the magic numbers not magical numbers but it's in this case it's our rules right so let me read for example javascript javascript here take a look at that so here some rules applied in the what in the magic number information take a look that so here if you have this information in the beginning do you remember is stored in particular place in the bikini of the

file so if you have here this information inside the file the file you have you have this you use the cut this tools file and the tools will run running in your environment in this command is we will read this magic number right so let me show you some example no no uh besides okay very creative let me put here malware let me save here and yes save so if you use file besides here take a look at what i found it's a ascii it's a text right so if i read here is the same case it's a only text perfect but we learn what we learn but we will see here about the javascript if you have this

information here in the beginning let me copy this okay let me change the besides file here besides file okay i will put here i will pass here and take a look at what happened here in this case let me save okay and i will uh put in the file again and take a look what happened here in this case it's not a matter it's not more a text it's not a taxi you know it's a node.js script you know because i aiming now i am manipulating the magic number of the file let me check let me change another thing here's here um do you remember what this means let me check here enter do you remember what this means if you

are programming python maybe you know what this means but if you don't know you can search this is a mystery okay let me check it's putting file besides here take a look at that guys it's a title script now i can execute the fight and right so python three besides and uh what happened here nothing happened why because it's not a python oh okay mv side let me move b sides that pi and python 3 sides python and pow we have a problem here because of course it's not a python but the you know the matching numbers it was manipulated right so let me change another time here because i like to manipulate many things like that

and let me here okay and the last let me change your percent pdf one nine let me save here and yes and file besides doc pdf take a look that guys awesome it is applied a pdf document in this case it's not a text it's not a python it's not a javascript it is a pdf if you see here take a look at the the rules of the pdf this string refers to what uh magic number of the pdf in this case it's a header of the pdf you know how this is important to understand how this works so that's my point here because you need to understand those bases right so okay so maybe you don't have a time

to talk more i i forgot completely if you have a 30 minutes or 45 minutes or one hour so if you have time i will explain more after this conversation but you have my you know social medias to talk with me so let me explain more about the physical and logical structure of the pdf right so in general we have a header in a pdf you have a body and you have a cross reference table in the three four parts right and take a look that so in the beginning we have a version and of course of the header as you see when i i wrote the in this file and then besides doc pylon do you remember this

is the version 1.41.9 you know and inside of the board we have many informations about the the images phones bookmarked something like that right and here it's very important key that you have you have a cross reference table and you have here in this cross reference table you have many objects referencing themselves in another right so while you uh of course this access is random i will explain more details in this in another demo right in here in here we have a trailer it's almost the same thing because you have many others uh um random access and reference right so okay i will explain in this demo i recorded of course because we don't have a time to explain all those details

but i recorded and i explained here to you okay i have here the sample in pdf right so i received the sample i will start with this analysis so i really could first of all the pdf id common this command is it's provide not provided it's created by dda stevens by the way it's an awesome guy i like this many tools that this guy have been created and uh and this this actually this tool is responsible to execute scanning uh uh century foreign string is inside the pdf right so maybe i'm supposing of course maybe when uh dj stephen create these tools uh by the way i need to talk with them with him about that but

i am supposing right and maybe when he created this tools maybe he understand how this is structure of the pdf works and probably he saw in many analysis that they can found these encrypt informations and the javascript and open actions and other things like that right so when you as a cut these tools we can find the object right so in this case we have a 15 object object right and we have here that it's streaming in this case it's too extreme usually this uh is a part of the the pdf uh that attacker can be used to to put some malicious code or whatever right and because they can you know uh obfuscate something inside or can they

he can encrypt it something inside and you know and here the cross reference table in the trailer and of it means the first part right the four parts of the pdf and these you have a is slashes here slash page in this case it's just on one page of the file we have an encrypt encrypt it means you have some encrypt uh encryption inside of this pdf and here we have a javascript java script in five javascript by the way so if you see here you know you have a pdf and have a javascript inside a pdf may man it's maybe it's literally malicious right because usually the pdf don't have a javascript inside right

maybe you have a url you know called another website but a javascript okay here we have another interesting uh uh interesting point a a or open action it means is this is a function uh using in this pdf when basically when the vitamin received the pdf and the whitman downloaded this file inside this machine and after that the user don't need to click in the pdf because this pdf has an open action it means i'm sorry it means the pdf will open themselves automatically no need to the user don't need to click to execute themselves right so that's the point here of the open actions right so here we have something interesting we have a javascript inside the pdf

and you have a streaming it means something happen inside this stream and you have an open action it means the vit1 machines download the file inside the the the your environment and after that something happened but i didn't know yet okay let's continue to do to look in this analysis right guys so i am explaining this for you let me check here i don't know if you have some questions um okay if you have some question i am open here but let's continue to explain and um the the demo okay okay so after that that i i use another platform pdf dash parser it's another tool provided from dda steven is a good running the run

right because i would like to see all those informations run inside of the pdf so take a look at what i found here the header take a look this person pdf dash one we have here object one but we have a 15 object right okay inside these