← All talks

BSidesSLC 2020 - David French - A Chain Is No Stronger Than Its Weakest LNK

BSides SLC30:47762 viewsPublished 2020-03Watch on YouTube ↗
Mentioned in this talk
About this talk
Title: A Chain Is No Stronger Than Its Weakest LNK Presenter: David French
Show transcript [en]

all right good afternoon everyone my name is David French and thanks for attending my talk today and this is a chain is no stronger than its weakest link so today I'm going to be talking about the ways in which adversaries abuse windows shortcut files and how defenders can hunt for and detect this behavior both statically and dynamically I'd also be talking about a model that Bobby Fowler and I worked on to classify a shortcut files this malicious or benign so just briefly a bit about me before we get started so I'm a security research engineer on elastic security protections team I work on analyzing adversary tradecraft and developing detections and hunts and I enjoy increasing the

cost of an attack for adversaries and finding ways to help defenders get the upper hand I'm a contributor to problem-child which is a graph based framework used to discover anomalous patterns based on process relationships and I used to lead hunt strategy a large financial institution and I'm a cold for the elastic guides of threat hunting which is a free book just to help practitioners get started with threat hunting so let's just take a minute to go over the agenda for this talk so I'll be talking about some of the reasons why I think attackers are abusing link files and I've done so for several years now for those who are not familiar with analyzing link files I can go over the

file structure and the properties that practitioners need to know about when they're either analyzing or detecting malicious links and they'll walk through some examples of how attackers are abusing link files in the wild to help them achieve their objectives and then I'll call out the interesting features of those files along the way that make them stand out as suspicious and then I'll talk about how we built a model to classify link files using machine learning and I'm gonna be walking through this process to show how security practitioners can apply that domain knowledge to extract features from samples and then apply data science techniques to try and solve a security problem and then we'll wrap up by

talking about some possible next steps for the research and then share some useful resources with people who want to learn more so before we cover link file anatomy and how attackers are abusing them let's go ahead and talk about some of the reasons why I think they've abused in this fight they've been abusing this file type for some time so here are some of the reasons why I think they've been abusing them for several years now against their targets so firstly crafted malicious links or modifying existing ones to include a backdoor is super easy the barriers to entry are really low duty availability of just off-the-shelf open source offensive security tools and if you're interested in those I've included

a few examples on this slide so although some people get frustrated over free and open source security tools or offensive security tools I think they really provide butte blue team's with the opportunity to simulate adversary activity pretty easily test their defenses and then understand their organization's ability to detect or prevent activity and so I think traditional AV software as typically had poor detection rates for malicious links from what I've observed many of the scanners on virus 2 on this malicious link on day 1 but then detection rates seemed to move towards 20 percent or greater within a few days of the file being submitted so I think a couple of reasons these low detection rates might

be a thing is due to the fact that there are just so many different combinations of values that can exist in link files I think AV companies might be concerned about making mistakes or quarantine or deleting the wrong files and then disrupting the users workflow and then we've got an easy delivery it's really easy to get weaponize link files into a victim's environment so most email gateways proxies firewalls they're not configured to inspect or block this file type because of its legit use cases I think lack of user or practitioner awareness might be a contributing factor to why attackers often evade detection when using this technique so users are probably not aware of the dangers of

shortcut files and security analysts might not be familiar with the ways they can be abused and how they can analyze detect or hunt for one so just to sum up this slide really I think once we can reliably detect and prevent a technique it's only then that attackers will be forced to go through the expensive time-consuming process of changing their behavior and this will increase the cost of an attack for them and then tip the scales to give defenders the advantage so as you can see from some of the examples I've included on this slide attackers of abusing link files for over ten years the use of this technique is still prevalent and successful for them so if

you look in the my attack knowledge base of adversary behavior there are about thirty references there and all of them link to a report with details of a successful intrusion against an organization that uses so we read off all these examples for you but you can see that attackers have user linked files to do things like maintain persistence in the victims environment still credentials obtain initial access and execute ransomware

so for those of you who are not familiar let's move on to talk about the structure of a link file so this is going to show you the minimum amount of information that you as practitioners need to know in order to be successful in either detecting or abusing link files depending on what your your goal and your day job is so we don't think about the minimum amount of information that defenders and attackers need to know I think that blue teams must know the basic anatomy link files and how to analyze them in order to identify one is malicious or benign and then attackers must know how link files can be abused and what defenders are looking for in

order to evade detection so in a nutshell a link file is just a convenient pointer to another file the target of the link file what you can see at the bottom right hand of this slide it's not the only interesting information so there's a bit more to link files than what you can see when you just right-click one and select properties so in these next few slides I want to go through the structure and properties of a link file that you need to know about so Microsoft's specification for this file type is about 50 pages I'll save you some time and just call out the highlights so here are the values of the file signature or

the magic number and the class identifier that enabled us and windows OS to identify link files in a file extension is not dot link so you've got several open source loop file passes available I like to use their examines le command it's fast it's reliable and you can as an option to pass the link files in bulk and so I'm going to be using Le command in the examples that I'll be walking through in this presentation so here's an example output from Le command after a benign Internet Explorer shortcuts past just the example I'll call out the properties that you need to be aware of so first and foremost the target of the link file is

stored in a list format in the file structure any command just goes ahead and passes that out for us and conveniently Blaz the full path for us and then you've got the file size property this is the file size of the link files target not the link file itself so something to be aware of if you do in forensics also just above the file size you can see they're modified accessed and created timestamps these are super useful during digital forensics investigations like when you produce in a timeline of an intrusion or maybe an insider threats activity but he's going to be out of scope for this talk so we're talking about how linked files are abused and then how to tell

when that's happening and then you've got the 32 link flags these specify which structures are present in the rest of the link file so some of them are reserved or unused there's a couple of examples has arguments flag means that the link is safe with command line arguments and has icon location means that a path is specified to display an icon for the link file and then you've got the drive type property so this specifies the type of drive that the link file is stored on so for example it can be stored on a fixed drive removable media or network drive and this values another one that's useful in forensic investigations to verify what files were accessed by a

user or an attacker and then hit some information that's useful if you're interested in tracking adversaries and relationships between linked files and intrusion campaigns so when an attacker creates a malicious linked file in their environment in preparation of delivering it to their victim the volume serial number NetBIOS name and MAC address of their computer is included in the link file and so some attackers are either unaware that this happens or they forget to wipe one or more of these values so this that it can be used for tracking campaigns or adversaries on services like virustotal another one to watch out for is the user said so that's not shown in this example but that can give you information about

the network computer and user account that was used to create the link file in the attackers environment and then we've got this show command so this specifies the state at the target applications window after the link file was executed so keep an eye out for this show min of active value this means that the application window is going to be hidden from the user or the victim when I click the link file and this could be an indication of the attacker trying to hide their code execution from the from the victim and finally here a couple of additional properties to be aware of that weren't shown in the example that we just walked through and so the icon location value

specifies the path where the link files icon is stored and the command line arguments there executed with their link files target when the link is clicked so in this example we can see the link files targets PowerShell and then we can see a script in the command line arguments being executed so the script imports of the bits transfer module and then reaches out to a URI to download a file called 7zp so this one looks really suspicious at first glance so in this next session we'll review some of the ways that attackers use malicious link files to achieve their objectives I'll walk through an analysis of some malicious examples and then point out the features

that make them stand out as suspicious along the way so I'm going to give you here is some information that you can use in your detection or threat hunting efforts so weaponize linked files are commonly used to obtain initial access or to maintain persistence in a victim's environment to gain initial access to a target environment attackers will often cross a link file to excuse an execute malicious one-liner or a script and that we usually leverage living off the land binary like powershell or the command prompt a common example would be a powershell one-liner two downloads and malware and then linked files during this phase are usually delivered by email or in a compressed archive file to the victim it

could be embedded in an office document or they'll include a URL for the victim to download and execute the file and then for persistence attackers will often place a link file in the location where their one-liner or malware will execute every time the user logs on or when the computer starts up and all they'll modify an existing shortcut file to include a backdoor so each time the shortcut is executed the original application will load and then the malicious code will execute in the background as well and another persistence technique is to craft the link file that forces user authentication so this can allow the attacker to harvest the users password hashes and then they can try and crack

those to obtain the clear text password or they can use those in our pass the hash attack so let's analyze some malicious links and now identify what features can help us identify them as suspicious and then we can use those features for detection hunting or to build around classifier that we'll talk about in a bit so here's an example of a malicious link that was used in an intrusion campaign to gain access to several organizations and fire I attributed this particular example to apt 29 and the attackers sent a phishing email to their targets that included URL to download a zip file from a onedrive account and then that zip archive file contained a malicious link and then when

that malicious link was executed a PowerShell script was executed which extracted a decoy document for the user to view to distract them and then in the background a Cobb watch cobalt strike beacon DLL was extracted and then that was executed and that DLL provided a connection back to the attacker so when a link file is executed the new process is spawned as a child process of exploratory Exe and that can make dynamic detection a bit of a challenge but there are other ways to identify malicious link files and we can walk through those examples so let's examine this link file that was used in this campaign and understand what makes it look suspicious so when we

pass this malicious link file and start to look at its properties some things immediately stand out as suspicious so this link files target is PowerShell DXE which is a commonly abused binary used to execute malicious code or scripts and then it's got long command-line arguments that usually indicates the presence of an encoded command or a script and we can see in this example what looks to be a base64 and blob of encoded data and then we've got the parameters non-interactive to prevent an interactive prompt from being displayed to the victim and then execution policy bypass to bypass any default power shell kind of execution policy that's configured so something that's important to note with regards to the command-line

arguments when you look at the properties of a link file in the windows UI the files target and command-line arguments will be truncated after 260 characters so in an attempt to evade detection attackers have been known to craft malicious links with a benign target and then include some padding like white space before the command line and then that will hide the full value from the windows UI and sometimes evade detection or kind of human analyzing the link file and here are some features that help us identify this this link file was suspicious so I mentioned earlier that the show command if that's set to show min no active that will mean that the application window of the new

process will be minimized and not immediately visible to the victim it clicks and then this is good one so link files are usually between four kilobytes and 20 kilobytes this one is 400 kilobytes whenever I see a large link file like this one it leads me to believe that the file contains other embedded content like files or scripts this one is if you recall from earlier it contains a malicious DLL and a decoy PDF document which accounts for their larger file size and then the zone identifier is a good one to look out for as well so depending on the internet browser or the application that was used to download the file a zone ID alternate data stream

will be added to the file to indicate it was downloaded from outside the host network so there's own ID on file greater than one typically means that it came from outside of the network another interesting feature is the entropy or randomness of things linked files so the top screenshot shows the entropy of the malicious link and we've been talking about and the bottom one shows the entropy of just a benign Google Chrome shortcut file so a link with high entropy can be an indicator that the file contains compressed or encrypted content so for this example the the number of suspicious features have really added up so let's move on to look at another couple of examples so another

technique that attackers can use to is to modify an existing link file to include a backdoor so each time the user clicks on the link file say it was Google Chrome just as a to see this example in this slide Google Chrome looks to execute so the original binary is executed but the back door is also executed in a background away from the user so pouchette empire's got a pretty good module that enables attackers to carry out this technique easily invoke backdoor link just lets you specify a link to include a PowerShell stager so remember what company were a red team that did this it was a bit tricky to figure out exactly how the PowerShell

stager was being executed so PowerShell was shown as the child of explored that way XE it's not something that blue teams typically monitoring for because that behavior looks pretty normal when happens all the time but the red team of backdoored windows server manager on several servers so whatever a system administrator logged on the stage are executed and then a new C to channel be established so another technique available to attackers is to include an IP address or URL in the icon path of a link file so when Windows renders the link file in Explorer it forces SMB authentication from the victim host to the attackers IP address so one way to reduce the effectiveness of this one

is to block egress SMB traffic from your network and that'll stop them from capturing the hashes and trying to crack them or username em I pass the hash attack but still if the attacker is already in the network and they place a link file on my heavily used Network sure because there will be quite effective at capturing hashes from thousands of users inside your network here's an example that technique being used for your reference so offensive tools like link up make it easy to craft one of these link files to carry out the technique and then you can use the SMB authentication capture Metasploit module and you can collect the password hashes so given this this presentation is only

thirty minutes I don't have enough time to go into detail about the behavior based detections so malicious links elastic is open sourced event query language which is originally created for security detection and threatening news cases and it's currently being integrated with the elastic stack it's easy to learn and read and write queries you can query on sequences of events for different event types and in the appendix of these slides for your reference I've included some behavior based detections for malicious links and you can check out the equal analytics library if you're interested in free detection there are about 130 free analytics for detection and they're all map to the Myra attack matrix so with regards to hunting for malicious

linked files in your environment here's a crude but effective method that can produce a big win for your team so it's amazing how many fret groups trying to evade our defenses but then they risk giving themselves away by creating a link file and use a startup folder to maintain persistence and then that link file will execute every time the victim logs on so it's one of the oldest tricks in the book but it still goes undetected in a lot of environments because defenders not looking so a quick hunt would be to use early commands pass the link files in commonly abused locations or your endpoints and then you could index that they are in a central

repository or sim and then you can query and visualize that data to surface normally so a simple but effective method is you can sort the results in ascending order by the link files target or command line arguments and then once you've learned what's normal in your environment there should be a low effort hunt to either automate or just complete periodically so one way to approach the problem of identifying linked files as malicious or benign is to call it a classification problem so while examining several linked files and then identifying the features that make them stand out as suspicious we try to build our own classifier too and classify them using data science techniques so the

next few slides show a practitioners attempt to use machine learning to classify linked files and explain the process from start to finish so my goal is that this shows practitioners how accessible and data science techniques are and that machine learning can be effective at solving the problem most of the time but it's not a silver bullet so earlier on in a talk when we were passing link files and then identifying the important features to help us understand if they're malicious or benign we were doing feature extraction we're essentially transforming our domain knowledge into features and then we can apply machine learning or other data science techniques to try and solve the problem so I analyzed lots of

malicious and benign link files and start building out a dataset so when we decided to build a model I had to normalize that data before I could run it through any algorithms and then try and predict whether a file is malicious or benign so how do we go from thousands of linked file reports like the one shown on this slide to something like this which is an array it represents the features of the past linked file shown on the left-hand side so the model that we want to run out data through needs to be at least the C data are in this kind of numeric format so some features of an Inc file like file size or entropy that they're

already in a numeric format so those are easy to handle how do you present represent features like command-line arguments as a number so this was called them feature engineering so we were asking questions of the link file data so we separated file sizes into bins largely linked files would be in a bin with a higher number and for values like the show command we can just use the pandas libraries factorized function that gives a numerical representation of these values and then for the remaining examples on this slide we just created features in a binary like true or false 1 or 0 method by checking each link file for certain values so does a link file

have long command-line arguments true or false deserves high entropy that kind of thing so an end result was just this normalized data set of malicious and vinayan kind of labeled in files so after preparing a data set I started looking at possible methods to classify linked files so one option was a decision tree very simple example shown on this slide so decision trees answer sequential questions and then operate in a if this then that method and they lead us to the answer so is the file malicious or benign advantages of decision trees are that the fast that I need a lot of data they're easy to interpret disadvantages that slow to train and then difficult to tune

so we decided to try and use a random forest classifier so this classifier essentially takes set of decision trees from a randomly selected subset of the data and then what you end up with is multiple trees with different portions of data and each tree gets a vote on what the answer should be and then those votes from the individual trees are aggregated to decide if the final class of the link file is malicious or benign so then the good thing about this type classifier is even if a few individual decision trees are prone to noise the overall result wants all the votes of the decision trees aggregated or considered should be correct so here's some information about

the experiment that we set up using the dataset and training a random forest classifier to try and identify the links as malicious or benign to the dataset cysts that are around 2500 benign and 30,000 passed and labeled malicious link files so this is quite an imbalance but this is a common challenge when attempting to solve security problems using data science but the extracted features should be descriptive enough to separate malicious from benign samples and so let's move on to talk about how we train the classifier and then what the results look like so the next few slides show what we did with the data set of link files and the random forest classifier so at this point every link

file in the dataset was passed and normalized into an array what you see on this slide so the data was set into two data sets it was kind of an 80 percent twenty percent split so we had the training data set to train the classifier and then the twenty percent left in the test data set for their classifier to try and classify those linked files admissible I'm denying and then we can kind of analyze those results so we use the training data set to train the classifier on whatever malicious what versus benign link file looks like and then the link files and the test data set were reserved so once classify I was trained we had to

classify the link files in the test dataset and then the Apple from that was an array of labels to link files to tell us if the classifier thought they were malicious or benign and then for the results we decided to use a confusion matrix to analyze the models accuracy so this matrix shows the count of true negatives false positives false negatives to positives so in general we want to keep the false positives and false negatives quite low and then to put these results in simple terms so two files were classified incorrectly out of almost 7,000 and then the vast majority of files were classified correctly so I would say and before this model was production-ready

though I'd like to increase the number of benign samples in a dataset so observe how this accuracy changes and now like so obviously also ensure that there wasn't a huge increase in false positives so would you spend a minute talking about what the classifier didn't do well on like I said machine learning can work a lot of the time but it's not 100% accurate false positives consisted of linked files that utilize commonly abused binaries like cmd.exe to execute one-liners to add software leaked files from software like peas EPP archiver pdfcreator and some PC optimizer software and then some of the linked files that kind of slipped by the classifier completely we're pretty interested in so a couple of backdoor

link files internet browser shortcuts that would communicate with the attacker when your ex cured and then a couple of lol bins using two to execute things like malicious DLL s so what I think about what we could do better we can look at using something like tf-idf that we can determine the importance of each word and the command-line arguments and as I mentioned earlier we care to continue building a dataset balance the number of militia and benign files out so just to wrap up really really quick as I said earlier I think attackers will continue abusing leak files while detection rates are still quite low once defenders can reliably detect and disrupt the effectiveness of this attack

is when the attackers will be forced to abandon this technique in favor of something else so here I've shown you that there are several opportunities to detect or hunt for malicious links and will consider applying machine learning to try and solve classification problems the domain knowledge of practitioners is really valuable for doing things like feature extraction and data science techniques are accessible to practitioners but if you're able to work with an experienced data scientist not a problem they can help you avoid common pitfalls like interpreting results incorrectly or maybe choosing the wrong algorithm or classifier to use and then it's just important to note that this research doesn't solve the problem entirely I like to continue building the data set

of linked files and extracting additional features for the classifier and now I'm looking to see if we can build a machine learning job in elastic security to detect malicious link files so I think I'm at the 30 minute mark and you can reach out to me on Twitter or I'm on Derby slides slack workspace so you can reach out to me there yeah thanks for attending