← All talks

Bsides 2018 Track 2

BSides Vancouver4:02:18489 viewsPublished 2018-03Watch on YouTube ↗
Mentioned in this talk
Show transcript [en]

You sort of have to understand that each of them have different use cases, different purposes, feature sets, and of course unique issues as well. I'm not here to tell you which ones you should choose. What I'll say is that you should not rush into a solution just because it looks really fancy on the surface. Some things are really nice and basic and offer functionality that will meet the purposes for most people, but some are really a McLaren F1 when really you just need a Toyota Camry. And some are even just a trailer and still require the car. For example, Syslog NG and rSyslog, very useful tools. I highly recommend using either one of them. They're

not for doing any sort of analysis. They're sort of just to kind of funnel everything in and then send it off to wherever you want it to go. Sumo Logic, Logarithms, Splunk, ArcSight, whatever, they all have their features and they all actually allow you to search for this data and even perform some analysis. Elastic is popular these days because it's free and for the most part open source. You need to research each of these tools and you have to find which of these feature sets are going to work well for you. And they also have different price points. Some of them will cost way more than the others. You just have to take cost and

benefit into account. Vendors are going to show you everything and anything your candidate software is going to have. It's going to be bound in buzzwords like threat intelligence, machine learning, and whatever terms are being thrown around by vendors these days. You just need to be prepared to listen to these terms and pay attention about the amount of work it goes into configuring them. Don't cast the features aside, but just be prepared. There's a lot of work. Once you've learned these buzzwords and what benefit they have to your organization, you're going to find that you're not going to be able to implement a tenth of these in the first few months, in some cases not even a year of implementing any sort of security logs. You

may even just skip it altogether because the amount of effort required to configure them may not necessarily be useful and you may be able to find something that does the job for you better. There may be some pressure from your vendor and your organization to implement these things. Some will even sing of how much they're going to change their lives. Again, these features do work, but you need to bear in mind that there's just so much work it requires. My recommendation, just going forward, is always set expectations early. Otherwise, you're just going to get burnt and you're just going to find that you spent a lot of money and you didn't actually get your money's

worth. Before going in, you really need to know what's on your network. And in a company, excuse me, there's tools and searching tools just to perform the same tasks. And it was significantly faster. In fact, it was like 100 times faster, 10 times faster, excuse me. I really like self-driving vehicles. I hate them and love them. When I read an analogy years ago that at any given moment, the human eye is processing about 30 gigabytes of information per second. This is sort of dubious because the brain doesn't really measure anything in bits, but I'm going to run with this because it seems kind of interesting to think about it this way. It took me about 40-45 minutes

to come into the conference today, meaning that I processed about 80 terabytes of data through my eyes in that time frame. There's a suggestion that your brain has a capacity of anywhere between 10 terabytes and 2.5 petabytes, if you were actually able to properly compare it. That's a huge spread, mind you, because, you know, like... A petabyte has what, like a thousand terabytes in it? So there's quite a bit of spread here. However, assuming at the high end, a week's worth of commuting would fill your brain by the end of the week, going back and forth. And if it was at the low end, by the time I got to Gaglardi Way on Highway 1,

I'd probably be dead. I don't know. But of course your brain does not keep that 30 gigabytes per second. In fact, you discard so much of that information. You're not going to remember what you ate for lunch on February 2nd. If you do, it must have been a significant event because that's the only way your brain's going to remember that. You're not going to remember the Toyota Corolla you passed on the freeway, that sort of thing. And the reason why I bring this up is you kind of need to know what is important and what is not. You need to know what to prioritize. Not everything that you think is going to be useful is

going to be useful. So You're now getting it to the point where you're starting to work with your infrastructure and you are decided that you're going to go for your own approach. You're going to host your own. You're going to host everything. It's a huge undertaking because You have to have the infrastructure to do this, and this means you have to have the storage, you have to have the bandwidth, and you have to have the backups. This is actually a really scary thing. In our case, we collect 200 gigs per day, right? That means 72 terabytes a year, since we have a year's retention. Can you afford to back up 72 terabytes? How are you

going to successfully back that up? It's going to be ridiculous. That's something to bear in mind if you are going to go down that road. Maybe you want to keep it for three months, maybe you're only going to collect 30 gigs. You never know, right? The nice thing that you can do is you can ask others to host it for you. There are a lot of services. Some of them are manager services that do it as a contract basis, and some of them are actually the vendors themselves. They'll provide you with the software and the environment to host it. Usually they'll put it up on Oz or Azure. Make sure if you go down this

road that the SLA includes backups and storage redundancy. And you also want to make sure that you have the ability to seize that data should you ever decide that, no, maybe the walled garden is a better approach. The advantage here to this is that you are not managing these boxes. You do not have to do the maintenance, you don't have to do the patches, you don't have to do any of that. All you have to do is manage your side of the business, which is finding all the logs, sending them through your collectors, and sending it off to their service. However, the one challenge with this and the reason why I use the word begging

in this is that you're going to lose a lot of flexibility in terms of configurations. And it can get really frustrating, such as this particular email here. So the honeymoon is going to be short. I figure like you might last six to 12 months if you have any esoteric configurations. In our case, I wanted to split the data off to our SOC. And we had to use a service that they wrote for us that would be compatible, as in it would actually not break their compliance or whatever for their customers. However, it broke. And what that means is that because it's such an esoteric configuration, support knows nothing about this. There's no documentation. There's no

nothing. And you have to yell at them to get them to fix this. You will probably find yourself, if you go down this road, having to pull your hair and yell over the phone. Unfortunately, it's something that you're going to have to put up with if you are going to get really deep into how you want to configure this sort of service. One big thing that I think is really important is that you should include security log collection in any change requests. And I hope everyone here has some level of change control in their company. It's quite possible to have a new device or a change to an existing device that could lead to a

problem with your existing log collection. It could introduce new issues for you. Will it change the log output? Will it stop all together? And are you able to support that log collection? People will sometimes go up to you and say, oh, hey, you're collecting logs in the environment. We've got this new appliance. Can we dump it on you? And we're getting it in tomorrow, and we can't turn on syslog without taking the device down. I don't know what devices out there have that feature, but it's been a thing that's come up before. Be prepared for that, and make sure that in any sort of change control review that your security logs are actually taken into

account. OK. So you've gone through the drama and you've dealt with the salespersons. They've probably taken you out for lunch a couple of times and you're now through the storm of selecting and now installing your new software solution. Let's talk about the perils and the headaches that you're going to have with your data because this is actually where it starts to get really fun and you start to realize, yes, it's useful, but oh god, do I have to configure everything properly? So one of the things you got to keep in mind when you start using the software is there's a couple of ways to approach searching. The way I look at it is basically collect,

monitor, hunt, and predict. You should never really say you're doing more than collecting if you're in an incident response role because that allows you to set the expectations when something occurs and you need to deal with the situation. You may be creating alerts, which is where the monitoring aspect comes in, based on existing indicators. But you're using those indicators and those are things that you've already seen. And you're going to end up doing the same thing for hunting. Predicting is something I highly doubt you're going to achieve. You can use this sort of software to forecast and apply some sort of heuristics to these sort of things, but it probably isn't going to tell you that you're about to get breached. If any product could actually do that for

you out of the box, I would say that security would be solved and I wouldn't have to be giving this presentation today. This is something that can be applied to anything that you do, but I really like using this within the company that I work at. It's called RACI. Basically, it stands for Responsible Accounting, Consulted, and Informed. It's really important to sort of set boundaries on who's doing what and who's involved and how they're involved. You need to know who's assigned to the task. You know, like who's responsible for that network appliance? Who's responsible for the log collection? And who's responsible for actually searching within all that data? You also need to have someone who's accountable, who takes ownership of the whole project. You need someone

who's consulted, someone who needs to be spoken to. Anyone that has a vested interest in any of this needs to be spoken to. And also informed. The idea here is not everyone's going to end up using it, but maybe your director should be involved just knowing what we have as a capability within the environment. Those are just some ideas you can come up with when it comes to working with the RACI model. And it definitely helps when it comes to working with multiple teams. If you're one person, you probably don't need to work too much with RACI. But if you're in a company like mine where there's 300 people within your department, it's very important.

You kind of need to know how important the data is to you. And what are you going to do with it? Like, is it for digital forensics? Is it for incident response? Is it for employee behavior? Can you measure the amount of data and events per second you're about to generate? I won't get into how you do this, but there's many guides out there for how to do this effectively. This is something you need to consider before you sign your license agreement. But at 200 gigs per day, we're almost eating a quarter of our daily internet traffic sending our data over an Oz cluster. You also need to keep in mind that you can end

up with duplicate data, even if it's coming from multiple appliances. For example, if you have a router sitting in front of your firewall, and that router's job is literally just to send traffic right to your firewall, why are you collecting data from that router? You just collect from the firewall, it's probably fine. You should collect the authentication data from the router, but maybe not necessarily all the traffic going to it. And again, going back to RACI, you just have to know who's responsible for the data, who is supposed to be involved in any changes. I'm just going to grab some water here.

Sorry about that. So things that I collect. I collect a lot of stuff, as I mentioned. The biggest set of logs you'll ever collect will be your event logs. You have your typical application system and security logs. But the thing to keep in mind about the event log system is that it's a lot more extensible than just that. Some Microsoft products, some third-party products, and a lot of security products create their own event logs within that system. And this includes endpoint solutions. And having that data collected can make a whole lot of difference in terms of incident response. Under some of the endpoint security, there's a lot of products out in the market that will

keep a ledger to help you sort of engage in any sort of defer, digital forensics in response, excuse me. However, your organization may not be able to afford such software. It's very expensive. But a nice alternative is, of course, to use Sysmon logging. It's not 100% foolproof, but it's really effective. And it's something that's native within Windows. Actions such as specific user behavior and various other events such as knowing where an application was launched from, like you can find out it came from Explorer or whatever. Those are the sort of things that Sysbond will actually log for you. And it's been very useful in determining the source of malware outbreaks within the organization I work

at. One thing to keep in mind about event logs, and I'm really going to harp on Microsoft here, and I don't mean it. It's just that we're a Microsoft shop, so I'm going to end up making lots of quips about them, is that you need to bear in mind that... but they are going to be the bulk of your logs. You should determine how much traffic you're going to generate from just those details alone. That's the first thing I would look at before anything like firewall. One thing you can take advantage of with event logs that's really, really useful is you don't have to install agents everywhere to collect these logs. You don't have them

on each workstation, you don't have them on each server. What you can do instead is set up a GPO and then just send it to one central server or a couple multiple servers if you wish to distribute the load. This is a... sorry, yeah sorry. The one thing that you're gonna have to do when you go about this method is that you will have to make sure that you are... excuse me, sorry. No, you just have to make sure that you are collecting the... what was I gonna say? I'm so sorry about this. Oh yeah, track of things a lot better.

And if you have a hosted solution with your vendor or with some sort of provider, you can actually use the DCP data itself to actually go back in time and determine, oh, you know, this IP address would belong to this host during this particular incident. It can definitely help you rewind a lot of things quickly. I don't have much to say about firewall data. It's useful. Take it. If you're lucky and you actually have a very nice firewall that does packet inspection and tags your application data and your users, it's actually really cool. I've made use of it and it's nice to know who's doing what. NetFlow, for those who aren't familiar, it's just traffic

logging. Be careful with it. It can go sideways real quickly if you're collecting all of NetFlow. In most cases, I find that NetFlow will actually outpace your event logs. The thing that's useful about it is you can use it to detect lateral movements. You've got your ingress and egress points really monitored, but you want to be able to track what's going on between machines. You can do this with NetFlow. The problem with NetFlow, though, is that it is very noisy. We only kept a week's worth of it just because it's a lot of data. It's very onerous. And when you do start going through it, it is sort of hard to keep track of everything using it. And it sort of gives

you a little bit of red herrings as you go along, too. But in a case where we find it really effective is we have a lot of split tunnel configurations where internal traffic will just get routed into our main corporate network, but the internet traffic just leaves out and we don't have a way to monitor that traffic. So what we do instead is the router that's sitting in front of it, we're actually able to monitor the traffic going in and out of that instead. And you know what? There's lots of other logs I can go on about collecting. We collect stuff from mainframes, database software, and internet-facing appliances. It doesn't hurt to just go and look around to see what you're collecting. One surprising thing that came

up when I started working with this stuff was that a lot of SaaS solutions have syslog. or they have an API that you can work with. If they do it over syslog, usually they use a TLS syslog, so you can use something like rsyslog or syslogng for this. The problem with these SaaS providers is they don't really document this very well. And as a result, what happens is you either have to make a support request or you sort of have to start to go begging about it. And I've had a situation where the vendor provided the log data, but they only sent what they deemed as important. In fact, they tagged it as APT, which is really useful because we already get that email if something comes up. We

don't get the general activity out of it, which is something that we would like. Be prepared for this to happen and do not hesitate to demand a feature request to have this changed. Again, you own that data and if you can't get that data out of there, then maybe it might be a consideration to look elsewhere. This is the bane of my existence when it comes to this software.

The problem with a lot of log formats is everyone has their own log format. Everyone wants to reinvent the wheel in their own image. To be fair, I can understand where they're coming from on this. I don't collect much in the way of stuff from web servers, but it's a perfect example. We have three popular HTTP daemons out there. Like Nginx, Apache, But we end up with multiple formats of logs. The most popular ones out there are, of course, the Apache format, the W3C format, which was actually done by the standards committee that is behind HTML, and Microsoft IIS, which is sort of ridiculous because they modeled their format off of W3C, but like all

things Microsoft, they wanted to make changes to it, sort of like how we have JScript and JavaScript at one point. This is sort of a consequence, like everyone comes up with their own format and eventually we end up with this mess where we have different formats here. You're going to face this a lot and you have a few options for dealing with this. In a lot of cases, the software out of the box is going to support the field extractions for you automatically, especially if the product is mainstream. You also may luck out and the vendor will have been really nice and they provided you with the configuration to do all the work. They provided

you with a tool that you can plug into your existing software and it's just going to work. However, even if a solution is provided, you may end up with weird edge cases that come up. Sometimes log data gets a little too helpful. It's very tempting to log all traffic on your file server. If you're small enough, like 20 people like I said, then you could totally just log every file access on that one file server if you only have one file server for your whole organization. But it doesn't really scale because once you end up with 400 file servers out there and they're all sending all the requests off there and they're doing all sorts

of things like DFS and so forth, you're gonna end up with all sorts of stuff inside of your security logs and it's just gonna become a huge mess. It's just like the same problem with NetFlow. You're not going to know what the difference between a real thread and normal activity just based on that information. It doesn't hurt to evaluate which event IDs you want to keep and want to filter out. This can be useful in reducing the amount of traffic you consume and even the storage required. I know that again, I'm harping on event logs still here, but one thing that I can point out here is Microsoft likes to include a little helpful information

at the bottom. Like this event was generated in a logon session, was destroyed, blah, blah, blah. By default, in the log tools that I've used, it does not filter that bottom out. And by just removing that bottom text, and it varies in length, We saved about a third of our bandwidth consumption just for that particular source. So sometimes, again, logs are just too helpful in this particular case. If you have millions of these events per day, it does really add up. And again, you're really going to love dealing with log formats. You're going to start to notice that they're not consistent, even though they're from the same vendor. As you can see here, this is an output from an event log from Sysmon. If you

are doing anything other than system application and security logs, you're going to run into this. And you're going to notice at the bottom here, we got this message field, and then a bunch of contents are at the bottom. And this changes depending on what is actually sending it. And it's really, really annoying. And it makes you realize that you actually got to change your extractions for several different types of Windows event log types. It gets even worse though because you switch to another Microsoft product again. You'll notice that at the bottom there, we have new lines for the paths there. We have new lines for the ends as well. It's really annoying. In fact, this was already

an annoying issue to begin with because to get the Windows Defender details, we had to tell Microsoft SQL that whenever SCCM makes a write to this particular table, that it's supposed to dump the contents of that write into the event log. Really useful, it's a really hokey solution and it does work, but every time the database is updated, it breaks the trigger. So it's a little bit hokey to say the least. Does anyone here think they can write a regex for this? No. So when I came across this, this was just frustrating. This was the most frustrating thing I had ever dealt with in terms of logs. This is just the surface of what's wrong with this particular appliance. And I'm going to name them by name. It's F5.

I don't understand what they were thinking when they created this syslog output. All it's doing is it's spitting out stuff that's human readable, which is fantastic. If you're a support rep and you work at F5, you're going to be reading through this real quickly because you know what keywords you're looking for. If you're trying to put this into your security log collector and you're trying to extract fields from this, it is not going to be a happy few weeks for you. And you're going to keep finding new and new things to pull out of there. Fortunately, I do have a solution for this, but I will bring it up in a few slides. You're going to want to learn regex, or at least get familiar with regular expressions.

To fix a lot of these weird issues that you come across in your logs, it doesn't hurt to know how to actually work with the extractions to begin with. RegEx is really the most common way that a lot of these non-standard log formats are useful to learn. even though they require a lot of time to put into it, they're very effective. I'm a little bit rusty these days with my regex. I don't do as much regex these days as I used to, but just knowing the basics of it will help you go a long way in terms of writing your extractions. However, don't get too creative. This is a very horrifying regex that I always like to bring

up whenever I talk about regular expressions. And this is how you verify an email address using regular expressions. And I kid you not, this is actually the effective way to do it. If you go on Wikipedia, you can pull up the article for email addresses, and it'll tell you exactly the rules for what should be inside of an email address. You just have to know how to work with the FQDN, and then you just have to know how to work with the local part. This does all the verification for you. Now, fortunately, the one thing I can say, this was not written by hand. If you are capable of writing this by hand, I will

have to say that you are a very special person. I don't think this... I don't know how you can keep track of all those braces, admittedly. But if your regular expressions are getting to this point, I'm pretty certain you hate life. RegEx is not a solution to all your woes, and you cannot get 100% perfection. You will not achieve perfection in your extractions, and you will come across edge cases, but getting something that is like 98, 99% functional will make your life a lot better. Again, I doubt you could write such an expression, and I'm glad Human Being didn't. If you want to start working with regex and you want to have something that's visual, I highly recommend working with regex101. It's just regex101.com. You can take any sort

of text, pop it on the bottom, start writing your regular expression on the top, and you can pick your flavor. A lot of the tools that I end up running across tend to prefer the Python regex engine. But realistically, once you learn one of the regex engines, they're pretty much all the same, just with some differences between them. Fortunately, there is a way out of this mess, and there is a standard, so to speak, for these formats.

Everyone knows where to start I think right now. Oh, yes, right, daylight savings. Once daylight savings kicks in, of course, we just shifted a whole hour. By sending it to UTC, you don't have to worry about daylight savings because you're just going to end up correcting it inside of your log collection software. When you're doing your searches, you should have your environment for your own account set to whatever time zone you're in. Or just work off UTC, and then if anyone asks you what time it is, just do the math. The other thing you want to keep in mind is make sure your time is accurate. And I mean not just on... like all of your workstations, make sure your appliances actually sync to the same NTP source. You're

going to end up with inconsistencies and it's okay to have a couple seconds variation between these appliances. You can work with that. Once you start getting into minutes, it starts getting really annoying. However, even the security software itself, or the security log collection software itself, does actually break time, as evident here. The time on the left is wrong. The time on the right is correct. So the time, though, that data was coming in was 11:37. For whatever reason, the software decided, "No, it's actually 10:37." I am still working to figure this out. I do have a workaround at the moment to make it work. It is going to be a nightmare for you if you

don't have your time properly synced up. You're going to be like, oh, I don't have that information, but you do. So just remember, time is your enemy, and you only have so much of it. Even though you're collecting stuff passively, you can still break stuff. As I mentioned earlier, we deal with a lot of industrial control within organizations. You may have heard it referred to as real-time systems, process control, SCADA, whatever. They're all in the same in my head. They're a huge concern for safety, and as a result, we do monitor a large aspect of our industrial control environments. And again, you want to be very careful when it comes to monitoring these spaces. As even though you're listening, you're maybe not

necessarily passive the way you think you are. I guess I can kind of bring up how sensitive these devices are, and then I'll bring up this story. If you are running M-Map on an industrial control environment, and you hit certain really garbage PLCs out there-- PLCs being the interfaces between your workstation and the machine-- you can find yourself in a situation where all of a sudden two days later the plc just stops and it's because they barely implement tcp ip to the point where they don't take anything into account other than we're just gonna you know do it this way because it works it's just It's so sensitive. So the moment that you do a

scan and Nmap goes and slams a mod bus on there, it runs into a memory leak issue or a race condition, and 48 hours later, it seizes. And that's something that hasn't happened where I work, but it's happened to a friend of mine. And it's not something that you want to be in trouble for. In my case, what happened is we had a... I had a site that had a Cisco router and it was sitting between the corporate environment and the industrial control environment. So its job was to only route traffic out of this network to be recorded into some software that they use for money tracking purposes. I don't know the specifics. And when

I went to go make a configuration change, there was a snafu in the config. I guess there was a typo somewhere, and it stopped the service. It stopped the service for three minutes. When the service was spun back up after I made the change, Nothing seemed out of the ordinary. Everything was fine, and I finished up for the day, and I went home. 20 minutes into my SkyTrain trip, I am taking a look at my phone, and the plant that I was working on had an issue where one of the machines stopped working for three minutes. And what happened was that I was collecting logs via TCP. I was collecting syslog via TCP. And this

particular networking device was configured that the moment it could not... cannot actually send syslog was the moment that it would stop routing traffic because its security settings were set to actually stop all traffic if it's not being monitored. So as a result, for three minutes, there was no data coming from the PLC to the database. Use UDP. With UDP traffic, we don't know if it's gone there or not. It actually has a lot less overhead. I used to configure with both UDP and TCP. You would tell the owner of that device, configure it to your liking. It's listening on both ports. I now just say strictly UDP. If they have a use case for TCP, then I can work with that. But I don't advise using TCP for

syslog. The poor son, or digital keyboard and monitor. If you're a team of one, you are responsible for everything and I am so sorry for you. If not, you need to identify any problems that come up and you need to know where the fault lies. Have your partners within your department involved in these situations. Make sure they're aware of what their involvement needs to be in the case where something is broken. You may not have access to that firewall that's no longer sending syslog, but they do. These teams may want to identify your log collection software as being at fault, and their firewalls are completely impenetrable and they're not going to break. But they just issued a patch last night and there was

a bug in it that broke syslog. You just have to be able to diagnose these things. You need to know how to actually look for these problems and how to work with them. Like I mentioned earlier, managed services have a limited time honeymoon. When they drag their heels, make noise, and ensure that they understand the severity of it, it's common for them not to understand your problem scope because they're dealing with multiple customers, multiple different environments. They're not going to care about it because they want to get to the point where it resolves the ticket while managing to resolve 72 other tickets. And the same kind of goes for your issue as local. Make sure

you're compliant with their requirements as well. Because they'll use that as an excuse to deviate from giving you any sort of support if you're just one gig short of the amount of RAM you need on that particular collector. And again, I'm just going to reiterate, just make sure you understand RACI. It's going to save you so many troubles if you're working in a large enough organization. So I'm going to finish this off. Then I'm going to finish off with one more Derby reference. So in derby, there is a requirement when you want to get to a certain level of play to achieve 27 laps around the circuit in five minutes. There are a lot of

techniques, but realistically, there's a lot of techniques to do it successfully, but at a minimum, you're going to be going 1,560 meters or just shy of one mile in that time frame at a minimum. It's really intense and at the end regardless of your success your lungs, knees, legs, and back just hate you. There's a reason why I like popping Motrin on Fridays. I'm still struggling to get to my beyond 20 laps. Right now I'm sitting at that and I peaked at 22. I know a lot of people who started off at 12 laps who are now able to do 24 and I witness people do 30 laps which is just incredible. It's really awesome.

It's possible to get to this level, and I'm lucky to have had a head start. I started 18 laps because I was already familiar with skating beforehand. How you achieve it requires not only endurance, but form as well, and it also requires a lot of patience. The same thing applies to security law collection. You're going to start somewhere, but only hard work will get you to the level you want to achieve. You should look at what others are doing technique-wise to achieve their goals for their collecting. If you feel like you're not getting anywhere fast enough, maybe just take a step back and think about what all the problems are. Just breathe, for crying out

loud. That's how you're supposed to do 27.5. You're supposed to get the oxygen in properly. Once you understand your tools and have your systems in place, you are going to rock it. You're going to totally get that 27.5, or in this case, get all the locks. I'm going to give my final remarks. Be prepared for things to break. I know. Be prepared for things to break and have a plan to deal with it. This does include identifying any of the risks involved. This is a big one. Don't hesitate to hire a consultant and make use of them. They're assets that make your life easier. If there's anything that'll get the fire lit under the seats of anyone in your company if they're dragging their heels, is have a consultant

that's just doing nothing and getting paid, I don't know, like $200 an hour to actually do the work for you. The company's burning money while things are going really slow. So this is sort of an opportunity to push things along and make your priorities their priority. One thing that I always say is I don't like promoting security products. I really hesitate to do that. I just never want to be a spokesperson for any particular security product. I'm very, very reluctant to do that. However, I do feel like this is the most effective security software solution that you'll ever pick up. It's not a holy grail. So don't ever treat it as such, you know. But

it will help you in all of your situations. As long as your security logs aren't compromised, you should be able to find something in there that's useful for you in the event of an incident. And with that, I was really happy to have you all here. Before anyone asks, this is my contact information. I do have the name "CateLibc," and I always like to bring this joke up. "CateLibc" is a reference to the movie "Hackers." Kate Libby, who's one of the protagonists in the movie, I basically renamed her as "CateLibc." And if you know your programming history, then you'll understand my joke. So yeah, thank you very much, everyone. And I'll happily take any questions. yep okay

so i have nine minutes for questions if anyone has anything Yeah, so we do have operations in the European Union and the United States. They have different rules for that. I can't get too much in the specifics of that because I don't work in legal, but it has come up. Overall, what I'm doing so far hasn't been an issue. We do keep the data within our own team. It's kind of complicated, to say the least. There's the European Union... EUDPR? Yeah, that applies to us. I'm an EU citizen, so it definitely applies to my company in that case because it applies to any EU citizen worldwide. That could come up, but we still haven't gotten to that bridge yet. No, not really. And you know what? I never

want to deal with compliance, admittedly. I got out of consulting for that reason. That's the case then. That's fantastic. One question is great. All right. Oh, sorry. Yeah, you mentioned about... Yes. So there are first tier in that particular case, and our role is to sort of deal with the impact of it. It's a little bit, it's a bit of a complicated relationship, but realistically, they're just like with UDP, it's much simpler that way. Yeah. Are you not concerned about like packet loss? If you have packet loss internal on your network, then I probably have much bigger problems to deal with at that point. That's kind of how to look at it. Because it's staying internal at

this point. It's encrypted once it leaves the network. So if we have any sort of packet loss, like say we have traffic going from South America to North America, that's an issue for our telecoms provider to look into why that's happening, or networking gear at least. Yes? I really do not know. I was hoping you'd like to see as well that. do you know what my solution has been so far is like hey boss guy can you give me an extra 50 gigs so that's been my solution so far yep yeah absolutely So we do make use of a CMDB and we do make use of Active Directory. Right now we're going through a huge process where we're actually migrating the data into our own OZ cluster. And for

me to talk about how we're doing it now is kind of broken because the way we're doing it is we have to mirror our entire AD data to this particular software vendor and then we have to reprocess it so it's actually compatible with all of their apps. It's really ridiculous. When we migrate over to our own hosted solution, it's going to be much simpler for us, and I actually will be able to comment on that much better than the way we have it right now. I'm so embarrassed about it that I don't even want to comment on it. Awesome. I guess this is it. Well, thank you very much, everyone, and have a good rest of the conference. Awesome. Awesome. Test. One, two,

three. Test. One, two, three. We good? We good? Awesome. the next speaker testing testing testing awesome test one two three test test test Vancouver one two three testing microphone testing mic speaker microphone test one two three test testing testing testing yeah maybe it's just because I'm holding it too close to like all right test test test test test one two three Besides Vancouver, test 1, 2 3. Test 1, 2 3. Testing testing testing. Test, test test. Microphone. Mic check. Awesome. That's great. That sounds perfect. You're good. Fixing that whatever that was. ... ... ... ... ... ... ... ...

Okay, let's get this started. Thanks all of you for being here. I'm really excited to be talking. This is my first conference presentation. And yeah, I'm really excited to share with you some of my lessons learned from an investigation that I performed, which I'll walk you through over the next 40 or so minutes. So, presentation only goes for about 40 minutes, so feel free to jump in, ask any questions, let's make this a conversation if we can, and if not, feel free to hold any at the end and I'll be happy to talk about it. So who am I? So my name is Barnaby Skeggs, I'm a forensics and incident response IT guy from Australia. So

before moving to Canada, I worked in forensics and incident response in a consulting capacity at a big four. So I worked on these types of cases all around the nation in Sydney, Canberra and Perth where I was based. So what am I going to talk to you about today? I'm going to tell you about three things. I'm going to tell you about an investigation that I was hired to do, a And this investigation led me to identify a file which I had to reverse engineer because in a forensics capacity no one had attempted to reverse engineer this data structure and pass both the contents and the metadata out of the file for forensic relevance. And

finally I'm going to give you some takeaways that you can use if you either encounter this same activity yourself, or even if you're a blue team or a pen tester and the relevance of this file to your job. So let's start with the fun bit, the investigation. So this investigation was a data leak investigation. One of our clients had a customer who had obtained a sensitive email that they weren't supposed to have obtained. So our client had shared this email internally and every recipient of this email knew it was confidential and was not to be shared. However, what we did find, or what our client found, is that their customer had knowledge of this email and was able to quote it in a meeting word for word.

So we looked at this email and when we're dealing with email data an easy method of exfiltration is simply to forward that email. So that was the first thing the client looked at doing. However they had insufficient data retention. They only had three months worth of email data and the time frame of this email was over a year old. So they weren't able to identify if anyone had forwarded on from their work email. Of course there are other ways which people could exfiltrate an email and we'll get to that shortly. Now the suspects, there was a well understood group of people who had access to this email but our client also had some lax shared folder access in the exchange so there was a wider audience of people

who potentially could have accessed the email. So we were hired, we came in, and we did a covert imaging of a handful of computers which we would use to forensically analyze and see if there was any indication that any of the staff members in question had sent the email out. The reason this had to be covert was because a lot of the people involved had fantastic employment histories with the company and they didn't want to tarnish anyone's employment relationship under false assumptions. Now, has anyone... Yeah, yeah. Mm-hmm. Yep, so in this scenario we were able to obtain physical access to the devices after hours when they weren't at work. So we came in after hours and made forensic backups

of those hard drives using, well we had a particular Linux boot disk with write blockers. And write blockers restrict you from being able to write back onto the hard drive, so it's a forensically sound acquisition. So who's, has anyone done email forensics before in the room? Or even been an Exchange admin? Couple of people. So you might recognize some of these data sources which are the standard places you would look for email-based evidence on disk of forensic investigation. You've got your PST and OST data stores. These are the primary data store, local data store of emails in a Microsoft Outlook installation. You've got individual email files, so .eml, .msg files. These are separate from your Exchange store and can be stored individually, copied around on your

hard drive. Carved email is looking for those same types of email files, but in the unallocated or deleted space on the hard drive. So if you imagine you had your current PST Outlook file, and then you had a recoverable PST Outlook file from six months prior, if you can recover those, put them together, and deduplicate that list using file hashes, then you've got a more complete data set. Shadow copies, a lot of you will be familiar with the Windows shadow copy feature, it's the same deal. If you can get a historical version of an email database, then you can extract more information, have a larger data set. Alternate mail apps and websites can get a bit tricky and a little bit interesting. So if the user

has local admin privileges on their account, then they'll be able to install programs and they can install, let's say, a Thunderbird or a third-party client, which they may be able to configure, depending on your setup, to a personal email account and send that out. So when we're doing email forensics, we'll also look for any other installations or databases of third-party email softwares. However, when we're looking into websites, so if someone logs in by the browser and uses their Gmail or their Hotmail to send it out, it's far trickier. So there are some scenarios where you can recover or reanimate a cached webpage, however that's only generally possible pretty soon after it was visited. So you might just have the URL record that said that they visited Gmail.

If you're lucky you may be able to get some subject-based data about the topic of the email, but it's unlikely. And finally, which is relevant to the type of file that I had to reverse engineer for this investigation, you do have the search index, which is the thing that allows you to find things really quickly in the Outlook when you type in a word and it pulls up an email from two years back. So this is a populated text-based index store which enables you to almost instantly find old files. The issue with this from a forensic standpoint is that in my experience this search index maintains relevant with the current files in that data store.

So if you delete an email file you're very soon going to lose that index otherwise the index would just build and build and build over years and it would take up this space. I passed all these files, I looked at them on all the forensic images and because we knew this email, we knew the subject line, we knew the date range and there was nothing. I didn't find any evidence that this file had been, that had even existed outside of the people who actually received it themselves. So no evidence of it going out. And working in the consulting space, and you always, like my boss, ask me is there anything else that we can do

to just prove that we've covered all bases so i said you know what yeah there is so i took the subject line string of text from the email that was exfiltrated and just ran it over the whole forensic image in a search seeing if anything would flag and one thing flagged on one pc which got me really excited but it flagged in a file that i knew nothing about so it flagged in waitlist.dat And I'd never heard of this from a forensic sense before. I'd never heard of this file, worked with it at all. So I asked my manager, I said, "Hey, do you know what this is?" And he's senior to me by 10

years in the field, very smart guy. And he said, "No, no, I've never seen that before." So I asked a friend of mine who works in the same forensic space for the Australian government, and he's five years ahead of my manager, who I asked, and he said, "No, never heard of it. Look into it. See what you find. Tell me what you find. Let's figure this out." So I began doing some researching. So the file existed within the appdata directory of the user profile under appdata/local/microsoft/input-personalization/text-harvester/waitlist.dat. Now when I saw that, I thought, hey, this has got some text in it from an email. I knew that. And it's in a folder called text-harvester. That's interesting, right? And it's within input personalization. Now

if you look up what input personalization is on your Windows system, you'll learn it's to do with the tablet input service, so the touchscreen input of your device and the settings around how Windows manages that. I would later find out that waitlist.dat is locked on a live system by the search indexer process. So at the time I was working with an offline forensic image, so I didn't know this. However, very soon I recreated the file on my PC and saw that it was locked. So we know something to do with the search indexer is extracting some text, storing it in this file within the text harvester directory. I began researching looking at some programming forums I actually found a few obscure references to this file in some programming

forums about improving the recognition of user input related to touchscreen conversion to text by accessing what they call lexicon blobs and I'd never heard of lexicon blobs for those unfamiliar with the word lexicon it simply means like a vocabulary and so they've I started looking at what these lexicon blobs were. And as of Windows Vista, The operating system has a capability to store user and application lexicon blobs where they collect data from any source and it uses a text trainer to train the text content of this data into user or application lexicon blobs which then can be accessed by a system call for whatever purpose is needed. So on the Microsoft MSDN website, you can see a, well that's a bit rough, but you can see a

diagram that they've drawn. And I'll explain, so on the right hand side, we've got some user input in the red, and it hits the input personalization system, which is the bottom right box. Now this goes across to the recognizer or the text trainer and it's stored in, on the bottom left you've got there app lexicon blob, user lexicon blob and there's a few other lexicon blobs as well. Now on the left hand side you can see you've got an application, in this example the ink application, which is able to make a call through the platform API and access this blob data. So, this is pretty cool. My brain started ticking and thinking that this is essentially storing my email and lots of other data in these

user blobs to be able to improve the recognition of touchscreen handwriting to text conversion. But I didn't know this for sure. And when you're working in forensics, not knowing something for sure just isn't good enough. Because if I made a call and said, "Hey, this text index was in waitlist.dat, this person sent it out." Then that person could very well have lost their job. Now, I didn't know the way that this file functioned. If they simply... If that email had been indexed because they opened a shared folder that had it inside and they didn't interact with it at all, that's very possible. I don't understand how this file works yet. So that is what I needed to gain clarification on. So my first step in order to do

this was to try and recreate the file myself. I had a touchscreen PC, yet I did not have the file. And I also knew that I'd never used the handwriting to text conversion feature because who uses the handwriting to text conversion feature. So I did, I opened up OneNote and I started doing some squiggles and doing some math. It's pretty cool when you get started with it. I still wouldn't use it but it was a nifty feature. And when comparing the registry before and after these actions, we can see that two registry keys were added. We've got an app lexicon timestamp and a lexicon generation key. And this just reaffirmed that I was on the

right track with my investigation. Interestingly, the app lexicon timestamp predates my purchase of this new PC, so I assume it correlates to something to do with the Windows installation date, but it didn't end up being relevant for my investigation. So now I've got a good understanding, I know why I have to reverse this thing to gain that context for the investigation. So I start opening up the file and seeing if I can understand what's in there except for text. Now when we, who loves hex? Who loves opening up a file and just reading something in hex? You guys are on your own. No, it's tricky, but if you work in forensics, you need to learn to be comfortable with hex because there's a lot of awesome value that

you can get out of being able to interpret hex and slack space of files. So when you open up this waitlist.dat file, you can see there's some human readable text there, like this one says, RedShot is a small free open source registry compare utility on the bottom right. That's fine, that's easy to understand, anybody could read that. However, you see all these dots and dashes between everything and at the top you see a D followed by some non-English characters and these things aren't so easy to interpret, especially when you have no idea what the data is that you're looking at. And when you view it in hex, these values, they're not just zeros, they contain

actual fields which will be interpreted by a software that's reading these files. So my first impressions when opening up this file, which was 140 megabytes of pure text, was that there's a lot of text, right? It appeared that there were multiple full text indexes of emails and documents, but I wasn't familiar with the dataset. So this dataset I was looking at was somebody else's emails. I didn't know where one email stopped and one email ended. There appeared to be some metadata between each of these files, those non-readable characters, and some of the dates and times from those text records. So the emails, if you reply to an email, the header of the from and sent and received, that was stored in readable text. And some of

those values were years old. So I knew that there was a lot of data in here to be interpreted. However, as mentioned, it got very confusing trying to figure out where one file started and one ended. Because if you forget about files and you just have a massive text that you don't understand and you've got an email chain followed by a document which you don't know if it's related or not, followed by a 20-page report that could be an attachment, it could not, followed by a second email which was a fork of the first email and contained the full text of the first email, it gets very confusing very quickly. And when you're dealing with

large documents and emails and you're scrolling through a hex editor, you're not able to easily find the start and end of a file. So I had to build an approach to analyzing this that would allow me to sequentially interpret each text index within the file. So I started with data reduction. I had two files now. I had a 140 megabyte file with unknown emails and unknown metadata in it. And I had a 10 megabyte file that was created on my system when I did my test. And I knew all the documents and I knew all the emails and I knew all the metadata values of those index documents. So it made a lot more sense to use my file for my initial analysis. There were far less unknown unknowns.

I started bouncing this idea, one of my close colleagues was a forensics analyst who had previously been a programmer and I started talking to him about data structures and he said, "Look, if you're going to analyze this, you have to think like a programmer and think about how a programmer would write data to this file." Now, I'm not a programmer and I probably never will be a programmer. I can script pretty well, I've got a handle around a few languages but I've never been a proper application developer. So, when I started thinking about this, and I suppose I'll take a step back, when you're looking at a lot of files, most of them will either be static in structure or dynamic in structure. So if you've got a file

that's static in structure, let's say a web server log, Now, each line of that web server log is going to have the same number of fields and each line is probably going to be interpreted individually. So when a program goes to read that, it knows how to just read one line and delimit on these values and boom, you've got your data. But when you've got a dynamic data file, the program reading it has to understand how to walk through the file to read the data. So when it starts, if you've got a string coming, it needs to know how much text is in that string. If you've got a particular portion of a file that

has a set of data, then there needs to be a flag to tell the program this is how you're going to interpret the following bytes. And so he began to give me tips on things like do the indexes have an index record length? Can you find some timestamps? Do the text fields contain metadata, so the string lengths and things like that before them? And is there encrypted data? So look for chunks of seemingly random data because if you're trying to interpret that without decrypting it and it's encrypted or compressed information, you're going to have a bad time.

So I started to think about how I can match patterns within these files. Looking at the hex bytes, there was a lot of things that weren't the same and I had to try find those pieces that I could match between each index to help bring some context to that metadata. So, I mean, I wouldn't have thought that I would be using a notepad editor for hex analysis, but it's what I ended up using for the reason that it supports style tokens. Now, there are plenty of hex comparison programs out there, but if you give it two hex files, it will tell you the difference between them. However, I couldn't find any, and if you know

of any, please let me know, that looked for patterns within the same file. However with notepad plus plus when you pull out all of the hexadecimal data you can apply style tokens to those strings so if we look at the top in the orange I saw that appear in a few places the D2 2d d201 sequence and so I highlighted that string apply to style token and every repeat of that Hesse deximal value throughout the whole file was then highlighted in orange and I began to build this structure of reoccurring reoccurring hex values to try and find a pattern between the metadata. And I've trimmed these so it's easier for the presentation view, but these

are two separate index records and you can see that there's a similarity in their structure and hex values. Now if we look at the interpreted reading of these hex bytes, we can see we've got a string that says waitlist tester at outlook.com and then it gets partially weighed through waitlists. And then at the top it's the same string. So this is likely an email index from the same test account that I built. So it makes sense that a lot of these metadata values would be similar. So we're getting to a point where we can see that we've got some similarities, we see that we've got some differences, however, we still don't know what these values

mean, and that was very important to figure out. So when we're interpreting text, sorry, when we're interpreting hex, there are plenty of ways in which you can read a hex value. So it can be big-endian, little-endian, different types of encoding. And the best tool that I found to help work through this problem was a hex conversion tool on skardacore.com, which you fed a series of hex bytes and it interpreted them in 16 different ways, which enabled you to work through those different interpretations and see what was most likely to be contextual to your file. And I'm happy to share these slides at the end. So an example of this in the realwaitlist.dat file that I used was looking at the 19 00 00

00 hex sequence. I put that into SCADA core and when it was converted into little endian uint32, it came out with the decimal value of 25. I then look at where it's positioned. So it's positioned right before a string of text. Get started with OneNote. Now this string, when you jump 25 bytes forward, it brings you to the end of the string. So this, I was able to quite easily understand, is simply a string length in a data structure. So that was quite easy and it ended up being quite consistent throughout the whole file, except for the body, which I'll talk to you about in a minute. I began looking for date and timestamps. So if you're using

a hex editor like I was at the time that is paid and feature rich like X-Ways Forensics, then some of these can do automatic interpretation of a large number of timestamps. However, we don't always have that luxury. So there's a great tool called Decode, which you can download, free forensics tool, sources at the bottom there. And you can copy it a series of hex bytes and it will interpret those timestamps for you in, I think there's roughly 15 different Decode formats. and give you the human readable time and date. So here we've got one with the example, 8 bytes, this is the photo from their website, and the interpretation to the 2015 date stamp. So

I'm starting to build a better picture here. I've got the date stamp of each record, and I've got the string length so I can see how it's reading through the content of these text indexes. but I'm still missing the capability to walk through these text indexes and separate them from one another. And I'm also missing some of the core metadata values that will indicate whether this person had sent the email out or whether they had received it. I didn't know if that information was even stored in the file or whether it was a simple text index. But I knew that there were hex values which I didn't yet understand. Now the date and time stamp before I jump into the index, you can see here a good example. Just

after the blue selection, you've got a series of random looking non-English characters. Now that was, it basically stood out like a sore thumb when I was doing my analysis, amidst the dots as a value. And when I began looking at the hex interpretations and them being either too large or negative to be able to build into my analysis, then I thought about timestamps and that's how I identified the 8-byte timestamp. Now, before every timestamp, I also noticed that there was a repeated 4-byte hex sequence. And so this wasn't the same, these weren't the same bytes between all my indexes, but before the timestamp of every index, there was the repeated 4-byte sequence. So I threw that into SCADA Core and converted

it to the same format that I'd used before for the string length and it gave me a value of 1715. And similar to the text length, when I jumped from my current position, 1715 positions forward, it brought me to the next repeated 4-byte sequence, which was the start of the next index record. And this is where my analysis started to really pick up because I was able to strip them from one another programmatically and dig into those last few unknown values to complete my analysis of the data structure. So I use the Python bin ASCII module because it enables you to easily work with Python. It's great. And then I began to take those last few unknown hex bytes and programmatically output them

into a big list using my known data source. And then looking at my real email files and my real documents that were indexed and saying, "Okay, this sequence of hex bytes is 01 for this file and 00 for this file. Look at the source data. Is this one read? Is this one sent? Is this one received?" Looking for those matches. and building a hypothesis about what I thought they meant. And once I thought I had a good idea, I began to extrapolate that over the bigger dataset. So this is a pretty representation of the data file and I showed this to a friend the other day who's not a forensics guy and he knew that my content was on email forensics and

he just said, "Well that looks like an email. It looks like an email structure. You've got some things at the top and then the fully colored in sections, they're the real text value that I've redacted from this example." So to give you a quick walkthrough and a more visual understanding of how this structure worked was at the top you've got the 64 followed by the four sets of zeros now this is like the Magic number or the the file record for this so every waitlist dot dat started with this hex sequence following that you've got the repeated four byte sequence which is the length of the first record then you've got the timestamp and And

then you've got some of those other unknown metadata values, which were single flags. They were like your cents, your receives, and some of your text types. Now, the following eight bytes of zeros was probably one of the most interesting parts of this reversal, because whilst it's zeros for this email, if it was a document that was indexed off disk, this had a... 8-byte length, which was a GUID or a document identifier. So this was cool because if someone was writing a report, I was able to use this GUID to match text indexes over a span of time. So if they started writing the report on a Monday and the search indexer had indexed their report, they would have their introduction sitting there in that index.

If I search for that same GUID throughout waitlist.dat, I would get a hit from Friday where they'd finished the body of the email and you'd be able to see the three-quarters completed report text index. And then the following Tuesday when they finished their report and their bosses reviewed it, you'd be able to see all the markings and the comments and the changes. So you've got this historical snapshot of the document over time, which whilst wasn't necessarily relevant to my investigation, could be very relevant to... an investigation into some copyright fraud or research fraud because you'd be able to prove that someone had been working on that file over a period of time. We then move

into the blue section. So I figured out that at the start of each section of this file there was a flag to say we're now entering for the blue section, the 07, it says we're now about to list off email addresses that had received this email. We then move, we've got the 0c is the string length followed by the email. And then you've got a double zero. Now for this email there's only one recipient. So this double zero simply says all the recipient emails have been listed. We're about to move to the next format of data. 04 says we're about to tell you email names. This is the name that you'll receive in front of your email address when you get an email. Exactly the same structure goes through

and then you've got, interestingly, a 01 at the end there says these header fields of the email are now done and we're going to move into the body. So with the body text there's a few more values which I'll detail in the next slide but the interesting thing is a similar format is it uses the 01 and the 00 feature to say you've got an extra length of body to read so it loops through the body in sequences. We had a question? It's written by the Windows search indexer process which runs on top of Outlook data and data within your Explorer directories on your file system. So anything that's indexed by Windows will be stored with a text extract

will be stored in this file permitting that you have those registry keys set and those settings for input personalization set on your touchscreen device. So, after I had reversed all these values, I was able to see that for Outlook emails, you had the date and time that the email was sent or received. You had the email subject line. There was a binary flag for whether that email was sent or not, like sent or received. You had a type field, so looking at is this an email, a document, or I also found contact information stored. It listed all the recipients of the email. It didn't distinguish between your to's, your cc's and your bcc's. I actually found a few places where it wasn't even sequentially aligned, it just

stored them all in the same block, which I found interesting. If it was a calendar invite, it included the meeting location and of course the full text extract of the body of the file. For Microsoft Outlook content, yeah, question? - Didn't include the folder that the email was found in. - Interesting question though. However, if someone had drafted an email, it wouldn't have the sent flag because it hadn't yet been sent. So for the purpose of this investigation, the sent flag would be zero, so it would show that this email hadn't actually been sent out by the user. For the Outlook contacts, you've got basically all of the main primary contact information, even down to a URL or a website associated with that contact if

it's stored. For documents, so any file format which is supported by the Windows indexer, which means most common documents, it excludes compressed data unless you've configured that in advanced settings, which I haven't run into at all. So it includes the last modification date and time stamp of the file at the time that the index was taken. It includes the document ID or the GUID, which I've mentioned before is useful for tracking between times of indexes. It includes the full body of that document and the company metadata field. If it was a PDF document that contained image-based PDF content, it would of course contain no text because it doesn't do character recognition, but it contains all the other metadata fields and the body section is simply blank.

So this took me some time. In the consulting space, you know, they need answers yesterday. So, unfortunately, I wasn't able to complete all of this analysis in time to apply it to the timeframe that the client needed for their investigation. However, we had completed enough of the analysis to say that at some point in time, this email was seen by this system and an index was taken. which gave them the confidence to not accuse the employee based on this information, but just to have the conversation with them because they knew that the system had interacted with this email at some point. And so for this investigation, it was merely used as contextual.

So the takeaways, things that you can take back to either improve your process if you have to do this again or use this data for your Blue Team email investigations or even a pen test. So first of all, my lessons learned. So if you're looking at a data structure which you have no idea how the data is stored within it, Do your research. The last thing you want to do is spend a week, a month, two months trying to reverse engineer this data structure only to find that it's documented in the obscure support section of that company or that someone else has already done this job. Because it's not fast unless you're really, really good

at it, I guess. It wasn't quick for me the first time that I did it. Identify which company created this data structure. So in my case it was Windows, so a lot of my interpretations of hex weren't using, for example, the Linux timestamp format, because that can give you indications to help and speed up your interpretation. And see if they have documented any other data structures, because there's a good chance that the same programmer or the same group of programmers is gonna be working with this data structure as well, so there might be a lot of similarities. Definitely, if you can, recreate your evidence within a controlled environment. Try to limit the amount of unknowns that you're dealing with in your analysis. It will save you days

or weeks of just headbanging when your hypothesis turns out not to be true when interpreting some of the hex values. And also, working with the same data set, try and significantly reduce the size of the data you're looking at at first, and then extrapolate that out when you have a better understanding. I found Notepad++ style tokens to be highly beneficial to my analysis. Being able to match those patterns was really able to enable me to separate my indexes in this case and in a lot of different data structure analysis you'll be able to improve your matching of hex values throughout the file. The tools you use for your hex interpretation, whilst they will all give you the hex, some of them are better at interpreting data

than others. X-Ways Forensics was the tool I used at the time and it contains a lot of advanced features for decompressing data and automatically recognizing certain hex fields, but it does cost money. They also build WinHex which is quite good and they've got an evaluation for 15 or 30 days so you can give that a go. Or HXD is a competent free hex editor which allows you to view both the hex and the interpreted values side by side. SkateCore was easily my favorite tool for converting those hex values into a range of possible interpreted decimal values. And when I was writing this and thinking about my lessons learned, I thought about the concept of could I have automated more of this process? And

I thought with the pattern matching, I could have stripped this file out into blocks of four hex bytes and automated pattern matching between them. And I soon checked myself when I realized that because my file included non-consistent length strings, it would break my four byte sequencing and so the matching might not be there. But if you're looking at a data file that maybe doesn't have those dynamic strings and it is more consistent, then you could definitely give that a go. I'd be really interested in hearing about it if you do. And when you're looking at hex, you need to have some ideas about what they could possibly mean. So I compiled this list when I was doing my analysis. I brainstormed with some of my friends and my programmer

buddies and brought up this list of things. So whenever I was looking at an unknown hex value, I would work through this list and ask myself, does it make sense for this value to mean this interpretation and work through them. So I won't read them all out, however you can grab them on my slides afterwards and if you're going through some hexanalysis of yourself, it's handy to keep side by side. So this brings me to probably my last and biggest lessons learnt from this investigation and it's something that if anyone's worked in forensics before "No" is one of the most important parts of your job, which is validation of your findings. If you claim to have a complete and full understanding

of an unknown binary file after analysing one or two or five of them, then you need to rethink the types of decisions that you're making about your evidence. You need to be able to extrapolate your findings over a large enough data set that you would feel confident going to court and saying, I know this file, I know how it works, and I'm willing to back the analysis of this for someone's job. Because if you don't validate that correctly, you can ruin people's lives. So for my analysis, I don't consider my analysis there yet. So I published all my findings on the internet, free for anyone to read on my blog. I'll give you the link to that in a minute. And essentially said, this

is what I've done. I'm throwing it out there. Let me know if you have this for your investigation and this file's relevant for your investigation. Try out my tools, try out my interpretation and see if it works for you. Let me know how you go. And so far I've had two investigators from the US reach out to me over about a year and ask me for my interpretation of the context for this file for their investigation, which was quite cool, but more validation is always better when it comes to this sort of stuff. So if you do some research like this and you find yourself interpreting a file, put it out there, publish your research.

If you're more patient than me and willing to put more time into it, maybe write a white paper. and get people to try it out, give feedback and hopefully we can build some confidence in these additional data structures which can be highly valuable in an investigation. And finally, my full details can be found on my blog at b2dfir.blogspot.ca I've got a GitHub under a similar structure where you can download either the Python parser from waitlist.dat or I've compiled an executable parser for the same file. What this parser does is it extracts all of the individual indexes out into separate text files and produces either a CSV or a Excel spreadsheet containing all of the fields

except for the body, 'cause it got a little messy having the body of the email in the spreadsheet. So this is like a index for you to scroll over. If you choose the Excel option, then it hyperlinks the file index to the text file extract, so you can easily click through and view the full text extract.

So I mentioned earlier that this could be useful regardless of your role in information security, and here's why. So if you are a defender, if you're a blue team member, then you're Initially you think, well maybe I want to, actually I'm gonna start with pen tester 'cause then it'll make more sense. If you're a pen tester, you've got this file here that is locked by the search indexer process, but you can just kill that, it's not a privileged process. And then you've got this file that contains a text extract of potentially every email and document that that user has received over the last, in my experience, at least three years. So you can do a string search over that for login. password, see what comes back. If

that user's ever received an email that contained a password and even if they've deleted that, if it was indexed, it should still exist within waitlist.dat. Now, as a blue teamer, you may want to get rid of that and rightfully so. And you're basically trading off then, deleting the file and changing the input personalization settings on your user devices. at the sacrifice of recognition accuracy for your users. Maybe it's something you can just turn off and they're never gonna know. And finally, forensic analyst, obvious, is what we've been talking about this whole time, how it can be useful for you there. So, I mean, thanks everybody for sitting through and listening to me talk. - You mentioned that some of the data structures might be

not as good as the program, how do you subject to the same policy Yeah, I had done research at the beginning and didn't get very far. So some of the, like, because it's built by Microsoft and the guys who are building Microsoft applications are going to be different to the guys working on the operating system design and it's such a big machine that is Microsoft that I wasn't able to find any useful correlated data. I've actually reached out to some Microsoft contacts that I've been put in touch with in hope that I'd be able to get some validation over the data structure research. However, I've found that even contacting those people within Microsoft, they don't even necessarily know

the right people to talk to to identify this and whether or not Microsoft would be happy to give out what's seemingly a proprietary data structure. I don't know. If you work for Microsoft, talk to me. Any other questions? Yes, so the search index, oh, good question. I don't know which version the search index is started in. I know it's been there since at least Vista. Does anyone remember it being there for XP? Yeah, maybe. Yeah, I'm not sure exactly when it started, but yes, it's native to Windows. So if you go into your Windows control panel, you'll be able to see search index options, input personalization options. You can go through and configure your initial

recognition settings, so where you practice writing some characters with your touchscreen. Mm-hmm. So was it specific to the touchscreen side? So Windows Search Indexer is not specific to the touchscreen side. It will index into the normal Windows Search Indexer database for your files without touchscreen input. This waitlist.dat file and the generation of the text harvester directory within AppData Sorry, the generation of the waitlist.dat file within that directory was specific to users who had and did use the touchscreen for text-based interpreting, for handwriting to text interpretation. One at the back. Yeah, so I can't go into too much detail. I mean, I've been given the go-ahead to publish my research, so that's why I'm here. Essentially the information was used

for contextual discussion and investigation with the employee. I probably won't go into what happened after that. Did you have a question there? Nope, so it was a different device and it worked with my touchscreen. So it seems to be inherent to Windows rather than a specific vendor device. So working in consulting in the forensic space, we also had the benefit of having a large data set of forensic images to check against. So I did validate against some of our other forensic images that we had of like surface tablets and things like that. And it was, quite consistent over all touchscreen devices that were running Windows of, I'd only tested it against Windows 8+ machines, but it could exist on Vista as

well. I wouldn't be surprised given the research that I found, or the information that I found about the input personalization and the lexicon blobs from Vista onwards. We have a question here? - Yeah, given that this idea of what So the file is only generated if you have that handwriting input personalization. But if you, you don't have to go through the config of the input personalization in the control panel. If you begin using handwriting to text, those registry keys will automatically flip and start generating the file. Don't see any other hands up. So yeah, awesome. Appreciate you all listening. I'll be hanging around if you've got any other questions or want to chat generally. Yeah, thanks. So I

would tell the... I am? Uh-huh. Give it a go. Nice. - Is yours a desktop with a touchscreen? - It's a laptop, but it doesn't have a touchscreen. - Right, I would be really interested, because I've only used it on touchscreen tablets and laptops that are running Windows. I haven't tried an external touch. That'd be cool. - I bought it because I wanted to get a monitor. - Yeah, yeah, may as well get a touchscreen one. Nice. I'll be sending these around with bsides, so email me. Let me know how it goes. I've got a question. Those registry flags, you said they flick on when you start using the input. Have you tried flicking them on manually to

see if it starts generating that file? No. Okay. - Okay. - Can we try that? - Yeah, yeah, yeah. - Okay. - Thanks. - See you, Mark. - Yeah. Always on the right team, man. Can we try that out? Yeah, man, for sure. Okay, dope. I'll chat with you later and find out which flags they are. What I'd be interested in seeing is what happens if you flip those registry keys on on a computer that doesn't have touch screen. That's what I want to know. Can we enable a key logger and start logging documents and other shit without anybody ever knowing? Let that build up and then steal that. How long would it take before it rolls that car? I saw three years and I turned this on on

my computer for my research and at no point did it ever roll over. Oh wow. The three years of data was 140 megabytes, so maybe there's a cap, but I haven't seen it. I wonder if we could seed one and see... - The interesting part that I'll mention about the registry key flip was that it didn't happen automatically. I wasn't like, do my first conversion and this has changed. I did my first conversion and I was like, oh man, this doesn't seem to be working. The next morning I turned my computer on and the file was there and it populated data. And the key changed. I can't remember specifically whether I rebooted or not, but it was definitely triggered by that. It just wasn't an instantaneous. change

which is an unknown right now. Cool, I'll give you a shot, we'll play with it. Can you elaborate a little bit more on how you copied the disk? Yeah, yeah, sure. So essentially these laptops were BitLockered and they were this organization, they left their laptops at work. - The office, yeah. - So we just came in after hours, like eight o'clock, with one of the security staff, and went over, we pulled out the, they unscrewed the laptops. Normally we do like, take a photo of the desk, so if we move stuff, it's just in the same spot. and unscrewed the hard drive, fully took it out. They were BitLocker encrypted, but because we were working with the organization, they just gave us the keys. So, NCASE Forensics is

a forensic software suite which can decrypt BitLocker drives if you've got the key. So, we took the encrypted images, which would have been garbage to us otherwise, put them into NCASE and decrypted them, and then re-encrypted the now unencrypted disk image into a raw disk image file, which we then use for our... You didn't have to encrypt it again, but you just did it for safety, right? No, no, sorry. It was already encrypted on disk. You were encrypted. You decrypted it and then you encrypted it again? No, no, we decrypted it. And once we decrypted it, we were in our forensics lab, which has all the physical access controls, all of the... Like, it's air-gapped

from the rest of the... IT environment. So then we try to work with unencrypted data when we're doing our data processing because it's faster. Awesome, very cool man. Cheers. Thanks, I appreciate it. Yeah, yeah, yeah.

Thank you. - I attended this last year. I attended one in Australia. They have big ones in Australia, but the one I was . So you started to join us. I'm mainly gonna go down to talk about ... ... ... ... ... ... ... ... Thank you. Thank you. All right.

Can you use the converter or can you guys use direct HDMI? Oh yeah, I can just... Oh, yeah. This one? Yeah, he might say, yeah All right. Thank you.

Thank you. Thank you. Thank you. Thank you.

Thank you. All right. - Okay, but how do I test this? - Just turn it on. - Just turn it on?

Hello. Hello. All right. All right. I'm a survivor.

- Oh, good house. You didn't hear me? - Very interesting what you guys are saying. - I'm from Lebanon. - Thank you.

Do you know Rocky? Do you know Rocky down in Portland? So, I don't know if I'm important. Rocky is a sales engineer guy. I work for Octo. So, we ended up working with him. And I went out to Vancouver. So, I've been a few times to various events. I don't know if I'm important. Usually, we have lots of... Yeah, it's very different.

Thank you.

Still four minutes.

Good afternoon. I hope you had a good copy. You will need enough amount of copy to be awake for the whole thing. So we're going to look into how we do malware research the other way because we're using some other tools, usually some proprietary and everything. But this one, after this presentation, you should be able to do your own malware research using open source tools and some OSINT tools that you can find from the internet. But never try to use your personal computers to do the malware research. Try to go to your neighbors and then use theirs, not your own. So before everything else, my very first conference is actually besides Vancouver, that was 2015. And after that, I kind of

put some conference metrics and I work at Fortinet. I've been writing in our blogs and also in virus bulletin. You can actually search the articles, it's mostly about malware. That is usually my profile pic. But every time I do some malware research analysis, That's me, not the other one. It's kind of hard in terms of because you can just run tools and everything to learn about malware, or you can just run your machine, especially ransomware, it will just pop up and everything and then there you go, you have your malware. Anyway, just for caution, Don't try this at your home. Go to your neighbors, to your friends that doesn't know that you're trying to do it.

OSINT and malware. Normally when you try to, when you heard of some malware, malware in the wild, the first thing you do is you search online, you go to some blogs, security blogs, especially blog.portnet.com, that's the only vendor pitch. So usually it discusses new discoveries, in-depth analysis of the malware, you don't need to do that. And then new threat vectors and attacks about the malware itself. So that is the very first point of contact that you want to go through to check what are the malwares doing, right? And of course if you have the actual samples, actual malware, you can actually throw that sample to Baris Total. There are a list of AB vendors that is there and

then it will tell you if that particular file or suspicious file is actually a malware. It will give you an app detection if it is really a malware. It is a pre-sample submission and lots of information for the submitted sample.

For example, we have this. Normally, if you have a file, there's what we call md5sum or sha256. Those are the hash that is the ID for a particular file. If it changes, the hash actually changes. So normally, if you want to search for something in VirusTotal or in the internet, all you have to do is to figure out what is the hash and then check it on with the VirusTotal and then it will give you something like this. There's a list. the list of AV that was there, it actually is more. So it will tell you 54 out of 61, there is a detection for that particular file. That means your file, for example, some

friend or again your neighbor gave you a file and then say this is a good game. You can throw that file to VirusTotal and then you will know that if it is a malware or not and then give it back to your friend. Let's go now to some tools that we sometimes use. For manual analysis, we actually have tried to, if you want to run your, as I said earlier, you don't run your malware or your sample to your personal machines or your personal computer. You can download VirtualBox. It is also free. It is something like a VMware or something. You can actually create a virtual environment. You can install any operating system once you're inside the VirtualBox, it is a very powerful one. If you have a

powerful computer, you can actually run different operating systems inside one machine using VirtualBox. But if you're rich, you can actually buy VMware. It is also doing almost the same thing, but the other one is just, you just have to pay. You can safely execute and monitor malware but be sure try not to have the connection from your virtual box to the internet. So that the internet will not go away from your machine. But be careful, there are actually malware that can escape VirtualBox. There are actually malware that can escape VMware. So be careful about that too. So to make sure, if you have a really powerful malware, you can store VirtualBox, install an operating system, install VirtualBox

inside, and then run the malware on the the second layer, right? At least you can be sure that you'll be protected, at least you have enough time to press the reset button from your laptop if it is infected. You can easily create and restore snapshots, you can snapshot your machine. So what you do is you have a virtual box, you have the operating system installed, create a snapshot, Run your malware, you can restart the snapshot, get the snapshot again, then you have now again a clean environment. That is how the VirtualBox works. So it will look something like this. You can have this different virtual image here if you have a lot. So I only have one here. It is Windows XP. You can have

different snapshots. You can go back to those snapshots anytime you want. For example, you're analyzing one malware and then you found another malware that looks more interesting than the other one. You can snapshot on that particular point. You restore the snapshot somewhere here and then run the second malware and then you can go back anytime to the first snapshot. malware. Those are the snapshots. It gets a snapshot of a moment in time where in that particular malware, for example, it is trying to send something or it is trying to infect something. If you put a snapshot, when you go back to that particular snapshot, it will do the same thing where it stops. That is

when you try to execute the virtual box. Okay, so you have now I am inside this one, the host, we call the host the original system installed in the machine. So I have actually a Linux machine and then I install Windows XP on the virtual box. So you will see that operating system now is like your own another computer. It is a computer inside the computer. The purpose for that virtual box is actually, again, to be able to safely execute the sample that you have, the malware. And then, again, what are the tools that we're going to use? There is a website or there's a... Microsoft website that you can you can look for sysinternet that's this internet is a suite of Lots of different tools that

you can use to analyze some other some of them. We actually use that in In our job. Okay to do some analysis some information you can run process Explorer process monitor this one these two usually that's the one that I use Hey when you run the malware, you will see what's going on in the system a process explorer process monitor. It is used also for advanced collection, it's also used to monitor like some disk activities and so on and so forth. This is the process explorer, when you run process explorer, You will see some of the processes here. If you're just starting, you have to try to familiarize yourself with the clean processes in your system so that when you try

to run the malware, you will know that, oh, there's a new process running. It can be malicious. And then these are some of the information that you can gather. Some sessions, some events, some new text and everything. A process, and then you can see some properties here. And this is process monitor. Process monitor is kind of more on a detailed happening on what is detailed events that you can see from the execution of the model. Registries, what are the files being touched, and so on and so forth. Let's process monitor. Later on I will give you some video demo on how those things or some of the things that is going, that will work. These are some of the tools

that you can find in Sysintern. It's like hundreds of, literally hundreds of tools that you can use. But normally I just use those two, the process explorer and the process monitor. If you want to do, those are for when you try to run the malware and you just try to know what's going on. But if you want to go deeper, if you want to look at the hex bytes, look at the binary itself, you can use a debugger. There are three debuggers. We have the Oli debug, x64 dbg, and Immunity debug. Just Google them, you will be able to find the pre-downloads that you can have. If you're going to use Oli debug and Immunity

debugger, they are all actually the GUI is almost similar. The difference between them is Oli and Immunity, if you know Python script, you can actually automate the debugging in immunity debugger okay but if you want to to debug 64 64-bit malware okay you cannot use this tool you have to use x64 dbg x64 dbg is still in development they're still developing adding lots of features and everything okay so normally what i use now is OLI DBG and x64 DBG, but if I want to automate, I just want to see what's really going on, I use Immunity Debugger. I use some of them, it depends on the malware that I'm working on. It is used for code by code analysis, more

control on the malware, you can actually pause on every instruction if you're using a debugger. You will know deeper understanding on the malware's behavior. this is what it looks like oh actually i have this the links okay this is the the codes these are some of the the data in your in the memory of your the of the machine okay you can have the stock here and some of the registers it's kind of more of this is where you need the copy okay you'll fall asleep when you're just looking at the hex bytes. One time I brought my daughter, my younger one, I have my oldest one here, I brought her here so that I can have an audience He didn't decide to go here.

So I brought my daughter one time at work, the younger one, and then I want to impress her what I do at work. So I'm looking at hex monitors. I have two monitors. I'm working on hex numbers and everything. And then when we went home, she said, my job is too boring. She doesn't like it. I made a mistake on that part. Okay, so this is x64. They will all look the same, but they have different features that you can use. This is Immunity Debugger. You can change the color settings and everything. You can download those things too if you want to. Okay, so those are what we call the Static analysis and dynamic analysis.

Now I will teach you the easier way to to analyze the malware itself. Okay using automated analysis. Automated analysis it is actually there already there in the in the internet But you just have to know where to look. Have you heard of Cuckoo sandbox? Some of them heard of the Cuckoo sandbox. Cuckoo sandbox is a is in new version I think 2. something you can download also this one for free but this one you should be able to run it in your using VirtualBox or VMware. There is an online tutorial also on how to set up the actual machine but basically what they did now is if you're using Linux you can just type like the

regular downloading of applications in Linux, you can just have the Cuckoo sandbox. And then when it's already installed, you can just type Cuckoo and then it will just execute. It is used to automate malware analysis To show traces of API calls, it's like you're running your malware in a debugger, but the only difference is your Kuku sandbox will give you a list of APIs that is already working. That's already what the malware is doing. Sometimes it also gives you screenshots of what happened to the system. I will show you two examples later on how it actually executes, how it actually runs. - Does it capture the arguments on the API? - It actually does, yeah, some of the API.

And also, especially now the new versions, they also capture network pickups and everything. So your requirements is to, for the installation, for the host, you should have Python language, VirtualBox, TCP dump, if you want to do memory forensic, digging deeper, you use volatility. This is also free. You can use that to check what is really going on in your computer's memory. And then Mongo database is the one that they use for the database itself. But those ones are actually, some of them are being automatically installed when you do the Kuku sandbox installation. For the guest, when we say guest, it is the operating system in your virtual box. So the host, for example, you have the Linux host.

So these are the requirements. For the guest, you should need Python language and agent.pi. Agent.pi actually came with the Kuku installation. You should grab that, put that in the guest virtual box, in the guest operating system. Running Cuckoo is all you have to do is to have Python, cuckoo.py. It will tell you the information here. It checks for updates if there's an update and so on and so forth. It checks for the virtual box and everything that it needs. If everything is okay, you can now proceed on giving it the sample. This is now your sample. At the command prompt type python submit.py, it is also from Cuckoo, and then this is your malware, or your suspicious file.

And you can actually see it with your browser. All you have to do is to run python manage.py and then run the server. Then you can go to your favorite browser and then you can see what's the behaviors of the malware you're trying to analyze. Overview of the web interface, this is the very first page. If you're looking at your GUI, this one has some of the information of the actual file. It has combinations of different hash, MD5, SHA1, SHA56, and so on and so forth. because sometimes when you're trying to reach for information in the internet sometimes some analysis only have the shadow 56 or md5 sam so kuku sandbox give you all those information okay so sometimes you will get a screenshot what's going on

if the malware shows some especially ransomware you will see those screenshots okay static analysis it will give you the different Different sections is kind of if you really don't look at the at the hex number sex bytes for the files It's kind of hard to hard to get for this one, but you should be able to have a good idea on what is going on okay, so this one is some of the Resources that it use a some of the api's that the malware you see it is giving you everything Cocoa sandbox and Behavior analysis, it will give you a list of information about the registry, the processes, and it is color coded so you will know what you are looking for. For

the network analysis, it will show you the information that pick up wherein you will know where it is trying to connect, what if it is using TCP or UDP. And if you have lots of analysis, for example, you can automate this one too. If, for example, you have hundreds of malware analysis ready, you can actually do a search. It has a search and then it can give you the actual analysis of a particular sample. And then of course there should be a way on where you can actually submit by GUI. You can analyze your file by submitting. You can also analyze URL. For example, you're trying to buy something that you have a website that you

don't know if it is malicious or not. You can actually get cut and paste that link. You can either throw it to virus total or you can also use that to, you can use Cuckoo's Handbox to analyze that particular URL, not just, not only files. Okay, we're going to, I'm going to show you some of the video demo that I have and again, do not try this at home. Okay, for first sample, for the first case study that we have, okay, we're going to create a A simple malware report, something like you're doing some job or something. Okay, we are going to check if you run the malware, what are the visible symptoms, what are, if it drops

some files, or to check if some registry, there's some registry modifications. Oh, it's not working. So this one I'm going to execute, I mean, my host is Linux. I'm going to execute Cuckoo sandbox. Okay, so that's what happened. It will tell you if it has some updates. check for updates and check for the machine. But be sure when you run this you already have guest operating system in your virtual box which is waiting for you there. So I have this on that side. I have an available Windows XP. And then I'm going to run now the sample, submit the sample via the command interface. You can just drag the files and it should execute. It will execute, it will open the virtual box

run it, this one is actually the agent.py, it is actually part of the, it is the one communicating with the host, with the host cuckoo. And then you just have to wait till it executes, you will know, you'll be able to know if the setup is ready, or the execution is already done. Sometimes you can set up also the time, for example, just want to execute that five minutes or so, you can also do that. okay and execute that one see you will be able to see there's actually a drop page here okay you can see now that the malware it is actually a banking malware wherein it when you execute the malware it will see a picture, an image. And then,

of course, it is a trojan wherein when you execute the model, you might think that it is just an image. So when that particular image is actually shown, behind the scene is actually a banking Trojan wherein it tries to grab your password, your banking information and so on. And then if you happen to have that, you will think that it's not actually doing anything because that's the only thing that you know that you have. But by running it in Kuku sandbox, later on we will know that it is actually doing more stuff. So when APR is done, you will just see that the guest operating system will close down. And then you can actually look for the report.html

and then now you will be able to see those things that I've shown you earlier. So this one is the information about the file, the hashes and everything.

There's actually a community signatures wherein you can download that one. So if somebody already analyzed the samples, you'll be able to know that it is a malware or not. So you will be able to see that even though you didn't, for example, you automated the execution, you'll be able to see the actual screenshots even though you didn't see the actual execution if you automate the process. You'll be able to see the different sections, should be able to see also some network analysis there, here. Some draft files you will know, some of the libraries that it use, okay? You can browse those things. See, this is the actual image that was dropped earlier, okay? And then actually, after dropping that image, then

the process, the actual banking information gathering is happening. It actually drops different files which we didn't see. This one is actually dropped. This one. And then this one, it uses some of the other files inside your Windows system. You can click those things, you will be able to see some of what happens to those dropped files. So when the malware drops another malware, another file, Kuku Sandbox will be able to see what's also going on on that particular one. So this is the behavior that's going on. You'll be able to see that it is doing some decompression because the malware is actually compressed. It is now going to look for some registry information, some attributes. It is checking your files, checking

for getting for the attributes. So you can just browse now. You don't need to actually go for your debugger and everything. You will be able to know what's going on. So you will be able to see when it dropped the copy of itself, a drop copy of the other malware. It also created an auto run registry. You'll be able to see that. You can automate that one. You can make a script in order to search for the report. You can make an automated malware report where in coming from the result given to you by the Kuku sandbox itself. You can now actually create your own malware report just based on this output. Those are the registries

again. And again, it is color coded. You'll be able to know what's really going on. So after having that particular analysis, so what happened is we just executed the malware inside our Cuckoo sandbox. Now the question is can we really create the malware report out of this? Let's see. We're going back to the part where we're going to create the report. First, what are the visible symptoms? It displays an image, of course. We saw that. But if you made the automation, you will not be able to see, but you will be able to see on the result. It drops two files, which is the .file. .file is actually a copy of itself. And this is the image. It creates the auto-start registry. it

actually checks for anti-virus and security applications if it exists. And it does actually something more. So for recap, for that one, it displays an image, it drafts the files from the result that we have, checks for security apps that we have. Oh, what happened? Oh, I skipped that part. You should be able to see some of the strings that should be part of that one. okay and so on and so forth i have now another example here for another case study so when we run the first first sample what happened is we saw a lot of things right what happened is what happened if you run a sample but you don't see anything okay again we're going to

check for visible symptoms draft files and some registry modifications okay and we run the malware same setup different malware, but now we're going to use the GUI. So we are going to get a sample. Anybody knows Petya? You will know what's going to happen here. Okay, I submitted a sample. It is actually Petya. Real-time, I activated the malware. It executes the malware. Earlier, the first sample, we have the image. At least we have a hint that there is something going on. We executed the second sample. We were waiting for the result. We don't even have the... What's going on? There's nothing going on here. But it closed down the machine. So you run the sample, you're waiting for

some symptoms, it didn't give you anything. It closed down the machine. Does that mean it is a clean sample? Let's check again. We now check for that sample for the behavioral analysis. There's an error analysis. It has timeout and everything. It gives you normal API calls. It doesn't give you visible symptoms. Nothing. Nothing's going on. It's just regular reading. But this one, you will see that there's some reading from physical drive. Physical drive zero and everything. What is physical drive zero? Anybody? Physical drive zero is actually your very first hard drive. That means there's something going on. You didn't see it when you ran the malware but it is trying to access physical drive 0 but it won't be able to see

anything. The thing is what happens during that time. Because Cuckoo Sandbox gives you all the information it can get even what is happening on the physical drive but this one, look at this. This particular API is actually the one that restarted your machine. The malware itself restarted your machine because it will actually execute something. But during that time when you're trying to analyze the malware, you get the sample, you execute, nothing happens, move on to the next sample. That's usually what happens. But this one in particular, the malware author knows what is actually going on. That's why it tries to trick you that there is nothing going on. It restarted itself. So no initial visible symptoms. no drop

files, no registry modification, but you've seen this. That's the only thing that we've seen in the result. And there's a weird API. At first, I didn't know that this is actually the restart. You have to use Google too. It's also free. And then suspicious? Now, we try it again. We didn't restore the snapshot for the Kuku sandbox. Do not restore if you're trying to want to do it again. What happened is, if you're going to execute again the same malware, just continue the execution, the one that we've left. out earlier. So we did it again, we execute the malware, nothing happens, it's just doing the same thing over again. It just happens that it looks suspicious for us because

of the NT-RACE DLL. And then we're waiting again, nothing happens, the same thing. Usually when we try to execute, try to execute again because if it is suspicious, if you think it is suspicious, try to execute again because it will try to run or setting up something. So it turns out, it shutdowns itself, you can actually restart the machine without restoring it. Click restart.

It looks like it is now checking something. It is actually the check this, like information when you try to run the check this, it will try to check if there's something wrong in your hard drive, but this is actually a fake check this. That counter is actually, it is counting the sector that is trying to, it is trying to encrypt. Oh no, no, this one is the MFT. MFT, the master file table where it is trying to actually encrypt. I know this because I've been analyzing the sample already, right? So this one, okay, it's doing that, jump to 100%, should be jumping to 100%. And when you restart it, you will think that, oh, there's actually nothing

going on. But what happened is, why it is checking your hard drive, right? And then it restarted again. There's totally nothing going on. You can, for the second restart, you can just move on to the next sample. But after the second restart, you now have your visible symptoms.

It is actually in the wild for a very long time. - When you restart the cache, is it still running inside the-- - So the very first time we ran the malware, it shuts down, right? But as long as you don't restore any snapshots, that particular system is still there. It's like you just turned off the machine. - So the Koo sandbox is still monitoring? It's not monitoring anymore because it is already turned off. That's the thing. Once it turns off, it removes the connection between the first agent that UI to the original host is connected. So it is already shut down. There's no more monitoring. Okay? So that is one. So the malware author knows that you're trying those tools. So it restarted. You can

just move on to the next one. And then, When you execute that because it's suspicious, you executed it again and then it restarted again. Right? And now it is now asking for the Bitcoin payment and everything. Petya was in the world for a very long time. It has different versions. You will be able to hear like Satana. You'll be able to hear like not Petya, actually. There's also different kinds of infection. Actually, the malware author itself, he gave away the... the code to to unlock the system the problem is the problem with ransomware ransomware when it is when you're infected with ransomware okay for example you have your machine at home you're infected with ransomware you will be

able to have something that you will put the payment You will put the payment and then if you are rich, but we are not suggesting you pay because if the malware authors know that you're paying, they will just keep on infecting you because you're paying. But if you don't pay, the only problem you're risking, if you don't have any backups, you're so dead. So ransomware, normally you have the screen wherein you can put the Bitcoin payment or you can do something about it. Petya infection, it infects your master boot record. When it infects your master boot record, there's no other way to use the same machine because it is now waiting for the code to

decrypt. So all you have to do is to use your neighbor's computer again. There's a link that you can actually try to pay for the ransom. If the malware author is nice enough, they will give you back the code. You can put now the code to your infected machine. You see the process? If you have two machines, it's fine because if the other one is not yet infected, you can pay for that if you want to pay. you can unlock the system. The only problem is, have you seen the checklist earlier with the percentage? If you stop it in the middle, what happens is it encrypted some parts of your master file table, but it didn't encrypt the whole thing, right? For example, you stop at the

50%, you still have the 50% not encrypted. Now, you pay, it is now going to decrypt the whole thing. What happens to the not encrypted? It gets corrupted, it is now encrypted with the supposed to be the decryption key, right? Sometimes, really hard to know what to do with the malware. Sometimes you're so scared, especially the regular home users, so scared, turn off the machine and then you cannot, and then they decided to pay and then you will not be able to actually get your computer back. Okay, so now we have The second result, we know that it is suspicious, it is actually the master boot record ransomware. We now have these modifications, we know that

it makes us suspicious because of these things, because of the anti-race hard error function. Then after restarting the machine, we now have the fake check this, and then the visible symptoms is of course this one. okay the ransomware run somewhere not normally there's always a note for the ransomware okay But you cannot have everything if you want to. Sometimes they give you a code, but it already laps and everything, but you won't be able to get your computer back. So those are the first two samples that we have. Cocoa Sandbox now is in 2.05. That's the latest version since December 2017. If you go online, you want to get Cocoa Sandbox, you'll be able to see this.

And then there's now a different, you will see the version and now it is now using the current working directory. You can now use different directories for your Kuku sandbox. This is how you run now the web servers. I think it's kind of similar to the very first few editions. And this is some of the GUI that you have. It has now the circular graphs and everything. And then this is where you get your information, static analysis, all the analysis that the Cocoa Sandbox is going. They're still developing Cocoa Sandbox. Okay, if you really want to know more on the malware analysis thing, malware research, we have the security blogs. You can always go to different websites.

Do not forget blog.portnet.com. Another vendor pitch. You can go to virus.total if you have a suspicious file, even PDF, PDF or Word, you can just throw it to virus.total. And then if you really want to do it on your own, get the VirtualBox, get the sysinternals tools and debuggers and everything, and install Cuckoo Sandbox. And again, of course, do not ever, ever forget this. That's why it is nice to have lots of neighbors. If you happen to go to the first neighbor, they get mad at you, at least you still have some. And that's it. Thank you. Any questions? Are there any NOMA where that protects the two sandboxes? Yes, there are always some malware that will detect if you're using tools, if you're using Koko Sandbox. I

haven't actually, personally, I haven't seen one, but there should be another one. They can actually detect if you're running VirtualBox because VirtualBox will have like virtual hard drive, virtual processor and everything. So those things will... is a plug for the malware to check. If they know that they are there, they are inside, they are being analyzed, usually what they do is they stop working or they terminate the tools that you have. Sysinternas usually, if you're running Sysinternas to monitor the system, they can check that and then they can terminate the tools that you are using. There's always something like that. - What are we going to do with those? - What we're going to do

with those is you have to submit it to the expert. You have to submit it to us. If you're really suspicious about something and you think that it will actually Normally if it is really a very malicious, very in the wild malware, if you throw it to virtual, a virus total, you will see lots of red detections. But if it is like a simple malware, not very. Usually if you have the minor malware, if you go there, you will only have like three out of 60 detection and everything. So that one, it will not gain any popularity, so probably not very strong. Question?

So you're asking where to get samples? Because I'm working at Fortinet, it is against our policy to tell you where to get samples. Sorry for that. I cannot give you the answer for that in your work. If they found out that you're actually the one that execute the malware, there can be some... If you intentionally execute the malware, of course, it's like you're destroying... public property, destroying your company properties and so on and so forth. It's like you know that a fire will burn something and you use that fire to burn your office stuff, computer stuff, then you'll be liable for that, right? But if you happen to have a file but you thought that it

is actually, or for example, you got an email coming from saying your supervisor and everything and then you have to click it but it's actually a spam, you might not be liable for that because you thought that it is legit, right?

That I actually cannot tell because usually for our work we actually have separate... So our machines, we have those virtual machines that we can run and then we have actually a network that is actually isolated that you can't, it cannot really go out and we have firewalls surrounding it. So you cannot really, if it goes to your network lab and everything because of our, because that's the nature of our work, okay? So it is, we are actually we are actually very careful on those small words that goes out. So we have the sample, we run it on a secured environment, but if you're doing it in your office, which I don't suggest, or in your

neighbor, make sure that you're not really... So if you try to execute that, it can be liable, but if you really want to do it on your own, you have to really set up. There are actually lots of articles and tutorials and everything how to have at least a secure environment, or you can do layered virtual box. So at least, and make sure that you're not connected to the internet at least. At least you will be able to see not the network part of the analysis, but you will be able to see what's going on in the malware. Or the safe bet is try to just throw it to Barstotal or throw it to your

security company support and everything. This slide is supposed to be available. Yeah, I will be posting it. And it is actually on the live stream too, you can just go back to that too. But I will be sharing this. Or you can just send me an email. I'm not sharing my email. Do not send me something. Any more questions? excuse me please click this link oh yeah it's always like that please click this link and this is like uh you will get some money out of this and everything yeah thank you very much for staying staying away thank you another question so uh i'm working in game The virtual machine usually don't have crappy video driver. So we want to sample something that looks like a

game. It won't run on a virtual machine. Are there any virtual machines, commercial or whatever, that has a better video driver support? The virtual box, the host, the guest machine should be able to use the one that the host is using. So if you have a good driver, a good BG driver, you should be able to use that. Okay, I'll try that. Last time I tried, I can't run anything. I have one question. You said some devices escape, you show a box of a BIM. If there's no sharefiller, no... connection network connection or whatever to your host. Do they still escape through like doing some sort of exploit to their virtual drivers or something like that? Yeah, they can. Usually what happens is because if they found

out that they're running inside the VMware, some modware actually can escape just even though you don't have a shareholder, they can access your - Alright, I think we're gonna start early. Basically this is I did this presentation internally at the company I work at and it took an hour. So this is a 30 minute time slot and we're trying to keep it to that. 35 slides and 30 minutes, so it's gonna be fun. So a little bit of intro. My name is Alex Parsons. I work at Strauss-Friedberg, which is a digital forensics and incident response company. I live in Seattle, from Pennsylvania. I used to own a Windows phone, so that was a mistake. But I did for about four

years. Internally within the company, I'm basically the guy people go to to talk about Office 365 or Microsoft. When I was in college at Champlain, I wrote a paper on Windows 10 forensics, which I think was like the first paper ever published on that. I cheated because Windows 10 wasn't out yet. Yeah, so this is me before I had friends with DSLR cameras. And this is me after DSLR cameras. And this is my peak, playing putt-putt, you know? And obviously opinions here are my own. They are not representing Strauss-Friedberg at all, although I am working for them. And my Twitter is @parsonsproject. So we're gonna go over some basics. The basics of what Office 365 is, but we're gonna try and breeze through that. The basics of a

compromise, and we're gonna go over a basic phishing compromise, and the logs around that. as well as the post-incident process and the incident process itself. In general, it's just about learning from my pain and suffering because as a consultant, we go through Office 365 cases pretty frequently and we have a lot of pain points with it that we have to get around. And this assumption is not that you have a SIM in place because a lot of clients that we deal with do not have SIMs in place yet and sometimes you have to acquire the logs manually. So that's what we're going over. This slide is mostly for after, afterwards when I release it. If you just want to go through the slides, this slide is

great. But we're going to go through these items throughout the presentation. Oh, and also the biggest TLDR is just enable multi-factor authentication.

All right, so what is Office 365? A lot of people think Office 365 is just mail. It's not. Back in 2010, we had Exchange servers and we had SharePoint servers and we had Lync back in the day, or Microsoft Communicator. And Microsoft was like, "Well, why don't we make this a service, make people pay every year and we'll handle the cloud?" Because that was the big deal in 2010. A lot of people also don't fully understand what SharePoint is entirely and it's important because when you collect data in Office 365 environment, you have the option to choose things from Exchange, SharePoint, and other sources. And a lot of people don't realize that one drive for business is SharePoint. So that's a common thing. It's pretty terribly designed.

I've yet to see a well-designed SharePoint site. SharePoint can do anything. Can it do anything well? Maybe OneDrive for Business, and that's the stretch. And that's Microsoft, that's not even anyone's company. Office 365, does it actually do anything interesting and how has it progressed over the years? And they do keep enabling more services. And surprisingly they are starting to, when they roll out services, also log them as well. So it's important to know that they're increasing the applications that they use. One of the more interesting from an attacker perspective is Microsoft Flow. which not many people know about. Microsoft Flow is basically an if this then that solution for Office 365. I think that the idea is that you could do things

like every time my boss emails me this, put a calendar notification to do this. Or the scarier things are every time I put something in this one drive folder, put it in my personal Google Drive folder, which as security admins we do not like that. But attackers might be using that in the future. I haven't seen it in a while, but something interesting. Microsoft Teams is a newer application they use. It's a Slack copy. Those are just some of the things of what Office 365 has. So I talked about SharePoint, so you're all asleep. So here's a fun fact about Seattle. Seattle has more cranes than any other city in the United States. So if you want to see a lot of really good cranes, come

over south of the border and come to Seattle. This is actually a picture I took outside of the Amazon biodome. It's ridiculous how many cranes there are out there. So, okay, good, you can read this. So I had to adjust this slide because it was way too small with this projector. But we're going to go over the basic fishing life cycle. So in general, I think we all know that when in a phishing life cycle, and let's say we're gonna use the example of a wire fraud. It starts with the attacker sending a link to a bunch of users, and they click on the link, and it looks exactly like this screenshot here, where they'll be logging into their

Microsoft account, but it won't actually be their Microsoft account, it's a phishing site, you enter credentials, they get your user's credentials, and then they log into your account. Once the attacker gets into your account, they then look at some of the emails, try and figure out where they want to go to next. Because let's say their goal is wire fraud, they want to try and get to the people who can actually perform the wire fraud, or who can ask for money from clients or customers or whatever. So they go around there, but the interesting thing is that they then use the trust of the user they just compromised to send additional phishing messages. And they

also create mailbox rules. So basically, you will have a mailbox rule that will be hiding the emails that you have because they have control of the account now. They'll say, "Oh, delete all emails that say you've been hacked, wire fraud, or anything that matches the email itself." So, yeah, so once you get to the user that they want to compromise and start asking their customers for money, they will also add mailbox rules and they'll ask for the money, get the money. And the first one up here is a repeat, but the one on the right is a continuation of that. After they get the money, or maybe they fail to get the money, they'll try to pivot and try to just send the phishing emails out to

everyone and to customers because they basically spent all that they could. They know they're discovered, so they might as well try and move to other customers. And this is an example, God, you can't even read that, but that is a mailbox rule just named random letters. That's something you'll see with phishing attacks quite often. And the text on the right is actually directly taken from what you'll see when you output new index rules. And we'll get to how you acquire that in the future. In most instances, start with this, with an email that says, "Please wire this money to this location," and you're sending it to your client. And then the money gets sent, and then you might call the FBI, or you might call Strauss-Friedberg, and

be like, "Help." And then we have to go from there. The scenario that we're going to go through right here is this scenario that you've been phished. The client wants to know, or your company wants to know, what do you do? And does anybody want to guess what's the first thing you might want to acquire or make sure that you have going forward? Like the first thing you do when you are breached, what's the first thing you should collect? First thing is you want to make sure the mailbox is preserved. Because the time to live on when the attacker gets in the environment, they often delete the message immediately and they try and purge it. And by default, the Office 365 environment has

only 14 days to live. That can change if your environment does increase that level, but you might want to take a picture of this. So this is the time to live for default logs in the environment. It totally differs based on the plan that you're on, but in general these are the default options. You'll notice that Exchange Audit logs by default are not even logged, which you would not expect.

Yeah, I'll be sharing it. Yeah, if you follow me on Twitter, I'll have a link to my Twitter afterwards. I'm going to try and find a place to share it, and I'll post that. So I don't want to go into the details of how to acquire it, but you can all Google. You can, you know, if you have to do this, you can Google on how to follow through Microsoft's instructions. Where I'm going to talk about is where you actually might have to, where you might have to Read Microsoft's instructions try to follow through and realize that their document their documentation is terrible Because that happens more than you think And the other interesting thing

that they don't really they kind of document but like you know Mostly everything in office 365 in the admin portal works in Chrome but the exception is if you are downloading mailboxes and Like I said in the last slide the first thing you want to do is you want to make sure that you have a hold on your mailbox And the reason is because of the 14 days to deletion. If you place a hold on that mailbox, Office 365, even if they delete the messages, will make sure that they hold those deleted mails. So if your phishing attack is within 14 days, you might be able to get the original phishing email as well as

the fraudulent email. Kind of. So there's a difference between litigation hold and in place holds. And I think at the end of the day, they both kind of have similar functionality. So if you have a litigation hold instead, your data will be preserved. It gets really complicated, but yeah, I don't want to get too much into that, but yes, you're correct. So that's placing the mailbox hold. Important if you want to make sure you're keeping your email. But I don't know if any of you attended the pie talk, and that mostly revolved around the mailboxes themselves, so reading the mail. We're going to focus around the sign-ins and the logging around Office 365 so you

can actually see indicators of compromise, not just how they were compromised, but who is compromised, because oftentimes it's not just one user. So, the first thing to look at is actually the Azure Active Directory sign-ins, which every Office 365 environment supports. Now, your mileage will vary drastically. If you're on a default plan without any Azure AD support, I think you have 40 hours of sign-in logs. But what's nice about these logs is that one thing when you're starting an investigation is you want to take all the IP addresses and find the locations and look for that one Nigerian IP or something that looks weird or out of place, right? So Azure AD will actually allow you to see the locations and filter through that. So

if you can acquire all of those logs, that's incredibly helpful and it'll get you some quick wins. And you'll see on the top right here where it says, well you can't even read that, but it says conditional access, MFA server, users flagged for risk and risky sign-ins. Clicking on those links will help you find, it could help you find compromised attackers because it will show you mailboxes or logins, like let's say you log in in California and then another log in is in New York two hours later. You can't realistically travel between those two points, so it'll flag those things for you, which is pretty nice. All right, so The other thing you want to do, because you

want to acquire the logs and make sure that you can analyze, but when you're frantically panicking, you want to make sure that the attacker's actually out of the environment, like this raccoon. I don't want a raccoon in my house. So you want to check to see if any mailboxes are forwarded, because as we discussed, a common thing attackers will do is they will modify the inbox rules. And, you know, if they're kicked out of the environment, those inbox rules often persist. I wouldn't say they are 100% going -- I mean, if they're forwarding mail, that's a problem. But if it's just a rule that says send anything from -- that I'm sending with a fraudulent

email or says I've been hacked to deleted items, that, you know, it's not going to ruin your environment. It will just be inconvenient. And you also want to look at your last password change info because immediately what you're going to do is you're going to be looking for the users that are compromised and looking at when is the last time they have changed their password. That way you know whether they're still vulnerable or not. So audit logs. First we'll start with what audit logs look like and based on this screenshot they look like unreadable text but basically There are three logs right here. Can you guys actually read that? Because from here I can't. OK. So you see there's mailbox login, file previewed, and user logged in.

So two of those three logs are logged by default. Does anybody want to guess which two? If you were going to implement something, the file previewed is someone clicking on a OneDrive link. Anyone want to guess? What would make sense? Do you want to log your mailbox logins, your user logged in, or every time someone clicks on a link and looks at a file? It was a default, yes. But does that make sense? No. So the weird thing is that at Microsoft, they will give you every single file you look at in OneDrive or SharePoint. But they won't, by default, tell you every time you log in with your mailbox, which is insane. Like, no,

that doesn't make sense to me. Yes, yes, you hit the nail on the--it doesn't make sense from our viewpoint, but yes, Microsoft is a huge company. Yes. Which that--that is a good question. I'm actually going to get to that later. So this is basically--this slide is just about explaining exactly what the audit logs have. So it's everything from, like I said, accessing files to, you know, logging in with your mailbox if you have that enabled. And it's all about user activity. It's about things that the user does. It's not the mail messages themselves or anything like that. But it's incredibly useful because you can't really see it right here, but you get IP addresses in that. So right here, speaking of the audit logs and what's

logged by default, that's not terribly readable. and I think this will be more useful when I distribute it online, but these are what's not enabled by default but you actually can enable. And you'll see within here there's something called message bind, and the description is, "An item is accessed in the reading pane or opened." Now that would be great if we could see that, but unfortunately you have to A, you have to manually enable it, but B, it only works with admin accounts, and you cannot turn it on for non-admin accounts. You just can't do it. Now, there are ways to gain access on every single item that a user reads or views, but that is part of Microsoft's telemetry department.

And Microsoft support staff will tell you it doesn't exist and there's no way you can get it. Internally, we kind of have a template that we give our clients to try to push through support to get those logs. But it will take you four to six weeks to get maybe one or two users' worth of those logs. So only do that if you're working with a PII case or if there's an impacted business. That's what Microsoft will try and tell you. So, on the right, the column on the far right is really what you probably want to enable because the majority of phishing attacks do not deal with admins. Some definitely do. But if today you want a command to enable to actually enable audit

log, mailbox audit logging, this is the command that you need in PowerShell. So you just log in with the Exchange PowerShell environment and you add that. That's This is specifically for your main users, not your admins. So while we were talking about audit logs, this is an example of just-- so you see how there's the audit data on the right here? This is one line of that. So the audit logs are messy. If you are not importing it into Splunk or something like that, you're you're gonna have a bad time. You can view it in Excel, it's a CSV when you export it manually, but it's frustrating. It's like double nested JSON. And it's incredibly difficult to parse. It's not

like you can just decrypt it with, or not decrypt it, decode it with PowerShell, like convert to JSON. It will only get you part of the way there. All in all, it's frustrating. try to put it into a SIM if you can or Splunk or something that allows it to be parsed better. But here you'll see that this is just an example of mailbox login. You get the IP address, you get the user agent, and a lot of other information. And then here's an example of the file previewed action that we were looking at. What's interesting about this is it tells you exactly the document that they viewed, as well as the user agent and the IP address. So when we talk about pivoting, what you want to

look at is you want to see You first want to try to find countries that seem suspicious to you. In this example, I think I actually just typed a random IP address in here because I'm not using real client data, and it turned out to be Brazil, which is really convenient for me. You run into Brazil, Nigeria, and other places. It's really all just a proxy. So, you want to look for those IP addresses, maybe get a list of IPs. The IPs change all the time, so you can't use that as a total indicator, but if you can run a tool that will look at the IP addresses and map them to countries, that will help you get part of the way. And you're

also identifying common user agents. Obviously it's not 100% effective, but if you know they're running Linux and using some weird version of Chrome or Chromium or something, that might help you. And this is just an example, which again is not very readable, but this is just an IP address of a proxy site based in the US. This 100% happens. Just because you see Nigerian IPs or IPs from Brazil does not mean they're only using those IP addresses. So just be on the lookout. You actually want to look at those IPs and do a Whois lookup and see if it's a cloud-based company. And then when, again, you can't really read it, but that's a Google result for the exact place that it resolved to. And Vulture

Holdings is, it says here in Google, fast-growing cloud service provider. So that's a big tip off that it's probably an attacker. So here's another fun fact to keep you all awake. This is the precipitation of major U.S. cities, including Seattle and red there. Most people think Seattle is the rainiest city in the U.S. It is not. For comparison, here is Vancouver, Montreal, and Toronto. So Vancouver is rainier, fun fact, and Portland too. Is that the right Vancouver though? Ooh, that's a good point. No, it's not Vancouver, Washington. At least I don't think so. You're making me doubt myself, but I'm pretty sure it's VC. Okay, so acquiring the audit logs. The first two things to know is never trust the audit log GUI.

The second thing to learn is never trust the audit log GUI. And does anyone want to guess what the third one is? Close, it's never ever trust the audit log GUI. Always acquire audit logs via PowerShell if you can. Because when you acquire logs via PowerShell, there's like an index number and you can actually compare that number with the amount of logs that are available. That doesn't mean it's not frustrating, but the frustrations with the GUI is that it will only export 50,000 logs per request. It won't even tell you that It won't say, oh, we didn't get everything, but here you go. No, it just gets you 50,000 logs, and you're probably missing a quarter million, depending

on the size of the company. And sometimes if you... Sure. Yeah. Well, yes. I've seen that as well, but in other environments I've worked in, like I worked with a client where they didn't have admin audit logs enabled, and also they're not enabled by default. the Office 365 audit logs. So make sure you enable them. You just log into protection.office.com and then the first thing that pops up, it says, "Enable audit logs," and then you're good to go. I also interacted with a client where they didn't have it enabled when we were on site. And when I went into the GUI just to peruse around, and it wouldn't let me search from before when they turned it on. But then when I went to the

PowerShell, I actually just, I was like, come on, let's try just doing the past 90 days. I got two weeks prior to when the audit logs were enabled back. So because of that, we were able to actually get a lot of good data to do the IR. I mean, without these logs, it's really difficult. Let's see. Yeah, and sometimes it just won't get you all the logs, even if it's less than 50,000. And this even happens in PowerShell. Sometimes in PowerShell, you'll do the exact same command and you'll get different numbers. It's terrible. It's a forensic nightmare. But just try and double check your results. So this is what the audit log GUI looks like. It's a little pretty. It has IP addresses and user logon activity

and all that. But this is what it looks like to me. I've painfully learned that it will lie to you. You will think you have everything and you don't. Use the PowerShell module. It's not perfect, but it's better. But the PowerShell module is also another pain point because you can only export 5,000 to 10,000 logs at a time if you want it to be kind of indexed correctly. And if you try and script something that'll say, "Okay, I want the first week, and then the next week, and then the week after that." you could be throttling Office 365. So if you try to get too many logs at the same time, then it will throttle you and you won't get everything. So you kind of have to throttle

yourself and there's actually a script that Microsoft developed called, it's not like an official Microsoft script, but it's called the robust cloud command. And I have a link that if you download the presentation you can click on. And that kind of throttles the request for you so that you don't have to deal with that pain. Let's see. Yeah, I think I went over everything else in this slide. So some useful audit log searches. You can actually search by IP. And you can even search by IP subnets, which can be really useful if you know certain subnets that are malicious and you want to find them. All you gotta do is add a star and it'll

search for it. And this is all within the PowerShell GUI. I'll probably post a link when I actually publish this on how to actually connect to that shell. Basically it's like you're going into an Exchange shell environment and all these commands work in it. So the other thing to look for is if you have it enabled in Exchange, if you have Exchange audit logging enabled, then you can also get the mailbox rules, which remember the mailbox rules are how the attacker is hiding their activity and it's a really easy way to find additional users that are attacked. So just going over what we've gone over so far is that Assuming we have this data, we could figure out what users

were compromised based on IP addresses and pivoting on that data, whether there are mailbox rules in the environment, if you review those, what mailbox rules they have created, and the only things that we have left is that you don't know, we aren't even on the email analysis. We don't even have the phishing email. We don't even have the the fraudulent email, or we might have the fraudulent, because that's how the incident starts. In my opinion, finding the users that are compromised is the number one priority. And you can search all day sometimes, and maybe it might not even be a phishing attack that got them in. Lately there are, I don't want to say brute

force, but kind of brute force attacks where they will send a new password every 30 minutes and just keep going for default admin passwords, like help desk 2015 or 2016 or something like that. that a lot of organizations use. And sometimes they do get in that way. But I'd say phishing is definitely more common. So we could automate this, right? Like that's what we really wanna do. We don't wanna run these commands, right? So luckily there actually is a tool out there that doesn't, it doesn't automate the collection of all logs, but it does create some really nice reports and it's called Hawk. It was made in December 2017, made by Microsoft support engineers. and they will, remember when I was talking about how the audit logs will export

the IP addresses, but the actual audit logs, the Office 365 audit logs, do not give you the location, whereas the Azure sign-in logs do. But what this tries to do is it tries to actually resolve those IP addresses to locations for you through the audit logs. It'll also, depending on the mode you're going in, it will even find the mailbox rules that were created in your environment and it'll even try and find more suspicious mailbox rules that are sending things to folders. And it's less common now that attackers are just blatantly forwarding things. Sometimes we do see that, but it's more common that they are in the environment and they're creating rules that just send

it to another folder, to the junk folder or to the deleted folder, something like that. So, it's a good tool to use, but if you want to make sure that, you know, down the line if someone says, "Okay, well this person was compromised, what did they do?" You want to make sure you collect those audit logs because if you want a more granular level, you have to look back on that. I think ideally if your environment's small enough, collect all the audit logs in your entire environment. But if it's huge, the best thing to do is probably collect all of the audit logs of the known compromised users. So, these five steps are literally all you need to start a Hawk investigation. It's incredibly simple.

If you're running, I think, PowerShell 5 or later, so Windows 10, you just need to run install module name Hawk. It'll download it for you, then you import that module. You connect to Exchange via PowerShell. That's probably the most number of command lines. I think it's like four. And then you can do the start hawk tenant investigation. And that will, the tenant investigation will do a good job at giving you an overview of what you might have in your environment. And then when you find compromised users, the start hawk user investigation will give you more granular details. Like it'll actually give you the exchange audit logs and some of the logins as well. So hawk is a pretty great thing.

So just to recap some quick wins that aren't immediately obvious. If you have Azure Audit logs or Azure AD sign-in logs, then go to portal.azure.com, go to the Azure AD section and try to look at the logins. And just to be clear, the Azure AD logins are about the same thing as the Office 365 logins. Like if you log in via Office 365, they're also sending those logs to the Azure AD logins. If you're Azure AD, I don't want to go, I don't want to explain all of Azure AD, but Azure AD has capabilities to be the sign-in provider for like Workday and plenty of other providers, or even Google. So it's a single sign-in

provider and you can see other organizations, or not organizations, but products and services that your users have logged into if you're using single sign-on that you wouldn't see in the Office 365. So for finding the phishing email, normally it's within five days of when the fraudulent email occurred. So to be clear, the fraudulent email is like requesting the wire transfer, and the phishing email is the email that says, click on this, or John has shared files with you. And oftentimes that will be deleted. And if you want to still find evidence of that phish, you can try going in the trace logs. And the trace logs are a little annoying to get because you will run

a search and then it just, no matter how simple or how large it is, it will take hours to respond to you, even if it's just a one hour search. So the trace logs can get you, it's basically the metadata around every single email that you get. So you can see the exact IP address that it came from and the metadata around when the email is sent. Also, you might want to check out Pi because I was just at that presentation and they seem to have a pretty solid way of finding and remediating the phishing emails, which isn't something we're really going into. We're talking about how to find badness. And for the fraudulent email, the good thing is that in some Office 365 environments, every email you send

internally will have the X originating IP address in the header. you actually will get the IP address that sent that email in the header. So my process is a little specific to me because I have a bit of a forensic background. But what I do is I process, I download the PST, the mail file, and then I will process it in xways, and then I'll export it as an email, because as an email files, it will be searchable in clear text. And then I'll search for, I'll search for IPs that I know are malicious, but also I'll run, we actually have a program we use internally to scan for all the IPs. So we'll scan all the IPs, scan the IPs and get the location of it. And I'm

sure plenty of you can script something together that does that as well. So that's kind of, and that process has worked pretty well for me because in fraudulent emails you often know the IP address. Not all the time, you might have that disabled in your environment and then you kind of just have to look around the time frame that you want for the user that you know is compromised. So preventative techniques. Obviously enable two factor authentication or multi factor authentication. We wouldn't be having this talk if more people did that. But understanding that larger businesses, it's not as easy as others to implement. But if you wanna look at solutions that could help you, specifically within Office 365 and Azure AD, Azure AD conditional access is

kinda cool because you can When I was talking about, remember that little panel at the top that says risky sign-ins and users flagged for risk? You can actually put rules in place that say if they are at a high risk level or a medium risk level or a low risk level, depending on whatever you want to do, you can make them change their password or you can force them to sign up for two-factor authentication. Now, you could always say, "Oh, well, the attacker's just gonna change the password," or, "They'll get in, they have two-factor." Well, the difference is that the user will know pretty immediately if they're locked out of their account. So it will

help, I mean, by them notifying and saying, "Hey, someone changed my password," you're going to be notified pretty quickly that they've been compromised, simply because the user has to do their job. And with Azure AD Conditional Access, you can also blacklist and whitelist certain IP subnets that you find, which is pretty nice.

Yes, yes, that's the other thing with conditional access. You can say, I want to block everything from these countries or, you know, if, I mean, impossible sign-ins are another aspect of the risk level as well. So it's, and they also take into account IPs that are from known proxies as well. You remember how I was saying it's hard to find proxies? I guess Microsoft does a lot of research to try and find those proxy websites to assign risk levels. And this is just an example. We tested it in our environment. A client wanted to know a solution that they could do. It does require Zer Active Directory Premium T2. That's a lot of words. Microsoft's not very good at names. But it worked when we tried it against

a Brazilian IP that you saw earlier. Wow, I did that in 30 minutes. When I did that internally, that was an hour presentation. So I'm sorry I rushed through that. If you have any questions that's your chance. Yes exactly yes. Mm-hmm. - Yes. - So I guess the, so Hawk will, I mean it's analyzing the account access itself and the mailbox aspect is only a portion of it. So you'll notice that it also has, I don't know if you can read it, but it says like investigate impersonation rights or role assignments. So it actually does look at permissions that have been changed as well. So it is kind of a holistic view. A difference is that when we talk about OneDrive

or SharePoint, I guess the answer is no, it's not collecting every log related to SharePoint or OneDrive. If you want to see, you know, is somebody access this that they shouldn't have, Hawk is not gonna get that answer for you. The answer is to look at the audit logs themselves and acquire them via PowerShell and then parse it however you want. And speaking of just OneDrive, lately I have seen attackers as a phishing tactic, creating documents in OneDrive, because people will trust a link from, I mean, it's your corporate OneDrive, you're not gonna click on that? Like, come on. So people will click on that, and it will then have a link that takes them elsewhere. But it's an interesting tactic that they're now starting to use

things other than mail, because I think historically, most phishing cases are all about have been about mail, but they are starting to use additional services. - Do you know if changing users' passwords immediately disconnects all the other sessions? - Just changing it? No. You need to actually, I think actually in the PyTalk, they have, I think in their modules, they have they had a command that will not only force the password reset, but lock them out of existing connections. - Can a user do that or does it? - I think a user can, but don't quote me on that. I don't know for sure. Anyone else? - Do you see any changes? - You mean, is

the format differently? I'll put it this way. Microsoft would love that you use their tools to do all your investigations, but they simply don't have the same capabilities that a forensic tool does. So, I mean, to this day, what I do in a lot of cases is I process them, I process PSDs in X ways. and I go from there. I haven't had any issues with that. I mean, it's a PST format at the end of the day. The one thing I will say is that Microsoft does make it very easy to place holds and to start acquiring large amounts of mailboxes very quickly. And their searches are, although they don't have the most robust like regex functionality in

their searches when you're searching across the entire environment, it searches very quickly. Because you know, processing PSTs with forensic tools, it takes forever. Where searching, sometimes you might want to use the content search or the e-discovery area to just run searches to try and find the same email. - Content search will find double the email - Well, if you have a litigation hold and you collect that mailbox, you'll see purged email, I'm pretty certain. I mean, I've seen people use tools that, they've gone through methods that are not supported in e-discovery, like use a third party tool like Exchange Migrator, I think it's called, and that might not have it if you don't choose the right settings. Yes? - Last question. - Yeah, okay. - Last tangent.

Yes, so there actually is a feature that Microsoft is rolling. I don't know why I know these things, but there's a feature that they're rolling out. And I don't know other than the press releases that they said, but their entire thing is about ransomware remediation. And basically you can say, I wanna basically go back to this time and just bring back all this data. So if you did suffer a ransomware attack, a great thing that Office 365 does, at least in theory, is it creates versions for everything. So whenever you modify something, there's a version. Yes, there are mitigation. Yeah, there's like a rollback. You'll have to. I mean, oh, whether it's actually doing the versioning? I think that's automatic.

I don't know 100%, but I'm pretty sure it is. Limited to what? Oh yeah, for versioning. But do you know if the new feature that they're announcing, if that will apply to it? I know the versioning is only at Microsoft documents, but... - So it's all files? Okay, cool, so that's the answer, it's all files. I guess we're out of time, but I'll put my Twitter.