Niall McGrath - Digital Forensic Science and How it Fits in to the Field of Forensic Science

Name: Niall McGrath - Digital Forensic Science and How it Fits in to the Field of Forensic Science
Uploaded: 2017-10-03
Duration: 37 min 7 s
Description: BSides Belfast 2017

BSides Belfast37:0770 viewsPublished 2017-10Watch on YouTube ↗

Mentioned in this talk

Platforms

Amazon EC2 AWS AWS EBS AWS VPC

Languages

Python

Concepts

HFS+

About this talk

BSides Belfast 2017

Show transcript [en]

so I kick it off anyway hi folks welcome to be sites my name is Neil McGraw and I'd like to talk to you today about how digital forensics fits into the world of forensic science I will also talk a bit about cloud forensics as well in the context of AWS this is the agenda we talked to today I'll just introduce myself I'll give a background on forensic science give a vote an overview of digital forensics in the context of two research papers I wrote in relation to investigating file encrypted contents on NTFS and OS X we talked a bit about dfi or in AWS and we will wrap it up then for questions that's okay so currently

I'm the soccer manager of security operation center manager in workday I'm there for about three years just a bit over three years previous to that I was in Amazon Incident Response by working as Incident Response and the focus on digital forensics and previous terada worked in Bank of Ireland for a few years had a number of roles there developer programmer technical designer architect and finished up with IT security and again would have focus on digital forensics and while I was there I was I completed my PhD in University College Dublin and that was in PA in in digital forensics as well so you can see that theme of forensics developing their previous that I was in I CL as a

software engineer and then I worked in Japan as as a mechanical engineer that was my primary degree in mechanical engineering from University of Limerick and I worked as an architect there as well but my wife there and this is the gang that's Marcio jana chuda and leon so let's talk about forensic science forensics as it's abbreviated to forensic science is you know its domain of Sciences and it's composed of the chemistry's the physics the geology's the biology's and incidentally geology is the study of the Earth's structure the earth structures and how they interact with each other under various conditions right so we have this domain of Sciences right so why are we using this domain

science I'll tell you it's to answer a question are a series of questions in a court of law you know criminal or civil civil law and the mechanism through which we do this is true evidence admissible evidence and depending where you are what your restriction your in admissibility can be many things many criteria etc but one thing it is are the many things that it is it includes relevance you know completeness of your evidence reliability of your evidence authenticity of your evidence and indeed repeatability you know if you've got a if you've got a methodology that generates evidence it has to be repeatable no matter what location Belfast Dublin London Hong Kong Tokyo it has to be repeatable to yield the same

results and unequally saw the same error rates as well you know that's just as as as critical so I mean to get the admissible evidence we have forensic services and here's a list of forensic services and in a school true them really quickly but these are services that help us get admissible evidence evidence into court we've pathology you know and this underlines medicine and science and it completely underpins that the patient's life care you know from start to finish and there's a mechanism there called the autopsy to post-mortem and this is you know this is really important for determining cause or reason of that you know organ failure etc we've also got um anthropology in this very gets a bit

macabre oh so bear with me anthropology is the kind of the story and detection of skeletal remains and from there than the detection to see artitude being able to distinguish if they're animal are human at Nissa t gender etc you know so and entomology this is um the study of insects and that the study of the growth phases that they go through because that will bring a lot of really rich forensic information to a crime scene in terms of chronology and timelines etc psychiatry it's all about human behavior you know and in particular abnormality in in human behavior and profiling those those abnormalities engineering would be used by the insurance companies predominantly to determine if if you know a claim was

falsified or are the proof of its legitimate you know equally so odontology is the study of dentition the uniqueness of dentition our teeth ballistic science is you know the study of rectilinear motion in conjunction with firearms etc you know determining the trajectory of a bullet you know you can determine distance etc ballistic an can also be used to determine if at something or an object or somebody was pushed from a Ledge or indeed jumped from a ledge toxicology is that the study of poisons and how it interacts with the human body etc and toxicology reports would be a very important evidence artifact in in a court of law so this is the executive summary that

I'm going to describe in relation to the two papers right and it's imagine imagine you're a digital investigator right and you you're you come across in file encrypted material right the the problems that you face are obviously you know trying to extract the evidential value from your final encrypted contents so I mean you just assume that you don't have the keys to decrypt right you're not going to brute force maybe the tools aren't reliable and and the tools themselves aren't admissible in court and you're not going to use script analysis techniques to to defeat the decryption right so how are you going to do this you know how are you going to get at extract your evidential value and

the typical scenario where where you could use encryption and the talks are going on right now as well as you know the widespread you know malicious and nefarious use of encryption would be the widespread distribution of of of malware for example you know and indeed in conjunction with Stephen ography as well also in in the widespread dissemination of illegal material like terrorist related information as well so these are kind of typically deny fired into various situations and cases that are been investigated so I mean what the research was proposing was to locate the plain text file name you know because it can be transformed in that encryption process and then from that then identifying the

plaintext contents of that file and how he went about it was we characterized the file encryption process as a series of input/output operations you know so then we could track those system events in the journal files of the systems we were looking at which happened to be NTFS and OSX ok so let's look at an NTFS and dollar log file right and the dollar log file is the journal file in NTFS right and you know so what what is the journal file why do we use the journal file essentially a journal file is there to restore the system on the event of a sudden outage or a disk failure for example and it's there to restore the

system to its former integrity right and how does it go about that it stole our what what does it store in the journal file well it stores metadata metadata under file system you know recording the change in the file system for example the new files that were added the new files that the files that were deleted that were modified truncated etc renamed so it contains metadata in from information on metadata and there's two types of transactions in in the NTFS dollar dollar log file and it's the redo and the undo and these are specifically for that scenario for example the redo is you know that the transaction gets cached it gets cached in the journal

file before it gets written to the actual disk but before it gets written to disk imagine if you had that failure that system failure that we spoke about right it is committed so those sub operations will have to be undone will they'll have to be redone equally so undo not quite you know that you're building up your transactions getting ready to write into disk you know and it's not committed and then that failure happens well then you undo those sub operations from your journal file so then we have a composite of these right we have a composite of the redo and yondu's and here is an example of one the hexadecimal to zero right and this

is this is in this case initialize file segment so when you're creating a file this happens this gets written into the into the journal file alternatively you know if you are deleting a file if your deallocating a file its hexadecimal zero two you know Microsoft didn't publish these these codes we had to go about you know reverse engineering these right so it took a bit of time to build up a list of these these these codes attributes standard file attributes there are many of and in NTFS you know the main are the primary data structure is MFT and every every file every folder every sub folder on NTFS for example does have an MF tree

an MFT entry the MFT entries for those files would have attributes one of which which were really interested in is the dollar file underscore name attribute and this can hold timestamps the name of the file and and indeed if the file is short enough can contain the content that the contents of that file you know in memory resident in memory are as if it's large if it's an MPEG file if it's whatever an mp3 mp4 it will store the location or the run of clusters how to access the file for where it is like where it's actually located on on disk on the left hand side right so let's just talk about the kind of the

input-output processing in the context of application execution right on the left hand side you see the three main areas so if you write your code if you write an application whatever Python scarlet Java right you know you write your your code to open or create a file that will call the windows API in Windows which will in turn call the native API and now we're still on user mode we're still on user level right then that would he get transitioned into into kernel level kernel area right and we have a lot of things in kernel area right we have system services in particular and there's a lot of system services down there again in this

particular scenario were interested in to that would be the input output of the input output manager and the object manager right if you look on the right hand side it's kind of the same thing but a bit more detailed so you're creating a file you're all a file this gets called in the windows API down to the native API transitions across into the kernel level the input/output manager sends out a request to the object manager a file handle is returned from the object manager right and then it goes back up the stack that that method that the singular piece of communication is that what's written in blue at the very end in the right and

the right-hand diagram the IRP the input-output request packet right so we're going to look at the input that the flow of the IRP is in this encryption process right and what does it mean and what we can get out of this rather that's a bit grainy sorry about that apologies but the diagram on the left-hand side is is essentially the flow of IRPs during this encryption this file encryption process we're talking file encryption here not disk encryption that's a different animal don't go near me on that right we're talking about file encryption so if you just think of the main artifacts we're dealing with plain text file that we want to encrypt so we we get a file handle to that plain

text file we read the contents of the plain text file we write it into a buffer file yes we have a file handle for the buffer file to the contents gets written in there into the buffer file encryption process kicks in transforms that the plain text into ciphertext renames the buffer file to the cipher text file that you put in right you know closes the plain text file and then close that the cipher text file so this is all characterized by a series of irps right on the right hand side is a sequence diagram like we did this hundreds and hundreds of times we automated this and this and the right-hand side is the sequence and

indeed itself a signature of the encryption process you know plain text file file handle cipher text file buffer file you know closing the cipher text file so this is that the kind of signature that we've we've come to understand and more or less the last entry in our journal file is from the cipher text file the created encrypted file is actually closed right so we're going to use that as a as a as a pivot point if you like right so we we did this a number of times and we came across the frame we built up a framework systematically got a framework together and we automated this we wrote Java files and we also wrote a python parser

and so I mean going straight into it from that then from that we were able to establish a methodology you know to formulate some steps and then we we executed our case study and the first thing was we got a hard disk drive with the file with that encrypted file on it and the first thing we're going to do is just to verify that that encrypted file was indeed created on that hard disk drive not dropped in or copied in so you can run your FS util object ID send in the query and and the name of the file and you get your Berk volume ID and you correspond that with what's in the journal file the two of them match so

we've a match we can definitively say this file was created on this hard disk drive now bearing in mind right we work we've extracted the journal file we've read this into memory and we go to the very end the last occurrence of that cipher text file name that we know we can get it we can get it quite easily because it's it's available in the in the file underscore name attribute so we we go to the very end of the journal file and we work backwards we backtrack right it's a constraint satisfaction problem we use backtracking so we backtrack and now we have we can see that the dollar file underscore name attribute this is the name of the file

the this is the cipher text file so we're going to change our way backwards in the journal file to see what files have been touched we can see other files the buffer files a buffer file a temporary file are a series of buffer files depending on the size depending on the size of the file you want to encrypt right but either way we can see this chain for example there's hexadecimal this format BB 8 0 BB 7 7 1 V 1 right this signifies a file has been created and indeed a temporary file right so we can chain our way backwards to see what buffer files were created what temporary files you know we can go we can backtrack we can

also use the file 0 signature which gets written in for every file in the MFT right and indeed the journal file so we can use that as a kind of a tokenizer and chain our way backwards for more parsing so then from this then we can determine the plain text file that was actually that was used and processed to create the cipher text file and again we we extract the name of the file from the the very same attribute you know the dollar file underscore name attribute and we can get our timestamps from that and again we can build up a chronological picture to build up timelines for evidence and admissibility and indeed you know chaining our way

backwards we get at the plain text contents as well you know so it's it's it's this is a case I know the string is pretty small it's pretty short and it happens to be reported you know res in resident memory of the journal file all right and that's fine that's great but this is for demonstration purposes if the file as we said earlier was whereas large was an mp4 or something this would record the locations and the run of clusters so we we could change our way we could jane jay-z chained our way through collecting logical sequence numbers that tells us what's the next what's the address the net the next address to go to and as

we do that then we we can reconstruct the plain text file if it's a pretty large file that's that's essentially you know from a Windows perspective how digital forensics fits into the world of forensics and indeed Forensic Services and you might hear a lot about digital forensics as a service these days equally so let's talk about you know the same kind of process in OS X you know it's it's very similar and yet it's different you know and you see the main you see the main core components of HFS+ here there that the six main core components and they they exist contiguously and under under the volume right and the two ones were after are

the catalog file and the dot journal journal file basically image and essentially you know since ten point I think was ten point two or ten point three in OS X journaling was turned on by default and were asked what now 10.13 you know Ciara whatever so it's been turned on quite a lot it's it was available beforehand but it's turned on by default no since 10.2 10.3 in OS X and again the same story you know the catalog filed this the catalog file is a b3 data structure this is how it stores its metadata in OS X and again the same drill you know what what is the journal fight what is the catalog file

it stores metadata information under metadata changes on the file system you know exactly the same as in NTFS but what happens here is copies of the catalog file get written into the journal file alright so now we can we can we're running with two copies all beers the journal version the journal file version of the catalog file can get all to sync and we can use that to our benefit too for example if a file is deleted off a system excuse me it gets removed from catyph like cattle catalog file immediately however it might be recorded and buffered for sometime in the journal file and from that if we cover if we correlate the two we can we

can you know conclude that the file was deleted off the system again we're sorry I should have said this now we're looking at from a forensic digital forensic point of view right we're investigating stuff you know all we have is is an encrypted file right so that's the kind of context for so we're going to use a lot of kind of inference as well so we can use that kind of you might say oh well look it's it's it's a mismatch in in in synchronization where we can use that to our benefit - okay so we look at the catalog file in the journal file and again same same same modus operandi here you know we started

in NTFS we were looking at the flow of irps up and down between the user mode and and kernel mode in this case we're looking at the enemies in OS X same drill we're talking about the plain text file we're talking about buffer files temporary files we're looking at the plate the cipher text the created cipher text you know these are the main artifacts that get that are generating this traffic so this is the flow of the Vienna VNL PS on the left-hand side and equally so the sequence on the right-hand side and same thing when the when the cipher text gets created and gets closed its last thing to get updated right so I

mean we could we automated this as well you know we wrote a part part Python parser and we simply read in that the journal the journal file into memory and we read the contents in and we were able to enumerate that the catalog records which are b3 notes in the journal file and we always get the size of that note from one of those core components it's called a journal file header we ever get the size of that node and so enumerator way through the journal file according to b3 catalog file nodes of those nodes then we're able to extract the records of those records then we're able to extract the file records of the file

records then we're able to get us using the fork information will get able to get at the location in on disk where our plain text file is and what also gets written into into the journal file in on OS X is temporary items and if you are creating if you're creating an archive and encrypted archive a dmg file or whatever that gets the name of that gets written in along with the Constituent files of that archive so in this case here of no point or no in this case it's the plain text file and then the plain text file is is a component of the encrypted file in this case that's the archive that we created so we can use

that information to go searching elsewhere in the journal file and it so happens that the EFS events SD UUID also gets written into the journal file and what this is essentially is that the FS events D is the daemon that use that does that live updating on spot lights in OS X right it does that live updating and OSX is essentially our that the spot lights is one vast index of your files your photographs your apps whatever is on your system right FS events D is the demon and of that updating and live updating so we can we look for having got the plain text file named from the temporary items in the journal file we're also looking for FS

events the entries for that plain text file and we find it you know we find it and we find a lot of a lot of forensic information to file ID the name the folder name the the file name timestamps and importantly the start block the start block of the plain text file and then we can get our plain text and it's the same thing you know the string is small because it's stored resonantly but if it's a large file you know we can get the the sectors and we can systematically you know harvest and reconstruct the plains the plain text file by by chaining our way to the logical sequence number same concept as an NTFS no mystery so again now we've

seen it from an OSX perspective the digital forensics how it fits into it into the digital forensics and then digital forensics into forensic science as a whole it's a service so you know sticking with this team of forensics let's talk about forensics in the cloud right and you know the landscape is different we know that the tread scape is different you know the evidence artifacts are different and of course that the origins and the sources of this evidence are different too and if you look at it if you look at the kind of a cloud scale then right and you look at the emergence of I suppose at this dfi or this so-called DF IR to digital

forensics incident response and this is the paradigm that kind of comes to to our aid really because we're not talking about one or two servers and in the cloud we're talking hundreds potentially thousands of servers that have been provisioned and orchestrated you know so you want to do your your incident response you want to do your digital forensics in that kind of automated fashion you know you just can't do it manually it's you're you're not at the races basically you know so this is the kind of what we're looking at you know at that that automated level as well and the in order to do that you're kind of you're relying on some sort of an

event-driven architecture and yes you can collect your events from from collected sources of information like log files you know and we go through that later on as well and this this kind of helps us with our with our on our quest on dfi or and I see some Amazon folks could all folks from Amazon here in the audience and and this is this is a fantastic model that Amazon have put out and this is what they stand and live by and it's the AWS shared responsibility model so in other words you know AWS will give you the cloud they'll give you the compute power they'll give you the the networking resources they'll give you the databases

they'll give you that and they'll secure so they're giving you the security of the cloud right and then you and me when we come along and we use AWS as customers its response its response sorry it's responsible on us and incumbent on us to put the security in the cloud on top of that you know so we're talking we're talking about doing our hardening you know or OS hardening doing our patching doing our updating you know putting in our encryption you know data arrest trans transit transmission etc so and I think it's a a lot of scenarios AWS Incident Response scenarios have evolved are come out of this kind of misunderstanding on the user's behalf here are some some

scenarios that I would have seen for you know for good dfi are and you know a key compromise an access key compromise and this this this is prevalent and you know you you you create an access key from your I am returns your access key and your access key ID right and it could possibly be you know that your your who was mentioned in a previous slide you're using github you're a developer you know when you push out your AWS s3 you know access key your API key into one of your configuration files and it gets whatever it gets trawled by one of these botnets for 4-bit mining you know this kind of stuff it happens you know and that's one

scenario it can happen by being careless you know I've seen it as well you know best practices aren't followed ec2 instance compromise would be typically you know where where the user wouldn't be touching on you know adhering to standards are best practices again with with hardening for example and suddenly you realize if you're not touching your your system is out of date and it's it's now vulnerable and it's probably exploitable by one of these dirty cows or whatever there's a privilege escalation going on there's remote code execution and you have any an ec2 instance compromised you know of course that that the one that that scares me is the account compromised like where a root credentials are

compromised and that can happen as well you know one or two key individuals in the organization are entrusted with the root credentials and one of them walks out the door you know and you've know your operation or a key management system in place and there you are you're wide open so I would I would look at them as being fires that you can control you know yeah it's getting things are getting hot and heavy in the kitchen here you know you want to get moving and then I think you're a once your AWS account is compromised by by kind of poor handling lack of best best practices it's very hard to kind of recover from that in particular if

you're hosting somebody else's data on therapy if you're you're earning your license fees by curating somebody else's data you know this this is a this is a pain point so I mean there's some kind of things that you can code and script you know to kind of help you along the way to to do the security in the cloud on top of that platform that AWS has given us of being security off the cloud right you can check for unexpected resources that have been spun up you can you can describe your instances you know yeah and you can put in you you you can have a list or an inventory of your known resources and compare it with the

output of your describe instances you know if there's a deviation in that ring the buzz you know bring in dfi are and from that then as well you know you can you can detect well what X X what access keys were used to spin up these unexpected resources you can list them down you can you can render them inactive you and in parallel spin up new active keys if you want to maintain continuity of services and then disable or delete that key that was used or compromised in terms of the instance itself you know create a security group to isolate your instance you know and from there then you can you can start adding in rules

you can start adding in rules into your security group if you want to control ingress traffic traffic into your ec2 for example you know and you can just add a rule for you know four SS aging and pour 22 you know you can do the same that's another format of doing the same command but different you know you're putting in the name rather than that the block ID equally so you know on instance compromise you know you can remove the road you can revoke or Laxus you know to all protocols and provide the parameter of -1 and this recovery or your TCP is your IDPs your UDP icmp all these all these protocols and and then from that

then you can layer on the access that you want on your segregated instance you know you cannot import 84 your egress traffic and you can modify the instances as you go and drop in new attributes and the old architecture that they eat the classic the ec2 classic architecture you'd have to stop the instance and then restart reboot the instance however now at the V PC architecture it's it's ok you can do it on the flight and you know that the AWS account scenario where that's compromised just rather than going in and modifying all the users that you think you have and you you don't know precisely you can go in and drop in a complete no I am policy on

everything so your quarantine your your containing it to a certain degree you know and you can buy or you can you can tag your your evidence you can create your to come you know your you want to target you know good old-fashioned you know but the first principles you know forensics you can bag and tag your evidence basically and this is how you go about it you can save your instance information as well you can output it to various logs and it's this this command here you know getting the describe in your instances and then you pass in the instance IDs that can be highly verbose are verbose is the wrong word it's highly paginate 'add but it's

good quality information because it gives you information on all of your stuff your your VPC level you know don't you know the type of VC to instance your OS you know the security groups you're using your your routing table information you know your routing you know a subnet information your network access and all this so it's good in for good very good forensic information and again you know you can you know you're getting ready for your dfi or process you can create your snapshot and you've to be cognizant of the fact that you're easy to hear has to be backed up with EBS you know the block storage to create your snapshot and you know from that

then you could load up your favorite forensic distributions and go to work then doing your standard forensics and you can create a volume then from me from your snapshot and you can also do create and create an encrypted volume and assuming that your your ec2 is backed up with EBS that supports encryption and you know this this for many reasons you could use this but dfi dfi or is a very good use case scenario you could describe your volumes in the in the default region that you're running you know there's a sample of some of the tools top left corners your endpoint tools you know your mandaeans your end cases there's girders well google rapid response you know they have

an open-source version of this and it's a very good you know enterprise wide collection tool and your network stuff you know like paul previous studies was talking about Wireshark Mallick your software software defined right blockers can do your memory captures with lime and ftk imager etc and there's a whole list of other open source projects that are ongoing there and pretty good so I think it's it's prudent as well if you're if you're in the TFI or game to to construct and prepare a response park station you know base it off whatever an Amazon machine image your an ami and you know put in your operation whatever you configure your operation assist your OS your flavors the various tools your

favorite distributions that you want to use you know and you could you could liken it to you know your preparing and initializing your robot for war right and you know you can have a dashboard as well the talks to your AWS regions you know and and list stuff that the the various recently launched resources that we spoke about earlier and he lifts off your various types of a.m. eyes etc and you know you can you can do your analysis then you know you can do your memory forensics or your disk your disk forensics and the disk forensics you know you you know you have those you have those artifacts that get written to disk you know you're the remnants the

leftovers of malware you know what gets written to disk and eat this helps us a lot this helps us build up evidence in a picture of and the modus operandi and the objective of the attacker equally so though in memory forensics - albeit the artifacts aren't written you know they're more volatile etc so there you are you you have a little little robot ready to go to do your TFI or battles for you and I would you know it'd be prudent as well to be forensic ly ready you know assume assume that your environment that you're operating in is going to be at some stage you know a crime scene right though Shaymin that just it's just preparation so your

forensic ly ready and you could build up your evidence environment you can have you can build up your own evidence ecosystem which are little part army at the center of it you know and you can go and you can collect evidence from from VPC flow logs you know your VPC over on the left-hand side here and that's your virtual private cloud and this kind of contains all your resources and it can the VPC flow log is essentially that net flow information that gives you information under the traffic in your flowing into your PPC out here PPC and indeed within the VPC itself and you know so that's really good kind of forensic information there you could

also use cloud trail and AWS cloud trail and this kind of gives you the kind of the account the big picture on the account you know and part API is I've been called are calling each other and you can see the flows there and you know from from an operational point of view and a compliance and indeed a governance perspective you know it's it's invaluable you know cloud watch cloud frail sorry that was that was cloud trail child watch is very much in relation to the usage what you're using and if you if you need to put your your your your cap on your resources you can set alarms and indeed it's useful as well for Incident

Response because if if you're one of those people that doesn't use easy to just use AWS three for backup you know and AWS CDN for pushing content to your website and you don't use easy to you can put it put an alarm in put an alarm in for one ec2 and over and you know in your heart and soul if you get an alarm for one ec2 and you're not using it something's wrong you know something's adrift alright and that can happen we just spoke earlier about our access key IDs being our acts our secret access keys being compromised you know that these are alarms that you can put in the AWS configures well helped see

the kind of baseline and if you're doing hardening on your OS if your whatever your security groups and your configuration that you have in terms of you can standardize your your cloud formation templates and any deviation from those prescribes standard settings you can get an alarm and generate your alarm and folks that's above it no you know thanks very much for listening I think that's the end of it yeah thank you thanks very much [Applause]

Niall McGrath - Digital Forensic Science and How it Fits in to the Field of Forensic Science

Related talks