
so yeah I'm gonna talk to you today about my dissertation project I done an investigation and to gpgpu enabled failed carving and if you don't know what that as I will explain its hopefully so yep I'm Corey Forbes I've recently left Alberti the university that doesn't exist the Northwest save the world I've just also finished work at the Scottish business resilience centre and I'm moving down here to work at NWR so that's gonna be fun yes so what is fail coming I hope everyone here knows and does anyone want to put their hand up if they do actually know what failed carving is okay maybe not then and so fail carving as the collection of fails
on the raw hard disk because when files are deleted they're not really like removed they're just dereferenced so what we do is we drop down to the physical level and we search the contents of a hard disk or backing storage so the Quartier excellently summarizes that the metadata that holds all of the fail locations as fragile baguettes completely weight and when I fails deleted but when as deleted this is still actually and the physical death square so and you can see here that the track sectors and are when the death sectors and the tracks meet and that typically holds your individual chunks of a file and but the operating system usually deals with us for you
the fail so the fails are broken down in these chunks and appropriately place them at the desk and this can be in the two ways you can see here which is fragmented fails where the fails to beg to be stored all together and as throwing throw death it's a pain in the arse and contiguous failure location as when the fails all kept together and it's just linear like lift missus quake Goods when you're working with physical hard deaths that have the actuator arm so the riverboat because you just read all and one go when it comes to tests as well with these actuator arms and that helps me segue into the fact that backing storage has only recently became fast
enough that it's no the software's fault that it's not that everything's starting to get slow from our fail carbon perspective so SSDs and raid technology has helped shift the attention of these programs back into the limelight in terms of needing improvement so ultimately my research question was how fast can be header for search become our how fast can I get the header for search to fame these fails and to clarify the headers and furs or unique like identification at the start in the end let's say this is a PDF or this is a JPEG so Alethea helped me get a solid answer to the question but to better explain as just to make sure our to
figure out where the problem was I had to break down what fail carving programs do so it just comes down to the to house of searching for these fails and these header/footer pairs as well as the actual carving but the carving can only get so quick before you start to lose accuracy and so the header fur repairs is entirely where are the searching for these pairs entirely where my efforts were focused so I'll quickly talk about surfing algorithms I don't want to ruin everyone's day here so Ballmer as what the current tools for motion scalp will make yourself these are two open source CPU fail carving tools that I was comparing against Boyer uses to rule
sets to jump through and strengthener searching and help save time but it requires the CPU to be fast enough I'm not going to go through everything on the side but as more for the video for anyone who wants to look back on it Boyer mer also has two pre-compile these jumps so it's not very fair for purpose with this new hardware I was trying to bring into the process and so as such I moved to a new algorithm which is the parallel failure list a whole Korsak it's a dumbed down version of the apple corer SiC algorithm which will allow our which basically well act as a brute-force search but for multiple patterns simultaneously forgot to mention
actually Boyer can only search for one string item so having the ability to search from multiple as we're a big step up also comes in which will see in the results later but having this more towards brute force search helps with them the fact that each core of our each computer core on a GPU is a stupid and comparison to the CPU where it's all fast and smart so know the backgrounds oh the way we can discuss the development so I made use of C++ as well as the CUDA library to develop the Alethea tool set but CUDA then limited me to only be Alti used in the video hardware so as such I wanted to bring back the
usability at the time Matt and actually realized and ER looking into adopting CUDA so that would have worked well to know okay so yeah cross-platform development was then a big objective of thing I wanted to get working on both Windows and Linux so then other tool or the other goal of course was I wanted open source fail Carver's to be hopefully and I wanted to be in the leaps in compared to those foremost and scalpel being cpu-based I was hoping that I could proof of concept that GPU is the way forward so for my results I used both my personal desktop which was running Windows and AWS instance on Linux and I've just got less for Klara's
purposes again it can be looked at later AWS was really useful for being able to recreate my results for anyone to want it to look at it themselves so you can see here the scalpel has both the fastest and slowest between the two tools and specifically it's worth knowing that the Boyar algorithm I discussed and how it searches for only one string at a time it was made perfectly clear by the fact that had to do multiple passes thus when I was searching for all the patterns instead of just PDF patterns the data throughput dropped drastically um you then compare this to ilithyia which was a vast improvement and when you compare PDF only to all patterns you
can see there's a lot less drop-off or a lot less of an extreme drop-off on the death reports so I was hoping as well that's just taken away so far more for future work I was really hoping that I could even apply this to real time memory forensics use case a hypervisor for example could be running a tool very similar or a repurposed version of the final project in the event I do implement the fill fail carving features and it could search for malicious files they exist in life memory so but yeah that's that's a lot of in terms of my slides but I'm open to any questions or heckles hackles yeah there's any questions I'll ask one that
so while you were doing this did you look at the device free list not the dev free list but there are no free list in the UNIX systems to see if that might be a good starting place or did you just look at the file system raw so I was just looking at how fast I could get a specific process to go just looking at fail carving yourself I haven't looks so you just scrape the entire dr3 list off the drive yeah I'm specifically with my EWS instance as well I wanted to test my algorithm more so I even made use of RAM disks to Lourdes the contents that I wanted searched and it was just all but the
speed of the algorithm so and did you look it through the bed block table as well or did you know I was the stupid approaches over yeah okay other questions now that the tough ones are out of the way that was actually how I got started in security doing the how fantastic forty-one years ago yeah hello hi Sam hi Cori yeah I was just wondering what do you think about the feasibility of tools like for most implementing a GPU based approach do you think that might happen or do you think is Rider but the court isn't good I held them on github and they responded with their attempt and I mean yeah they could absolutely adopt it it's just a matter
of proof of concept and hopefully being able to circulate this knowledge throughout the sphere and it would be applied hopefully in the future so when you say it's not good did you mean it wasn't parallelizable or was it not vectorizable so they dead managed to paralyze their results but they've also it was as if they they're really good computer scientists but they just obviously aren't good parallel performance they just didn't understand how to do it correctly and ultimately it just didn't work out well for them other questions comments ideas okay thank you [Music]