← All talks

Regipy: Automating registry forensics with python

BSides TLV · 202016:39276 viewsPublished 2020-07Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
Mentioned in this talk
About this talk
Martin G. Korman - Regipy: Automating registry forensics with python BsidesTLV - Tel Aviv - July 2nd, 2020
Show transcript [en]

And now I'd like to introduce our next speaker. This is a talk a lot of our team is actually super excited about. It's about using a very cool new tool that our speaker has actually created to make forensics work, specifically for registries, better and easier. So let me tell you a little bit about our next speaker. His name is Martin Korman. He is a research and development director at Signia where he builds new tools for incident response analysts. He's got years of experience with incident response and malware reverse engineering. Martin also served in the software division of the Israeli Air Force. For those of you who don't know, the Israeli Air Force is the most friendly military force foreign to the IDF operating in Israel.

That's an internal joke, but if you're from Israel, you got it. Martin also has this really cool pretzel necklace, which specifically I might have to ask him where he got, because I love pretzels. But he's also a really nice guy and an industrial design student. That's probably where he came up with that pretzel idea. Martin, are you ready? The floor is yours. Give it up for Martin. Thank you.

Hi, good afternoon. I'm Martin and I'm going to talk about today about the Windows registry and how we can use Regipy for analyzing registry. So Regipy is an OS independent Python library for passing offline registry hives. I use it to pass registry hives in scale for collected from thousands of machines or from a single machine. We get the output that is ready to be ingested to a document database in a JSON format for analysis or for manual review. I've got asked many times if there aren't libraries like this already. So the short answer is that yes, but most of them don't run natively on any operating system and don't have a good JSON output or not written in a modern programming language

or they don't have a good plugin framework. And a lot of them, I didn't want to show the git commits history, but they are not actively maintained by the developer. So what is the Windows registry? The registry is a system-defined database in which applications and system components store and retrieve configuration data. Before I start talking about Regipy, let's do a short overview of the registry internals. So from the forensic point of view, the registry is a treasure trove of forensic artifacts. For example, the AMCache artifact in my machine contains more than 1,000 hashes These hashes can be searched for in various OSI search engines and sandboxes like VirusTotal. Malware autos generally use persistence mechanisms that rely on

the registry, and we can use Regipy to find those persistence. You can find in a registry a list of connected hardware, for example, mobile devices and external storage. In some cases, it's even possible to retrieve the device serial number, which is really useful in forensic cases. You can also extract the local user accounts and LM hashes in some cases that can go on. The list is really long. There are a lot of research around registry forensic artifacts. Some general terminology of registry before we dive in. The registry consists of keys, subkeys, and values. Keys and subkeys have the last modified timestamp. It is not possible to determine when a specific value was modified, but you have the timestamp of the subkey itself.

Registry values are mostly stored all in a binary format or as strings. It could be, most of the time, it's encoded as UTF-16, sometimes UTF-8 as well. I'm going to do a short overview of the most valuable registry hives for a forensic point of view. So let's start with hkeyclassesroot, which contains the file name extension associations, which means which applications open the file when you double-click it. and com class registration information such as class IDs. The hkey current user, which is a pointer to the NT user hive of the currently logged user, which is an important thing to know because when you are analyzing the registry of a machine, you have one NT user hive per each user that logged into that machine,

even the system accounts. Also you have the same account with the same Hive, which like I said before, contains the local user groups configuration and passwords. The last logon attempts, the last password changes, and password hint, and sometimes the hella mashes. There is also the software Hive, which contains configuration for installed software and Windows components. and the system hive, which contains the list of configured devices, connected hardware, and including external storage. This is a really useful subkey named hive list in the system hive, which contains a list of the currently mounted registry hives and their respective path. One important thing about this key is that there is such thing as a volatile subkey in the registry, which means this key

is not present in disk. You can only see it in memory, although it appears to be in the system hive. I want to go over two forensic artifacts from the registry that tend to be the first one I look at when investigating a forensic case. Let's start with the AM cache. The AM cache was introduced in Windows 7 and is part of the Windows 7 application compatibility infrastructure, along with the Shim cache. Because of that, it is involved in almost every execution of binaries in the machine and keeps a lot of metadata that we can make use of as forensic investigators. Besides the file path, you have modification timestamp, we have the hash of the binary, which is really useful as a forensic evidence, and we'll

get to that later during the use cases. Another artifact, it's really close to my heart and I use it a lot, is the user assist. This is a mechanism that Windows uses to display the recently opened applications to the user. It contains the execution timestamp and the last execution timestamp of each application.

Of course, this is an artifact that appears in the end user hive, which means you have one per each user. Let's talk a bit about the binary structure of registry hives, which I've learned a lot about when writing Regipy. I got to know that the best way to learn a binary format is to write a parcel for it. And what a complex structure it is. So the registry files are binary. The header always has 496 bytes and starts with the regfmagic. The header has a lot of valuable information such as the last write timestamp, the registry hive name which you use to determine the type, the version, the checksum, the size of a the size of the registry hive itself, and really important, the primary

and secondary sequence numbers, which are used to determine if the hive is dirty. That means if the hive is dirty, it means that transaction logs were not applied. We'll get to transaction logs in a minute. Also, the header contains a pointer to the root key from which we can start parsing the registry data. So the header points to the root name key. Each name key contains one sub key or a key and some metadata, such as the last write timestamp, a pointer to the child sub keys, and a pointer to a list of VK entries which contains the values. It also points to a security key that contains the permission to the specific sub key. Value key entries contain the value name, the value type, and of

course the value itself, and some flags This is really high level and there are a lot of extreme cases and additional data types. For example, to store huge data, for example, you have the indirect blocks. But in high level, this is how the registry looks like and because of that architecture, we use recursion to iterate over all the registry. So transaction logs. Transaction logs, are used when writing to the registry. To avoid registry corruption, Windows doesn't write directly to the registry hives. They use the transaction logs as some kind of journal files. The changes from the transaction logs are flashed to the registry itself on regular intervals or when called the regflashkey API call. Modern versions of Windows use a slightly

different transaction log, but both types are supported by Regipy. So why transaction logs are important? RegiPy can apply transaction logs that were collected with their respective hives. I took the NT user and transaction logs from my machine. I applied the transaction logs using RegiPy. I can see that 141 pages were modified, which is a lot of data. Now we can use the div feature in RegiPy to compare between hive before and after applying the transaction logs. In some cases, there would be a lot of differences. In our example, you can see almost 2,000 new subkeys that were modified. The second example belongs to a small test I did. I wrote some keys directly using regedit, and I immediately copied

the transaction logs and the end user hive itself. And after applying the transaction logs, you can see a new subkey and a new value. which I would have missed if I just collected the Hive. Regipy has a lot of command line utils. You have utils to recursively parse a registry Hive and get the JSON output to a file which you can later on ingest to some kind of document database like Elasticsearch. You can run plugins on a Hive and extract specific forensic artifacts like the AMCache and UserAssist I've showed before. Regipy can also be used to generate a timeline from a registry hive. And also, like I showed before, you can use it to compare two registry hives. Most of

the times, it would be the same hive before and after some execution. It's mostly used for research. This is a simple plugin that goes over specific paths that malware authors tend to write persistent to. Regipy has a lot of utilities that plugin writers can use that make it really simple to write a plugin like this. As you can see, this plugin is nothing like five lines of code, maybe. It's nothing, and it's so easy to write a new plugin. Regipy can also be used as a library, and that is where most of its power is. I want to go over two use cases. I really love using Jupyter notebooks for everything possible, so let's open a notebook and

parse the amcache hives we collected. I'm not doing a live demo because parsing hundreds of hives could take some time, and our time is short. I'm iterating over a directory of collected amcache registry hives and parsing them. This could also be done with multiprocessing, but I'm showing a simple example here for brevity.

After parsing all the registry hives, we have a list of AMcache entries. We can load them to a pandas data frame, which pandas is a data analysis Python library that is really commonly used. This is an example output of 10 entries out of thousands of the entries passed by Regipy.

Now I can perform various analysis on this data. A classic method would be to group all the occurrences of the SHA-1 and find the outliers. After I found the suspicious hash, I can check on which machine it is. For example, in this case, we can see one hash, the ones that started 5FF, that is present on one machine, which is named win14-something, and was executed on 2015. I can now take this hash, search for it in VirusTotal, for example, and see that That is some kind of malware. Another use case which I found myself using it a lot is using RegiPy plugins to solve questions in forensic-focused CTFs. For example, the Defcon DFIR CTF from two years ago.

Let's answer some example questions, and I hope I am not spoiling this CTF for anyone. It was published two years ago. Let's start with the first question. What is the time zone of the system? So RegiPy is a plugin exactly for that. I execute the time zone data plugin on the system hive and get the answer. The answer is eight hours because the bias is stored as minutes in the registry. Of course, so 490, which is eight hours. This data is stored in the system hive. The next question is, what is the SID of the administrator account? We can use the profile list plugin to see that. That information is present in the software hive. Another question there was when the

computer was last shut down. So I don't have a plugin for that yet, but I know that the shutdown time, if the machine was shut down nonviolently, is a control set, control windows, in a value name shutdown time. So because...

So what I have to do is just load the hive and get to the key and to the value, and then I need to use another utility that is present in Regipy named convertwindtime, which converts the value from the Windows timestamp to a regular timestamp we can read.

So what is next for Regipy? I'm planning to add a lot of more plugins. The backlog is really long, and I have a lot of ideas. I support to add support for CSV output, which is easier for manual review than JSON. And I plan a lot of performance announcements, because parsing an entire hive-like system could take tens of minutes in Python, and that's really a big issue when you're parsing in scale. So I'm currently experimenting with moving the parsing part to Rust and calling it from Python, It's not yet committed to Git, but it's looking really good and it would make a great difference in Regipy. So to summarize, hopefully the security community will find this library and tool useful, and I hope to

get a lot of contributions from the community. I've learned a lot about the registry structure from working on this project. The registry is a really complex binary structure with a lot of extreme cases and I'm pretty sure I haven't covered all of them. The transaction logs are really important and can make a great difference between discovering findings in a case or not. Not just the samples I showed before, but I actually had some cases that I cannot show here because it's work stuff, but when the transaction logs made a really great difference. And one little small fact, you can store emojis in the registry. Actually, Windows does that for some reason.

I'm really hoping to get a lot of contributions. This is the link to the Regiby repository in GitHub. Feel free to open issues and pull requests. I promise to be responsive. And always happy to help. And you can reach me via Twitter for any questions. Thank you.