
This talk is Atomic Security Analysis of IoT Firmware. And the speaker is Matt Brown. A few announcements before we begin. Uh we want to thank our sponsors, especially uh our inner circle sponsors, Critical Stack and Veil Email, and our seller sponsors, um Amazon, BlackBerry, and Microsoft. It's their support, along with our other sponsors, donors, and volunteers, that make this event possible. Um for cell phones, these talks are being streamed live, except in Underground, which this is not, and as a courtesy to our speakers and audience, we ask that you check to make sure your cell phones are set to silent. And here's Matt with his presentation. All right. Can everybody hear me? All right. Uh so, yeah, my name is Matt
Brown. Um and today, uh I'd just like to thank you all for coming out to my talk. Uh today I'm going to be talking about ByteSweep, which is a free software IoT security analysis platform. So, a little bit about myself, uh I'm a Christian, I'm a husband, and I'm soon to be a father, and then, uh I'm a hacker. So, when I'm not busy doing those first three things, I love to uh tear apart uh IoT devices and see how they work, um and just tinker with stuff. And so, uh today, uh what we're going to talk about is I'm going to kind of lay out what I think are some of the big problems that exist
in the IoT security space today. Uh then, I'm going to discuss some of the automated analysis techniques that I've built into my platform, ByteSweep. Uh then, we're going to do a quick overview of uh the architecture, the system architecture, of ByteSweep and some of the deployment methods. Uh then, we're going to give a demo, and then I'm going to talk about some of the uh things I want to do with this tool uh moving into the future. Uh so, uh about a month ago, um this uh news story landed in my newsfeed, and I thought it was pretty interesting and pretty relevant to what we're going to be talking about today. So, uh way back in 2017, uh the FTC or FTC sued
D-Link over a number of vulnerabilities in their routers, and um there was just a couple of quotes from this article that stood out to me. So, the first one is that one of the things the FTC cited, among others, is that D-Link repeatedly failed to take reasonable testing and remediation measures to protect hardware from well-known and easily preventable software security flaws. So, these weren't these crazy advanced, you know, uh nation-state level attacks. These are really simple, easy to find, easy to remediate things. Um and then, as a part of the settlement that that D-Link made, uh they agreed, among other things, to perform testing for vulnerabilities before releasing products. That's a novel concept, I know. Um
So, uh we're going to jump into just some of the things uh that I think are uh the reasons why we struggle with security in the IoT market. So, the first one is that uh security costs money, right? And uh in much of the IoT space, these device manufacturers are operating on razor-thin margins, to the point where they don't have the budget, right, to to pay that full-time security engineer to be on the product team, or they don't have the funds to pay for that big external pen test uh from some big company. Uh the other thing is that uh there are a lot of complex uh supply chain dynamics that exist in the IoT
space. Uh oftentimes, when you take a look at one of these devices, you'll see that it was built, the the the actual hardware and a lot of the software was built by a hardware manufacturer, that was then passed downstream to some system integrator that, you know, wrote some extra APIs and slapped it on top of on top of that device, and then, oftentimes, they'll pass it on to another company that just white-labels it. They just slap their logo on that device, and then, it finally ends up in a consumer's hands. And so, uh the thing that happens is each in each stage of that supply chain, source code rarely, if ever, gets passed down, because that
is the proprietary technology of the company that's upstream from them. And so, uh if you haven't watched uh Dan Kaminsky's uh keynote for the like the first DEF CON China, not the one like this recent year, but 2 years ago, um he told this really cool uh story about when he wanted to go do an audit of a a solid-state uh drive manufacturing plant. And uh he went in uh and met with the people there, and uh he asked them, "Hey, can you just hand over like just give us all the source code for everything that runs on this solid-state drive, and we'll do uh security audit." And the guys kind of laughed at him and said, "Hey, you think
we have all that source code, right?" And so, uh one of the quotes uh from the talk is he said that each one of those custom each one of those companies, they passed a binary blob down that supply chain onto the next, so that they wouldn't be factored out of the manufacturing equation, right? This is like these people hold on really tightly to their IP, and so, any kind of security analysis that we perform, we can't rely on source code tools in this space. Um and then, the last problem, this is certainly not a problem that just exists in the IoT space, but it's very pronounced in the IoT space, is that security is an afterthought. So, if
you do perform a pen test uh at these companies, it's conducted right before going to market, and that means that if they find some really simple vulnerabilities, let's say they find a piece of software and they say, "Hey, uh it has version 1.0, and you need to upgrade to version 1.1 in order to be secure." Okay, maybe they'll do that right before they're going to market. But, if the pen testers come back and say there are some big design flaws uh that we found, um oftentimes, those risks are going to be accepted, and it's just going to be sent to market as is. So um what does a solution to this problem look like? And
uh one of the things I've really enjoyed about this whole track today, and being a part of it, is that um I think there's a sense of camaraderie where we all want to come together and solve these problems together. We understand that no one person is going to solve it um by themselves, and uh so, this is just my uh shot at solving this problem. Uh and but I really want to hear from from you guys uh what you think uh your solutions to these types of problems are. So, um for me, what a solution would look like is free and open source software, right? Cuz we established that a lot of these companies are operating on these
razor-thin margins. Um the solution is going to provide automatic analysis, because we can't pay, you know, a security engineer full-time to run a bunch of manual testing. Um and then, it's going to utilize static methods that don't require source code, cuz we don't have the source code when we're performing these audits. So, that's where uh my tool, ByteSweep, comes in. And so, ByteSweep is a web application that is written in Flask on the front end, and then there are a number of Python back-end workers that do a lot of the heavy lifting, uh the analysis. Uh these are uh licensed under the AGPL and the GPL, respectively, and if you're like me, when you have somebody giving
one of these talks on their software project, I'm just like, "Hey, dude, show me the code." So, uh feel free to browse the repository. This is public as of uh this morning, and uh you can just uh yeah, check out the whole project. Um So, now we're going to jump into just some of uh automatic analysis techniques that I've implemented into the ByteSweep platform. So, the first thing that a user of this tool would do is they would come to like the upload an artifact page, and they would upload whatever they want to be analyzed. This most likely is going to be an entire uh IoT firmware image, but it doesn't have to be, right? If you're
just interested in a subset of files, you can wrap those up in a zip file, upload it, or if you're just like there's one binary on this device that I really am interested in, you can upload that one binary, and it'll analyze that alone. So, after we upload a binary or after we upload an artifact to the platform, we then need to extract all the files and file systems that we can out of that image. And so, to do this, uh we use the tool um binwalk. Uh binwalk, if you've worked in IoT you know, firmware analysis, uh it's kind of like the go-to tool for unpacking firmware. Um and it has a Python library. Binwalk's written in
Python, and it has a really good set of Python APIs, and so, that works really great for me. So, this tool is so awesome but that leads that means it's like even more unfortunate that uh the developer of it has decided to stop development on the open source tool, and instead is kind of pushing this proprietary cloud-based service. And so, this is kind of too bad, but the like what I want to do to deal with this is just a fork this project and to continue um fixing bugs, developing new features. And so, uh underneath the ByteSweep uh project up on GitLab, uh I've got Bitwalk, so that's going to be the name of the
binwalk fork that I'm going to be supporting going forward. But, um even though it's like sad to see like what's going on with binwalk, I still want to thank like devttys0 for all of the contributions that he's made to the open source community. Like this whole project wouldn't be possible if it weren't for all the work that he put in. So, just got to say thanks to him. So, after you've uploaded a firmware image to the platform, uh it's tried to it's extract all the files and file systems that it can. Uh then we need to kind of enrich that data a little bit. So, there's two uh big data enrichment things that the platform
will do. First is we want to determine what type of file uh a a given file is. And so, for that we use libmagic. If you're not familiar with libmagic, anytime you run the the file command on any Linux or Unix based system, uh this is the this library is the brains behind that. It's basically looking at the file, the first few bytes uh of that file, and then it's doing a look up in some big database, uh and and then it's going to tell you what type of file that is. Um and then the second calculation or the second piece of uh data enrichment we do is a byte level Shannon entropy and an index of coincidence calculation.
So, these are more of like the heavy like math based things that are going on. And so, the cool thing that you can do with like byte level entropy is that you can actually determine if a file is encrypted or compressed. Uh so, they're like So, this is an example uh from the graphical interface. Uh we can see that a Shannon entropy that's really close to one is going to like kind of warn the user that this file is likely either encrypted or compressed. Can't necessarily distinguish between the two, but it's one of those. So, um then after we've enriched that data, there's a number of uh data analysis uh techniques that are implemented in the ByteSweep platform.
So, the first one is that we're going to take any files from that data enrichment step that were determined to be either a binary executable or a uh shared object uh library file, and we're going to search for unsafe uh C functions. So, uh as an example, you got like your your string copies, your string cats that are going to be potential risks for buffer overflows, and then you have your system calls, your p opens that are going to be potential risks for uh command execution, right? The existence of these functions doesn't like necessarily mean that a vulnerability exists, but it's just telling you that there's a risk. And then if you're a pen tester using
this tool, you're going to use that as like a jumping off point to this is where I want to look inside of this binary to see if there is actually an exploitable vuln. Um so, to do all this, uh the platform uses radare2. If you haven't used radare2, it's a really awesome open source uh reverse engineering tool, and it also has a awesome Python library called r2pipe that lets me programmatically uh search through a binary for function references like this. So, for the next few analysis steps that the platform will uh do on all the files, uh I got to talk first about this regex search system that's implemented in ByteSweep. So, um So, basically the whole point of this
search system is to look for any kind of static string content that we deem interesting. Uh and so, uh these regex rules, they can kind of see an example of one at the bottom here, uh are all built into this YAML config file um and it lets users write new definitions for things that they want to look for inside the binary file or inside those strings, and then the platform will find those things. And then it'll tag it as a certain type of finding. So, in this case, uh we have a rule that is a regex expression for a standard like Unix shadow password hash. And so, it's marking it as a type of of
password. That's the like finding that it's going to that it's going to mark it as if it does find anything that matches this expression. Um and if you're interested in like contributing at to the platform, uh there's documentation up on GitLab to tell you how to create some of these more advanced regex rules. So, with that regex search system, uh right now the platform is looking for three big things. So, the first one is password hashes, so an example of that. Um and then uh things like hard-coded keys. So, you got your RSA, you know, public and private keys, open SSH keys, things like that. And then the other thing that it looks for is third-party
component strings. And so, uh a third-party component string is a string inside of an open source library that is going to indicate to us what its version is. And so, with these like regex rules, we can actually identify whether a given file that's been extracted from the the original artifact, whether it's a like you can determine if it is a certain program and version combination. And so, with those So, if you were here for the previous talk where Alan Freeman talked about software bill of materials, um this is what I'm hoping is like a start to moving in that direction to be able to automatically identify and build out this software bill of materials. And he
actually talked about a couple of like standard data types that I hadn't even heard of before the talk. So, uh definitely planning on implementing those as like an export, you know, feature uh inside my tool. Um and then uh the last thing uh that uh that or or the last analysis step uh that's taken by the platform, uh to understand that I got to explain these two different services that run inside the platform. The first one is the CVE fetch service. So, this is a service that syncs the National Vulnerability Databases uh JSON feed of every CVE that's ever existed down to a local database uh on the platform. And then the watchdog service, this this
service's job is to then cross-list those third-party component program names and version strings with all the data in the CVE database. And so, so this is an example of like a part of a CVE out of that JSON feed. So, we can see here uh under the product name, it's saying OpenSSL, so that's the name of the program or library. And then it's going to have a really well-structured way of saying which versions are affected by the CVE. So, with this uh version data here, we can then perform a cross-list of everything that we found inside of the firmware image. And the cool thing about this service is that it doesn't just do it the first time it
uploads it. So, if I upload a a piece of firmware today that isn't vulnerable to anything, but then a new CVE comes out, it's going to automatically update the list of CVEs that are affected. So, now I'm just going to give a little quick uh like high-level view of the architecture and some of the deployment methods, and then we're going to jump into a demo. Um So, yeah. So, the back end is all is all Postgres. Uh this is for uh some of its useful like JSONB data structures that are really nice for dealing with JSON. Um the web interface, like I said, is a Flask uh front end interface. And then you have those three
back end daemons that are also written in Python. Um the worker, again, so the worker's job is to do most of the data analysis. And then you have the last uh couple bits of it that are done by those other services. Um and for deployment options, uh so right now uh the build system for ByteSweep builds uh deb files that target uh Ubuntu Server 18.04, and it builds RPM installer files for CentOS 7, and then you can also uh I've got Docker images for everything. So, it can also be deployed uh fully via Docker. So, now I'm going to jump into demo of the tool. And so, uh this is the web interface that a um user of the system would
interact with. And so, they can either list, you know, previous artifacts uh that they that they've previously submitted, or they can add a new one. Uh this is what the upload an artifact interface looks like. It's pretty simple. Um we're just going to look at one that has already been analyzed for an unnamed uh camera root file system. And so, here we can see that it extracted a number of files out of that file system. Uh there's no encryption keys in this one. And then we got password hashes, the high-risk binaries, those are any binaries that we found one of those unsafe function references in. Um and then we have the third-party components and CVEs. So,
um we're just going to kind of show off the file explorer by clicking into this real quick. So, this is the original uh uploaded uh image. And so, it was called rootfs.bin. Um here we're actually seeing it's it's telling me that it's likely encrypted or compressed, right? Now, in this case, it's it's compressed cuz we were able to extract stuff out of it. Um but we can actually we can actually browse through the entire file system here and see what it was extracted. So, here it was able to extract the entire uh Linux file system uh for this device um out so that we can navigate through it. And then um So, here uh we can see that uh it it
did find uh password hash in here. Um turns out you can just like Google this password hash, and and I was able to crack it that way. Didn't even have to like run John or anything like that. So, that was nice. But also like let's say I did want to run this through John. All I All I'd have to do is hit download password and this string zero, so this is like this whole string here. That would download it into a text file for me that I could just like run straight against my password cracking software. So then to move on to high risk binaries. So here we'll just take a look at like one of
these binaries here. Here we can see that we found a reference to a sprintf call and it's located at this offset inside of the binary and it's inside of the main function. And so as like a pen tester again this doesn't mean that there's a vulnerability but maybe like I would want to go and like look closer at this function. Make sure things are being sanitized correctly that there's no user input being sent into that sprintf call. Things like that. And then and then if I wanted to like get that binary directly from this interface. So basically everything links back to the file explorer. So I can just click here and then I can download that that binary
straight from here. So then we'll go and let ahead and look at the third party components. So we identified three of these third party components. There's definitely more third party components on this device but they're just like they need to have regex rules built out for finding them. But here we'll kind of show off so it found a an instance of an mbed TLS library. So it found this string here inside of this shared object file and then it and then the regex rule was able to identify that this part was the version right there. So it was able to like parse out the version from just like the rest of that string. So from there
that's where the watchdog service comes in and is able to automatically identify CVEs that affect this given firmware image. So we just go ahead and look at one of these. Okay, so this CVE again it found it because of mbed TLS 260 and this is some you know remote code arbitrary you know code or denial of service buffer overflow thing. So it was able to find that just all based on that regex search system that is looking for those third party components. And yeah so just to give you an example of what it would do in the case that a firmware image was truly encrypted. So a lot of days a lot of times nowadays
you'll find that manufacturers are encrypting their firmware and then like the update process for when you upload the firmware to the device it will use some like key that's hardcoded into the device to decrypt it and then like install it on the device. And it's kind of like a sad day for us as like pen testers because then we have to like get on the actual device. We can't just download the file from the internet. So just to show you what it would look like. Right? So here I uploaded this like GPG encrypted file. Obviously it's not going to extract anything from here and you can kind of just ignore the two files extracted because it's really it's
counting like the the parent directory and then the file itself. But if I go ahead and I look at that it again it's going to give me that warning based on that Shannon entropy that it calculated that this is probably encrypted or compressed. And in this case if I uploaded a file and it wasn't able to extract anything then I can probably go ahead and assume that it's more likely encrypted. Right? So that's Oh yeah and then I did and then I also wanted to show an example of a key since that didn't show up in that exact root file system. So I've just got a number of tests here that I use to test the suite with. So
here is what it would look like if you found a key file again. So okay so I found this you know RSA public key that was hardcoded into the image and then I can even you know either go to the file or I can just directly download this key and then store it somewhere. So So that's my tool and so where I want to kind of take this tool in the future is one of the next things I want to do is I want to build a way that I can automatically based on all this data that I have kind of calculate some sort of relative risk score. It It'll It'll all be relative but if I
if you upload two pieces of firmware and it is able to extract them and but on one of them it finds you know you know 20 CVEs and another it finds you know only two or something like that. Well then we could get we could you know calculate some sort of relative risk based on that. And then I want to build build in exportable reporting. Like I said this is like an idea that you know I came up with as a result of Alan's talk last the last talk given in here. Want to definitely want to be able to export those software bill of materials. I think that'd be a useful feature. And then yeah just to just to keep on developing
bitwalk that fork of binwalk and fix bugs because there definitely are some like false positive bugs that the tool has sometimes and that will like cause the platform to crash if you upload one of those. And then just develop new extraction techniques and be able to support different types of compressed data. So with that said you can deploy this tool today at that link. If you have any questions about that I'd love to answer all your questions and you know just get any ideas that you guys have about how you could use this or how you would solve some of those problems that we talked about. Yeah. So have you considered making like a John plugin for simpler
Oh we got a mic. Sorry. Hi. So have you considered making like a John plugin for the simpler encryption schemes like I don't know DES crypt or anything like that. So it would be able to run quickly and you'd get a quick result. Yeah yeah I think you could definitely do that like with a limited word list. Yeah. Cuz I think in this example I think that password would have been in like a pretty easy word list. Yeah. No that's a great idea. Uh your pick. We'll we'll we'll get to both of you.
Oops. There you go. Have you thought about integrating with any other debugging systems like maybe IDA or Ghidra? Um I've definitely thought about Ghidra or Ghidra. I still don't know how to pronounce that. Um I've yeah exactly. Um I've thought about that. My hesitation at first was that I I was reading and I was just like scanning all of the issues that people were uploading in in the tool. And one of the things I that I thought I saw is that they only supported Python 2 and not 3. Um Oh. Yeah cuz it's using Jython but there's there's another project called that's on pip called Ghidra bridge that does some horrible like shim script that
deserializes Python objects over RPC so you can it's like some you know horrendous project but that if I was going to go down that path then I'm not saying that necessarily it's the best idea but that I'm definitely not Jython. Yeah yeah no I I've definitely thought about that. It was just like radare2 is the tool like I'm familiar with and I kind of started some of this development work before it came out. And so but I I would like to integrate it because there are some cases uh radare2 does pretty good on ARM MIPS it has a hard time with a lot of function references. So I've I've definitely like it it'll miss things on MIPS and there's
a lot of embedded MIPS devices out there. So about your regular expressions I'm just curious about your philosophy there whether you're looking for every string that might possibly hit or whether it might be useful to use something like a YARA approach where you can more uh uh place the strings and and look for combinations or or what? Yeah so I'm stuck with regex at this point. I would do totally be open to exploring something that works better. So what I didn't get into here and what's in that documentation file that regex documentation you know markdown file is that so what we saw up there was one of the like the like standard or a simple like
regex rule. So I built out the concept of like searching for a nearby string. So how much most of my regex like searches work is first it looks for something that identifies the program. So if you're looking for like an OpenSSL library right? First you want to identify that this thing is OpenSSL and then you want to find the version that's like in that thing in that binary somewhere. So I'd love I'd love to talk after if like you think YARA would be able to do that more efficiently. Yeah.
Uh so kind of following up on that question you mentioned focusing on regex rules to detect version numbers and things like that. Are there any plans to implement something like file hashes to detect dependencies and versions? Yeah so that was like my first thought but and this is this is something that that kind of came up in the in in the in the talk before this about the software bill of materials and like how do you identify a software component? And the problem you get into there is a lot of these embedded devices like they might have the same version of OpenSSL but they've compiled it themselves. And so any little difference in the compiler
environment could could cause that to yield a different hash. And so at that point you'd be like compiling be like in this like rat race of like trying to like compile a database of like every hash of every library that's ever been, you know, produced. Um But but there there could be a hybrid approach, right? Where you where you could use some well-known hashes, right? That might be used across a lot of devices and have that complement the Redgeck system. That'd be something to think about. Yeah. Any other questions? All right. Thanks. Think I got done SUPER EARLY. THANKS. THANKS MATT.