
Manohar Hussain Shehlia. Manohar, how are you? I'm fine. How are you? I'm very well, thank you. I'm very well. So, Manohar, you're going to be talking about reverse firmware, reverse engineering, a pragmatic approach. I like the sound of that. So if I can understand some of it, you'll know that you've definitely taught somebody. But Manavar, in the interest of time, I'm going to ask you to bring your presentation up and crack on with your work. MANAVAR KHAN: Sure. Sure. I have shared my screen. Let me know if you are more able to see it. There you go. MANAVAR KHAN: Yeah. Over to you. MANAVAR KHAN: OK. Thank you. Thank you. Hi everyone. The topic of
today's presentation is reverse engineering firmwares. I'm pretty sure you have heard this topic a lot of time, but today we will discuss some of the challenges that I face regularly in my work. I'll put together some things that can help you. Let's get started. I'm an IoT security researcher and My full-time responsibility involves security testing of devices that are out there, especially with our clients and help them to fix them and find appropriate solution for them. I have spoke and delivered training in various conferences. My primary interests are reverse engineering binary analysis and so on. If you want to contact me, you can contact me on Twitter. So giving you a little bit overview about today's discussion. So first
we will go into what is firmware. The primary thing about firmware is what operating system do they use and what are the different components involved in operating system. Then we will look at first category of the firmware which is full fledged operating system. and we will see what it is as we move forward. The next category is bare metal firmware. and we will look at some of the tools that you can use to reverse engineer this kind of firmware and what are the challenges will you face. Last but not the least is how do you first this firmware and how do you emulate this without having the actual device? Because buying device for each of the firmware you encounter will be little difficult. We'll look into how
to do that, how to do emulation and how to find security bugs. So if we talk about embedded system in general, there are different architecture and MIPS and different sort of CPU architecture. Some of the CPU will have a memory sophisticated memory management like x86. virtual paging some of them will not have, there are different RAM sizes, storage capabilities based on the complexity of the device and what all features it offers. So a simple device like weather monitoring system will have just some data reading and forwarding the data facility but other complicated systems like webcam will have many features like wireless network, camera attached to it and it will have facilities to store the video so all this makes the device very complicated. The next piece of
the a component of any device is the operating system in which the whole thing is programmed and how it is handling the whole data of the device. The other thing is the tool chain which is helping to compile the whole firmware and the device drivers and what SDKs do the vendors provide so that the developers can add more features to the device. So if you broadly look at the embedded operating system, then first category is full-fledged OS. This is not any official term as such, this name is invented by me. So in this category, this is a very rough category, in this category you have a whole operating system which is available to the developer of the device. You have complex system
calls which are available and whatever application that you program are based on the kernel and then the kernel is interfaced with the hardware. and the application program does not directly interact with the hardware. So the kernel is sitting between the application and the hardware to provide the necessary coordination and all the operating system functionality. Usually this kind of systems have a microprocessor and which are little complicated and feature rich On the other end of the spectrum is a bare metal operating system. So in this the application code that you write are directly talking to the hardware. Usually this type of system are not very complicated and it usually involves reading some sensor data and computing that
data on some communication interface. This type of system are usually the microcontroller. which are not based on very high processing frames. So let's look a little bit deeper into a full-fledged operating system, what they are. So a full-fledged operating system will have a bootloader, which is the first thing it starts when you turn on the device. Its responsibility is to bring more and more component of the operating system into running memory and start the operating system. One example of the bootloader will be U-Boot which is a popular bootloader in embedded space. Another variant is RedBoot and there are many other bootloaders out there. So the first, their responsibility is to load the kernel from any storage device. It can be flash memory or a USB
storage. It brings the kernel inside the RAM and hands over the execution response of the kernel. Now in this space, there are many operating system that you can use to make a device based on this. First of all, very popular choice is Linux. Other options are Windows and VxWorks is also very popular in real-time operating system scenario. There's another operating system, Cisco. Since these devices store the data in a very constrained environment, they usually use compression to store the operating system in storage. During the execution time, they decompress it and put it in the memory and start running it. Another important component of any operating system is the file system. Some of the popular choices in this space are squashfs, yfs2 which is
in Android, and there are many other options. CPIO archive is another option. Let's look at some of the analysis tool. BinBog is a very popular tool for extracting file system from the firmware. When you're doing any analysis for any the first thing which you try to do is extract the firmware. But the firmware does not have any standard format and it is usually many binary blocks are stuck together. So what the binwalk is really good at is finding those small, small pieces of binary files in big blob, which are part of the firmware. So what a binwalk will tear them apart and give you a very usable form. of the block has a huge signature library which it uses to do this and it will identify the
piece of binary section. It will chop it up and give it to you. So from that, what firmware modkit does is it gives you the capability to modify the firmware and repack it so you can patch it or make any changes and reflash the firmware on the device so you can change the behavior of the device. KEMU is a very popular tool among embedded developers, especially kernel developers to emulate a whole system. So we will look into what is emulation and virtualization when we discuss KEMU in more detail. Chilling is another very popular tool which has recently come up in the space. It is based, so it is a sub project out of Kimu called unicorn and based on unicorn the chilling is developed so it Kimu is basically
a tool to run any firmware but chilling is give you a sort of modification gives you a programmable way to run the Kimu so it's strengths lies in that another tool is reverse engineering of course you want to load the code of the binary and see what it is doing so Ghidra is a very good tool for doing that so talking about the key move is going to be very useful because you will be able to run architecture of different architecture in your machine without actually having the hardware so it's very important to know this technology uh another thing that comes up is what is emulation around virtualization so in virtualization you are running the same architecture on the machine operating system which is
on your machine. So if you have an operating system based on x86 and your machine is also based on x86 but when you do an emulation you are actually running a foreign architecture in your machine which is different from your machine. For example running an ARM binary on your x86 system. Qimoo is a very fast emulator. It has many features that makes it give reasonable performance. It allows you to add different devices and hardware. And how actually Kimu runs is it translates the guest instruction, for example, ARM into an intermediate representation. Then the intermediate representation is translated into host instruction. For example, it will translate ARM into intermediate representation. Then the translation will be done to your system, which will
be based on exit address and the code is executed. It supports many different CPU architecture ARM, MIPS, RISC. So it's a very great project to explore if you are into an embedded source. Another very good tool which I very frequently use to reverse engineer a binary is Ghidra. It is free and open source, which is a very good thing. You get a very nice decompiler which is free of cost if you go to other vendors then they are very expensive if you buy them it's written in java so it can work on many platforms mac windows or linux whatever you are working on it supports many different architecture ARM, MIPS, PowerPC there are some of the other reverse engineering tools in the
market which won't give you this architecture in their free version but Ghidra has many of the foreign architecture which are widely found for free. So Ghidra mainly is for binary analysis, it has lot of plugins so the architecture is also very good because most of the code of the functionality which you use in the Ghidra is sort of a plugin and you can extend it very easily and the code is very well separated in the form of APIs so which you can use. You can also write you can write a plugin in Java and Python so which is a very good option. So let's look at the overall how do you use this all the tools which are just with
Binwalk you initially have the binary, you take the binwalk, you take the binary and take apart different binaries in the main binary block. Then you try to run something in QEMU and see if If you can understand the binary behavior, what it is doing, what all hardware device it is communicating with. Chilling and Kimu are sort of in the same league. They help you to achieve the same objective, which is do a dynamic analysis. Ghidra will help you to do static analysis, basically understand the behavior. So then you can change your input and rerun that binary in Kimu. So enough of the theory, let me give you an example so that you can get an idea of what I
am talking about. So let's take an example of a D-Link firmware. So there is a firmware AC1200. We will take an example of a firmware DIR825 AC1200. So this is the form where which you can download from the D-Link website. If you go to the product page, it will direct you to a link. And if you change the links to one part back, then you can find all the different version of this form. So I managed to download the firmware versions. So let's look. So if you go and download all this firmware, then you'll find that different version. At least there were nine different version when I saw this. And if I looked at the recent version of the firmware,
it was encrypted. So let's try to do a binwalk. So these are the different versions which are separated out. And if you look at the latest version, it is encrypted. I guess this is the decrypted one. If you extract it and if you try it, if you see this, this firmware is encrypted and binwalk is not able to identify any of these things. uh i did a little bit analysis and uh what came to my mind is that this is a different firmware and in this uh from when skipping updating from one version to another uh the firmware version after this were all encrypted and there was some encryption routine which was introduced in this uh version so i
did a little bit of research and I came across one of the code inside this version of the firmware which had the encryption routine. So let me share that code. It was a PHP code which was running. So it had different functionality which was getting triggered. If you see the downloading of the firmware, it is downloading the firmware somewhere in here. So basically it was downloading the firmware and executing a command. which was basically decrypting the firmware and this is download and this is running the firmware. So which was basically downloading a firmware and decrypting it. So the decryption binary was inside the file system. So if we go inside that file system, then you come across this
binary called ENCIMG. So this is a MIPS binary and it is basically encrypting and decrypting the image. So let's try to decrypt this and see if we are able to decrypt the firmware. Let's try to run this using Kimu maps and Kimu has different option if you see if you do the help. So I'm going to summarize that if you use this option, you can find a set the root part of the operating system, which will load all the dependencies that the binary needs. So this is the root OS. and we will run the binary. So you can see that we are able to run this MIPS binary in our operating system and it shows that it has input image, output
image, decode and file. So let we are sure that we can decrypt our image with this. So let's try to see what what is that image and we see that
This is that image file. We are not able to, binwalk is not able to decrypt it. So let's see if we can do it with this. So the command line is input and the key is again, this key was hardcoded inside the image and D for decoding it. And if we run binwalk again, then we can see it was able to decode. So basically,
Let's get back to the slide. So basically you can use a combination of these two things, a binwalk and Ghidra. I was able to reverse engineer this algorithm, what it is using, what all options it is using, using Ghidra. And it was a very painstaking reverse engineering. So I've summarized it for you. And the binwalk was able to later extract the same firmware. So there are many challenges in this. When you analyze the firmware, you find a lot of proprietary file format which is specific to vendors and they have not published anything about that. So that becomes a challenge. Sometimes they modify the compression algorithm which they are using, which again becomes a challenge to open or decompress the files. There are some hardware specific initialization. For example, if
you run a particular web server in your binary, in your firmware then the that server first looks for some hard-coded value inside storage devices only then it starts the server which again becomes a problem when you are emulating that hardware with kmo firmware encryption is again we just saw the challenge that how encryption is also uh avoids the attackers to analyze the binary so encryption is also sort of different from the vendor to vendor. Then another basic but important challenge is instruction set, what architecture that the firmware is based on so it becomes difficult to analyze. So now let's look at the bare metal firmware. In the bare metal firmware, There is no kernel sitting between the application
and the hardware. The application code which is developed directly changes the hardware. There are toolchains and API SDKs which are released by the vendor and there are a lot of vendors in this case. Each vendor is giving SDKs for their own for their own development tools so that they can attract the developers and the rate at which the hardware are launched and changed are very frequent so the security review process of this thing are very questionable. So in this space people have came up with a solution where there is a hardware abstraction layer, APIs which are developed and your application directly interacts with those hardware extraction APIs and which in turn interact with your hardware. One very popular example of this is
FreeRTOS and EmbedOS. These are real-time operating systems for microcontroller environment and they have many APIs to interact with the hardware and they are sort of abstracted out and so you can effectively program on this platform from developer program for this platform and those all the APIs will run for all the hardware. So usually if you have tried your hands on something like Arduino you write application code which is directly then controlling your timer, processor interrupt, you can program any of this part directly from your core and usually how these devices communicate with other hardware is they use memory mapped IO. So what you do is you write specific value to specific part of the memory and that changes the behavior of other hardware that
are connected to your main microcontroller for example your GPIO pin if you want to put up the pin or pull down the pin you write various values to specific address and you can see the change of behavior so here is an example of a data sheet you can find all this information in a data sheet and one example of the data sheet is here So if you want to communicate with the Ethernet MAC address, so this is the address range in which you have to write the data to. And if you there are many, many such devices, like, for example, one is the GPIO. IHG and there's one device RCC. So then if you want to further change the behavior of RCC, then this is
the expanded view of that particular register. And if you write different value to different bits, you can see the different behavior. So let's take an example of weather monitoring system in which you are reading a data sensor from your temperature data and you're sending the data over into Wi-Fi. So your code is continuously in loop and reading the temperature and sending the data on WiFi. If we have taken this from a hardware abstracted out API perspective, then it would be something like this. You have a hardware abstraction API which is reading the data and which have another hardware abstraction layer APIs which is sending the data to the WiFi. So this is important because we will see how this thing,
the hardware abstraction layer APIs can help us to hook these functions and we can emulate this sort of system without actually the need of the while this hardware. So one such example of these abstractions are this is a UART manipulating APIs from the a framework lib open C3, which you can find on this URL. You can change various bot rate of the UART device using this API. So let's look at what you can do to do static analysis of the firmware. So as you see, the static analysis help you to gain understanding of the firmware using just by reading the code and finding certain pattern in your code execution. So the tools which are very popular in this space are Ghidra, AIDA, RedRA. So Ghidra as we have
discussed previously is a reverse engineering tool. So in Ghidra, the reason for using Ghidra is it also helps you to define, gives you a feature to define memory map. of the device and which we saw previously that memory map I/O. So we can define those memory map regions in the Ghidra memory map and you can see how the device is interacting. So one example would be this. This is a code, one of the microcontroller I compiled the code for that and took out the binary and loaded that in Ghidra. The microcontroller initially just mapped the flash memory and the RAM to the device and I loaded the code and I was able to decompile certain code but I was getting illegal access. I
was pretty sure that this code is trying to access some hardware peripherals using memory mapped IO. So I defined different region of the memory map specifying what all region is what device and this way I know what it is interacting with. So now that same code has resolved that it is it was interacting with UART. So now I know that my device is interacting with UART. Then there's some manipulation going on there. So I can write some data to the device UART and see if it is changing the behavior. So there's a very nice Ghidra plugin for doing that SPD loader. And if you want those challenge of the binary, which I just wrote in either, you can download it from this URL. So
for dynamic analysis, we just saw that just not reading the code is not enough. We also need some facility to run that code and see if we can debug or change the behavior of the device or the behavior of the input, change certain input and see if there is any change in execution behavior of the firmware. So this can be done with a key move, but you have to do that, uh, modify key and write certain code. Another option is doing it on engine. You want to basically give you a PS to loads a certain memory section of that with code and execute whatever you want. You can even say the registers and read the
execute, read the output of each individual instruction and monitor the state of the execution. It is very good for our purpose. For example, Unicorn also has API to hook certain section of the code. If your execution is going to certain memory address, you can write the code so that that memory address behavior control execution execution capability is passed to your code and you can change the behavior of the whole emulation engine and see what are the changes have been made. This API hooks can be used to collect the feedback, what all execution path were taken and you can record that data and you can get the coverage. Using this API, you can also fuzz firmware and keep collecting
the keep collecting the execution coverage and see if it is improving it in any manner. You can integrate this thing with AFL and find the bugs if you want. So emulation is also a way to go because with this you can parallelize 100 test cases, run as many test cases in parallel you want and on whatever powerful machine you have. But buying 100 device or 200 device to test all those parallel test cases which becomes difficult. So using this emulation approach is a good way to go with this. So you can also test only certain section of the firmware without actually running the whole firmware. and this gives you a partial execution capability. So you are only
targeting the part of the firmware which is dealing with data parsing. So some of the challenges are there are no uniform binary format. So you have to do a lot of reverse engineering of the firmware to actually understand what the firmware is. and what all data section you need to load and where all the data is kept. So this is usually found from the data sheets. You need to get the data sheet and find what is the initial execution point. Identifying instructions that also become difficult. So if you just have the binary block without knowing the device, it is very difficult to guess the instruction set. on both CPU instructions that it is compiled for and different SoCs. Different firmware will have a different memory mapping,
so you need to have the data sheet for that to get that. Another important point is identifying the entry point. This helps you in disassembling the code, otherwise your disassembler will not be able to figure out how to disassemble the code. Instructions at routine is basically Whenever your device is getting interrupts, it is switching from various privileged states. So ISR is sort of that gateway between your one privileged state to another. It defines what all different entry points are there for those privileged states. So identifying this table is also important. Sometimes it is easy and other times you won't be able to find it. So thank you for listening to me. And if you have any questions, I'm happy to take them. Thank you, Manavar.
Thank you very much indeed. Fascinating. I'm afraid we don't have any questions yet. Come on, folks. Get asking those questions. I think they're on the... the right hand side is that is that the correct side i'm not sure uh but on the right hand side therefore you're uh in the comments field and ask some questions uh there but uh in the meantime what what made you even want to get into you know looking into a firmware analysis of this kind so the interesting thing about this uh this firmware is you can modify this to be a a software and you can see some hardware changes that are there in the device but you can change your fridge into doing some weird stuff uh changing uh temperature
of your fridge sorry turn your fridge into an oven Yes. That's a very interesting use case. Kind of. Yeah. If you want to freeze your food or if you don't want to, if you want the report of your food on your phone, you can do that. Yeah. You can change the behavior of your... So when you go into market you buy the wireless devices, wireless cameras and sometimes they have a very good processor and they're not fully utilizing that so you can install additional software in that you can change the security of that devices. Get your light bulbs to mine bitcoin for you. Exactly, exactly. Some malicious purpose. If you're rolling then... Yeah, exactly. I
know there's quite a few companies, their routers, for instance, are very good for breaking the firmware on and reinstalling additional functionality onto the hardware. Is it D-Link or TP-Link? I can't remember. So there was recently I came across a blog where the guy had modified the firmware to use the radio of the router as a frequency analyzer. So to read the radio. read the radio signal around it other than Wi-Fi and what it was supporting. So we repurposed the hardware some other way. Yeah, another functionality like allowing for a VPN server on there to come in from outside, all that sort of stuff. I'm not sure what it does to the warranty. Vendors don't like that. No, no, that's right.
People picking into your device. Especially Apple, especially Apple, the whole jailbreaking thing. That is what Richard Stallman is fighting for. Yeah, yeah, absolutely. Absolutely. Well, the right to do it, but not necessarily to maintain your warranty, I guess, is the key difference. Manuwa, thank you so much for your time and for your contribution and knowledge. You get another virtual round of applause. Thank you very much. Thank you.