Large-scale Security Analysis of IoT Firmware

Name: Large-scale Security Analysis of IoT Firmware
Uploaded: 2022-03-08
Duration: 37 min 22 s
Description: This presentation was held at #BSidesBUD2021 virtual IT security conference on 27th May 2021. Large-scale Security Analysis of IoT Firmware - a presentation by Daniel Nussko Today, the number of IoT devices in both the private and corporate sectors are steadily increasing. IoT devices like IP came

BSides Budabest · 202137:22134 viewsPublished 2022-03Watch on YouTube ↗

Speakers

Daniel Nussko

Tags

CategoryResearch

ResearchEmpirical Research Technical Deep-dives

StyleTalk

Mentioned in this talk

Tools used

Binwalk

Platforms

MongoDB

Frameworks

Scrapy

About this talk

This presentation was held at #BSidesBUD2021 virtual IT security conference on 27th May 2021. Large-scale Security Analysis of IoT Firmware - a presentation by Daniel Nussko Today, the number of IoT devices in both the private and corporate sectors are steadily increasing. IoT devices like IP cameras, routers, printers, and IP phones have become ubiquitous in our modern homes and enterprises. To evaluate the security of these devices, a security analysis has to be performed for every single device. Since manual analysis of a device and reverse engineering of a firmware image is very time-consuming, this is not practicable for large-scale analysis. To be able to conduct a large-scale study on the security of embedded network devices, an approach was applied that allows a high number of firmware images to be statically analyzed. For data acquisition, a crawler was used to identify and retrieve publicly available firmware images from the Internet. In this way, more than 10,000 individual firmware images have been collected. The firmware was then automatically unpacked and analyzed regarding security-relevant aspects. For the first time, this research provides insights into the distribution of outdated and vulnerable software components used in IoT firmware. Furthermore, a comprehensive picture of the use of compiler-based exploit mitigation mechanisms in applications and libraries is given. Factory default accounts were identified, and their passwords recovered as far as possible. Also, a large amount of cryptographic material was extracted and analyzed. Besides, a backdoor has been discovered in the firmware of several products that allows remote access to the devices via SSH after triggering the functionality. The backdoor has been verified and confirmed by the vendor and two official CVE numbers have been assigned. The results of this large-scale analysis provide an interesting overview of the security of IoT devices from 20 different manufacturers. IoT firmware was analyzed regardless of device type or architecture and a broad picture of their security level was obtained. https://bsidesbud.com All rights reserved.

Show transcript [en]

hello and welcome to my talk today i want to talk about the security of iot fimber i conducted a large scale security analysis on firmware and in this talk i want to give you a high level overview about what i've done and my results a few words about me my name is daniel nusco i'm from germany and working as a penetration tester so my job is to discover vulnerabilities in corporate networks and applications i have a special interest in the field of iot security that's why i conducted a large scale static analysis on fimber and that's my topic for today before we start a short introduction why are it devices interesting and especially why should we care about the

security of them well today we have more and more connected devices um in our homes as well as incorporate networks for example we have home automation systems which can be managed by a smartphone um we have smart speakers that are connected to our wi-fi and of course we also have a lot of regular internet devices like a router it's a violence camera a wipe phone and so on and so on but we also have more and more industrial devices um like industrial control systems to control machines for example um we have robotics or solar power plants and all of these devices are connected to the network to monitor and manage them so you see iot devices are more and more

common and more and more widespread today we have about 36 billion devices worldwide and the number of devices rises from year to year so in 2025 we expect to have more than 75 billion iot devices and a lot of them are connected to our internet so what does this mean let's take a look in the past in 2016 we had about 18 billion devices and in the same year the mirei.net compromised about 600 000 iot devices these devices were responsible for large distributed denial of service attacks for example one of these attacks hit the internet service provider of liberia and this attack was so heavy that the internet connectivity of the whole country was temporarily interrupted

so according to the statistics we will have four times as much devices in 2025 as in 2016 and additionally you have to know that the original version of the murai botnet just used a simple list of default credentials and this default credentials yes was used to get access to these devices and i assume that much more devices can be compromised by using publicly available exploits so we have more and more devices all of these devices are connected to our network all of them have interfaces to manage them for example a web interface some of them also provide other network services which all pose an attack surface

the security of iot device is known to be poor but what are the reasons for such bad product security wireless devices are often produced as cheap as possible there's a hard competition between the vendors and product security costs money and yes it's not the vendor's first priority in addition these devices are very diverse and very individual um not only hardware but also the software a lot of software components are developed by the vendor itself which increases the risks of design flaws compared to well-known and standardized components and iot devices have a short life cycle i will come to that point later at the end of my talk okay before we start diving into a large

scale analysis i want to share my experience from a penetration test i conducted a penetration test of a security camera and now i want to mention three of the identified vulnerabilities first a buffer overflow which was located in the web interface of the device and a denial of service vulnerability also affecting the web interface and also here in this case um the web server binary was self-developed by the vendor and i've seen this many times before that vulnerabilities are often located in self-developed components i've identified a debug interface that can be accessed over the network and yes this debug interface is yeah enabled by default and allows to create memory dumps and yeah you can

reach this debug interface over the network so everybody is able to access it the interesting thing about that ip camera was that this device was white labeled so on the camera there was a company logo of a german vendor but in fact this device was produced by a chinese manufacturer and a little research showed that this manufacturer is the second largest manufacturer of ip cameras in the world and a little more research showed that iot devices from this manufacturer were significantly evolved into the mirai botnet so it seems that the circumstance that this device was white labeled and at the end sold by another vendor lead to several problems first when i informed the vendor about the

debugging interface it turned out that the vendor didn't even know about that feature and second it turned out that some of the identified vulnerabilities are already known by the chinese manufacturer and are also fixed in the newer firmware version so someone took a closer look on this device discovered vulnerabilities reported them to the manufacturer and the manufacturer released a new firmware version where this vulnerabilities have been fixed so this is the best case you can get and even here the device yeah stays vulnerable because we are of communication problems between the original manufacturer of the device and the vendor who sells this product in germany so the vendor didn't even know about the firmware and shipped their products with

the old and vulnerable firmware versions so this example shows that there can be many many different reasons that lead to bad product security um not only programming mistakes

okay since we cannot rely on the product security we need to evaluate the security of these devices and therefore several approaches exist we can perform a penetration test here we look at the device and its interfaces while it's in operation for that of course we need a physical device but we can analyze the device in depth and so we can get very very accurate results because we avoid false positives by verifying vulnerability directly on the device but um this approach simply does not scale so it's not practical for a large scale analysis we also have the possibility to emulate a firmware we can use qemo for the purpose but this is hard to apply because the

hardware of iot devices is very diverse usually if you try to dynamically execute a firmware it is checked if the peripherals are connected and if they are working um the device may have a sensor to collect data or a camera or a physical button um which we don't have when we emulate them there also have been attempts to emulate only single parts of the firmware for example emulating only the web interface and this approach allows to run common vulnerability scanners um on the web interface although the device is not physically present um but the setup here is really complex and it's very difficult to automate this process for a large-scale analysis this is the reason why i chose a static

analysis in this approach we unpacked the file system and the binaries and then we analyze them and of course this approach is much more limited um because we don't have the device up and running and yeah we cannot communicate with network services of the device um but it's scalable and the whole process can be automated to analyze a large amount of humor images the aim of this research was to obtain a high level overview about the security level of firmware and therefore a large number of fimber files from different vendors and also different device types was collected and due to the large number of firmware files it was necessary to fully automate the process of unpacking and analyzes

so at the end every firmware file was analyzed with regard to several topics first of all the name and version information of common binaries and common system libraries have been identified so in the first step the file type was identified and in case of an executable file like an elf file um printable strings were extracted and after that these strings were analyzed using regular expressions for typical version information so the name of typical software components like openssl busybox or openssh along with the numerical software version was extracted from the binary and in this way it was identified which software has been used and in which version so for that i use regular expressions in combination

with yara files yara is a signature format which was originally developed for malware detection and was used for antivirus software then i checked every binary file for the use of compiler based exploit mitigations um to give an example when compiling a program with gcc uh you can set different flags to enable exploit mitigations um like stack smashing protection or nx protection and yeah this makes exploitation of buffer overflow vulnerabilities much more difficult another aspect was the analysis of default user accounts um in this case the passwd file was analyzed as well as the shadow file and password hashes from shadow files have been extracted and after that a dictionary attack was conducted to recover as much plain text

passwords as possible and in the last step cryptographic material was identified such as certificates private and public keys and this was also implemented by signatures in error format so how do we perform a large scale analysis there are several tasks to do first of all we need to collect firmware images and after that we need to unpack the firmware file in order to access the file and directory structure of them when this is done we are ready to start our analyzers processes for every file and a set of checks is performed for every file so we check the file type and in case of an executable file we analyze if exploit mitigations are implemented we scan the file for version strings and

also for cryptographic keys and certificates all results of these analyzers are stored in a database in addition to the results i also collected some metadata of all files um which i also store in the database for example i created a hash value of each extracted file and yes this allows to analyze if a file was also present in another firmware image so in case we have a finding let's say hard coded certificate we can use the hash value of this file and search in our database if this specific certificate is also used in another firmware when our automated analysis finished we can use our database to create statistics about our results or to search for files inside a specific

image and to manually analyze them all right let's take a look on the architecture of my analysis environment during my first tests it got clear that yeah the analyzers has high system requirements a large number of files are examined in parallel and therefore uh the analyzer scales with the number of available cpu cores and system memory and this was the reason why i decided to uh yeah to run this analysis um in the cloud environment the analyzer itself was implemented in python using the effect framework and the fact framework is built modular so it's easy to create custom modules in python so as you see we have different components um we have the firmware analysis server here we unpack

the firmware images and analyze the files all unpacked files are stored on a virtual storage um we keep all unpacked files since we also may want to analyze the manual afterwards and yeah we also have a database server this database server is based on mongodb and here we store all our results all these components are located in the cloud i've also developed a job scheduler that uploads ephemera file to the analyzer server in the cloud environment and also monitors all tasks on the server this is done by a rest interface

where did i get the firmware files from i downloaded the firmware files from manufacturers websites or ftp servers um thereby i only collected firmware images that are directly provided by the manufacturer um doing so i can be sure that i analyze the original firmware from the vendor and not any other image which might be customized by someone else to automate the process of downloading fema files i developed a crawler i developed this crawler in python based on the scrappy framework and in this way i crawled the download pages of 20 vendors and downloaded about 10 000 firmware files so i manually searched for the download portal of a vendor and then i crawled the portal for firmware

files with my python crawler together with the firmware files i also stirred some metadata about images for example i generated a hash value of each femur file to avoid duplicates and stored the url where i downloaded file from and as you can see on this pie chart um most of the firmware files are for routers followed by security cameras printers switches and voip phones and a smaller number of images are for nas systems smart speakers wifi repeaters wipe gateways photovoltaic systems smart locks smart plugs and power line adapters so as you can see a wide range of device types has been analyzed so let's talk about the unpacking of the firmware most of the devices i analyzed have a

full operating system so you have a kernel you have a user space and the file system and yeah the file the firmware files are often packed in multiple layers and different file formats so the challenge is to handle a wide range of file and archive formats and to unpack them in an automated way layer by layer manufacturers often provide firmware images in an archive file and after this archive file is unpacked you often get the binary blob and to extract files from this binary blob we use so called file carving this means we search for common file signatures and magic bytes within this binary blob and then try to carve and to extract files inside

one very good tool for this is spin walk which i also used in this case in my analysis bin work is a well-known tool um which is very popular for fimber reverse engineering so yeah this was the tool i chose for this okay let's take a look on the results and start with some general observations the results show that the majority of the analyzed firmware images are based on a linux kernel so in 88 of all firmware files a linux kernel was identified um with about six percent threadx is used which is a proprietary operating system often used for consumer electronics and in two percent of the firmware files um open wrt is used this is a linux based operating system

used for routers then we have a small number of devices which use vxworks uh windriver linux and lynx os on the right side you can see statistics about the identified cpu architecture um most of the devices are based on mips and an arm cpu uh this is very very common fire for iot devices but we also have some yeah architectures that are not that common like m68k from motorola or superh which is a microcontroller from hitachi

this chart illustrates a number of times a specific software component was identified among all firmware images the most commonly identified software component is open ssl followed by busybox which is very common in embedded linux this is a single executable file which combines a set of command line tools we also see a lot of binaries that provide common network services like udhcp openssh dns mask and dropper ssh and for all these programs also their version has been identified in this way we got an overview of the use of outdated software components

okay here you can see the version landscape of openssl as an example so we see how often a specific version has been identified actually this is not a complete list as yeah the complete list would be much too long these are only the most frequently used versions and the extracted version information was then compared with the data of the national vulnerability database from the nist institute and in this way for each version already known vulnerabilities and their cve numbers have been identified as example the bar graph on the left shows the number of vulnerabilities according to the css score um in this example for open ssh 0.9.8 set and in this way we can find out vulnerabilities which

affect this version and also that severity just some interesting statistics regarding three famous vulnerabilities in open ssl um according to their version information about 75 percent are affected by the freak vulnerability about fifty percent are affected by poodle and six percent by the hard plate vulnerability here open ssl is just an example um i created such kind of statistics for all software components also for the identified linux kernel versions and the results show that two or three kernel versions are older than 10 years and more than 90 percent of all linux kernels are already end of life and the oldest kernel version i found was from 1997. when i crawled the download portals of course also film images of old devices

have been downloaded so uh it was not possible to distinguish between old and new devices um this is something you have to do manually you have to look for the release date of the firmware on the website of the vendor or for example take a look in the changelog of the firmware you cannot do that in an automated way that's why i did some manual research and compared the release dates of some single devices with their kernel version and the resource differed very strongly from manufacturer to manufacturer and it turned out that for example at product launch of some enterprise voip telephones a kernel version was in use that was already 10 years old so even current high-end phones which

are dedicated to the enterprise market are sold with a strongly outdated linux kernel

let's take a look on binary hardening statistics all elf executables and libraries are checked for the use of common exploit mitigations so when you compile a binary you can set compiler flags to make use of them and to make exploitation much more difficult and here you can see the results when compiling a program with gcc nx the no execution bit is set by default um i think this explains why 90 of all analyzed binaries make use of nx as far as i know all other protection mechanisms like stack canaries railroad and fortify source must be explicitly enabled during compilation and i think this is the reason why they are not that present in most of the

executable files okay as already mentioned i also analyzed the past w2 and shadow files this chart here shows the number of users which are allowed to log into the operating system and for which a hard coded password is set in the shadow file so all of these users here yes can be used to log into the device several of them show typical usernames for factory default credentials such as root and admin or user and guest but you also see some users with cryptic names and in the past we've seen several cases where such cryptic usernames um yes have been identified as undocumented static user account so in case an ssh or a telnet service is

running um these accounts can be used for remote access to the device some of these cryptic usernames in this list here are already known and you can find cve numbers about them but others are still unknown so static user accounts especially when they are undocumented are a real problem from all shadow files i extracted the password hashes i performed the dictionary attack on them to recover as much passwords as possible and the results showed that pass was the most common hard coded password followed by one two three four and an empty password and all in all 68 percent of all hard hard-coded passwords um have been recovered the mri.net used a total of 62 predefined combinations of factory

default usernames and passwords and 12 of them were also identified here in this research please keep in mind these are the results of a static analysis we don't even know if there is yes a service running that could allow user login and in addition we cannot know what happens at runtime um so this is like a snapshot of the device software um which we shot before the first boot of the operating system and maybe after the first boot process or during the first boot process the user is forced to change the default credentials so this is one of the disadvantages of a static analysis we have here as part of the analysis also cryptographic material has been

extracted um this table here uh lists a number of cryptographic keys by their type most of them are tls certificates among them are a lot of root certificates um but also hard-coded and self-send certificates also a lot of rsa keys some of them belong to their corresponding tls certificate others are used for ssh and regarding ssh most of the rsa keys are identified as ssh host keys and a few public keys have been identified which are used for key-based authentication but i will come to that point later so what's the problem with hard-coded keys let's have a look on the principle of asymmetric encryption on our device we generate an individual pair of keys let's say rsa keys so now

we have two different keys a private key which we keep secret on the device and the public key which we provide to our clients and our client is now able to encrypt a message using this public key but the only one who is able to decrypt this message is the owner of the private key so the security is based on keeping the private key secret and in case of a hard coded certificate um all device with that firmware use the same public and same private key so for example the device may use these keys for https communication um due with the web interface and in this case an attacker could easily decrypt the tls traffic by just

extracting the private key from the firmware and yes the firmware is publicly available on the internet so extracting the private key shouldn't be a problem in many firmware files i've seen hard coded certificates and hard code sh host keys but also here we cannot absolutely rely on the results because the device may regenerate a key pair during the first boot process and those overwrite the hard-coded key pair and we simply cannot verify that with a static analysis in addition to that hard-coded tls keys can be used to yes to identify the ip address of public public reachable devices here you can see an example where you show them to search for iot devices by a certificate

fingerprint and as this certificate is hard coded all devices use the same certificate with the same fingerprint that means that i can use this approach to just to identify the app addresses of all internet reachable devices and this can be used to conduct very very targeted attacks on specific device models for example to connect to these devices with an hard-coded password that i've extracted before from the firmware or any any other attack

after the automated analyzer is finished i had a large database of extracted files and a lot of metadata about them i use this database for further manual analysis for example i use the metadata of all identified rsa keys to search for the filename authorized keys this file allows key base identification so the owner of the private key is able to log into the device in case an ssh service is running in firmware images of four different routers i've identified an authorized keys file three of them offer an ssh service according to the user manual so i contacted the vendor and described the problem but the vendor rejected that and claimed that key based authentication would be

disabled a little research showed that for dropbear um which was used in this case for ssh you cannot do that by configuration you have to compile the executable um explicitly without keypad identification so i did that and yeah compared the binary i've extracted with the compiled one and look for the function call used for key based authentication and i also found it index in the extracted drop executable so i'm pretty sure that key based identification is possible and that the owner of the private key is able to log into the device but i cannot prove that um because i don't have access to one of these devices so yes this was really frustrating for me

okay i also analyzed all extracted files with regard to yeah let's say suspicious strings and uh thereby i found a cgi script um which is part of the web interface um i've identified this file in firmware files of um nine different devices um and yes when this cgi script is called a telnet service is started and a new user is created and this script was identified in different versions in a newer version an ssh service is started instead of telnet and the name of the created user account is nsa rescue angel and all of the affected products are network attached storage systems and most of them are yeah from the vendors product line called nsa so i

think that lead to the username nsa rescue angel i reported that to the vendor and the vendor confirmed the backdoor and stated that yeah yeah this function was used for troubleshooting during development for four of the affected devices a new firmware version was released and the other five products will not receive a security patch as the support period already ended for them and yes this is due to the short product life cycle of iot devices so in this case they will stay vulnerable forever [Music] okay i also want to mention that i've identified some let's say unusual software components in device firmware for example tcp dump um tcp dump is a tool which allows to capture

network traffic um i found tcp dump in 500 film images of eight different vendors mainly in firmware files of routers and web phones and in case of a router i could imagine that capturing network traffic might be an helpful feature for troubleshooting for example but i don't know why you should need that for a vip phone in addition i've identified gdp gdp is a debugger used for runtime analysis of executables i was surprised how often i found gdp in firmware files all in all 861 times from nine different manufacturers and yeah gdp was mainly found in firmware images of security cameras routers and switches and it seems that yeah developers forgot to remove these binaries

but it's definitely not a good practice to leave that components in the image and to ship the products with them to the customer now we come to the end of this presentation to sum up we have seen that it's possible to automate the process of firmware analysis in this research a total of 10 000 firmware files were automatically unpacked and analyzed and the results show that yeah there are a lot of best practice violations vendors use hard-coded credentials hard-coded keys they forget to remove ssh keys and they even ship their products with a back door a lot of software components are outdated we've seen that more than 90 of the used kernel versions are out of

support and are therefore affected by publicly known vulnerabilities and we've seen that vendors do not make a good job in binary hardening a lot of compiler-based exploit mitigations are simply not used thank you for your listening i think we now have a few minutes left to answer your questions thank you very much

Large-scale Security Analysis of IoT Firmware

Related talks