← All talks

Minerva_lib

BSides Warsaw · 201835:03401 viewsPublished 2018-10Watch on YouTube ↗
Speakers
Tags
Mentioned in this talk
About this talk
Mateusz Kocielski presents Minerva_lib, a universal fuzzer written in C for finding errors in C libraries and programs through symbolic execution and SMT solving. The tool uses a simple yet effective algorithm that generates test cases by chaining API function calls, automatically discovering crashes in real-world libraries like OpenSSL and PHP. The talk demonstrates practical fuzzing techniques with live examples ranging from toy programs to complex cryptographic libraries, emphasizing the importance of sanitizers and minimization in vulnerability discovery.
Show transcript [en]

Thank you. The topic is complicated, it was a bit dark in the agenda, because I will be talking about SMT, solvers, and other things like symbolic execution, second-row logic, node theory, etc. to look for errors in programming. But first I would like to I would like to thank the organizers. Although the organizers have changed in this edition, but as you can see, they have stood on the same level as the tasks, so a big round of applause for them. And I have a question for the audience right away. There are three people who have performed in all editions. One person for some reason is me. However, does anyone know these two others? Other people. Not Boris. Who? Mac,

yes. Mac was present. Big applause for Mac. And there is someone else. Adam. Big applause for Adam. You know what Mateusz, I'll interrupt you a bit. Yesterday evening I found out about this institution and I'm helping you organize SBS for the next 10 years. Okay. It will probably end up with us breaking someone's legs and he won't come. But maybe legs... OK, maybe that's a bad idea. But we'll come up with something. I will briefly introduce myself. I work at Logical Trust as a pentester. Apart from that, I also deal with open source, especially locating errors in all kinds of open projects. And that's what I'm going to talk about today. And the short story behind this presentation is that In

2015, me and my colleagues did a general presentation about fuzzing. We mentioned that we have our own methods and programs for torturing programming. We promised to take something out of this bag and show it to the world. We did it, but we did it at a closed conference, so nobody knew about it. This time we decided to do it at an open conference. The topic is not new. If someone had a chance to be at my presentations, we came up with a way to test programming. We applied this method in many places. We started with PHP and we ended with what I will tell you about in a moment. In the meantime, I am going through JavaScript,

various servers and we managed to find quite a few errors. For example, two years ago, when we launched our We found over 100 unique crushes in PHP and HipHopVM. And what's interesting, some of these crushes, I don't know if you realize, but Facebook The program also includes hip-hop, so we even paid for some of it. So all we had to do was download the program from the internet and run it. This time I want to present a different program, a relatively universal fuzzer for testing various types of libraries and more. It is written in C and is more suitable for torturing other programs written in C, The code is available for two years now, but as I mentioned before, not many people

knew about it. I don't know if there are hardcore fans of Polish YouTubers here. The question is: almost every video on YouTube starts with someone saying "Hi, I'm Adam" and Sub and like this video. So, on GitHub, you can manage and fork this project. As long as it's possible. This project also works on all Unix mappings. This list is not necessarily up to date. How to make it disappear? I'll click here. That was a mistake. Oh, these Macs. This list doesn't work right now, so please try it for yourself. The idea is to make something that is easy to apply to all programs. But first I will tell you about how Here is an algorithm, but I don't want to explain it, because no one will

understand it quickly. I will describe it on an example. The idea is that we have a collection of functions in API. There are functions a, b, c and d. For example, function b collects two ints and returns int. and returns the string, etc. At the beginning, the x on the slide is a collection of the values. The idea is that we take any function that we are able to run at the moment. So at this moment we run function a because we don't need any arguments and we get a variable v1 which is an integer so we write it down. When we have an integer, we are able to run function b. and from the function b we get another integers. For

example, we will run the function c and we will get a variable v3 string, and so on, and so on, and so on. And we repeat it until something falls out. The advantages of such an algorithm are that it is simple and easy to implement. It turns out that it is also effective. All the operations we do are done without thinking, we just draw, iterate and see what happens. A question for hardcore fans of IT: All of this can be generated from grammar without context. Does anyone know what is grammar without context? One person? Two? Three? Okay, then the question for the hands. After the presentation, if you think why grammar is not enough, it will be a bonus to clench

your hands. The marketing of this solution is that using this algorithm we found for example, in the SSL library, OpenSSL, or OpenSSH. We also tried to make this tool as versatile as possible so that it could be easily configured and adapted to other targets. I will show you how it works in a moment. The idea is that you only write a small config file and a magic makefile and it magically turns into a fuzzer. I will show you this with an example. Let's take OpenSSL and a library of large numbers. Now a question to the audience: what is the use of the library of large numbers? Can you imagine it? For example, it is a library of

open SSL and open SSL for operating In cryptographic algorithms, you need to have any number of any kind. So you can't use a simple integer, but you can make a library for so-called bignum. The library has around 3-4 thousand lines of code, so it is not very extensive. What you see on the slide is a copy and paste of So if you type in man3 bn you get this. To configure the user, which I will show you in a moment, you just need to do this. to check if the function has been called or not. Here are the functions that check if the function has been called. And possibly add annotations to variables, for example, if the free function slows down the memory, I

add an annotation that the variable is destroyed. We also add included files to have original prototypes of these functions. This is magically turned into something that I will show you in a moment. It is for the purpose of putting slides on the Internet. So, in general, this configuration can be done very quickly and get a fuzzer. Compiling this fuzzer is necessary, and a simple framework in GNU Makefile is used for this. I hate GNU Makefile because I don't like GNU. I will show you that in a moment, it is for the sake of slides, so let's see how it works in practice. And before that, for the people who... Do we have programmers in C? Does anyone program in C? At least one

or two people. So what do you program in, if not in C? In what? In CSS. Don't you write pages as CGI? I don't know, something has changed. Anyway... You probably remember the error "_heartbleed" - I was more interested in OpenSSL back then. It turned out that there are three or even four representations of zero inside OpenSSL. There are +0, -0, 0 and 00000, etc. It turns out that these zeros are sometimes equal and sometimes not, depending on the humor or knowledge of the programmer. The first simple error we found thanks to I will show you the device that I will show you in a moment. I will ask if you can see the error, but

I expect that it is difficult to analyze it so quickly. Is there anyone who knows it already? No. If we add -0 here, it will turn out that for different representations of 0, This value is also zero and here two bytes are locked. But since it is minus zero, we will write minus and zero and finish the string. So we will write three bytes and we locked two. And every programmer knows that this cannot be done. Bravo! Another mistake is also a mistake in the long numbers sub-bibliotheca. It turns out that when you have calculated, for example, 3 bits of randomness, or actually less than 8, so less than byte, It allocated itself to the buffer 1 byte.

And here it turns out that we are writing to two cells in the table. And every programmer knows that this is not allowed to do. And this is another question outside the competition. When do we use the get_s function in C? Never. Very good. You are still following up. OK, so here are slides for the need to put it on the Internet. So I will show you the demo. Maybe I'll show you first that this project is really on GitHub. So if you want to help us, click here or here. Here not necessarily. There are no pull requests, but I don't have the internet here at the moment. It's interesting, because I have no idea where this gentleman came from. But some gentleman

from China has reported that something is not working. I don't know where he learned about it. In any case, we will be grateful for this help. Let's see how this project looks in practice. I didn't want to show it. I didn't want to show it. Let's close it. These are the remains of Quake. Good. The project is organized in such a way that as users we are interested in the target catalog. There we write down the most diverse We have some goals that we want to achieve. Right now, on GitHub, because it is cloned directly from GitHub, and as you can see, even today. So there are some of these goals. Not all of them work on all platforms. So let's see the toy ones

first. So, what you write here is a file like this. And this is your API that you want to test. So, we have three functions: zero, add one and throw me out. And I assumed that each call of this function is How to say it in Polish? Success? In the sense that everyone succeeds. Is that OK? It ends with success, let it be so. And API itself has such an implementation inside. So there is a function zero, which, if not difficult to guess, returns zero. There is a function addOne, which, if not difficult to guess, adds one to the integer. There is a function crashMe, which does what? So the function crashMe, if we give

something bigger than four as a parameter, it will simply and if we give less, it will add two to the integer and return. And to compile such a fuzard, I mentioned that you also need makefile and to avoid being verbose, The makefile is very simple and looks like this. This is the template, and this is the template, and this is the template. It was enough to add the information that the implementation of my API is here. When we issue the gmake command with the prefix g, it will be compiled. It will not be compiled, because I have already compiled it. I will do this and compile it again. We will wait a moment. Ok, it has been compiled. So in the bin catalog we get

a fuzer. It works like this: we just run it and it will run in this way and find the crash. It is not very useful because the fact that it found We created a shell function. It works like this: A simple mechanism of REPL, that is, we write that for example we want to write a test case to x that will throw this program at us. Let's do it. And the FAS function simply starts tortures. In this case, it will torture the program to the end. If we add the number of iterations, Because readline on Mac has some problems, nothing really worked here, because I deleted it. But now, when we have found this crash, we can show it to ourselves.

So we have found some test case that is ruining our program. We can run it again and see if this program is really that the program is actually breaking down. But the analysis of such a test case is rather complicated, so we have also added a primitive option to minimize test cases. So, if we minimize the test case and review it, I mean, it is still not the simplest to analyze, but it is certainly shorter. We can, for example, put it into the file, but I do not remember the next parameters at the moment. Save X Test for the future and run it in this way. It didn't fall out here, but when we run LLDB, Maybe we

will recompile it with a debug. gmake clean, not here, where am I? Once again, gmake clean, gmake debug = 1. Thank you. So let's recompile it to have a debug symbol. When we run it in LLDB,

I will ask if anyone has ever used LLDB? There are some people, very good. I will start it. And we can see that the test case we have written is throwing the program in this place. If we are stubborn, we will write the value of x. In this case it is 5, so accurate. But these are some examples Let's see how it works in more complicated examples. For example, let's say OpenSSL. Here I have a real configuration for a large number sub-bibliotheca. It turns out that it is practically copied from the manual. Let's try to compile this configuration and torture OpenSSL for a moment. I've already compiled it before the presentation to see if it works and I've ruined

the effect. So it turns out that It is not difficult to click a fuzzer for OpenSSL. I even catch some crashes. Let's see where. In fact, I will not analyze the entire test case, but I have specifically linked the OpenSSL version here, which is defective and this error is not visible in it. So it can find errors very quickly and efficiently with the help of this software. It also has support for mutating variables. Perhaps, not to prolong the video, because I know you want to listen to the panel, I will show you the last example, a playful one, but showing the game with variables. The idea is that you can add to the fuser the functions that

change the variable a and b in some way. Let's compile it and see. Here is a more complicated example. Here this x should be larger than 33. It turns out that if we try to compile it on such an example without playing with variables, I have already compiled it before, when we start it straight, it will turn out that it is not necessarily possible to trigger this error immediately. Before the presentation I tried and the test case had more than 100 MB. It did fail, but the test case will be really huge. But if we use our magic shell, we will start fuzing for example at 100 MB We will see the order of the calls. For example,

100 in X. We will run fuzing on X again. It turns out that such a crash is easy to locate. If we want to see this test case, it has only 238 calls. But if we minimize it a bit, it will be much smaller. Some more notes. If you want to find errors, sanitizers are the absolute basis. You can also check out our 2015 Alligator presentation, where we show you the tricks, what to do and what not to do. And of course, I showed you here absolute marketing, because it was a perfect case. However, in many, many If you implement your targets, it will turn out that when a structure is allocated and initially initiated separately, these functions

must be glued together. If you create a variable but don't initialize it, it will be surprising if the program will fail. As for what we would like to do in the future, this slide has not changed for two years. because we are somehow short of time and maybe motivation. However, if you would like to help us, here is a whole list of ideas. I encourage you to two ideas. I would be very interested in solving such a problem that we have two libraries that do exactly the same thing and we check if the results are the same. For example, if we have OpenSSL and some TLS GNU or whatever, we check if, for example, in the library of

large numbers, these accounts are identical. There was a man who did it, but not in a universal way for SSL and Gcrypto. It turned out that these differences can be huge and even more dangerous. We are also dealing with the security of the kernel of NetBSD. I will also show you how easy it was to create a fuzzer for syscall in NetBSD, which generated about 100 or 200 unique reports. It looked like we had just made a file with definitions for sys.coli and it was even taken from the header file. Only, if necessary, we added this information somewhere when the function was successful and when not. It took only 5 minutes to compile the work and you can become a hero.

I would like to invite you to correspond with me. Of course, version 1.0 will not be released. Never. Here are the people who helped me during this journey and play with all this. Thank you very much for that. Do you have any questions? I understand that this was information that you don't have. Very good, because the panel is coming close in big steps. Which one will take place in five? I think we should give a moment to switch so that we can start at 4 p.m. Okay, 4 p.m. The experts themselves will ask hard-core questions. So, we invite you. Okay, I would like to thank you on my part and the organizers. For you, a certificate for hanging in the office with my signature. Thank you very much.

And a hand of applause from the President. Thank you very much.