← All talks

Fuzzing Ruby and C Extensions

BSidesSF · 201833:10388 viewsPublished 2018-04Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
Mentioned in this talk
About this talk
Claudio Contin - Fuzzing Ruby and C Extensions Intro to fuzzing, and specifics in Ruby lang:, security implications of vulnerabilities that might be found (ex: https://hackerone.com/reports/499). Intro to AFL fuzzer, basic concepts on how it works, and how to run it against Ruby lang, and potentially target gems with C extensions.
Show transcript [en]

[Music]

so quick introduction about me why I think is not changing slides alright so my name is cloudy contain I'm security consultant penetration tester based in Wellington New Zealand I come from a web development background mainly became development and quite a lot of JavaScript pretty much everything but CSS in my previous role I used to be web developers trainer as well in term of security teaching almost petain and secure coding guidelines and I also done small contribution to big project and go fish some fun facts to start with so this is my personal laptop at home as you can see my C key is a bit broken that's why I make a lot of mistake during my work and I do control C a lot

as you can see you probably guess from my accent that I'm not actually originally from New Zealand so I'm actually Italian I was born and raised in Italy and I moved to New Zealand around end of 2007 and always be surprised about the similarity between the two countries especially looking at the shape really this is actually my hometown back in Italy it's kind of based north of Milan is kind of famous for ice cream whereas this is where alley I moved to so this is actually Wellington in New Zealand it is well known for his bad weather and it's one of the windiest city in the world as well and talking about wind with it

actually this actually was the window in my bedroom which exploded from inside out after a big wind gust I had early one morning as you can see the debris is down the red card on there so that's all part of the whole window kind of diet the day during this few years I lived in Australia as well and yeah I soar above it just I made it out alive so this talk is about Ruby so Ruby is a 25 years old language is really well known for the frame or moving rails if you are in penetration testing you will know Ruby for Metasploit everyone loves Ruby all this meta programming magic that you know makes

all was framework possible so the agenda for today I would talk about start talking about Ruby gems and rubies and twist extensions I would give a high-level overview of what fuzzing is how we talk a bit about IFL I'll talk about my setup to target ruby gem with the extension and potentially Ruby itself talk about some of the findings and I will give a brief introduction of your grammar based fuzzing targeting languages and Rubin in specific so what Ruby gems are are a way to create Luke Ruby libraries saying simplistic ways so rails itself is a ruby gem so what happened when you want to install a gem you type gem install name of the gym and what happen on the

network level your your machine we contact James Ruby gems API check what dependencies that the Jamie want to install as an download everything he needs to and install locally the other way to install James City use gem foul and Bandler rather than tapping gem install or live James and a way to use gems it just requiring them in your code base simple as that so when you when installer jane with extension you often see this message what it means is some your machine is missing some libraries by the C extension needs for example if you installing mice equal to JM you will need leave my sequel development libraries install in your box so why people why developer wants to write C

extension James with the extension model speed so Ruby itself the language itself is not the faster languages out there so if you want to if you have some specific specific area where you are concerned about performances you might consider writing it and see other other reason is reuse existing celebrities import them into Ruby we'll see one of the findings about is specific amount of base cases and of course whenever I'd see you you have issues with you know you need to be careful with memory corruption memory leaks and all this kind of stuff so this is a really simple six tension of course you need to include the Ruby dot header file and for example this would be hi

hello C extension class and by by default you need to declare you need underscore name of your only obvious class and as you can see it'll be the fine class will define a Ruby class hello and Ruby the familiar will define method word on the class hello there simple as that so this so many great printed ruby gem with the extension you need to create a ext conf dot RB file where you specify compilation option for you C extension and the two main lines the one required MMFF and Creed my father actually did to only require lines everything else are a compilation option you can specify at the top of the defaults so after you create your six

tension in your ax T Kampf you invoke your ext conf Ruby Val using Ruby itself which is generally the make foul and by having to make value just compile your extension which generate a shared object at the end and to just users you can do require or you can do require relative sorry cool the difference between the two is required relative will look for the Ruby or the share object in the current path whereas the require we look in the your Ruby setup on your machine and by to use that you just use a normal Ruby class so this is a structure of a typical gem so the lib part with the pill contain Ruby code

and the EXT part will contain native C code and everything else is test and break fall so API fundamental Dementors about writing C extension is so discouraging of the use of Malakand free so just keep in mind the API only discards you it does not prevent you to use malloc and free leap fray to Ruby garbage collector and one of the main key thing within C extension is this value type so everything in Ruby is an object and in C extension as well so this value type is actually a pointer to a ruby object within your extension so the API does never allows you to access Ruby object rightly instead give your pointer to these objects and to

understand what type of object within what title object this value type is you can use Ruby type and say raise a string and so on so why did I look into this so when I join when I start doing up and testing is a full-time job always for that binary was one of the main area to to concentrate on so I wanted to learn more about explore development itself and active learning is one of the main reason I wanted to do this and just looking at I don't know looking at bounties out there I realized that you can still find bugs you don't have to write exploit ancestry you can still you know contribute funding box to report

the box and contribute to the community so explore development is not necessarily if you want to do this kind of stuff I chose Ruby because I've work will be more than seven years so it was a heated choice and again Ruby itself and Emma will be within Shopify script is already part of bounties but I notice that C extension I couldn't find anyone doing much research on C extension that's why I kind of dig into it area a bit more so let's talk a bit about fuzzing now so the main reason people do fasting is because code reviews can be quite hard to detect box often what complexity of the code they mainly the main purpose of

passing in fire a lot of inputs to the binary and make the binary crash that's really what forging into the high level is the core analysis is the post crash analysis of why the crash happened with this kind of input in the first place and parsing itself lead typically require little effort in most cases fuzzing mainly divided in two main categories so Dom is much fasting so damn far thing is where the input is randomly generated by the father the father has no knowledge at all about the binary you're targeting whereas in in opposite smart fuzzing they really target specifically pure binary and they're typically require more take more and more effort to be saturd so in term

of type of father as I said they're really mainly divided into how the input is generated so a mutation based father is where you provide simple inputs for the father and the father will mutate Bo's and target the binary whereas generation based fasting is where actually the father knows more about the binary you're targeting and it generates inputs itself evolutionary feedback driven we are more similar to mutational father and the work is very used instrumentation to understand that a new input reach some new code path there are some new experimental type of father they are called transformational based fuzzing so the ideal board is when you have some input checks in your binary let's say you have a CRC check on the

upon your probe on e to the binary and a mutational atom father will never be able to reach to pass this input check so this theory of transformational base passing is where the father will mutate the banner itself temporarily disable the these particular checks and carry on fussing with random inputs of course we miscalculate more false false positives as you can imagine if you move the input check and you get a crash you still need to validate the crash can be replicated by putting the the input check back so for visitors I use AFL which is American fossil up was developed by Michael Zaleski with X Google employee and our FL words is a compiled time instrumentation so the

main feature is it can detect that a new code path has been raised by the input you provide to the binary and it really works best we've reduced this case so as you can imagine if the the banner has to pass in the input you provide smaller the input is faster the fuzzing process would be one of the feature of FL is dictionaries so let's say you're targeting let's say you're fuzzing sequel light or any kind of language specific you can specify a set of keywords that the father will use to generate inputs rather than generator and completely random inputs so other feature of the father AFL is it can distinguish between unique crashes which

is really helpful when you get thousand of crashes and they are really at the end of the is really one unique crash it really makes it easy to to kind of remove all the noise you can when you when you use a fail you can use the firm old at purchase mode which is a way to kind of speed up the passing itself so how differ works is for example let's say you have a piece of code that does a lot of business initially zation process you can put some some you can actually catch the binary and compile and saying hey FL when you start fuzzing a new process please do not reinitialize they will process flow from this point until

this point and for the process from this kind of State and you will have faster fuzzing and persistent mode is a similar idea where you will use a single process to iterate through multiple inputs instead of one process and one input you can use paralyzation so when you run FL on multi course by default one AFL process use one CPU so you can spin across multiple CPUs and multiple austell well over the network you can if you don't have the source code you can use QA MO and some other project that you know allow you to instrument binaries without source code and the IFL provided by disability check the various so how does a FL does instrumentation which is a

really high level of instrumentation so it does insert some code within your your binary to kind of notify better if a new input or it's reached a new code path within the binary and how does it do it they generate an intermediate assembly when you compile using a GCC or si Lang and if you're curious to know how it does and you can use this AFL keeps assembly one at compilation time and it will keep you with intermediate assembly so so now what I wanted to target the extensions so I could compile the see essentially cell with AFL but there wasn't really a deal which meant I have to write each fuzzing test case manually to target each particular logic

in the C extension and one of the thing is technolon is often James of C extension only have small component written in C whereas the rest is written in Ruby so why I couldn't just use Ruby so and because I felt once the binary you you calling which in this case would be rube itself I would just comparable themself to start with and this would be my fuzzing input so I would require my extension I would open the input file provide by FL and fast extension as simple as that so in order to understand even my setup was working a creative is really simple vulnerable C extension where I wanted to prove it all this setup was working and

within a minute ago the crash so I was pretty confident everything was good to go solo from mistakes one thing to keep in mind is AFL does not give you the result of any of the binary so when you file some binary with an input with a fail you don't know what the result is you only know if it didn't binary crashes or not so when I was running is against no competing with his famous XML parson in Ruby and he's a big yes I said also big codebase I wasn't getting anything out of it it was a if L was keep telling me that he couldn't find any new code tough and stupid enough I

realized I wasn't given enough memory to ASL so yeah so learn from mistake from here so final tweaks I could run all of these on my base OS which is probably bad idea also want to maybe try to target if in ruby version different generation and use maybe different compilation options for AFL and for example user sanitizer could use VMs Bahia VMs or a bit of a pain to you know copy and clone and stuff like that so I ended up getting a bunch of docker containers which was all available of Israel if you're interested and was really my kind of setup for in term of machines so that is what I use for the

firm or wherever else obviously the main daughter C Ruby code so visit the source code of Ruby and what the ddr escape all the initialization code of Ruby and I say when he when a FL wants to start fuzzing a new process it just start from this point it will make fuzzing faster my doctor setup is really simple I just install all the dependencies you need the loading company FL set environment vailable to tell to compile everything with a FL dollar Ruby fetch the Ruby minus CS just show before and compile the file so this world looks like so I'm inside my docker container I install a gem we've seen extension I create my fuzzing enter point and I create a

simple test case

and alight fasting running as simple as that I guess so all was the top now to start actually looking into something I didn't really know what to fast to start with I wasn't too aware of too many see extensions so I found out about this Ruby toolbox website that collects a lot of nice gems so I kind of picked fusee extension from this and I start targeting mainly pushers this is the hardware I use which is my own personal laptop which is yes best idea to use a laptop you know nothing else so my first target was BC CSV which is a pure seed shisui parser I was still trying very CFL option was while running

this and would be 22nd running AFL on these I got a crash so I thought that something was wrong with my setup because having a crash in 20 second is really probably uncommon but instead it was actually a real case so by adding up doing some root cause analysis in the codebase I realized that there was a double free condition in this case so when I was conditionally the one line was backslash it was fraying the line and it was freeing the same line afterwards so in this case I submitted a fix to the maintainer it was made without any changes which surprising considering I'm not su programa so and so I want to give a really quick

overview of a double fray on Hanah hippies so when you free a chunk in your energy pin gypsy will end up in a in a beam usually is a doubling list and when you double free the same chunk again the same chunk will end up in the double linked list whatever means is the first time the tongue had reallocated the user get X get access to the get the two pointer to the memory address and the same chunk is still in in the in actual beam so what it means is you you actually able to control the pointer of the chunk within the beam which which means you can control it traditionally you could exploit this by changing the

forward point and backward pointed of the list by writing arbitrary Kong arbitrary memory addresses and achieve code execution model in this it is not really easy anymore because there are corruption memory detection built in of course this might not be the case if use embedded systems and or if you disable those checks you wonder why you should disable those security checks doing some research online I found some some interesting things of companies suggesting that you should remove those checks which is a bit shocking so yeah so instead of things the code base we were just suggesting hey just remove the check and carry on and please don't do this don't follow suggestion where bed so I moved on this is my next post my

next target so o X is a fast XML parser as a lot of features we know describe them all and this last on the last line is one of the input where AFL generated after five minutes and so in this case it was just a null point so the function was a stunning null and it could be as expected of strut and talking to interesting the process just crashed and if the fix of the maintainer was just check that you know this variable isn't if it's not just you know exit gracefully rather than crashed into the process and I target another functionality of visual X which is a sax parser and within few hours I got

another crash and this looked at a bit more interesting so I got a stack-based bottle overflow as you can see the memory address as raised up there and even running bay FL with that the sanitizer I got the confirmation that was definitely a buffer overflow so quick note about a the sanitizer is a fast memory error detector and it can detect this kind of memory corruption so the input that generated bees was a bunch of carriage return character followed by a bunch of ace and doing a quick debugging with gdb I noticed by the string and copy was using n which was set to this huge number and it was kind of all flowing the destination

buffer in this case so I will probably skip this because I don't have enough times all right so after the first buffer overflow was fixed by the maintainer the another off by one error was still present in the code base in a different place and the same the same iteration was the same input generated the condition what I did in this case are work with no maintainer to I provided in the doctor the doctor container I was using to test these and it kind of promptly fixed the issue after a couple of days so after this I decided to move into more JSON stuff after evalin csv and xml but for a while why not try

some JSON so i target bees it's called yet another decision library which is actually an existing pure c json parser has been ported into ruby and also this gem has a bunch of features one of those is it's supposed to be as close as two precious close to crash proof as possible so i'd never thought i could find anything on this one and I got a couple of crashes within a few few hours as well so this was my typical fuzzing entry point and this is what the input format AF had generated for for crash the binary and the output I got is this one so I got bad should never happen so yeah I'm not sure

why they happen and another second accession failed as well from similar input file I'll keep some of these but just a quick note of what assert is is it's not a mechanism for entering you know heralds at runtime and is often disabled when the code is it actually shipped for production news and what I did in this case I reported as usual the issue on github and I provided the reproduction step and the fix was released and the maintainer actually contact me directly and say can you please make sure that the next time you disclose this product Lee so I was a bit surprised considering that it was a public project I just open an issue on

github I realized where this guy was the actual development manager of github and this gem is used by it was downloaded 21 million time from rubygems so he's also using some Damian ins in two packages so you can actually equity install this Ruby with I never knew about and after a few weeks github release of this which is a good timing oh it worked is so if you enable dependency graph on your Ruby or JavaScript project Gaeta will tell you that if some of your dependencies of some novel vulnerabilities so it's pretty cool and you can also set up various options get notified by mail every time a new burner if this comes up something like that and this is what it

looks like in the interface so visa blog post came out from github a few days ago actually and they reported some statistics about this feature and they say that within 500,000 oh very positive if they found four million vulnerabilities in dependencies and 450,000 of them were actually already resolved by project maintainer to kind of put all of this in context I set up a simple rails application let include all these vulnerable gems and the thing I wanted to in fact file and emphasize is that even though some of those might not be exploitable in order to achieve remote code execution but they all process crashes and the old process in this case will be to your

application server of your app right and it's really hard to often to track down those issues because there are no information in nginx law or your whatever web server logs you use that is nothing can be logged in your application logs and in my setup has used Puma and passenger to kind of try out a few different options by crashing was you know application servers and Puma does not log by default so also by checking some open source relapse I noticed that the Puma configuration just doesn't include logging by default and yeah so what I did I run the app and replicate the crashes win the app so in the case of the CSV nothing was present

at all in the Puma logs so the the application server was crashing and there was no trace whatsoever anywhere in the logs in same in the case of wax there was some logging in there which is good in case of the buffer overflow over was some login but there was nothing really useful useful to understand why and what was going on and in case of the JSON one it was kind of clear enough as well good thing about space injuries well it's passenger logs by default and all the back traces of this crashes were present so again probably would help you in case you you know you have to to dig into this kind of issues you might

experience yourself quick note about memory leaks so memory leaks is when a memory gets allocated and he's never been afraid I will skip some of his stuff because I'm but I want to to mention this because I experienced this myself when I was working as a Dave we used to get about once twice a day some really weird exception in our code base that didn't make any sense so every time the session was showing something totally different I didn't point 20 of our code base to actually give an idea what was going on and better turn out to be actual memory leak in no cookie really lib XML specifically so this is a kid the same

case by the about experience so in that is a Australian company paid a very big rails code base and they wrote a blog post about this and they made really good points they spent months and hours into looking into this or you can imagine it would be a huge cost for a company to to track down or all this exception and one last point good point to make is that the the application server should never SEC fault it is sec fault that should be the first priority for everyone to look at so what's next I realized there was no so many C extension of there or anyway nothing new that was being pushed out or maintained

anymore so I put into looking into fastened rube itself and of course when you use I fell against Ruby it's good but if you want to actually fast the actual language itself rather than the partial parts of the language you all the FL input will generate something that is synthetically incorrect so the Ruby will never even run the input input test generated right so that's why I start looking into this concept of grammar based fuzzing and allocate specific to this antlr tool which has been developed by a professor University in San Francisco yeah it really the main of the Oriental are is creating grammars you can create personal lecture rules of its own escape these a bit and this what

a ruby lecture rule will look like as a high-level and this is a ruby parcel so let's say the function definition will look like something like this and sort of scabies and the reason really nice to lavare grandmary NATO which was developed by Renata Hamid one from University in Hungary which use Aunt Eller to generate synthetically correct random test to actually fast languages it is based on pain teller and it has create test generator in Python 3 language and to really give a quick overview is really really simple to use once you setup your grandma you can just invoke this tool which generate the you Python and parser and a lexer which when will be used to

generate your random inputs so this is this is a really quick output of this tool used against Ruby so a bunch of definition and one interesting things to note is this code base is actually synthetically correct so some final thoughts so you don't need to be an expert I wasn't an expert when I was looking into which is actually my first time doing fuzzing you don't have to add the best hardware as you can see I use my own personal laptop it can be once the initial stop it down it can be kind of semi automated I believe that if you have some kind of C code base within your organization you should look into

fuzzing and doing continuous parsing against it and I believe it fuzzing just cannot replace code reviews of course and one thing I will maybe would like to note is maybe public project on Austin on guitar pit label whatever could maybe you know maybe it was provider could allow to open issue travel to it rather than you know report vulnerabilities to the public yeah that's probably bother I'm not the time so thanks [Applause]