← All talks

LightBulb Framework: Shedding Light on the Dark Side of WAFs and Filters

BSides Athens · 201729:2770 viewsPublished 2017-10Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
Mentioned in this talk
About this talk
Security BSides Athens 2017 (24/Jun/2017)
Show transcript [en]

You can imagine them as, for example, some single signatures, which are strings or text patterns describing a specific attack. This is a root from PHP IDS, which is a very popular open source web for detecting SQL injection. You can see the expression format here. There can be roots which have logical expressions of operators and production variables. These are more flexible than the previous signatures because except of the content of the untrusted input they also examine other attributes such as the character distribution or the length of the untrusted input. Also they can combine many of the previous signatures. This is a logic from remote security, another popular web application file. And they can also be virtual practices or what we call

just-in-time patches which are specific patches for genotype vulnerabilities that we haven't yet been able to fix them in the code. So we just have a rule to protect our application until the development team is ready to fix it and make a release in your release. So in order to figure out how difficult it is to find an attack by examining the rules, or generally you like to attach to a WAF, it will take all the signatures that the PHP IDF has, which is an open source WAF, and create a state machine with these rules, it will have more than 420,000 states. Assuming it is a huge state machine. Another attribute of the WAF rule sets is that sometimes they are shared between different software. For example,

the PHP, DS, and Moist Security, and Exposed framework, all of them use the same loosens. And they're also being used for log-o-digit software, such as the Atlas software.

So the question is, since they are too complex and so huge state machines, why is it possible to find a bypass? Why does bypass exist? And what we are looking for? can be just simple apps. For example, sometimes these wabs have limitations on the size of the input, on the length of the input that the user sends, or on the protocols that they support on the contents. As a result, an attacker can just, for example, under padding on a payload and create a my-wash. This is the simplest approach.

Another really fun issue that we found out is that because of this sharing sometimes things go wrong. Very wrong. For example, the root set share. For example, PHKDS is exposed, share the same root set as I told you before. However, PHKDS has this normalization step where single quotes or double quotes or any kind of quotes will be transformed to double quotes. And then, in the root set magic step, we have only one rule one signature, but contains only W quotes. Now imagine what happened with X-Poze, that they got this rule set, but they forgot to also include the normalization set. We found out that we could bypass this graph by simply using single quotes instead

of W quotes, including the possible attack failures.

Another reason that bypass exists is that some critical components of the bugs are not updated with the latest version or are not synchronized with the latest release. For example, we found a bypass for non-security for CRS. So we reported it to the guys who maintained the project. the guys started looking and trying to find out why this bypass was possible. And they figured out that they were using a forked version of LibExec library instead of linking it to the latest version in which the issue was already fixed. So in other products, that they were using the same component, the bypass was not possible. But in this product, because they were using an old version, a forked version, the bypass was still possible.

Now, the real fundamental reason is that the signatures will always be sufficient and the rules will always be weak. And this is because it is very, very, very difficult to identify a tag without knowing the proper context of the web application and the input that is sent on the user. It is very difficult to create such rules and signatures. So the thing is that you are a pen-tester, you have a target, A web application, for example, that is protected behind one of these WAVs. And your task is to detect the vulnerability. What are we going to do in this case? OK. I think that you all imagine that the first thing that you will do

is try to identify the web application firewall and search for known attack vectors. There are tools that we can identify the WAV. Most of them use, for example, some artifacts, such as cookies, and the headers that the WAFs leave behind. However, even if you find the attack vectors, if the WAFs are already in the latest version, this attack vectors will be passed. Also, can't you identify in which version are which WAFs? Most of the time, you can't, because the artifacts are common between different versions. For example, all the deployed versions of the WAF will use the same cookie value and the same header.

So okay, the next thing that you can do, you just gather all the possible attack vectors, create a collection and try to send all of these payloads and see what will happen. But again, you can't do this, you can't enumerate all the possible payloads. Just let's say one payload, like script alert script. How many white spaces are you going to use between the script tag and the alert world? the first one is to set a limit here and this is not possible. A movement is like brute forcing on the possible character combinations here. So the third approach, which is what we actually do when we have a similar case in everything, not only WAWS, is to use a buzzer. Let's use AFL,

it's usually a buzzer. Let's try to create a smarter attack player But the thing is that this also won't work. And the reason is that the fathers need a feedback, an input that will tell them if they are doing well or if they are doing wrong with the inputs that they are generating. And since this is a black box test, we just haven't done it, we don't know anything about this, it's very difficult to provide that feedback. So eventually, for large inputs, it would be like generating completely random frames. based on the input that you will give to these users. Three days ago, Spider Labs, which maintain mode security, announced that they are going to use AFL for quality assurance. So they

wanted to use AFL to test mode security, for instance. And they said that they haven't yet managed to use AFL for the parsing feature of mode security. And it's very important to note that if they cannot do this, While they are doing a white box test, they have security, they have compiled the C-Lan features with dividing symbols with everything and they cannot do this for the partial of the white security. How many testers are going to do this in a black box test? So what we can do, what I propose you to do is to use Lightbox, which is actually a framework that offers you a novel and efficient way to bypass a web application firewall using automata learning algorithms.

LightBalb offers three main things. First of all, it offers you a way to formalize existing knowledge in code injection and attack variation using automata and context-free grammar. Secondly, it allows you to expand this knowledge using learning algorithms that will infer the specification of parses and the targeted webs. More precisely, we use the algorithm learning algorithm, the animation of it, for symbolic automata actually, that is an active learning algorithm. An active learning algorithm means that it does not learn from a corpus of data, but it queries the targeted webs at runtime, which puts the learning algorithm besides. And finally, language analysis will cross-check the inferred models for finding

for example, testing for inconsistencies on a model that was in furniture. So, about the formalization of the code injection attack variations using regular expressions and automata, we decided to use automata because this was the most efficient way to create models for the regular expressions which actually, right now, the internet class standard for text patterns. So, all the WAFs that are using text patterns are usually having regular expressions like this. So in order to create models, we transform them to automata, which are small state machines like the one that is presented there. We also use graph because they are more efficient for working with. So for example, if you have this SQL injection, a select query,

we can create a graph map that will contain all the possible valid suffixes for this SQL injection. And this grammar is not, we didn't create it from the beginning. We just searched the internet and found an SQL specification for the grammar and we just port it to work with selectability.

Now, like that, it comes as a Berkak station. You can port it directly to the Berkak framework, It's the most popular tool for paint testers. It has a list of attack vectors already included that you can select and work with. Let's now examine the following scenario. We have a target WAF and we want to bypass the WAF or generally the filter that protects our target. We have a very large number of potential non-excessory spill injections, attack vectors, and we can define them as grammar or regular expressions. Actually, we may do something more clever. We can use the rule sets and the signatures of OPEXRs WAF, which already contain modules for attack vectors like this, and use them against other targets. Since we

have access to this, this is our OPEXRs. So the first algorithm that we will see that LightBad supports is GoFa. The main idea of this algorithm is to use the grammar to drive the learning procedure. So as I told you before, LightBad uses an active learning algorithm which performs queries on the targeted WAF. So it performs the queries and infers a WAF model. Let's assume that we gave as input an SML grammar.

και μετά την γραμμα, έκρισε και άλλο μοντέλο και τώρα είναι έτοιμο να κάνει αυτό το κλπίσμα. Τα πράγμα το κάνει είναι ότι προσπαθεί να βρει ένα παίλογ, ένα κλπίσμα, για παράδειγμα, που υπέροχε στο μοντέλο του ΑΤΑΚΒΕΚΟΣ και δεν υπέροχε στο μοντέλο του WAV, που ήταν αφήνει. Γιατί, όπως μπορείτε να υπέροχετε, αυτό μπορεί να είναι ένα παίλογι για το WAV. Έπρεπετε τι παίλογιζε να προσπαθεί να μπλοκει And we also have a model about possible bypasses, so we compare these two and we try to find the bypass. It's very simple. So after this, we send the candidate bypass to the firewall and check if this was actually true or not. It may not be true, because the inferred model may be very small. So in this case,

we just, if this bypass we keep it, we succeed it, we profit it, Or we just send it as a counter example back to the algorithm in order to define the model and do the same operation again. So now on the LightBank extension, how easy is this to implement? First of all, you have to intercept a request with BerksRodSuit. You just right click on the absolute button and you forward the request to the LightBank framework. Then you have to select your attack model. As I told you before, Lightband has a list of possible attack vectors that are ready to use. Currently, I go to the learning tab at the test sub-tab, and I select the grammar, which is for selecting

SQL wins. We also have a since tab that is just for making the algorithm faster. You can imagine that instead of learning the, inferring the targeted graph model from the beginning, If we assume that okay, the WAF should be like this, we can also give a C that makes this procedure faster. So then we right click on the forward request and select start figure learning. At the end it will return us a payload that belongs in the attack vectors model that we provide and it is not blocked by the WAF, the ARGED WAF. This is a payload for PHP IDS, however this is a quite useless payload right now. You cannot do a lot of things in SQL

database. So the problem that arises after this is OK, my SQL is passing probably a different SQL label than MS SQL. So the grammar list that I created may not be perfect for doing HTML injections. For example, now I got payload, but it wasn't perfect. I couldn't use it for a direct attack. Browsers are definitely not parsed in HTML standards. So even if I create, for example, a grammar for HTML attacks or for HTML injections, the browser may not work with this. Or maybe the browser also parses more things than this. So I will lose the extra possible blackout vectors. Also, the WAFs are doing much more than a single regular expression matching. As I told you before, they have these extra

steps. And tokenization, normalization. This would change the model, the inferring model. This will change, but actually it is being blocked. So let's examine the initial scenario. We have again our target becomes our WAF. But right now, we decided the available grounds and expressions that we have for creating these attack models are not enough. And not always good for my vulnerabilities. What we expect as an attack vector and as a successful vulnerability is something that has no actual sense. It deviates a lot from the HTML standard. But the browser is OK for the browser. Internet Explorer will see such a proper

So what we can do is we can use the second algorithm that is offered by Lightman Brandler which is SFAD. The main idea in SFAD is, okay, if I can use learning algorithm to infer the model of the web application file, why not using the same learning algorithm to infer the model that is from the browser or from whatever is the target that we want to execute the value. So we do again the same thing. We create two learning algorithms, two inches of learning algorithm, and we try to infer two models. And then we do this cross-checking between these two borders. Instead of counting our attack vectors here, we learn the attack vectors model this time directly from the browser.

Again, we create a guide-less bypasses, If it is a bypass, we can keep it. If it is not, we just define our models. You can imagine this bypass that is generated here as a difference between the model that it is blocked by the path and it is accepted by the HTML browser, for example. The model that has this HTML model. How you can use the live path? The approach, again, is simple. You just use the differential learning tab. You select here, for example, a scene. We have proposed to you to use a scene because if you use a test in this case and try to learn, for example, using a test detectors from the browser, you may have a very narrow scope on what you

have learned. So a scene is better here. And again, you start the differential learning with browser selection. If you wonder how this looks like, how the Live by Friendbook will make it worse to your browser and try to further model, it is something like this. It will pop up a message saying navigate to localhost, for example, this port. And when you go there, it will start using the browser, using web sockets, and trying to find, for example, a valid text access payload. What it actually does here is that it disperse the HTML browser mode.

Finally, on the campaigns, you can see the results, if it is bypassed or not. In this case, for example, we found this bypass for PHP ideas. And also some statistics on the number of the requests that it made to the browser to the WAF. And it found like times it did that cross-checked. You can see here that it did that 69 times.

The thing is that since we can use a straight disk to infer the HTML parser, what else can we do? Why not use it to do this with browser filter? For example, the XSS Auditor of Chrome. Or why not use it for SQL parser, like MySQL. LightBank supports both of them. There are selections that will allow you to do the same procedure, for example with MySQL, και προσπαθώ να βρει την αξιότητα που παρακολουθεί την WAV και την αξιότητα από την εργασία του MySQL. Επίσης, μπορούμε να βρει μια διαφορετική μεταξύ των δύο μοντρων, γιατί δεν χρησιμοποιούμε να γεννήσουμε ένας εξωτερικός μεταξύ μεταξύ των 2 WAV και να μπορούμε να διαφορετικά τα WAV. Ωραία, κάνουμε την ίδια προσπαθήση, αλλά αλλά μην αλλαγή την προσφέροντα σε ένα δύο WAV

και The bypass that we get is now just a difference. I mean a payload that one waff will accept and one waff will block. Let's take the following scenario. We have in this black box a waff that we cannot identify. We don't know which is this waff. And we have the number of lone waffs.

The question is which of these is in the black box. We use the offered algorithm for LIDAR framework to create differences between the first two WAVs. We test the difference, and depending on the answer, if it is blocked or if it is accepted, we exclude one of these two. And then we continue doing the same binary tree searching approach until we end up with only one WAV. And this WAV will be the one that it is probably in the black box, and we will not be asking for it. By doing this you can also create a binary tree like this with payloads that one Waf accepts and one Waf rejects. So this can also be

created of live. You create this once and when you have to identify Waf you use these payloads. Like the live-up frame got has already a list of generated trees that you can write to you. For example, you select the tree from the panel and you select Start Filter Wap Distinguish. But okay, since we can do this, why not generate your own Heliocrit trees? The approach is simple again. You just forward more than one HTTP request to like the framework. For example, one in West for PHP-ITS, one for X-Pose, one for Code Security. You select all of them and you select Start Filters Districting Tree Generation. And it will create one such tree for you. And, okay, since we can

create fingerprints for WAWS, what else we can do? We can also do this for browser fingerprints. The friend box supports the ability to do the exact same approach but not for the browser models. So it can generate for example a payload like a JavaScript payload or an HTML payload that one browser will understand and will parse as valid HTML and another won't. So you can use this payload for example to distinguish visiting users in your site. using the passing language of the browser that the user is using exactly using.

Now, we've found some other abilities using the tool. For example, using this grammar, which extends the search conditions in S12. We found all exist select one page. that can be used for bypassing authentication queries. This is a very common query for web applications. They try to see the two inputs, a username and a password from the user, and they try to query the database to see if it exists or not as a no. So using the payload, we can create a query that is always true. And this affects very popular web application followers that transport security pages where it has determined and exposed.

Again, another authentication query. It's almost the same. We can use it in the same condition, and it's one or a zip121. So instead of a, you have to use just a valid variable or a column name from the database. For example, here I guess that it probably is admin or admin or something like this or like d. So I can use this as a student event PHP-LDS. Also, we found other payloads that can be used, for example, to fingerprint existing columns or the data base. We found that nxselecta and nxselecta can be used, for example, for this purpose. We want, for example, to fingerprint if the email code exists in the data base schema and we extend the

query like this, the file, the map, won't discover this attack,

Let's imagine that it is a web application for authors and articles, and wants to create a query that will then retrieve the data from both tables. We can extend this query with the right point, and retrieve also the data from the user's table, like the user password. This also was a bypass for security and webcast telling. Again, a loop by a ask, which can be used for fingerprinting varieties and calling names. and the procedure A, which for example can be used as procedure analyzed, you get inner workings of the database schema. This affected relief injection from water. Using SFAD we found the vulnerabilities like the one I showed you before, about data event handlers, such as PHPEA and X4 framework. If you

check the rules, the signatures that BHP-IDs have, and you will notice that the rules 71, 27, and 2, and 65 are trying to mitigate the risk

for these attacks successfully. They have some slight variation of what they accept and what they don't. So if you use other signals, it will probably not be attacked. But if you use these signals, it's OK for them. So Future work, currently we are building many optimizations for the Lineback framework. We initially made Lineback for research purposes, so without some code that is not the most efficient code. And we are writing in blue. We also have a similar line of work for sanitizers. Sanitizers is a totally different thing. For example, it can be a simple code, like HTML code there on URL coding. And yes, you can use LightBad to emulate the same attack, but it is not the best that you can use. So we are working on

learning algorithms that will also be able to gather more specific cases and have a better learning approach on this team. Also another thing that we are working on is that we try to incorporate fuzzes to improve the attack models. Instead of letting the user decide which models he wants to use, which attack vectors he wants to use, we want to lead the fuzzes to take this decision. Generally, our vision is to try to enforce the standard for such products. Try to enforce a way or a method that the pen testers will follow each time they want to do a lot of in a protected target, a target that is behind one of these wows. Also, we are open to contributions. If anyone wants to contribute, has an idea, and

wants to develop the idea, you're very welcome. So thank you. You can find the live map at this URL.