
greetings everyone and welcome to my talk about hunting bugs in javascript core with codeql our journey begins when we wake up in the middle of nowhere and this guy is approaching us and start talking to us saying hey you you're finally awake you were trying to find bugs in the realm of javascript core huh walked right into that ambush same as us and that hacker over there who are you so before i'll introduce myself if you're not if you're not recognizing the following reference congratulations you're not a complete nerd like i do this is from skyrim who am i my name is asafseon and i work as a security researcher at cyberlock i enjoy ctfing and playing video games in my
free time uh here are my i'm level 24 of course here are my character my character stats so i'm somewhat skilled in code ql but i'm not so skilled in javascript core exploitation uh my goals my character goals are use codeql to find bugs in javascript core and of course have fun finally my character state of mind is that i'm 100 positive and 100 excited so without further further ado let's begin during this talk we'll have three major checkpoints the first checkpoint will cover the fundamentals of quad ql will get a very good cultural introduction we'll see some cool and basic examples with codeql and there is gonna be even a boss fight so stay tuned
at the second part of our talk we'll review javascript core as a javascript engine we'll talk about how javascript core processes javascript instructions what type of jit engines there are in javascript core and what are side effects in javascript finally we'll talk about bed side effect modeling bugs a type of a bar class that exists in the in javascript engines and then finally after that we'll we'll find out how how we can use code ql to find these types of bugs let's begin this is going to be a very short introduction to code ql and i mean very short this is me on the left side what is codeql for whom of you are not
familiar with codeql yet codeql is a framework created by a company called saml which is now owned and developed by github codeql in its core is a query language it is very similar to sql it lets a researcher listen with code using tools from logic it has some cool syntactic sugar from object oriented programming like inheritance and class methods we can use the code qlcli cli you should know that the code ul engine the code urcli is the codeul engine which can execute codeql queries so we can use the code qlcli cli to to turn codebase into database it supports many programming languages such as c plus plus java and many more it has tons i mean tons of great
features to analyze code uh for example data flow analysis that we're gonna look like we're gonna review today but it has many more i i encourage you to read about it later when working with coql there is a general workflow that will that we will usually follow at first the researcher will choose the open source project and make sure he can build it after that clean the the solution and then rebuild it again but this time use the code qlcli next select a bug class that you want to discover analyze it understand that backlash very well and understand how that bad class might appear in the code base that you chose then write a query that finds the
backlash and make sure it works by executing that query against the prior known vulnerable version of that project you chose to analyze that's a pro tip so make sure that the query finds all vulnerabilities from that same backlash in prior versions finally execute your query against an up-to-date database and hopefully find new bugs query structure so each query will consist out of three parts there are very similar that are very similar to sql we have the from class if you're not familiar with sql the front class this is where we're going to define all the codeql elements that we are going to use next the word class although this clause is optional uh it is the very essence of your query so uh
this is where all the logic of your query will be in so although it is optional it's quite mandatory if you want to have uh complex uh queries finally this is the select clause this is where we choose how our output is going to look like now that we've learned what what is the general purpose of codeql we can start our training okay who is this lovely npc this lovely npc says hello there do you mind helping me with some code ql queries sure sure thing let's help her our first quest requires us to find all the variable declarations in a given code base and this task is very simple this is the first mini quest so
this task is very simple at first we'll import c plus plus this tells codeql to use the c plus plus standard library that exists within the coqueur library we don't need a where clause in this query since we don't want to find since we want to find all the variable declarations so let's define the form clause variable you see form variable var so variable is a codeql element from the c plus plus library it represents a variable declaration form the abstract syntax tree that represents our code eventually then we simply have to select val this will select all the variable declarations amazing so let's see i use the vs code to run my queries and
this is how the output should look like when executing that query against the database created from webkit we can even click the results and review them so let's click for example the host it will take us to the uh actual declaration of host in the webkit project and indeed we can see that uh this is the declaration of host awesome in our second quest we need to find all the function calls to the malloc function this is a bit more complicated than the previous one but not too much okay so import c plus plus we already talked about this earlier now we know that we need to extract the calls all the calls so let's use the
element the codeql element that represents every call in our code from function call call to malloc call to malloc is now a representation of a function call in our codebase then before inserting any constraints let's select the call to malloc if we run this query this will extract all the calls to all the functions so let's narrow down it let's narrow it down by by adding the where clause well and then call to malloc dot get target get target will return the the actual function the that the function call element called to and then all we got to do is uh use the has name this will this way we make sure that the function we called has the name
malloc and then execute the the query here are the results we can see that there are over 500 calls to malloc in the webkit project select one click it and here is the call to the relevant function maluk in our case of course okay we have a boss fight you think you're clever aren't you find every expression that controls an allocation size every expression that controls an allocation size okay in order to win the boss fight we would have to use some data flow analysis don't get intimidated by the phrase data flow analysis it's very simple we would simply need to find which expressions flow into the first argument of a malloc function call let me remind you the
first argument of the malloc function call is the size the malloc receives one argument which tells malloc what's the size of the buffer you should allocate so we want to find every expression that flows into that first argument again import c plus plus very similar the first part of the query is practically the same only that we added a third uh a third element this element called expression uh it represents every expression in a codebase so it's very abstract it contains a lot a lot i mean a lot but we're gonna use it in a more precise matter later on uh okay now the this is the first part so uh right now uh the function call element
fc holds all the calls to malloc like we did previously now let's use a predicate called local expression flow this this way we can define the source and the sink in our data flow analysis now this predicate says find all the flows all the expressions that flow into the first argument of fc fc is a call to malloc the first argument of fc is the size of that allocation and we want to make sure that source represents an expression that flows into that first argument if you don't understand it let's see an example how the results are going to look like okay on the left column we can see all the expressions that control the
allocation size of a call to malloc on the right column we can see the corresponding call to the malloc so let's click one of the results uh here we chose a call to a jstring get maximum utf 8 string size two uh to the size argument of the milo call let's click the the result uh and this is indeed what it's going to look like so we can see that by calling that function the returned value uh flows into the variable j size which is then passed as an argument to the mallow call awesome i think that was it yeah that was the first part of our talk congratulations that's the first checkpoint let's move on to the second part of our
journey by entering the rem of javascript core that's me hello there so what is javascript core javascript call is the javascript engine for webkit meaning each javascript instruction executed via safari is actually handled and executed via javascript core now before we'll dive into the fundamentals of javascript core i want us to learn a little bit about side effects in javascript you'll understand why we have to do it now later on but what are side effects in javascript a javascript operation causes side effects if it modifies the state of other variables outside the local environment of that instruction that's a long way to define side effects in javascript so let's see a more uh real example
let's review it this is an example for not triggering side effects in javascript so for example concutting a string with an empty object uh i'm sure you know the the answer a will now hold the this a will now have the string hello with curly braces at the end of it that's the that's a a is now a string with a low concatenated with the curly braces at the end of it but and this is very interesting but concatenating a string with an object that have a function uh the if the function if the object has a function property named to string this will set the value of a to something completely different the function to string does two things
at first it will print to this to the console the string side effect here and then it will return the string b sides concurring a low with this object will trigger the call to tostring and we'll set the value of a to the string hello b sides and of course we'll print to the console the string side effects here now this is where the side effects occurs the mere fact that we were able to execute a different function that has nothing to do with the concatenation meaning that we managed to trigger side effects in javascript now i want to i want you to understand this is perfectly normal it's not a bug it's not i don't know a
security flaw nothing like that side effects are perfectly normal they are part of the the way javascript works and they're crucial to it uh okay keep that in mind now let's dive into the implementation of js adnan number js adnan number is the is a function that written in c it's part from the javascript core engine so every time we're going to call the uh use the plus operator uh javascript call will execute c code and that c code is the function js adnan number uh this function implements uh adding a non-number in javascript core of course okay at first js adnan number will check if the first argument is a string let me remind you the first argument
is the string hello and it is a string so it will continue next it will check if the second argument is not an object now let me remind you the second parameter the second argument is indeed an object so this check will return false if it will return false it will call a different function called js ed slow case which will call the method to primitive the this method will be called from the first argument it will try to convert the first argument which is the string hello into a primitive object a primitive object in javascript is basically a number or a string but the first argument is already a string which means it's already a
primitive object so nothing will happen here it will try to convert the second argument to a primitive object now this is where the interesting part happens so when we try to concatenate an empty object what happened is that return we got a string with curly braces meaning that the chord to two primitive converted the object to the string curly braces but now we have a different function in the that's called tostring to primitive will search for the javascript property to string and if that property exists and it's a valid function javascript call will call that function and eventually this is where this function is a function that we completely control and this is where the side effects
are triggered okay so now that we've learned a little bit about side effects we can focus on javascript code as promised so to understand the guts of javascript call we should first be familiar with the differences between javascript and c sometimes be compiled by a just-in-time compiler also c is statically typed meaning we must declare the argument types and the return value types when writing a new function on the other hand javascript is dynamically typed language meaning we don't have to declare anything when writing a new function the same function can choose to handle whatever type she wants to and finally c allows us to have low level controls such as direct memory management and even
control over the stack or heap layout but javascript is pretty much powerful for everything else javascript can go fast this is where jit compilation comfort comes for the rescue there are photos that an instruction can live in javascript so the first deal we already talked about it it's the low level interpreter compiler this basically takes the javascript instructions and compiles them into byte code it's not it's not actual code that the processor can execute it's by code that the javascript engine can execute then we have the baseline jit there is no additional logic regarding the relation between other instruction what so on it simply takes the instructions find there are plenty of templates in the
javascript call code base and it will take the template for example we want to add two numbers it will take the operation add numbers uh using a template and it will produce the byte the the assembly code that represents that very function that's very operation next we have the data flow graph jit compiler at first javascript core will represent a given piece of code or an operation and it will turn it into data flow graph we will then see how a data flow graph look like but it will apply several optimizations uh on that data flow graph depending on the relations between the nodes in the graph and finally it will compile the data flow graph using the jit compiler
and the final tier is the ftl faster than light jit which highly focuses on optimizing the compiled code it will simply take all the optimizations that the data flow graph jit applied and will ignore the relation it will simply apply all the optimizations in it can to a given piece of code without taking uh in mind the cost it will take to do so okay so let's see an illustration of how javascript core creates a data for graph for a simple function here we have fu foo is very simple it simply takes an argument it returns the multiplication of the argument's first index property with the property y from this object now in order to compile the function
using the dfg-jit compiler we need to execute that function approximately a thousand times so to do so we just put a for loop that will call that function a thousand times and by doing so uh we will compile that function using the dfg jit after doing so let's look at the pseudo data flow graph created for that function and of course at the beginning we set arg as the argument next we make sure that alg is an array the reason for that is that we access the first index property of out and therefore it must be an array if it is an array we're going to create and save this object into v0 and then we will fetch the property y
from this and save it under v1 finally we load the index property 0 from arg and multiply v1 with v2 and return it as mentioned earlier one of the biggest advantages of the telephone graph jit compiler is its optimizations redundancy elimination is a core optimization in the delfo graph jit compiler by using the relations between the nodes in the dataflow graph it can determine which guards are redundant and which are not which are crucial in a matter of fact each redundant guard will be removed pre-compilation and it will save time while executing that code so let's take a look at the following function which which returns the division of the two properties fetched from uh arg
so the data flow graph will look like that at first we'll set alg as an argument then we'll make sure that algae is an object because we fetch properties from that uh argument then we'll fetch one one argument so for example we'll fetch y and then we'll check again that arg is still an object if it is an object will fetch x as well and finally we return the division between these two if property y exists the second check is redundant why because nothing has changed in object in our object so there is no reason to make sure that it's still an object nothing happens y exists everything is good let's save some time during the
execution and that's exactly what javascript call does but what can go wrong if by fetching y it could cause side effects an attacker can change the value of alg and change it and change it from object to a different type then javascript call will think that alg is an object because we remove that check and even though it is not an object and this could lead into type confusion bugs which could help an attacker achieve remote code execution we know that removing necessary checks can allow an attacker to achieve remote code execution in order to deal with that issue javascript call must know which operation can cause side effects and under which terms for example adding two numbers cannot
cause side effects but as we saw earlier adding a string with an object can indeed cause side effects and javascript core must be a well so javascript call must read the argument types as well the side effect modeling is done by the function execute effects you can see the full path in the slide under each case that function will hold a giant switch case each case represents a javascript operation javascript call under each case will determine uh if an operation can cause side effects and if that operation can cause side effects can cause side effects javascript core will call a function called clobber wall so let's take a look at an example uh right here this is a actual snippet from
javascript core from the execute effects function you can see both upper operations can cause side effects according to the modeling of javascript call because under both cases there is a call to club a wall one here and one there but at the final uh operation at the third one we can see that the javascript call can assume that this operation cannot cause side effects because there is no call to global warming all right this was the second part of our talk we've learned that the data flow graph jit compiler can be an interesting surface for an attacker to find new bugs in javascript core of course when focusing on the redundancy elimination optimization combined with side effects
in javascript now let's try to make the world a safer place by hunting this bug bugs ourself using codeql perfect that's me now before writing new and complex queries we should understand the bug that we're hunting as best as possible and this tip applies to every bug you're hunting especially if you're using code qr so let's analyze together a vulnerability found in 2018 that allowed an attacker to achieve full remote core execution via safari this vulnerability of course is triggered by bad side effect modeling of the instance of operation let's start with the let's start from the end by answering the question how did they patch the this bug uh and let's let's understand the patch
it's very simple both snippets you can see it's divided to two both snippets were taken from the function execute effects the function that as we've mentioned earlier is responsible for modeling for side effect modeling in javascript core on the left side we can see the vulnerable version and on the right side the patched one the developers simply added a call to club of wall under the case that represent the instance of operation this is the only line that was added to the code in order to fix that bug so this will this tells us that the vulnerability is indeed uh around the uh side effect modeling the javascript call was doing to that operation you can see they only added a call to
global wall in order to fix the bug we know that the instance of operation wasn't modeled correctly therefore we simply need to find how it is possible to trigger side effect using that operation to do so let's take a look at a part of the exploit at first we only need to create two classes two empty classes then we create a new object named handler that have a single function property called get prototype off uh the doesn't matter what the function does this is where we're gonna put the malicious exploit but eventually it will return a class prototype then we replace the prototype of trigger class with a proxy object this is very impro important we replace the prototype
object with a proxy object and the pro the prototype the proxy object now has a the handler that we defined earlier and finally all we need to do is uh of course compile uh instance of using the data flow graph jit and then call instance of as follow this would trigger the bug the reason for that is that javascript call already thinks that instanceof cannot cause side effects meaning that we see now that in order to trigger side effects using the instance of operation we need to do something uh with the with replacing its prototype with a proxy object that's the final uh goal of that code snippet let's try to understand how the operation of instance
off is implemented so this is the implementation of it that is inside the javascript call meaning it's written in c it's not the javascript implementation it's javascript core implementation uh at first it will call the default has instance uh which will eventually uh heal we'll call the method get prototype object is the argument so uh trigger class is our object and the method the method that will be called is get prototype uh of course trigger class is an object so we're gonna go to the implementation of get prototype from a js object and then oh okay well what happened here we replaced if you remember we replaced the prototype of of trigger class with a proxy object so
the actual get prototype function that will be called is from proxy object not from js object but proxy object and that function has a completely different implementation this function uh will actually fetch the handler the proxy object handler and it will call the get prototype of a javascript function that we control and this is where the side effects occur great we are looking for okay so let's try to define the that bug and i hope you understood that bug let's try to define that bug using code ql are looking for an operation that a javascript call would not call cloverwall under the execute effect function and b that function can trigger side effects even though javascript code assumed it
cannot how do we define the side effects at all so there are three major challenges we need to solve in order to make our query useful at first we need to model each operation for side effects ourselves using colquel although this challenge is very interesting to solve i've decided that we will not talk about this in this talk because due to time constraints but rest assured that the entire query is available in a public repo and i added a link to that report at the end of the slides we'll focus on solving the second challenge which is finding which operations can trigger side effects specifically speaking we're looking for operations that can trigger side effects
using a proxy object so it's the same [Music] way that the instance of operation trigger side effects after this presentation i encourage you all to modify the query so that it will find other timeout strings that's defining the side effect this is like the crucial part of the query so the side effect that we're looking for is constructed out of three parts the first part receives a parameter with the type of js object or somewhere during the flow of that operation a parameter will be converted into a js object then there is a method called form that js object and finally there must be a different method from the class proxy object that shares the same name as the called
method from js object let's remember we want to find operation that handles with js object the simplest scenario is the following where the operation receives as its input as one of its parameters adjacent objects like here let's write the query real quick we'll import c plus plus [Music] and we have two codeql elements the first one is variable which represents all the variable declarations and another one a variable x axis which represents all the accesses to a variable this way we tell codql that we are looking for a variable from the type js object and then all we got to do is link the variable access to that variable and now js object access represents an
access to a js object variable but what if the operation receives a generic type and it will then convert that argument into a js object like this so here we see that the parameters are different from js object but somewhere in the code we convert base which is a parameter into a js object okay let's try to do it with coql at first let's define all the co-ql elements we have a function call which represents all the calls to function and variable axis which represents all the accesses to a variable now that we define the the expressions let's let's make sure that as object will point will hold all the calls to as object like here
and then we can extract its argument we we're interested in the argument so let's link the variable uh the all the expressions as object arguments and make sure that they are inside that they received as an argument to the as object function this way then we'll use data flow analysis again to find the flow from the return value of the function as object to all the js object accesses like so the first dataflow analysis is from uh a parameter of a js object this is the first uh argument uh to the argument of a of the function js object like here and then we we simply need to find the data flow uh using the data flow
analysis the second flow is from the return value of the function as object like here to every js object access js object access is like this one or this one and then we have to find the flow if there is a valid flow in a certain javascript core operation we know that somewhere doing that execution a parameter will be converted to a js object and this is exactly what we're looking for now that we find all the interesting js object accesses all we need to do is to make sure there is a method called form.js object and that there is a different method in the class proxy object that shares its name with the called method we found
let's do it with coql a quick recap by now we were able to identify every operation that has an argument which is a js object or somewhere in the flow of the code will convert one of its argument into a js object now we need to find if somewhere in the code there is a call to a method from that js object not only that that method must share the same name with an existing method from proxy object let's start by defining the codeql elements so as usual as usual jsobject method will represent a method from the class js object and this is how we can do it with coql yeah so we we basically bound the js object get
using get qualifier to the js object so the qualifier to the get prototype method is a object an object is a js object let's bound the element from the element from proxy this is supposed to represent a method called form the proxy class by the way this is not the only way to bound uh from proxy this is that's the way that i chose but from proxy is an object and we make sure that the scope of that function belongs to from proxy a proxy object sorry you can choose the different approach to achieve the same result finally we want both message methods to share the same name so we compare the js object method name with the from proxy
name like so and eventually we need to execute it so this will find all the js js operations that handles with a proc with the sorry the handles js objects and uh that these js objects call methods that share the same name with a proxy object we can now extract all the operations that can cause side effects uh the side effect that we defined earlier but that javascript core assumes they cannot cause side effects and this is exactly where the bug lies in so finally after running our query against the database compiled from the vulnerable version of javascript code we got this result we can see that there are four potential vulnerable operations one of them is
indeed the vulnerable instance of so this is a success already but what about the rest although i could not approve that all four operations are vulnerable i did found that the operation create this is indeed vulnerable and this is awesome this vulnerability was discovered by cielo and it shares the same characteristics as the instance of bug so this is definitely strength my intuition regarding the correctness of the query let's have a quick recap a summary of what we've talked about by now that's me we are a calculator tutorial we talked about what is codeql what's the structure of a query in codeql we've talked about the building blocks uh of codecoil for example variable function call and
we even talked about data flow analysis in code ql then we talked about javascript core we had a javascript core 101 session talked about the differences between javascript and c javascript side effects and javascript called execution tiers the all the jit compilers that exist in that and finally we've talked about bad side effect modeling bugs what are the po what are the what is the potential of these bugs how lo how to look for them and how to translate these bugs into cultural query queries if you're interested in further reading uh you can of course read about uh a blog post that i posted that talks about the codeul in more details and then of course the blog post that i
posted about the mysterious realm of javascript core and the github link to the query itself and that's it thank you very much besides budapest i hope you enjoyed that talk and i'm available for any questions that you might have via twitter or email feel free to contact me and thank you thank you very much have a nice day and enjoy the rest of the the conference