BSidesSF 2022 - JavaScript Obfuscation - It’s All About the P-a-c-k-e-r-s (Or Katz)

Name: BSidesSF 2022 - JavaScript Obfuscation - It’s All About the P-a-c-k-e-r-s (Or Katz)
Uploaded: 2022-07-06
Duration: 45 min 38 s
Description: Or Katz - JavaScript Obfuscation - It’s All About the P-a-c-k-e-r-s The usage of JavaScript obfuscation techniques have become prevalent in today’s threats, from phishing, Magecart, and supply chain injection to malware droppers. This talk will introduce a technique that focuses on the detection of

BSidesSF · 202245:38510 viewsPublished 2022-07Watch on YouTube ↗

Speakers

Or Katz

Tags

CategoryTechnical

TopicSupply Chain Security Web AppSec

StyleTalk

Mentioned in this talk

Tools used

JavaScript obfuscator

About this talk

Or Katz - JavaScript Obfuscation - It’s All About the P-a-c-k-e-r-s The usage of JavaScript obfuscation techniques have become prevalent in today’s threats, from phishing, Magecart, and supply chain injection to malware droppers. This talk will introduce a technique that focuses on the detection of JavaScript packers in order to detect obfuscated files. Sched: https://bsidessf2022.sched.com/event/rjqr/javascript-obfuscation-its-all-about-the-p-a-c-k-e-r-s

Show transcript [en]

okay bingo hey uh this is the last talk i believe of the day yeah mr or katz uh has a tremendous uh security background to him and uh well i'll let you talk about yourself there and i want to know what you got going on with javascript up fusion sure thing thank you very much hi everybody um so i'm very excited being here today uh it's been over two years since i last was doing uh in-person presentation and the last one that i was doing was besides san francisco 2020 so it's uh closing a circle in that sense and i'm here today to talk to you about uh javascript foscation it's all about the packers right and i wrote packers

with dashes and we will go into that later but remember that so let's start the beginning we'll show do a short introduction for myself um former former osp israel chapter lead really enjoyed doing that i'm data driven security researcher which means i work with a lot of data and data is something that i'm using using on a daily basis to create a lot of my research and we will see some examples for that here today when i try to define my role i'm saying that i'm trying to move security challenges into the science and solution space and hopefully we'll see that here today as well and i have a very boring social network persona therefore i

encourage you not to follow me in any given way in twitter or in any other platform don't do that right boring okay let's continue uh how did i end up up here today so over 18 months ago i was doing some research about javascript foundation and in that research i released three different blogs the first one was um in the first one i was trying to actually do a dictionary-based kind of techniques being used to do obfuscation that was the first blog the second blog took some of um phishing data that i had a really good one and i was looking into that and i had the ability to say in the context of trends how many of those phishing

websites are being uh well javascript foundation is being used on those phishing website and the third one i took one sample of javascript being obfuscated the very s well relatively small example because they could be huge i needed to do some snapshots for that and this is the snapshot of that given sample that i did reversing for that sample going step by step uh breaking up that to smaller pieces to try to better understand better explain what happened when something is being obfuscated and while i was doing that i started to ask myself the question well that's a nice thing it took me a couple of hours at the time but i don't have the capacity

to doing that and the amount and volume of samples of javascript being obfuscated in the while that i'm seeing is way too much for me to analyze so i asked myself maybe i can do some sort of a research project around that and i had few objectives for that research research project the first objective was to try to figure out if i can find a technique that can enable me to detect javascript once being obfuscated but more to that it would be nice if i could say that the obfuscated javascript is actually malicious so that's the first objective the second objective was well i need to do that at scale it's not a matter of one or two or ten or twenty

i need to do hundred or even thousand in a given time frame the third objective which is actually related to the second one is me saying okay so i wanna analyze those files but i have some limitations right when it comes to javascript you can take the file the javascript itself you can render it and see what actually being executed but that's something that i didn't want to do from my point of view i need to look into a file as the file being transferred from the server to the client being executed there look at the file before being executed and make decision based on that and that's actually a statical analysis kind of approach which has better performance

when it comes to analyzing those files because obviously when you need to render a page and look on the execution of the page that obviously had performance impact and that could take some time now the first objective that i have is assuming the pareto principle which says the following thing well if i will take 20 of the most used javascript packer javascript packer is actually software that doing a obfuscation of a given javascript file and i will look into them i will be able to see that 80 of the samples out there are being used by those 20 so in other words i'm not trying to solve the entire spectrum of problem of javascript escalation i'm trying to do a

well minimal effort and solve as much as possible in the space of the problem that i'm trying to solve so and here are some of the links of those blogs from over 18 months ago and i started the project and i started with a couple of questions that i had in mind so the first question is why and how javascript being obfuscated and we'll go through that and we'll answer that question uh the second question was what are the numbers behind the usage of javascription in the wild uh and there's an interesting one well i didn't know what's this you know the the volume or the scale of that problem and i had to to well as part of the

research wanted to know that answer and being able to to report on that um the third question that i had in mind well will i be able to well to find a solution for the problem to be able to detect and obfuscate javascript to find a technique to be able to do that and the third question well and that's a question that i would give you the answer but we'll get into data detail later on does javascript fuscation mean malicious and the answer is no we'll go into that in details but obviously javascript foscation can use for malicious purposes but it can also be used for benign purposes as well so let's do touch base let's try to

figure out what does it mean javascript fusion and what's the challenges of that so here you can see three line code of javascript uh hello world kind of example and if you take those three lines of code and run it into an offer skater a packer a software that knows to take a given javascript code and do obfuscation for that code it will create and you cannot see the well you cannot see the example itself but believe me it's not readable it's very hard to understand it became much bigger in that sense and if you take those three lines of code and do the same obfuscation with the same tool one second after you do you did the first

kind of obfuscation you will get a different file and when you compare those files they are not the same so when we talk about some of the problems that i experienced with in the context of javascript affiliation for example the ability to take a file that was identified as malicious because it was obfuscated and we know it's malicious and try to create a signature for that file meaning a text based signature or hash based signature and trying to detect that file will not work because if we will take the same file and do an obfuscation one second after we did the previous obfuscation we'll get different file different signature and that's the problem right that's part of the problem

i want to address so that's a bit introduction how javascript is being obscured and here are the most well the common the most used techniques being used out there for obfuscation uh using repetitive or meaningless whether those are function name or variable name making that a bit unread well unreadable in that sense the file itself using anti-debugging code meaning putting into the code some code that will make the debugging much harder it's not impossible but it makes it much harder and third element here is we'll consume computing resources and time and in other words if you put into the code dead code meaning code that be executed but doesn't do anything or you put timers in

that sense you make a given code being very slowly and in the context again in the context of resources that takes a lot of time and we don't always have those resources so we talked about the how let's talk about the why why javascript being obfuscated so the trivial thing is that client well javascript is the client side client-side code it's run on our browser in most cases and as such it's a code that is exposed to us meaning we can see the code we can see it on our browser and that's versus a server-side code that once been executed we don't really see the code and we don't know really what's the functionality being executed javascript

we have the code we can't analyze that code and that's a challenge right this challenge from a defense well offensive or defensive point depends how you look into that is if you want to make sure that that code is unreadable or it's very hard to understand it called well in the context of security by obscurity making the code harder to read or harder to debug makes sense right it's not again impossible to understand the code and what does it execute but creating confiscation makes it much harder and the second reason as i mentioned before if you do an obfuscation you create much more challenges from a defensive point of view because if you had some sort of a signature whether

that or text base or hash base on that code it will not be successful now i have my own interpretation of the reason why javascript is being used in the wild from a malicious point of view right and that's my perspective and hopefully you will agree with me is that from my point of view it's all about some sort of an equation that says when you have more resources and more time from defensive point of view that will result in lower detection rate meaning if me being attacked by someone that creates an overwhelming amount of work for me and that will create more the need for more resources whether those are human resources or whether the computational

resources uh it will take more time and as a result of that detection light will be lower it will not evade detection but it will be lower and in that sense from an adversary point of view they will have a higher success rate and that equation is what drive the usage of javascript and obfuscation of javascript in the while now so at that point of time i i had some question i had some assumption i started to look into a lot of javascript samples and when i look into those samples four different sample will cross my cross well across my attention in that sense and i took a snapshot of four different samples and well a snapshot of four different

samples and when i look first on those four different uh snapshots i told myself well they are not the same right if you look into them if you go you know letter by letter or character by character they are not the same but they look very similar but they are not the same and why they are not the same because they represent different threats the two upper one represent phishing pages one is financial services the other one is uh storage services services the two lower one the right one is a dropper malware dropper and the left one is mage card javascript trying to steal some credential or sorry some some credit card information stuff like that

so they are definitely not the same these are not same files it's not the same thread but they are very very similar and the similarity is when you do some sort of a zoom out to them you can start to see that there is some sort of a pattern here being executed you can see an anonymous function with two variables and in that will function you can see a variable that contains the function that has a wall with a decreasing kind of element into that and there's some pushing and shifting which is doing some sort of a rotation on a given payload on a given array even that is being given to that function and that's a structure and that

structure repeats itself the values of parameters of functions and a bit let's say some code can be added to that given function but the structure is the same so i asked myself maybe i can work with that structure maybe that structure can help me do some sort of a detection so you're saying hey how can i do that let's take a given javascript file do some sort of a representation of that file in an ast format which is the abstract syntax tree and abstract synthesis take a given code and do some sort of a json kind of representation for that so now i have a json which has some sort of structure and i can go over

that json file and make decision or try to look for the given pattern that i was looking at at the beginning so in our case the functionality that you see if we will take this structure and look only for a given specific sorry specific part of that structure enabled me to detect well a structure of the code or or some sort of a pattern of the code that is not dependent on the values or some of the parameters of functioning being used in that code so i created a proof of concept for that uh and before creating proof of concept for that i used five different packers and i look into those packers as i said before the most used package that

i was able to see and i created signature for each one of those so we will go through that really really quickly but you can see these are some of the examples of those packers a very straightforward one basically doing in most cases some sort of a manipulation of payload a payload that was well we had an original kind of javascript code that code was obfuscated and created a much larger with a payload kind of outcome for that and that payload once going to that this code is being reversed in that sense and being decrypted and in order to get to the original code that was um obfuscated at the beginning so we have three of those

and we have these two that are more more common one you can see that example number four contains well i named it packer with dashes and the reason for that you can see that the name of the function get six parameters and those parameters values are p a c k e r and that's the reason why i call it that and we'll address that later on so we now know that we have a proof of concept we know that we have five different javascript packers that i created signature a structure based signature for them now the question is what are the results what we are seeing out there when i'm using that proof of concept that i just created in order to

see if i have detection what is the amount of well numbers that i see behind that detection so here we go the first data set that i was using was a phishing data set with over 100 000 different phishing pages now i want to say that given data set was not as clear as i wanted that to be it wasn't clean sorry not clear it has a bit disruption in that fact some of those phishing websites were not phishing website or used to be facing website and now they were mitigated so it was not clean data set but still i was able to see that i have 2.5 sorry 2.1 uh detection rate on that data

set which mean in other words that the concept that i had meaning i have some sort of a proof of concept to try to detect a structure of javascript is working i'm seeing detection and more to that i validated those detections those were actually the detection that i was expecting to see it wasn't in some of that sense the second data set that i was using was a much both malware and cryptominers and mage card and some phishing website but it was very very quality in that sense and on that given data set i was able to see the 26 of the javascript pages that were examined were actually obfuscated and that's a really big message and the

message is that we are seeing some sort of a shift in that sense on threat landscape where threat actors understand that they can challenge us from a defensive point of view and they need to do that by doing obfuscation and obfuscation make their attacks much more successful again as we mentioned before and important to say in that context that this number the 26 percent is very uh how should i say it it's very a realistic number it's not i it's not an exaggeration it's not a overall optimistic number it's really realistic meaning the numbers i would expect are much higher now we did testing for data that was classified as malicious now comes the question what

about data that is not malicious would we be able to see some detection and the answer for that is yes we're able to see detection we took the 18 000 most popular um website on alexa and we checked them out and we were able to see the point that 0.5 of those pages were actually also using and javascript for skating more to that we took a random kind of set of urls and we checked them as well and the numbers were very similar to that and the meaning for that is that we are now understanding thinking in some cases javascript fusion is being used on software b9 purposes and why we know that because i was going

over some of those samples and i was well i wanted to make sure that those are actually true detections it's not false positive in any way now i would assume you have some questions on so why they are being obfuscated so why does a benign website do in obfuscation that's a really great question we'll address that in a few seconds so summary of what we have seen so far so we are seeing that javascript fusion detection is accurate right so i created technique and it works and i have really good results which is a good start we were able to see that using that technique i'm actually having over 4k new detection of malware and

phishing pages and again as a result of my objectives that were to increase detection in that sense to be able to detect all kind of obfuscated attacks that most cases will evade traditional kind of detection mechanism i have a nice outcome we were able to see that some of those packers are more mostly associated with malicious activity for example the aes ctr the crypt packer that in most cases is being used for phishing website and we didn't saw any true detection of that given packer being used on b9 website and therefore that can lead us to a decision making kind of approach meaning if we can see some pages being obfuscated with that given packer most likely that that given pages

are malicious and finally we were able to see that obfuscation doesn't is not equal to malicious right and that's a problem and that problem will result with two questions the first one is the one that i just asked so what's the reason for that why what we are seeing out there and why benign websites are being obfuscated or you know javascript file being obfuscated and the second question is okay so now we have a problem we know that some are malicious and some are not can we introduce an approach that will solve that problem that will enable us to do better detection i will try to address those two questions so the first one so why benign websites are being

obfuscated here are a few of the answers for that according to the files that i was able to see i would assume that there are other better reasons as well but for example email address masking for some reason websites some websites in order to make want to include email address in the website but doesn't want to make that email address available to any kind of scanner or search engines that look into their website and therefore they are doing obfuscation for that um cookie client side functionality don't try try to ask me the reason for that it's not good practice but yet we were able to see some of those examples a lot of examples for third-party

scripts such as advertisements and translation services but using um obfuscation and what's the reason for that true so try to imagine that you have a website and that website consume a third-party script that do some sort of a translation service kind of uh functionality on your website that third party vendor is doing obfuscation of that code and push that code to your website and that's the way it's being used um again there they have their own reasons i would assume that security by obscurity that's the reason that why they are doing that and and the last um example that i will try to avoid addressing is that a lot of adult content are also doing obfuscation

and i will avoid addressing that issue um so we try to answer the first question right and the second question is so can we build a mechanism that will enable us to after we detected this that a given file was obfuscated can we take it to the next level and try to differentiate between malicious and non-malicious obfuscated file in order to do that i use two kind of approaches that are complementary in the set in a sense right uh the first approach is a false positive kind of approach to try to say hey let's try to take those benign websites that are in most cases highly popular because that's the problem that we're trying to address

and let's try to figure out if we can find the techniques that will enable us to say that at least few of them are obfuscated but we know how to say that they are benign and not malicious so that's the first approach and the second approach is a true positive kind of approach and in that approach i'm well trying to show and again in a high level kind of sense machine learning approach for classifying malicious files and and we'll go through that and and and again it's a high level an example and hopefully it will make sense so the first approach false positive right so we have highly popular website that's the gray uh circles the alexa and

the random data set that we have and for each one of those files on those domains that are under those kind of data sets we have files we have well javascript code that we hash that javascript code and we associated the hash javascript code to the actual domain names which are the blue circle so orange means the hash value of a javascript code blue means domain names and the gray one is the big circles of alexa and random data sets and what we were able to see is that actually in a lot of cases we can see that the same files are being obfuscated but the hash value is the same and that can be addressed by the example that we

saw before on the translation services because they are taking a given code doing a foscation of that and push that code to many different websites and therefore you can see the same hash value for that code so in other words the false positive and a white list kind of approach would say hey let let's take those files that are being pushed to a lot of well to many different highly popular websites and do a white list approach for that let's make sure that if we see an f a file with a hash value of that given uh the value that we see here we can classify that as non-malicious and by that reduce the problem that we are

facing and again in my case which was a very small set and a very limited kind of data set that i was using i was able to reduce the problem by 20 and obviously if we will use a much larger data set of b9 files we will get to better numbers so that was the first approach right the white the the [Music] the the white list approach and and positive security kind of approach the second approach was true positive right and then machine learning so we saw at the beginning a file a javascript file right and we saw um how we break it into small pieces and try to de-obfuscate that now you can look at that file from a

machine learning port which is a a set of features that can be extracted from that file so for example if you have an array in that file which array which that array contains the actual payload that being used or been de-obfuscated as part of that file we can look at that array and say hey the amount of elements in that array could be considered as a feature a feature that we can use later on the length of the values of the elements of that array can be used the entropy of the values can also be used um we were able to see that a lot of those of those office schedule files contain a lot of

um well parameters and function names now counting those and we and i'm calling them identifiers in that sense counting them is also a feature that can be used in the context of doing some sort of a machine learning approach to try to identify or classify who's malicious and who's not so here is an example i took the data set of those um samples that match on the push shift packer that one of the five signatures that i have the structure based signature so i took one of those and i put them on this graph which you in the context of using two different features here the first feature on the left is the number of unique identifiers

which means the number of parameter values and function names on the bottom side you can see a different feature which means the number of elements that we can see in a given array in that file and only by using those two features right a very naive approach we start to see that there are different classification right we have green red and blue green malware blue mean alexa and sorry blue mean random and red mean alexa so in that sense we started to divide the space to two different groups on the right side and on in our on the upper left side and those group by himself represent malware with the features that we chose only those two

different features that we chose here and that gives us the ability to make decisions we will see those decision in the following slide but wait we have at the center of this graph a mixture of colors right so we can also do some sort of uh focus on that group as well and when we will do that we can see that even in that group we have a even smaller set of features being classified and we can also use and apply a machine learning approach to also make decisions based on that so how does it looks like in the context of machine learning so i took those i well i took three different features the

two one that i just mentioned and a third feature that i will name in a minute and i created a decision tree a malware decision tree kind of um module that enabled me to make decision and again it's a very high level and i didn't spend a lot of time on tuning that it's more about proof of concept and showing that it can be done so when we take only the left the oh sorry the most right side of that tree meaning uh we are using three features the first one on the top is number of identifier which are function name and variable name that start with underscore zero x that's a feature we the second feature that we use is the

number of values in a given array the elements the number of elements and the third feature was the number of unique identifiers which again identifiers are parameter names and functioning only taking those three features with the following values meaning if you can see um number of identifiers starting with underscore 0x above 20 and we can see number of elements in array that is about 17 and we can see that the unique identifier is greater than 92 only those three features in those values will eliminate 50 of the problem meaning 7 well 757 of this out of the 1500 40 41 samples that we checked so that's again an example how we take a file create the right feature out of

that file create a decision tree based on that doing some fine tuning for those different features making the right decision and by that we can classify a given file that we know it's been obfuscated but we don't know if it's malicious or not to be in malicious or being medium right that's the project i'm suggesting here now we had four question at the beginning right why and how javascript is being obfuscated and i think we addressed that uh we had what are the number behind the use of javascript being obfuscated in the wild and we were able to see that for example 60 26 of the malware data set were classified as files that are being obfuscated

uh we addressed the question that obfuscation doesn't equal to malicious right and more to that we address the question on technique that enable us to differentiate between something that is malicious versus something that is benign so that part of the research was done there is some step in the research that i would like to do um in the in the coming month um here are some of the things that are on my to do list uh i want to release that code right i need star some work doing that i need to allocate some of my time to do the work doing that i want to add additional signature for additional packers i want to refine some of the data sets

that i was using trying to use better data set as a result have better results and better functionality being done and i want to use maybe more features in the module of the machine learning module to make better decision being able to classify things much better and so explore the ability to do some sort of an algorithmic extract extraction for patterns in signatures um so i named the presentation it's all about the package right with the dashes and i told you hey what about those dashes and i also showed you that's the reason for that right i'm not sure that you can see that you can see that so the name of the the variables in that

function are p a c a k e r right and part of the approach that i was using was saying hey the name of the variables are not interesting for me because they are they can easily be changed and i was doing the right decision because after a while i was able to see a different example using pact instead of packers which was a good decision at the time but as i continue to look into more examples and more sample i suddenly saw the following example which is again the same structure but this time you can see the name of the variable are underscore 0x176 and so forth right making it very readable and therefore i will say myself

maybe i should change the name of the presentation to it's all about the underscore 0x i don't know we will consider that we'll see so i think that was me for today guys uh i hope that you enjoyed the presentation and if you have any questions feel free here we go i think he wants you to speak to the mic very good well i have lots of questions but i'll start with one okay um so you detect 26 of them uh malicious of obfuscated samples with your five packers right how many uh obfuscated samples were not detected because you didn't have code to recognize the obfuscation in that data set so that's a great question i don't have

a clear answer for that i have my intuition because again if i knew what i'm missing i would easily add a signature or structure based signature for that so i know the numbers are much higher toward the 50 but it's becoming more of a niche or more small kind of packers being used and i didn't have the resources and time to take each one of those packer and figure what's that structural that signature that i want to pull out of them so numbers are high but i don't have a realistic number to share but in the machine learning a model the arbitrary features that you recognize would probably apply to the rest of the packers as well

it's correct and not 100 correct because the machine learning approach was to take a data set that was already classified as being obfuscated by one of the packers that we have and make decision based on that whether that's benign or malicious so in that it will to create some sort of a machine learning approach that generically can detect and obfuscated code is a bit more a bit too over fitting in that sense right because the feature might be different from different packers right and where did you where did your data come from that's a good one uh a lot of the data that like part of the data that we have in my work environment

[Music] thank you i really enjoyed the talk thank you for that thank you um you showed us a few packers but you identified them via their signatures via the patterns in the code um but clearly everyone is using similar things do you have to know what vendors or um i don't know if there's free software out there that everyone is using for this so it's a great question first of all i didn't go into the vendor space of a fuscation that's different as i mentioned at beginning the pareto principle 80 20. i'm not into looking into those that are creating apt kind of you know especially crafted obfuscation or someone that is using a vendor-wise kind of authentication it

was more of the open source or online kind of package that i was able to track and those were the one that were massively used in that sense in the context of numbers uh so i was more focused on those um did i address the question yeah thank you i was i was just mostly curious if there are i don't know popular projects that represent what these packers uh generate um different use cases different you know thank you most of them online like i would assume that you will do a google search for doing an online obfuscation and you will get into some of them really easily one other question that i had uh if

that's okay is um you mentioned at the beginning of the talk about um taking these uh this obfuscated code that was you know packed and it has a similar structure but dissimilar identifiers um and turning those into an ast to look for the patterns that way and then i didn't understand if you did go with that approach or if you used a machine learning model that was something different and why didn't you go with that i did that approach that was the first phase of creating a proof of concept tool that for each packer i created this structure based signature right using the concept of ast and and looking into the patterns that are really relevant

and repetitive in the context of a structure i was taking that given a proof of concept tool that i wrote run it on a lot of data sets and that's the result that i was showing and on those results we were able to see that some of my detection shows that obfuscation doesn't mean malicious and that led me to move into the machine learning approach to like to differentiate between malicious versus benign i see thanks very much sure okay hi um you mentioned some of the false positives that you saw by any chance did you see anybody actually using it not for malicious that would show up in the false puzzles it was actually for more well security

reasons so false positive is is a i need to frame the the concept of false positive on this research because in the context of me creating proof of concept tool that know how to detect those packers i didn't have any false positive meaning i didn't detected structures that were the same structure that being used for fuscation but actually are not being off it's not of obfuscation it's something entirely different i didn't have that kind of false positive i think what created the sense of false positive is that we were able to see that some of the packers being used in a context of malicious intents and some of those pacquiao being used in the context of b9

and that created some sort of a sense of false policy but i didn't have any false positive in that sense oh it gotcha so no i i totally get that the i was just curious if you saw other use cases around the benign for obfuscation um the the examples that i give gave in the email and the cookies those were the most used one but again it's it's it's very like it's these are open source kind of services you can go into that take your code obfuscate it and put it in whatever you want to do with that and in that sense the scars are the limit so i question your belief that it's expensive to expand

these to the point of the eval and actually look at the content and decide if it's malicious or not although it's great that you try to recognize obfuscation you know i think you actually have to take it to the point where you look at the content to figure out if it's malicious or not so but in theory you know one one other point you know the ml proponents would say ml is adaptive when the adversary changes their pattern so i think it's great that you went from the ast model into something with ml because that might scale much better with adaptation of the packers anyway yeah this is also my area of research so i totally agree in that sense and and

and and obviously when i was trying to do things that are more generic in the context of machine learning i started to see results that are not really uh make sense but when i focus my machine learning per packer and the result that put more of the feature are more associated with the packer and the action that that packer makes and that makes more sense to make a decision based on that and obviously these are statistical decisions right it's it's machine learning it's it's a matter of statistics so statistically i had good result that doesn't mean that you can also make mistakes right depends on how the model works uh thank you for the talk i have three

quick questions uh so uh you kind of answered the first question uh you want to extend the the experimental part of your approach uh i was curious to have more details about the data set so i'm gonna skip this and uh the relationship you showed between different hashes where did you get those relationships from so think about it like that so we know we have benign data set that we know that there are matches of javascript being obfuscated on those files that are benign and on each of those files i took that well the javascript component out of those files the the section of javascript components on those files and created hashes for those javascript

parts of that files on those pages and create hash values for them and just save that and then i was just doing some sort of a plotting of that and showing the relationship between them and that showed me the fact that well i can see two different websites that both of them are highly popular right according to alexa um they have well we have i'm working in akamai so we have a lot of data to back that up and try to figure out what does it mean a popular website in the world in that sense and i was able to see it's actually the same kind of code being hash so it's it's a different use case from the

scenario of adversary kind of point of view of taking file and doing each time obfuscation creating a version of that file and deploying that in different contexts it was use case of someone creating security by obscurity taking the same code doing the same one time obfuscation and push that to different you know website and that created the the relationship that i was seeing there thank you and the last question uh is uh you said well some packers are more associated with malicious activities why do you think it's that way that's a good question i i didn't thought about it but i can try to think of the fact that maybe they were created by adversaries therefore being used by

them some of them probably been introduced in the context of for example fishing kits that are including those kind of packers as part of the fishing kit and being used in that context and are not as other packers that are actually online websites that really good and very nice website that you can create different configuration of the you know obfuscation that you want to do and everybody are using that and that's a service right so maybe that was the reason why we were seeing different you know results um so i have a question it's uh of course when you say the malicious uh javascript as you were saying pointing out like maybe phishing campaigns or something

are you also like maybe looking at uh you know use after free attacks on javascript engines within browsers is that that's well that's an issue that i didn't go to that direction and my you know assumption was that i'm focused on file javascript files being obfuscated and how can i detect them while they're in transit that that was my approach okay in that sense other questions great

um yeah so i arrived like towards the end of the talk so we may have covered this towards the beginning but uh so from what i understand it seems like most of the uh analysis that you did was like static right yes so at the beginning of the presentation used four different objectives that i had at the beginning of the project and one of those objectives was that in the context of my research which is related to my work and some of the things that i was doing i was looking for a solution that can do static code analysis meaning i don't have the privilege to do rendering of the page and looking into javascript

once it's been rendered it needs to be decision making on the file as it's in transit okay and therefore i use the static code analysis approach which is much more performance effective because that's also one of the objectives that i had okay okay that makes sense um okay other questions

that's a good one uh linkedin that would be a great approach we will find a way to to get in touch okay others any uh comments questions stimulating ideas it's the end of the day you know people yeah hey it's uh i wake up early massachusetts time so i'm like on my last few the drops of fuel well thank you very much for coming thank you a lot here today so that's for sure and you've been a fantastic audience i hope you enjoyed b-sides lots of things here to do in san francisco go to the hate thank you for for you know accomplish that you know working with us the entire day and doing all those

presentations and all that great introduction and thank you it's been fun so come back next year

BSidesSF 2022 - JavaScript Obfuscation - It’s All About the P-a-c-k-e-r-s (Or Katz)

Related talks