← All talks

Using JA3. Asking for a friend?

BSides DC · 201953:49141 viewsPublished 2019-10Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
TeamBlue
StyleTalk
Mentioned in this talk
About this talk
The number one question every single network detection person gets asked: how do you deal with encrypted traffic? Threat actors leverage encryption to obfuscate their activities, sneaking past the border guards in their enchanted cloak, leveraging legitimate certificates or even worse, legitimate services to operate their C2. In 2017, a method for fingerprinting SSL clients and servers was released titled JA3 and JA3s respectively and with their release, network detection engineers rejoiced. JA3/JA3S seeks to profile the client and server software involved in an SSL/TLS session through fingerprinting their “hello” messages and the involved cryptographic exchange. This method is not without its’ nuances and in our experience putting it to the use, the nuances are critical to understand. This talk will give insights into our challenges, failures and successes with JA3 and JA3S while sharing tips for those seeking to begin using it for network detection. Justin Warner (Director, Applied Threat Research at Gigamon) Justin Warner (@sixdub) is the Director of Applied Threat Research at Gigamon where he leads a team of intelligence analysts, detection engineers, and security researchers who seek to dismantle a threat actors ability to impact their targets. Justin is an Air Force Academy graduate, former USAF Cyber Operations Officer, and has private sector experience in both blue and red team roles, preferring to use his evil skills for good. In his free time, he can be found climbing with his wife and daughter or volunteering in disaster response organizations, bringing a nerdy edge to the mix.
Show transcript [en]

everyone's in the same fundamentals huge shout-outs of those that came before us there's a lot of rehash material here from other presentations and you should definitely check the reference work and then we'll talk about our experiences and some things we've done uniquely with this data and data points so encryption is widespread it's well known that people are that the use of encryption has increased throughout the internet this is you know you see charts like this all over the internet from mozilla and cisco basically showing not only has encryption increase used but also mauers adoption of encryption has been used in our realities we see anywhere from 50% to 60% of our customer data whereas as

so we get to see all this is mostly over SSL specifically in web web protocols and and we've ran into a number of actors who use a wide variety of encryption so at this point no one here surprised everyone's like yeah this is well known in fact but the funny part about that is encryption being used has been used as an argument for just disregarding Network detection altogether literally been in conversations with well-known researchers alike wall if they're using SSL why do you even bother sniffing the network why do you even bother looking which is appalling to me and I mean it's just in mind-blowing because there are a lot of detection opportunities even on

top of encryption so first I want to start by going through how bad guys use encryption one of the most simple ways and one that's been around for decades is the idea of just using self-signed cert so for anyone who doesn't know a self-signed cert is where an attacker creates a cryptographic certificate that's not signed or otherwise trusted by any notable Authority they can completely create these and couple seconds on a Linux command line and they just purely use it to get encrypted traffic so again no established trust in this certificate just using it for encryption and most people will like any red teamers in the room hands raised yeah we got some like you would be like

do people actually still do this like is this really a thing and the fact is yes last week we we pulled a piece of malware or the config pulled out the IP address and when we went and go use internet scan data to look at that IP you can see in the right self-signed cert and this self sign sir is pretending to be adobe acrobat reader like definitely stole a thing bad guys like to pretend they're different people the next level up is attackers using legitimate certificates so with the rise of low-cost free hosting or certificate providers we've seen a surge and in legitimate criminal and thread activity using legitimate certs so what does this look like basically

you go out to let's encrypt Komodo or any number of legitimate certificate providers that are actually doing a good service by helping people secure their websites and or at least gain privacy over the traffic to their websites but there is very little validation so many people will assume that s SL equals trust and that it's not the case that you have to understand the levels of trust that are involved with certificate validation so in this left hand config you can see another IP address this is from last week fresh new adversary infrastructure stood up on the Internet this IP address and on the right hand side you could see it's using a let's encrypt certificate pretending to be a

CDN so bad guys are using this registering legitimate certificates then we get a little bit more fancy adversaries using legitimate services and domain fronting essentially the way this works is they register an account on a legitimate service this could be anything ranging from social media profiles to emails to CD ends and they redirect or otherwise been there malware traffic through it so many people have probably heard a domain fronting we've seen instances where bad guys are using Google Drive for command and control through like exert sheets so literally they're pacing a command and a sheet their malware is pulling that command and extra link data via the sheet it all kind of looks the same trusted third

party that is being an intermediary broker for your malware and so the really big benefit here is that if there is a tap or an defender listening in the corporate network they're not they're going to only see the SSL traffic destined and signed by certificates for that third party so it's it gets extremely hard at this point to distinguish legitimate and illegitimate so let's go into fingerprinting 101 so ssl/tls fingerprinting there's been a long history here so dating back to 2009 there's there's white papers and blogs by a guy named Ivan eristic who was with the SSL labs project where he originally he hypothesized in did some initial research into the idea of fingerprinting the actual application so rather than

just looking at certificates is there some way by looking at network traffic that we can say this process or this entity made this connection and it was some of the earliest research into this and it kind of grew over the years every couple years there would be another research area here in advancement up until in 2017 there's a project named j-3 that was released and gained widespread popularity within the industry both defensive and offensive knowledge wise so what do you actually see so if I'm a network forensics nerd I naturally jump to so it's easy to say it's encrypted you can't see anything but is that actually true have you booked have you ran Wireshark and look

at the traffic and if you did you'd realize that that that's not actually all that true there's a whole protocol exchange that occurs well before the encrypted traffic starts happening that is mostly plain text and plain text either in legitimately plain text strings like ASCII all the way ranging to a well-known structure of metadata or details about the protocol that are important so the client always begins in negotiations with what's called a client hello we're gonna go into that a little bit more later the server then responds with the accepted negotiation called a server hello and then they exchange certificates and they exchange cipher suites and they do a whole bunch of like back-and-forth chatter before saying

okay we're done talking in plain text let's go let's go black out let's go encrypt it that's all the application traffic so that all your application level traffic is then protected so what are these hello messages and why do they matter so the hello messages are the start of the exchange and agreement over what crypto protocols are going to be used so the client will announce a list it'll announce a version and a list of cipher suites as well as extensions that it will use or accept so this will be a longer list and it generally is just like basically imagine imagine two friends talking together and one friend saying hey man do you speak Spanish the

other friend says yeah I speak Spanish and then you start speaking Spanish that's essentially what's going on here the client is saying I accept these these cryptographic protocols the server responds by saying yeah look cool let's use this protocol and then application traffic begins it's important to note out so the picture on the right is an example client hello from pallet rail Empire there are a number of specific fields in here that are fun that the specific SSL versions in plain text the cipher suites which is a whole list of all the different crypto ciphers is in plain text and a number of other all the extensions including an extension into least TLS one - called server name which

includes the name of the server that is that it's starting to communicate with and so think of like a DNS hostname so and we talked about this a little bit and 2017 there is something called J a 3 and J 3 yes released who has heard of J a 3 j 3f all right yeah good audience for that basically J a 3 is the concept of picking out all of the cryptographic things that are exchanged and hashing it and the reason this is important is because the specific cryptographic exchanges that are chosen are chosen at the software layer so it's it's code it's however the SSL call is made is what actually defines what crypto is

going to be used so I as an application programmer can say I want my use application to use 1 1 2 1 1 or nothing in there will default to whatever the application stack set to use and so if I focus exclusively on understanding what's going on with the crypto negotiation in theory I'm focused exclusively on how the application is making the decision so very specifically ja3 the client portion is the TLS version the ciphers the extension the clip the curves ecliptic point formats all put in an ASCII string and then md5 didn't do a hash so it's a it's a hash that is supposed to represent a unique application the ja3 s the server is the

md5 of the TLS version the cipher and the extensions a smaller list because of what's available and how the negotiation works so you can see this example essentially this is literally what would happen you would compute all of these numbers separated by dashes and commas and md5 it so if you have everyone wondered what's happening under the hood of your bro parser or Wireshark however you're seeing this data it's it's not too magical it's a it's taking a bunch of hex constants and values and empty fiving it into a hash alright so you can see that's where the points get pulled from literally you can map a client hello directly to a j3 pretty simple so

let's think back to all of our examples health threats work so we talked about self sign search let's encrypt in domain fronting I mean these are the J 3 and J 3 s's specifically for these different threats so starting with self-signed cert I basically ran a PowerShell Empire calling out to I think it was an Ubuntu 18 box out on a VPS and I exchanged a whole bunch of malware traffic I did the initial like staging and then I let it beacon for awhile and her acted with it purely as an example I know Empire's dead for what it's everyone's worth but this is a this is a representative example here and so let's start with

self sign all these values equate to a J 3 and a j 3s hash I did this against all three you'll note here that the self signed the J a 3 and J e 3s match both on the legitimate cert and the illegitimate cert so what that essentially means is certificates don't play a factor here IP addresses don't play a factor here we were able to identify an application so you irrespective of the where it was hosted what type of certificate we used or anything of the matter they match so in theory maybe we're finding PowerShell Empire in the associated server and the the second one the last one's really particularly interesting did I hear a

lot of people talking about like how do we identify domain fronting how do we detect it at the network level and this shows a little bit of hope even in the domain fronting example our ja3 matches exactly and that would be because it's the same client the same application making the connection in the same way now you'll note that the j3s and domain fronting is completely different which if we recall what we talked about in domain fronting makes complete sense because we're not talking directly to our server we're talking through an intermediary which is acting as our connection so the server and domain fronting is in fact different which are J 3 and J 3 yes validates one

more example I really like another one I've seen a lot of people talk about like it's undetectable just my favorite word as a blue Timur is a DNS over HTTP everyone's been very scared of this you know basically what if we lose visibility like what if I take my DNS rats posh g2 or or Empire cobalt strike or meterpreter and just pipe out of HTTPS so now they can't even see all my long DNS magic it's it's now encrypted well again if you start using HTTPS SSL you now have a J 3 and J 3s hash or just generically a fingerprint and that fingerprint is useful so I ran godo which is a particular DNS over HTTPS

proof of concept tool Rand on Windows x64 I called out the Google and CloudFlare and used some of the common providers and no matter where I called out to you the je3 was always the same so this is no different than any other legitimate traffic c2 and in this particular case it was really funny is this this hash was 100% unique across our entire data set talking years of network data from from multiple large corporations and it was because this is like some unique go library which pretty much no one had been using in our data set ever and so go stood out like a sore thumb here and it actually proved as a really valid and useful je3 indicator

the last one I want to talk about our kind of in cover is something called Cisco joy so in 20 I think those 2019 Cisco released a package called Cisco joy and Cisco joy is another SSL TLS fingerprinting project it it builds upon a lot of the previous research including J 3 and J 3s well what's interesting here is a couple aspects one their hash or their fingerprints they use are not hashes they're actually reversible values they're kind of compressed in a way and so that when you when you look at the indicator if you only collect the indicator you can go back to understand what how the indicator was derived unlike hash which is a one-way function

another neat aspect here is a win in depth talking about how they correlated in point data with network data to enrich j3s with process information so rather than just starting to stare at a fingerprint and wonder what is this fingerprint represent they actually pulled all the process information from their endpoints and we're able to correlate it so then you could start getting building a library of fingerprints to executable names or or processes at least that caused them in this data set there were you know 1,500 unique fingerprints which were associated with about 2,400 unique process names for a total of around just shy of 13,000 hashes process hashes and this was this was a really interesting

thing that drove some of our additional research later on anyone seen the no easy breach talk by the mandiant fireEye folks this is a fantastic talk if you haven't I'd highly recommend it basically it details their year-long IR engagement against apt 29 I mean in all the different creative ways that they had to battle the state-sponsored actor and what was really interesting to me is if you watch the Twitter exchanges on it too they actually used a primitive form of SSL and TLS fingerprinting as a way to track this actor when they would roll their malware so they had basically built like a little in-house version of SSL fingerprinting and we're tracking the actor across the enterprise despite

changes in the tool set despite rolling infrastructure using this method and so when I saw this it definitely inspired motivation to keep researching here as a potentially valid way so let's talk about the nuances of fingerprinting like is it really that easy and obviously we already said there's no silver bullet so it's not but just to get deeper into that you know the first question a lot of people have from this is can we just blacklist these fingerprints and no you really can't and here's you know a great example the SSL abuse list here as a bunch of j3 fingerprints that have come from confirmed and now where executions so the example here is trick pot and so I

took the j3 fingerprint from the database search against all of our data and certainly found a few legitimate cases which are on the top of the list there but down at the bottom there's actually a what's that plant running from a Windows Phone surprising to find a Windows Phone in 2019 but that's a legitimate device and that's you know a legitimate as a cell cert to the whatsapp server as well as you know confirmed whatsapp IP space this was not trick bought using whatsapp this was legitimately whatsapp traffic so can't just blacklist straight up on the j-3 unfortunately but further what does the fingerprint really tell you Justin already mentioned that it you know covers the various parts of the hello

handshake and really those fingerprints line up to an application or in some cases more like a library or the API that's in use so we took you know a bunch of cases from stock Windows machines and you know there will be like five or six J three fingerprints from fresh out of the box windows I was a little surprised that I an edge overlap maybe that's just because I'm I'm ignorant about about what edge actually works like but on Windows 10 ie an edge both have the same fingerprint then PowerShell has a separate fingerprint which is actually also shared with the background intelligent transfer service and the window scripting host has yet another fingerprint so these these fingerprints loans from

the same machine just different different ways of you know accessing as a cell

so he mentioned j3s earlier and really this is better in Paris do you have a j3 you know approximately what that client might be but the same client j3 will always map to one server j3s and a different client j3 will always map to a different server j3s because that's handshake works so in this case the graph above here shows our fingerprint for I think this was another trick bought case where the client j-3 is shared across a bunch of different things but paired with the j3s for the actual confirmed trick bought servers there's really only a few results so it cuts down on the noise and you know those are legitimate trick bought cases

thank you very logically if you imagine like let's say you're investigating a PowerShell or Python Empire talking to the Internet out of our command and control if you're just looking at j-3 you're just looking for PowerShell in your environment which is gonna be a lot here to tell you if you've ever seen a real corporate environment you would think that would be a good indicator but reality it's trash but if you look at the J 3 and J 3s you're not just looking for PowerShell you're looking for PowerShell talking to a very specific Python lib on very specific Linux flavours that tend to be used by Red Team fact folks and zaroor adversaries and so again it just

narrows the the potential for collisions when you start talking about these things and indicator pairs and and that's how you look at it and the same thing is true for Python clients I did this search trying to find Python Empire cases but as mentioned empires kind of dead not alive live Empire at least among our clients at the moment so the Python j3 produced tons and tons of hits you know fun all kind of one-off Python scripts that query different api's and stuff and that's not super interesting it's over all over two months for one we called it a representative customer they had a stable environment a good number of devices they had 25,000 unique j3s

over a 60 day period roughly and over each individual day there is kind of up to 6,000 active not a whole lot new but there's kind of a long tail here it's kind of the message of this there's there's lots of unique j3s and if you're just looking at the fingerprints you've got a lot of data to look through and we need to found a whole lot of new ranges anywhere from 10 or 12 to I mean there were days where we would see a thousand plus new je3 that we had never observed and that two-month dataset so it definitely varied but I actually consider that potentially like it's not a lot new what it could be it's it's

it's significantly down boiled from just the amount of network traffic we're seeing in bulk but it's still a lot to track if you're in a threat Intel guy that's it's a lot of magically new indicators that you have to worry about and that's only two months in one customer this data grows significantly when you look across their base yeah we'll get into that yeah

you know about number devices with your foot it's usually for throughput wise it's it's we're monitoring multiple gigabits of you know throughput traffic many devices man that ranges really drastically I think this I think this data is derived from a customer that's on the scale of thousands but not tens of thousands of devices so rough Wiggly numbers for you we've we actually surveyed this across all our customers and this was kind of like what I would say we felt we're not set statisticians it was statistically a representative enough for us to show we felt good hacker con not map con luckily yeah so the design spec that the guys a self source put out was that the fingerprint

must fit in tweet got to be something that can be easily shared but like anything that's on Twitter there's not a lot of context you can fit into 282 characters so it loses a lot of information that might be helpful for defenders to hunt or to spot you know unique and interesting fingerprints there's a few examples here that are more or less relevant but the SSL ping scanner is one that has a few different aspects that make it looking weak first of all it's longer than average j3 fingerprint I would say on top of that the strictly incrementing order of the cipher Suites is very unusual the second example of the trick bots its cipher

suite list is much more typical where it has it has a mapping basically of stronger cipher suites to the less strong cipher suites but then that the tbody sorry not trick about the t-bot example then in its case it has again a strictly incrementing list of elliptic curve points so that one stands out as well because almost none of the legit ones have either that many elliptic curve options or they they're not they're not ordered in such a way and then the two dried X examples are both very short for a j-3 so these all this context would be lost if you've just had a 32 character and b5 hash but with this if you're if you're

just scanning the data you know doing manual hunting these things will stand out much more so these uh it's it's less usable for there the requirements of being able to tweet but it's it's much easier for an analyst to identify and again the joy the Cisco joy fingerprints they keep all of this context although it's it's all in hex so it's slightly different but you have the context over a single fingerprint or a single hash I should say which is slightly more useful and that's mentioned before some of this can actually be configurable so I don't know if the code is very visible to everyone in the back but this is a basic PowerShell modification of this cell

handshake and specifies in this case just which as a cell version we're looking at you can go deeper and specify sorry for sweets and stuff but in this case just changing the SSL version completely changes the fingerprint and this is something you can very easily add to any of the powerchute and just make it round robin or randomly change every callback or you know then if people are purely looking at jay through from blacklist approach good luck hitting that block list i would add as a form of red teamer I always you know I thought I was super clever like who I'm being evasive like I I'm dodging how everyone's looking at this by being

evasive you were also sticking out and so like you know if you set yourself to like SSL v3 with like every cipher suite on the planet in your magic configuration your might look out you might look worse than if you just looked like Empire as far as like a you know people who are doing analysis or analytics based approaches to this so I we we I would say if you're gonna do this in practice like you use some creativity and logic into it into how you can avoid the fingerprint but not make your own unique way so really what we wanted to get into after covering the basics was what was our journey like doing this and like you

know over the past couple years how have we seen this grow and and really put this to practice I can tell you I was one of the first people when this came out that I was like we must do this now like this is you know super cool and you know I I had known about SSL fingerprinting but it really wasn't packaged and well supported and a lot of tools it was kind of it felt proprietary like a lot of vendors were doing it but there was not a lot of open-source so when I saw it you know I was hyped ready to go and and wanted to dive right in and went off and started you know using

Empire and cobalt strike and mountain Metasploit and calculating all their hashes across all the OS versions and architectures and possible server versions and starting to build a list and then when I started to use it a little bit dismayed and so what I found myself falling back to after about six months of initial research was well it's not really good for detection and I'm gonna caveat that with like I learned later some ideas but at that point I was it's not really good for detection but where it was been really successful for us initially was just post compromise so forensics we find some evil we find what je3 that evil is and then we pivot

across our data to find other cases of that evil in so a very real world story working in suraíh team who had written a red team thankfully not real adversary but not illegitimate adversary should say they were using a custom little Python script in an environment and so we we solve the c2 traffic from this Python script out to a host let's say the host was evil evil over SSL and it was using let's encrypt cert and we found it using just like SSL artifacts and then we were like well it's only like one host talking to one server like this this can't be that well scoped like I'm there elsewhere usually when we find red

teams like we find one thing and then you know Discovery's explode so we decided to pivot on j-3 and we actually found the same thing using Azure CDN as a domain front so same tool that we just pivoted on the j-3 which there were collisions and we certainly had a lot of work to investigate and work through those collisions but it did lead to successful forensic discovery and usually by the time you reach forensics you're willing to sort through collisions because there's an incident and you you know you want to make sure you run it to ground whereas prior to incident you might not be willing to sort through 10,000 collisions that's not an acceptable amount of work value

payoff I'll note that this has been really like personal narrative I would say like I felt that the post compromised hunting piece has been particularly successful when dealing with like the scripting language backdoors because it's just a quick way to down select on the traffic we're looking at only looking at things from likely the scripting posts of some kind and while I said there's a lot of it you generally see a lot of legitimate and then things that stick out is pretty illegitimate so again the the most often way we found this useful right off the bat post compromised hunting just pivot on it like it's any other indicator nothing particularly special about predicted predicting the application other than

there's an there's a link or association between the events fun fact the the sand boxes are starting to adopt J 3 and J 3s which is really really helpful as a blue team er one of the challenges is taking an own tool and trying to find all the possible fingerprints for it like I am NOT a DevOps guy I'm less than hip in the blue team's face like my version of this is literally boot and run like seven concurrent vm's across OS stacks and architectures and like run the thing in each and do Wireshark and then run a script like our research Ops guys like giggle at me and you know make jokes but like you know you could DevOps this for

all the spin up and automation capability or you could just use a sandbox which is you know built to do just that and they already have this data point definitely yeah some of them definitely a call out to the hatching IO really awesome to see the ja3 in there and I believe that one is doing j3s soon yeah talk to jooheon about the j3s he's Billy and I think as a Friday triage has a three s yeah so that's that's really helpful again because j-3 is kind of a tough indicator on its own and in virustotal juju box they have the j-3 right in the UI there it's a little bit buried but you can find it and it

can be really useful for doing this post compromised work or generating these ahead of time for kind of a context list and that's the virustotal of juju box they have a few different sandbox options if you're familiar with their behavioral reports so it's not every report is gonna have that but if you find it they will have three so dialing back to the wire we here we got here because over this time working with other researchers talking to customers working with corporations who had heard of this technology they think it's just so easy to leverage an application fingerprint for pure detection and so obviously that's a goal like we don't want to stop at we have to find evil

some other way and then use this we want to use this unique advancement in network forensics for detection and so our hypothesis here realized we could use this fingerprint in some way probably procedural detection not quite like or tool detection to find you know unknown threats the sexiness like we can find the unknown the known unknowns and in here we're gonna kind of assume a clean fingerprint or one with contacts but has a lot of collisions like you know it's either not previously known or it's at every customer environment we have and therefore it's not particularly useful with the fingerprint alone and when I started down this road you know it's easy to easily like drop the mic

and say here we go and then you say like okay we're actually start like what's the process for this look like and it looks like pretty standard what you would call hunt processes depending on what you call hunt and so I started by just like let's look at all the data sources that are available out in the public spaces to get me context like the first thing I need is I want to pull all these j3s and I want to understand what do we know about them already like that I don't have to go do proprietary research in so there's a bunch out there and it continues to grow I would say this this list is you can

google and find a lot of offshoots but these are the three that we found to be like you know they contain a bulk and they've been useful the first one is j-3 year it's one of the newest ones and the fun part about this is when you visit it they tell you your own j-3 which is always enjoyable but they also they provide a downloadable JSON list of every j-3 they've seen and corresponds to the user agents they've seen it from and basically they're generating this by having people visit their website so it's really good for browser j3s its crowd sourced and it stays up-to-date automatically which is always a plus it's not like a human running malware

and like copy-pasting je3 is do a file on pushing a github that introduces lots of delays there's the tribal tribal NSM and they were one of the earliest guys that kind of explore j3s they initially did a conversion of the original fingerprint TLS work from like 20 or sorry 2010 or 12 I think and they converted all of the known prints to j3s and then added some of their own research and work in there which was really great and then there's the ssl blacklist for those who don't know that abuse CH ssl guys they basically run all this crime we're known malware and and they generate IPS and edit for you they're also generating je threes now

from all this so you can go there and see like trick bought J threes and that's where we you know we we we obviously use that list what I found interesting after this initial study so like step one enriched all the data great fantastic but in one day across our entire environment so we showed you data before this is one day one sample day across our entire customer environment we had 30,000 hashes and so of the 30,000 hashes enriching with this list it left 97% that we knew nothing about so like despite the largest public basis of known there's still a whole lot of unknown that we found in our customer base and so after this I was I would I

want to say discouraged maybe motivated and a little bit knowing that this was going to be in bigger scope project decided to move on so when Cisco joy was released I was particularly excited because their database was we're much bigger than all the other sets so they had a lot more fingerprints and fun fun enough they had the processes it's just much more cool like set of knowledge to correspond like correlating network topics of process names is a lot more fun from a defensive perspective and so when we pulled their database list we actually wrote scripts to convert every Cisco joy fingerprint into a J a fingerprint J 3 so they're not directly the same Cisco joy actually

supports Google grease which is like an SSL or TLS feature whereas J III doesn't they did drop grease and so if you if you're gonna do this work it actually takes a little bit of kung-fu tour they extract the right data points from a joy fingerprint to format it in a j-3 and generate the list and when we did that we went from 97% on label to 91% unlabeled so pretty considerable progress given like one additional data source but still a lot of unknown and even in that labeled set like you're still it's labeled dot dot dot with collisions it isn't it isn't just like labeled perfectly there's obviously going to be collisions and things you have to deal with and so we

kind of leverage the idea of what they were doing with Cisco joy with our own data so we this is where you enter data scientists which I am not and we we we have a guy on our team and Lindsey lock who does fantastic research for us more in the data engineering data science space and he decided to build what's called he named the J III synergy data set and basically what he tried to do was correlate HTTP browsing and traffic from the same host to the same locations with SSL traffic that could be related so essentially he's trying to leverage plain text data HTTP to help gain context on encrypted data so imagine I browse in my web

browser to HTTP google.com right I'm gonna talk to Google who's then gonna redirect me cuz they're gonna be like full we don't serve anything over HTTP and so I'm gonna have an HTTP out to Google and then I'm gonna immediately be followed with a bunch of HTTP but if you're listening on the network you see both and you could potentially make a hypothesis or I would say a fairly strong assumption that if their time bounded and they're from the same source and deaths and the server name and the and the cert matches some fuzziness to the passive DNS of the destination they should he or the host header of the HTTP then maybe the user agent we saw maps to the ja3

what we saw follow and so it's Wiggly so but it but it's kind of like you're trying to gain context on your own data and that was wildly successful it actually mapped all a huge amount of data for us I don't have the exact percentage because this was older it's like an older research project and so I didn't didn't rerun it but at the time it was some words up like tens of 20s or 30s a percents like it was there's still less than 50 per than 50% unable but we gained massive contextual visibility mapping user agents to j3 hashes in this process obviously with you know we could run this all the time and so that's what

we're working on doing now is getting it set up and running on all our data all the time to meet to be a contextual batch process and provide that these pictures are kind of fun they show what we saw so what you want to see is clusters here to like show there's like you know lots of user agents of similar kind that are that are associated with a particular j-3 so on the left you can see we found what looks roughly to be some sort of Microsoft Mac thing you can see all the different variants and user agents but they're all pretty much Microsoft application of some kind running on Mac OS X that was

interesting a little cluster where you're able to pick out really easily like you know visually and on the right-hand side you see a picture is horrible but you see New Relic Java agents and that stuck out like there were you know dozens of these New Relic Java user agents that were associated with a little cluster of j3z so pretty interesting to be able to tie similarities and user agents to particular fingerprints and cluster again better but afterwards we're like are we any closer to finding evil so I'm like I mean we're down the data nerd front like data it's cool I'm doing correlations is cool but what does this actually buy us and so it kind of had a

mind context shift where I went from thinking about what a thing is so what a thing does so like I I kind of started down the path just thinking about what are these things and what I realized was is that that's that's cool but it's gonna be very very difficult to actually find evil so I need to change my mindset to think about what are these things doing like what are what are these applications actually doing based on the other that we can see and like most things in detection no approach is mutually exclusive we should be doing both and we should be doing them equally as well and so what else can we focus on and so

we're we're still working on this this is research that's ongoing and anyone who's worked on like super generic situ detection know that this is will probably never end that it's a very difficult road to go down but anytime we look at representative features or statistics by grouping on these different characteristics is really good so an example is the application represented by a J 3 and J a 3s pair associated with lots of servers and lots of destinations so can i group by J III and J 3s and how many s and i's are there how many destination IP is do we see do we see 1's and i in one destination over 30 days of data because

if so like that's a pretty non prevalent application in our data set that is particularly interesting is the destination or sni associated with many applications so like do we see this IP address which might be shared hosting are there lots of things talking to it because if not it begs the question of like why not like you know we have a lot of customers a lot of data like generally we see things across more than one so if you only see it in one it's it's usually fun when was the application décima estimation or SN I first seen particularly like newly observed J 3 s is an absolutely good like place to start your filtering just

because of its new its naturally interesting I'll be the first one to tell you like I hate the concept of pure anomaly detection we're like you surface every anomaly to a user because I've been in that role of having to triage those anomalies and there are a lot more they're a lot less anomalous than most people think they happen every day the Internet is weird I'm here to tell you and anomalies are an everyday occurrence and so but I do like it as a start because most not most some malicious things are anomalous and so it's like it's a decent filter to start with especially if you're working POC research like you you want to run it

across smaller sets of data and see if you can find some results and then there's the classical things like everyone jumps to je3 but there's a lot of value you can find than just the traditional ssl flow characteristic so data sizes destination and source IPS periodicity regularity in the data like distributions idleness one of my favorite things to hunt for is an analytics person is idle beacons because you know I as a red teamer like 5:00 p.m. sleep 60 minutes and I'm gonna go eat dinner right like that was my like process at the end of every adversary day but like what does that mean in the data that means every 60 minutes plus or

minus some jitter there's a packet and that over the next 10 hours it's gonna be the same thing for 10 straight hours exactly 60 minutes roughly and then data sizes that are like if the same almost every adversary toolkit has jitter in in their sleep intervals but they don't have jitter in their sizes bite sizes so like everyone thinks that you know all the blue teams are finding evil by periodicity like you don't even bother to adjust your size and so it's like 97 bytes exactly and the same like rough interval and so idleness is one of my favorite things to hunt for and you can do that in ssl whether it's encrypted or not and then I love certificates like

almost every Red Team I see is let's encrypt like here like very few are paying for it especially consultants because they have to pay with a lot of engagement so a lot of volume so like why not just register let's encrypt if you go look at like a kill switch GUI Adam Alex from denko's tools he has like you know a lot of scripts to automatically set up servers same things with Jeff Dimmick and Steve is in the room github they have tools that like help set up servers and infrastructure like a lot of them will have built in like scripts to automatically register to let's encrypt that's a good choke point to go look at why don't we just

start by looking at let's encrypt shirts and I like to keep the track these over time and aggregate so as with any analytics approach this is what we found an initial approach is I found security products I found financial analyst scripting their access to investment sites found lots of tour and more security products looks out it turns out security products look pretty evil in nature and so we pretty much see them all the time and they hit on all our initial detection and analytics but we also found evil so in this case this was a red team that we discovered doing this approach so we found this je 3 s and J 3 pair and

and these are all the labels that kind of got applied based on our analytics so it was a new application and a server it was on a single host in our environment it had a single s and I a single destination non prevalent across our customer base the s and I was only coming from a single host it was a low-cost cert or issuer it was young and P DNS so when we correlated the IP to PD and s it was young it was non prevalent like note there was only one P DNS record like just tons of labels that when you look at this you're like that is ugly like but but as a red team

you're not thinking about all these aggregate roll-ups and stats when you deliver your malware like this is you're not a data guy usually so you know you don't think about what you look like in the environment fun enough the the domain that was used by the adversary was actually they had registered after it expired from a legitimate company and so we were just doing certificate stuff or like by name stuff it would actually look pretty legitimate it was uh it was a medical provider website that had expired and they had registered it like a couple weeks after and it was when we even looked at it after the fact was hilarious that you know it was hosted

and like on the yellow books like this adversary domain was in the yellow book associated with a medical provider but if you looked at the actual hosting characteristics that stuck out like a sore thumb that like it expired and was rear edge' stirred moved from Akamai it's a digital ocean or yet to digital ocean and move from Akamai cert or knows no certs to being certain it moved from Akamai Azure register to Namecheap so pretty fun so kind of diving in final thoughts here I always encourage people when you see the hype when you see something new release and advancements in this field don't stop there listen to everything that's being said almost many of the things we told you today we're in

your original ja3 talks like the jas or awesome folks they understand all these characteristics they work in detection but what happened is is 90% of what they said only most people walked away with five percent and the five percent was look ssl fingerprinting finds evil and so it's really important as technical stewards in this community to not stop there and fully understand the nuances and caveats associated with these technologies and educate about that it is a useful this is not this talk was not intended to say that fingerprinting is useful anything it's it's a new idea that needs to continue being explored particularly in the behavioral front and I would encourage everyone to be creative in detection in general clever

blue teamers find the red so being creative with how you approach it not just going for the easy low-hanging fruits has been very valuable for me in my community at the end of the day this is a human game human defenders defending against human adversaries and so it's often how you can think of that humanity that gets you that leg up and in playing the game and I like to say it's all a balance this is a personal rant or pet peeve for me and with the the crowds moving towards like just find it all behaviorally it's so easy generic is the only way you should be going it's all a balance and they're not mutually

exclusive polarizing approaches have no place in the industry you should be doing all of it if it if I can get an IOC from X vendor and it's associated with a b-29 and they reuse it IOC and I don't catch that like that's negligence like now I'm not saying that that approach is the only one you should be doing quite the contrary like it's you know and 1% of what you should be doing but it is useful and we can do it all technology allows us to scale this approach and honestly the more generic you move the less threat specific you are and a lot more effort it requires in terms of man-hours and technology and so

it's really good to find your sweet spot on the balance on this spectrum this is kind of a pyramid of pain shout out wherever I pay that tax but yeah at this point I think we're near done in time so we'll open up for questions if there is time and if not we'll certainly be out back yep

absolutely yeah so they've got a call-out for information or validation yeah I agree with everything you just said it's configurable one thing I find interesting so like I I think a lot like that of like well why is this useful because I can do X and Z but the reality is that most of the bad guys we see don't do any of that yeah they're just lazy it's again humans verse 2 mins and so like what's interesting is when ideas like this come out so like I'm hoping this talk I'm gonna go watch all the red team traffic that we see in our environments and I'm gonna watch all of them disappear but I'm not gonna watch

any of the real bad guys disappear because they do follow some of this but every hour they invest in their ops is money they're losing real adversaries op like a much more efficient business than most of the legitimate pen testers and red teamers and so they they make those calls but yeah totally agree I would love to see more research on the red side of like building custom servers to configure these parameters like in or do it in Apache and do it in the different Python server applications and start masquerading making that more common it'd be fascinating

that's a good question it's on the order of thousands if not higher I mean it depends it depends on what specific application so cobol strikes a really fun example it uses wininet underneath the hood so anything else that uses when I net it looks pretty similar with the difference being cobalt strike has a Java server versus a lot of like Apache stuff but we still see when I net talking to Java like all the time in our real environments there's a lot of job at web servers out there my friends and so that's one that like has been collision prone that we wanted to use the hunt for like we were hoping that I would be a silver bullet on that one

yeah because of the value ability these these toolkits that have a lot of malleability like we were hoping these finger printings would be the bullet but that one has a lot of collisions against repeating languages they collide on the j-3 but not so much the j-3 yes

et a being encrypted traffic analytics Francisco yeah yeah totally there's a I mean there's been research here in the past they I think we've actually spoken to pass but there's been research here in the past about encrypted traffic analysis and in ways to do it and there's a lot of white paper research out there and in this space too in terms of machine learning and modeling on an encrypted traffic and the slide I did on the behavioral stuff a lot of that's in there there there are segments that aren't in there and I would say the mixing in applications into those in crypto traffic analytics is still kind of new ish like we're we're starting to

see it in vendor spaces where a vendor but it's it's I think there's still a lot of myth around applications and how unique they really are and so we're seeing a lot of what might not be called false positives but in the terms of a defender sitting at the keyboard they're a false positive so not much the essential data for the Jaypee freeing prints is still there yeah TLS version you have the cipher suites you have an S and I feel potentially maybe an ESN a yeah is there sometimes okay but the the core is the same because the it's still the cipher suites still the TLS version still the elliptic points I also think

it's fine about 1/3 because you'll you it's 1/3 in the spec versus 1/3 in the implementation and what's been seen and so like we've seen I've seen presentations list like 1/3 you know fingerprinting should be broken it's not how we're not working anymore but we've seen the j-3 folks will tell you like they've seen one supposedly one three where everything was intact and we've seen it too and so it's like implementation to spec is always a fun one to look at but there there has been people you know there are fields in the spec that could be challenging I think it's particularly with us and I and searching the extension being encrypted so not a lot

honestly we've seen very slow 1/3 adoption across their customer other thumbless all 1/2 still and so we haven't we haven't quite gotten there we're still I mean quite honestly we're still early in some of the behavioral pieces here so we did do a few one three via the API is on Windows but like I said it's still you still get a j-3 fingerprint yep yeah I mean j-3 there's multiple scripts with J III on their github actually has a Python script that will you feed it up IP cap and it spits out all the J 3 and J 3 s's it sees there's a Wireshark Lua scripts as well yeah post that out there

yep yes totally thank you for that yeah we're actually using the broken fig so are the the bro module so it's um in there I think that's it for time we're gonna be I'll back for the questions so please join us and we're happy to talk thank you [Applause]