
besides DC would like to thank all of our sponsors and a special thank you to all of our speakers volunteers and organizers good morning everyone give us one minute while we still settle in the Navy fun awesome check check can't turn it on again no check check there we go I saw I met Myles I been doing malware threat stuff for about nine years now I'm a twin dad and I am a drum and bass DJ and I love dogs dogs my name is Justin Warner I'm the director of the applied I research team at gigamon work with that doing lots of fun stuff primarily focused on network forensics finding bad guys using passive network
data prior to this I was a red teamer for a couple years in the consulting world where I got the be a bad guy and I very little thought about how I was being detected and prior to that I was in the government space Air Force computer nerd wife's a hacker daughter will hopefully be a hacker and for fun away from computers I do Disaster Response volunteering it's my reprieve and it's my excitement away from Twitter so why are we here today so we really really like the idea of SSL and TLS fingerprinting this came out a couple years ago the idea of j-3 and before that several six you know previous researches but like a lot of things in
the information security industry once a small advancement is made it gets oftentimes ran with sometimes overblown and mostly it's because not that the discoverers or people who listened hyped it in any way they're very factual and honest and they tell it just like it is it's most often because people walked away with 5% of the content and then go tried to apply it and so in the presentations about ja3 there is quotes exactly like j3 is not a silver bullet yet in the field talking to other researchers talking to customers and talking to others basically all we heard was why are you just focusing only on doing j-3 fingerprinting and so there was a discontinuity here so we wanted to
share our experiences and go through the introduction this will share some basic knowledge it will cover the introductory aspects of it just to make sure everyone's in the same fundamentals the huge shout-outs of those that came before us there's a lot of rehash material here from other presentations and you should definitely check the reference work and then we'll talk about our experiences and some things we've done uniquely with this data and data points so encryption is widespread it's well known that people are that the use of encryption has increased throughout the internet this is you know you see charts like this all over the internet from mozilla and cisco basically showing not only has encryption increase used
but also mauers adoption of encryption has been used in our realities we see anywhere from 50% to 60% of our customer data whereas as so we get to see all this is mostly over SSL specifically in web web protocols and in we've ran into a number of actors who use a wide variety of encryption so at this point no one here surprised everyone's like yeah duh this is well-known in fact but the funny part about that is encryption being used has been used as an argument for just disregarding Network detection altogether literally been in conversations with well-known researchers alike wall if they're using SSL why do you even bother sniffing the network why do you even bother looking
which is appalling to me and I mean it's just in mind-blowing because there are a lot of detection opportunities even on top of encryption so first I want to start by going through how bad guys use encryption one of the most simple ways and one that's been around for decades is the idea of just using self-signed cert so for anyone who doesn't know a self-signed cert is where an attacker creates a cryptographic certificate that's not signed or otherwise trusted by any notable Authority they can completely create these in a couple seconds on a linux command line and they just purely use it to get encrypted traffic so again no established trust in this certificate just using it for encryption and most
people will like any red teamers in the room hands raised yeah we got some like you would be like do people actually still do this like is this really a thing and the fact is yes last week we we pulled a piece of malware or the config pulled out the IP address and when we went and go use internet scan data to look at that IP you can see on the right self-signed cert and this this self sensor is pretending to be Adobe Acrobat Reader like definitely still a thing bad guys like to pretend they're different people the next level up is attackers using legitimate certificates so with the rise of low-cost free hosting or certificate providers we've
seen a surge in legitimate criminal and threat activity using legitimate certs so what does this look like basically you go out to let's encrypt Komodo or any number of legitimate certificate providers that are actually doing a good service by helping people secure their websites and or at least gain privacy over the traffic to their websites but there is very little validation so many people assume that s SL equals trust and that it's not the case that you have to understand the levels of trust that are involved with certificate validation so in this left hand config you can see another IP address this is from last week fresh new adversary infrastructure stood up on the Internet this IP address
and on the right hand side you can see it's using a let's encrypt certificate pretending to be a CDN so bad guys are using this registering legitimate certificates then we get a little bit more fancy adversaries using legitimate services and domain fronting essentially the way this works is they register an account on a legitimate service this could be anything ranging from social media profiles to emails to CD ends and they redirect or otherwise been there our traffic through it so many people have probably heard of domain fronting we've seen instances where bad guys are using Google Drive for command and control through like it's through sheets so literally they're pasting a command and a sheet and they're malware's
pulling that command and excellent data via the sheet it all kind of looks the same trusted third-party that is being an intermediary broker for your malware and so the really big benefit here is that if there is a tap or an defender listening in the corporate network they're not they're going to only see the SSL traffic destined and signed by certificates for that third party so it gets extremely hard at this point to distinguish legitimate illegitimate so let's go in the fingerprinting 101 so SSL TLS fingerprinting there's been a long history here so dating back to 2009 there's there's white papers and blogs by a guy named Ivan ristic who was with the SSL labs project where he originally
he hypothesized and did some initial research into the idea of fingerprinting the actual application so rather than just looking at certificates is there some way by looking at network traffic that we can say this process where this entity made this connection and it was some of the earliest research into this and it kind of grew over the years every couple years there'd be another research area here in advancement up until in 2017 there's a project named j-3 that was released and gained widespread popularity within the industry both defensive and offensive knowledge wise so what do you actually see so if I'm I'm a network forensics nerd I naturally jump to so it's it's easy to say it's
encrypted you can't see anything but is it actually true have you booked have you ran Wireshark and looked at the traffic and if you did you'd realize that that that's not actually all that true there's a whole protocol exchange that occurs well before the encrypted traffic starts happening that is mostly plain text and plain text either in legitimately plain text strings like ASCII all the way ranging to a well-known structure and metadata or details about the protocol that are important so the client always begins the negotiations with what's called a client hello we're gonna go into that a little bit more later the server then responds with the accepted negotiation called a server hello and
then they exchange certificates and they exchange cipher suites and they do a whole bunch of like back-and-forth chatter before saying okay we're done talking in plain text let's go let's go black out let's go encrypt it and that's all the application traffic so that all your application level traffic is then protected so what are these hello messages and why do they matter so the hello messages are the start of the exchange in agreement over what crypto protocols are going to be used so the client will announce a list or it'll announce a version and a list of cipher suites as well as extensions that it will use or accept so this will be a longer list and it generally is just
like basically imagine imagine two friends talking together and one friend saying hey man do you speak Spanish the other friend says yeah I speak Spanish and then you start speaking Spanish that's essentially what's going on here the client is saying I accept these these cryptographic protocols the server responds by saying yeah look cool let's use this protocol and then application traffic begins it's important to no doubt so the picture on the right is an example client hello from pallet rail Empire there are a number of specific fields in here that are fun that the specific SSL versions in plain text the cipher suites which is a whole list of all the different crypto ciphers is in plain text and a number of
other all the extensions including an extension into at least TLS one two called server name which includes the name of the server that is that it's starting to communicate with and so think of like a DNS hostname so when we talked about this a little bit and 2017 there was something called J a 3 and J 3 F released who has heard of J a 3 j 3s all right yeah good audience for this basically J 3 is the concept of picking out all of the cryptographic things that are exchanged and singing it and the reason this is important is because the the specific cryptographic exchanges that are chosen are chosen at this software layer so
it's it's code it's however the SSL call is made is what actually defines what crypto is going to be used so I as an application programmer can say I want my use application to use 1 1 2 1 1 or nothing and they're all default to whatever the application stack set to use and so if I focus exclusively on understanding what's going on with the crypto negotiations in theory I'm focused exclusively on how the application is making the decision so very specifically j-3 the client portion is the TLS version the ciphers the extension the clip the curves ecliptic point formats all put in an ASCII string and then md5 didn't do a hash so it's a
it's a hash that is supposed to represent a unique application the j3s the server is the md5 of the TLS version the cipher and the extension the smaller list because of what's available and how the negotiation works so you can see this example essentially this is literally what would happen you would compute all of these numbers separated by dashes and commas and md5 it so if you have ever and wondered what's happening under the hood if your bro parser Wireshark however you're seeing this data it's it's not too magical it's taking a bunch of hex constants and values and MD fiving it into a hash alright so you can see that's where the points get pulled from literally you can
map a client hello directly to a j-3 pretty simple so let's think back to all of our examples health threats work so we talked about self-signed cert so let's encrypt and domain fronting I mean these are the J 3 and J 3 s's specifically for these different threats so starting with self-signed cert I basically ran a PowerShell Empire calling out to I think it was an Ubuntu 18 box out on a VPS and I exchanged a whole bunch of malware traffic I did the initial like staging and then I let it beacon for a while and are acted with purely as an example I know empires dead for what it's everyone's worth but this is a this is a representative example
here and so let's start with self-signed all these values equate to a J 3 and a j 3s hash I did this against all three you'll note here that the self signed the J a 3 and j 3s match both on the legitimate sir in the illegitimate cert so what that essentially means is certificates don't play a factor here IP addresses don't play a factor here we were able to identify an application so you irrespective of the where it was hosted what type of certificate we used or anything of the matter they match so in theory maybe we're finding PowerShell Empire in The Associated server and the the second one the last one's really particularly interesting because I hear
a lot of people talking about like how do we identify domain fronting how do we detect it at the network level and this shows a little bit of hope even in the domain fronting example RJ a3 matches exactly and that would be because it's the same client it's the same application making the connection in the same way now you'll note that the j3s and domain fronting is completely different which if we recall what we talked about and domain fronting makes complete sense because we're not talking directly to our server we're talking through an intermediary which is acting as our connection so the server and domain fronting is in fact different which are J 3 and J 3 s validates one
more example I really like another one I've seen a lot of people talk about like it's undetectable just my favorite word as a blue Timur is a DNS over HTTP everyone's been very scared of this you know basically what if we lose visibility like what if I take my DNS rats posh g2 or or Empire cobalt strike or meterpreter and just pipe o2r HTTP so now they can't even see all my long DNS magic it's it's now encrypted well again if you start using HTTPS SSL you now have a J 3 and J 3s hash or just generically a fingerprint and that fingerprint is useful so I ran a godo which is a particular DNS over HTTPS
proof of concept tool right now on Windows x64 I called out to Google and CloudFlare use some of the common providers and no matter where I called out to you the j-3 was always the same so this is no different than any other legitimate traffic situ in in this particular case that was really funny is this this hash was 100% unique across our entire data set I'm talking years of network data from from multiple large corporations and it was because this is like some unique go library which pretty much no one had been using in our data set ever and so go stood out like a sore thumb here and it actually proved as a really
valid and useful je3 indicator the last one I want to talk about our kind of uncover is something called Cisco joy so in 20 I think was 2019 Cisco released a package called Cisco joy and Cisco joy is another SSL TLS fingerprinting project it builds upon a lot of the previous research including J 3 and J 3s what was interesting here is a couple aspects one their hash or their fingerprints they use are not hashes they're actually reversible values they're kind of compressed in a way and so that when you when you look at the indicator if you only collect the indicator you can go back to understand what how the indicator was derived unlike a hash which is a one-way
function another neat aspect here is a women depth talking about how they correlated in point data with network data to enrich j3s with process information so rather than just trying to stare at a fingerprint and wonder what is this fingerprint represent they actually pulled all the process information from their endpoints and we're able to correlate it so then you could start getting building a library of fingerprints to executable names or or processes at least that caused them in this data set there were you know 1,500 unique fingerprints which were associated with about 2400 unique process names for a total of around just shy of 13,000 hashes process hashes and this was this was a really interesting
thing that drove some of our additional research later on anyone seen the no easy breach talk by the Mandy and fireEye folks this is a fantastic talk if you have and I'd highly recommend it basically it details their year-long IR engagement against apt 29 and in all the different creative ways that they had to battle the state-sponsored actor and what was really interesting to me is if you watch the Twitter changes on it too they actually used a primitive form of SSL and TLS fingerprinting as a way to track this actor when they would roll their malware so they had basically built like a little in-house version of SSL fingerprinting and we're tracking the actor across the enterprise despite
changes in the tool set despite rolling infrastructure using this method and so when I saw this it definitely inspired motivation to keep researching here as a potentially valid way so let's talk about the nuances of fingerprinting like is it really that easy and obviously we already said there's no silver bullet so it's not but just to get deeper into that you know the first question that a lot of people had from this is can we just blacklist these these fingerprints and know what you really can't and here's you know a great example the SSL abuse list here as a bunch of j:3 fingerprints that have come from confirmed malware executions so the example here is trick bot and so I took
the j3 fingerprint from the database search against all of our data and certainly found a few legitimate cases which are on the top of the list there but down at the bottom there's actually a whatsapp client running from a Windows Phone surprising to find a Windows Phone in 2019 but that's a legitimate device and that's you know a legitimate SSL cert to the whatsapp server as well as you know confirmed whatsapp IP space yes this was not trick bought using whatsapp this was legitimately whatsapp traffic so can't just blacklist straight up on the j-3 unfortunately but further what does the fingerprint really tell you Justin already mentioned that it you know covers the various parts of the the hello handshake and
really those fingerprints line up to an application or in some cases more like a library or the API that's in use so we took you know a bunch of cases from stock Windows machines and you know there will be like five or six J three fingerprints from fresh out of the box windows I was a little surprised that I an edge overlap maybe that's just because I'm I'm ignorant about about what edge actually works like but on Windows 10 ie an edge both have the same fingerprint then PowerShell has a separate fingerprint which is actually also shared with the background intelligent transfer service and the Windows scripting host has yet another fingerprint so these these fingerprints
are all from the same machine just different different ways of you know accessing SSL so he mentioned j3s earlier and really this is better in Paris if you have a j-3 you know approximately what that client might be but the same client j-3 will always map to one server j3s and a different client j-3 will always map to a different server j3s because that's how the SSL handshake works so in this case the graph above here shows our fingerprint for I think this was another trick bought case where the client j-3 is shared across a bunch of different things but paired with the j3s for the actual confirmed trick bot servers there's really only a few results so it
cuts down on the noise you know those are legitimate trick bought cases thank you very logically if you imagine like let's say you're investigating a powerseller Python Empire talking to the internet over command and control if you're just looking at j-3 you're just looking for a power show in your environment which is gonna be a lot here to tell you if you've ever seen a real corporate environment you would think that would be a good indicator but reality it's trash but if you look at the J 3 and J 3s you're not just looking for PowerShell you're looking for PowerShell talking to a very specific Python lib on very specific Linux flavours that tend to be used by Red Team fact folks and
zaroor adversaries and so again it just narrows that the potential for collisions when you start talking about these things and indicator pairs and and that's how you look at it and the same thing is true for Python clients I did this search trying to find Python in Empire cases but as mentioned empires kind of dead not a lie alive Empire at least among our clients at the moment so the Python j-3 produce tons and tons of hits you know fund all kind of one-off Python scripts that query different api's and stuff and that's not super interesting it's overall over two months for one we called it a representative customer they had a stable environment good number of devices they had 25,000
unique j3s over a 60 day period roughly and over each individual day there's kind of up to 6,000 active not a whole lot new but there's kind of a long tail here it's kind of the message of this there's there's lots of unique j3s and if you're just looking at the fingerprints you've got a lot of data to look through and we instead found a whole lot of new ranges anywhere from 10 or 12 to I mean there were days where we would see a thousand plus new j3s that we had never observed in that two-month data set so it definitely varied but I actually consider that potentially like it's not a lot new and what it could be
it's it's it's significantly down boiled from just the amount of network traffic we're seeing in bulk but it's still a lot to track if you're in Toronto diets it's a lot of magically new indicators you have to worry about and that's only two months in one customer this data grows significantly when you look across our base yeah a few truck we'll get into that yeah do you know about number of devices or throughput that's up man it's usually for throughput wise it's it's we're monitoring multiple gigabits of you know throughput traffic main devices man that ranges really drastically I think this I think this data is derived from a customer that's on the scale of
thousands but not tens of thousands of devices so rough wiggly numbers for you we've we actually surveyed this across all our customers and this was kind of like what I would say we felt we're not said statisticians it was statistically a representative enough for us to show we felt good hacker con not math con luckily yeah it's not cameras today okay so the design spec that the guys the self source put out was that the fingerprint must fit in tweet it's got to be something that can be easily shared but like anything that's on Twitter there's not a lot of context you can fit into 282 characters so it loses a lot of information that might be
helpful for defenders to hunt or to spot you know unique and interesting finger prints there's a few examples here that are more or less relevant but the SSL ping scanner is one that has a few different aspects that make it look unique first of all it's longer than the average j3 fingerprint I would say on top of that the the strictly incrementing order of the the cipher suites is very unusual the second example of the trick bot its cipher suite list is much more typical where it has it has a mapping basically of stronger cipher suites to less strong cipher suites but then that the tbody sorry not trick but the t-bot example then in its case it has again a strictly
incrementing list of elliptic curve points so that one stands out as well because almost none of the legitimate ones have either that many elliptic curve options or they they're not they're not ordered in such a way and then the two dry Dex examples are both very short for a j-3 so these all this context would be lost if you just had a 32 character md5 hash but with this if you're if you're just scanning the data you know doing manual hunting these things will stand out much more so these it's it's less usable for there the requirements of being able to tweet but it's it's much easier for an analyst to identify and again the joy the Cisco joy
fingerprints they keep all of this context although it's it's all in hex so it's slightly different but you have the context over a single fingerprint or a single hash I should say which is slightly more useful and as mentioned before did you mint know some of this can actually be configurable so I don't know if the code is very visible to everyone in the back but this is a basic PowerShell modification of the SSL handshake and specifies in this case just which SSL version we're looking at you can go deeper and specify cipher Suites and stuff but in this case just changing the SSL version completely changes the fingerprint and this is something you can very easily add to any of the the
powershell based post exploitation frameworks just a quick modification really and really it's all any of the API based frameworks fun homework for any red teamers like go take your framework go look up how to configure the SSL Options and just make it round-robin or randomly change every callback or you know then if people are purely looking at j-3 from blacklist approach good luck hitting that blacklist I would add as a former red teamer I always you know I thought I was super clever like oh I'm being evasive like I I'm dodging how everyone's looking at this by being evasive you were also sticking out and so like you know if you set yourself to like SSL
v3 with like every cipher suite on the planet in your magic configuration your might look out you might look worse than if you just looked like Empire now as far as like a you know people who are doing an analysis or analytics based approaches to this so I we I would say if you're gonna do this in practice like you use some creativity and logic into it into how you can avoid the fingerprint but not make your own unique way so really what we wanted to get into after covering the basics was what was our journey like doing this and like you know over the past couple years how have we seen this grow and and really put
this to practice I can tell you I was one of the first people when this came out that I was like we must do this now like this is you know super cool you know I I had known about SSL fingerprinting but it really wasn't packaged and well supported and a lot of tools it was kind of it felt proprietary like a lot of vendors were doing but there was not a lot of open source and so when I saw it you know I was as hyped ready to go and wanted the dive right in and went off and started you know using Empire and cobalt strike and mountain Metasploit and calculating all their hashes across all the OS versions
and architectures and possible server versions and starting to build a list and then when I started to use it a little bit dismayed and so what I found myself falling back to after about six months of initial research was well it's not really good for detection and I'm gonna caveat that with like I learned later some ideas but at that point I was it's not really good for detection but where it was been really successful for us initially was just post compromise so forensics we find some evil we find what je3 that evil is and then we pivot across our data to find other cases of that evil and so a very real old story
working in a red team who had written and red team linked Faline ah tree laboratory but not a legitimate adversary should say they were using a custom little python script in an environment and so we we solve the c2 traffic from this Python script out to a host let's say the host was evil dot evil over SSL and it was using let's encrypt cert and we found it using just like SSL artifacts and then we were like well it's only like one host talking to one server like this this can't be that well scoped like I'm sure there elsewhere usually when we find red team's like we find one thing and then the you know discoveries explode so we
decided to pivot on j-3 and we actually found the same thing using Azure CDN as a domain front and so same tool that we just pivoted on the j-3 which there were collisions and we certainly had a lot of work to investigate and work through those collisions but it did lead to successful forensic discovery and usually by the time you reach forensics you're willing to sort through collisions because there's an incident and you you know you want to make sure you run it to ground whereas prior to incident you might not be willing to sort through 10,000 collisions that's not an acceptable amount of work value payoff I'll note that this has been really like a
personal narrative I would say I felt that the post compromised hunting piece has been particularly successful when dealing with like the scripting language backdoors because it's just a quick way to down select on the traffic we're looking at the only looking at things from likely the scripting posts of some kind and while I said there's a lot of it you generally see a lot of legitimate and then things that stick out is pretty illegitimate so again the the most often way we found this useful right off the bat post compromised hunting just pivot on it like it's any other indicator nothing particularly special about predicted predicting the application other than there's a there's a link or association between the events
fun fact the the sand boxes are starting to adopt J 3 and J 3s which is really really helpful as a blue Timur one of the challenges is taking an own tool and trying to find all the possible finger prints for it like I am NOT a DevOps guy I'm less than hip in the blue team space like my version of this is literally boot and run like seven concurrent vm's across OS stacks and architectures and like run the thing in each and do Wireshark and then run a script like our research ops guys like giggle at me and you know make jokes but I like you know you could DevOps this for all the spin
up and automation capability or you could just use a sandbox which is you know built to do just that and they already have this data point definitely yeah some of them definitely a call out to the hatching I oh really awesome to see the j-3 in there and I believe that one is doing j3s soon yeah I talked to jurian about the j3s usability and I think as a Friday triage has j3s enabled yeah so that's that's really helpful again because j-3 is kind of a tough indicator on its own and invite so little juju box they have the j-3 right in the UI there it's a little bit buried but you can find it
and it can be really useful for doing this post compromised work or generating these ahead of time for kind of a context list and that's the virustotal juju box they have a few different sandbox options if you're familiar with their behavioral reports so it's not every report is going to have that but if you find it they will have j-3 so dialing back to the wire we here we got here because over this time working with other researchers talking to customers working with corporations who have heard of this technology they think it's just so easy to leverage an application fingerprint for pure detection and so obviously that's a goal like we don't want to stop at we have to
find evil some other way and use this we want to use this unique advancement in network forensics for detection and so our hypothesis he realized we could use this fingerprint in some way probably procedural detection not quite like your tool detection to find you know unknown threats the sexiness like we can find the unknown the known unknowns and then here we're gonna kind of assume a clean fingerprint or one with contacts but has a lot of collisions like you know it's either not previously known or it's at every customer environment we have and therefore it's not particularly useful with the fingerprint alone and when I started down this road you know it's easy to it's easy to like drop the mic
and say here we go and then you say like okay we're actually start like what's the process for this look like and it looks like pretty standard what you would call hunt processes depending on what you call hunt and so I started by just like let's look at all the data sources that are available out in the public spaces to get me context like the first thing I need is I want to pull all these j3s and I want to understand what do we know about them already like that I don't have to go do proprietary research in so there's a bunch out there and it continues to grow I would say this look this list is you can google
and find a lot of offshoots but these are the three that we found to be like you know they contain a bulk and they've been useful the first one is j-3 year it's one of the newest ones and the fun part about this is when you visit it they tell you your own ja3 which is always enjoyable but they also they provide a downloadable JSON list of every j-3 they've seen corresponds to the user agents they've seen it from and basically they're generating this by having people visit their website and so it's really good for browser j3s it's crowdsource and it stays up-to-date automatically which is always a plus it's not like a human running malware
and like copy paste and j3s to a file on pushing a github because that introduces lots of delays there's the tribal tribal MSM and they were one of the earliest guys that kind of explore j3s they initially did a conversion of the original fingerprint TLS work from like 20 or sorry 2010 or 12 I think and they converted all of the known prints to j3s and then added some of their own research and work in there which was really great and then there's the ssl blacklist for those who don't know that abuse CH ssl guys they basically run all this crime we're known malware I mean and they generate the IPS and edit for you they're also generating
je threes now from all this so you can go there and see like trick bot j3s and that's where we you know we we we obviously use that list but what I found interesting after this initial study so like step one and rich all the data great fantastic but in one day across our entire environment so we showed you data before this is one day one sample day across our entire customer environment we had 30,000 hashes and so of the 30,000 hashes enriching with this list it left 97 percent then we knew nothing about so like despite the largest public basis of known there's still a whole lot of unknown that we found in our customer
base and so after this I was I would I want to say discouraged maybe motivated and a little bit knowing that this was going to be in bigger scope project decided to move on so when Cisco joy was released I was particularly excited because their database was were much bigger than all the other sets so they had a lot more fingerprints and funding it fun enough they had the processes it's just much more cool like set of knowledge to correspond like correlating network traffic to process names is a lot more fun from a defensive perspective and so when we pulled their database list we actually wrote scripts to convert every Cisco joy fingerprint into a J
fingerprint J 3 so they're not directly the same Cisco joy actually supports Google grease which is like an SSL or TLS feature where as J III doesn't they just drop grease and so if you if you're gonna do this work it actually takes a little bit of kung-fu tour they extract the right data points from a joy fingerprint to format and the J 3 and generate the list and when we did that we went from 97% unlabeled to 91% unlabeled so pretty considerable progress given like one additional data source but still a lot of unknown and even in that labeled set like you're still it's labeled dot dot dot with collisions it isn't it isn't
just like labeled perfectly there's obviously going to be collisions and things you have to deal with and so we kind of leverage the idea of what they were doing with Cisco joy with our own data so we this is where you enter data scientists which I am NOT I mean we we we have a guy on our team and Lindsey lock who does fantastic research for us more in the data engineering data science space and he decided to build what's called he named the J III synergy data set and basically what he tried to do was correlate TTP browsing and traffic from the same host to the same locations with SSL traffic that could be
related so essentially he's trying to leverage plain text data HTTP to help gain context on encrypted data so imagine I browse in my web browser to HTTP google.com right I'm going to talk to Google who's then gonna redirect me because they're gonna be like full we don't serve anything over HTTP and so I'm gonna have an HTTP out to Google and then I'm gonna immediately be followed with a bunch HTTP but if you're listening on the network you see both and you could potentially make a hypothesis or I'd say a fairly strong assumption that if their time bounded and they're from the same source and deaths and the server name and the in the cert matches some fuzziness to the
passive DNS of the destination they should he or the host header of the HTTP then maybe the user agent we saw maps to the ja3 what we saw follow and so it's Wiggly so but it but it's kind of like you're trying to gain context on your own data and that was wildly successful it actually mapped all a huge amount of data for us I don't have the exact percentage because this was older it's like an older research project and so I didn't didn't rerun it but at the time it was some words up like tens of 20s or 30s a percents like it was there's still than 50 per more than 50% unable but we
gained massive contextual visibility mapping user agents 2j3 hashes in this process obviously with you know we could run this all the time and so that's what we're working on doing now is getting it set up and running at all our data all the time to meet to be a contextual batch process and provide that these pictures are kind of fun they show what we saw so what you want to see is clusters here to like show there's like you know lots of user agents of similar kind that are that are associated with a particular j-3 so on the left you can see we found what looks roughly to be some sort of Microsoft Mac thing you can
see all the different variants and user agents but they're all pretty much Microsoft application of some kind running on Mac OS X so that was interesting a little cluster where you're able to pick out really easily like you know visually and on the right-hand side you see that picture is horrible but you see New Relic Java agents and that stuck out like there were you know dozens of these New Relic Java user agents that were associated with a little cluster j3s so pretty interesting to be able to tie similarities and user agents to particular fingerprints and cluster again better but afterwards we're like are we any closer to finding evil so I'm like we're down the data nerd front like
data is cool doing correlations is cool but what does this actually buy us and so it kind of had a mind context shift where I went from thinking about what a thing is to what a thing does so like I I kind of started down the path just thinking about what are these things and what I realized was is that that's that's cool but it's gonna be very very difficult to actually find evil so I need to change my mindset to think about what are these things doing like what are what are these applications actually doing based on the other factors that we can see and like most things in detection no approach is mutually
exclusive we should be doing both and we should be doing them equally as well and so what else can we focus on and so we're we're still working on this this is research that's ongoing and as anyone who's worked on like super generic situ detection ZnO that this is will probably never end that it's a very difficult road to go down but anytime we look at representative features or statistics by grouping these different characteristics is really good so an example is the application represented by a J 3 and J a 3s pair associated with lots of servers and lots of destinations so can i group by J III and J 3s and how many s and i's
are there how many destination IP is do we see do we see 1's and i and one destination over 30 days of data because if so like that's a pretty non prevalent application in our data set that is particularly interesting is the destination or sni associated with many applications so like do we see this IP address which might be shared hosting are there lots of things talking to it because if if not it begs the question of like why not like we know we have a lot of customers a lot of data like generally we see things across more than one so if you only see it in one it's it's usually fun when was the
application destined is the nation or SN I first seen particularly like newly observed J 3 s is an absolutely good like place to start your filtering just because if it's new its naturally interesting I'll be the first one to tell you like I hate the concept of pure anomaly detection we're like you surface every anomaly to a user because I've been in that role of having to triage those anomalies and there are a lot more they're a lot less anomalous than most people think they happen every day the Internet is weird I'm here to tell you and anomalies are an everyday occurrence and so but I do like it as a start because most not most some malicious
things are anomalous and so it's like it's a decent filter to start with especially if you're working POC research like you you want to run it across smaller sets of data and see if you can find some results and then there's the classical things like everyone jumps to J III but there's a lot of value you can find than just the traditional SSL flow characteristics so data sizes destination and source IPS periodicity regularity in the data like distributions idleness one of my favorite things to hunt for as a as an analytics person is idle beacons because you know I as a red teamer like 5:00 p.m. sleep 60 minutes and I'm gonna go eat dinner right like that was my like
process at the end of every adversary day but like what does that mean in the data that means that sixty minutes plus or minus some jitter there's a packet and that and over the next ten hours it's going to be the same thing for ten straight hours exactly sixty minutes roughly and then data sizes that are like if this same almost every adversary toolkit has jitter in in their sleep intervals but they don't have jitter in their sizes bite sizes so like everyone thinks that you know all the blue teams are finding evil by periodicity like you don't even bother to adjust your size and so it's like ninety seven bytes exactly on the same like rough interval and so idleness
is one of my favorite things to hunt for and you can do that in SSL whether it's encrypted or not and then I love certificates like almost every Red Team I see is let's encrypt like and here's like very few are paying for it especially consultants because they have to pay with a lot of engagements and a lot of volume so like why not just register let's encrypt if you could look at like a kill switch GUI Adam Alex firm denko's tools he has like you know us a lot of scripts to automatically set up servers same things with Jeff Dimmick and Steve is in the room github they have tools that help set up servers and
infrastructure like a lot of them will have built in like scripts to automatically register the let's encrypt like so that's a good choke point to go look at why don't we just start by looking at let's encrypt certs and I like to keep to track these over time and aggregate so as with any analytics approach this is what we found an initial approach is I found security products I found financial analysts scripting their access to investment sites found lots of Tor and more security products looks out it turns out security products look pretty evil in nature and so we pretty much see them all the time and they hit on all our initial detection Zin analytics but we
also found evil so in this case this was a red team that we discovered doing this approach so we found this J 3 and J 3 pair and and these are all the labels that kind of got applied based on our analytics so it was a new application and a server it was on a single host in our environment it had a single s and I a single destination non prevalent across our customer base the s and I was only coming from a single host it was a low cost server issuer it was young and P DNS so when we correlated the IP to PD and as it was young it was non prevalent like no there was only
PBS record like just tons of labels that when you look at this you're like that is ugly like but but as a red team you're not thinking about all these aggregate roll-ups and stats when you deliver your malware like this is you're not a data guy usually so you don't you don't think about what you look like in the environment fund enough the the domain that was used by the adversary was actually they had registered after it expired from a legitimate company and so if we were just doing certificate stuff or like by name stuff it would actually look pretty good legitimate it was it was a medical provider website that had expired and they had registered
it like a couple weeks after and it was when we went looked at it after the fact that was hilarious that you know it was hosted and like under yellow books like this adversary domain was in was in yellow book as associated with a medical provider but if you looked at the actual hosting characteristics it stuck out like a sore thumb that like it expired and was rear edge' stirred moved from Akamai it's a digital ocean or yet to digital ocean move from Akamai certs or knows no certs to being certain it moved from Akamai as a registered a Namecheap so pretty fun so kind of diving in final thoughts here I always encourage people
when you see the hype and when you see something new release and advancements in this field don't stop there listen to everything that's being said almost many of the things we told you today we're in your original ja3 talks like the jays are awesome folks they understand all these characteristics they work in detection but what happened is is 90% of what they said only most people walked away with 5% and the 5% was look SSL fingerprinting finds evil and so it's really important as technical stewards in this community to not stop there and fully understand the nuances and caveats associated with these technologies and educate about that it is useful this is not this talk was not intended to say
that fingerprinting is not useful as anything it's it's a new idea that needs to continue being explored particularly in the behavioral front and I would encourage everyone to be creative in detection in general clever blue teamers find the red so being creative with how you approach it not just going for the easy low-hanging fruits has been very valuable for me in my community at the end of the day this is a human game human defenders defending against human adversaries and so it's often how you can think of that humanity that gets you that leg up and in playing the game and I like to say it's all a balance this is a personal rant or pepé for me and with
the the crowds moving towards like just find it all behaviorally it's so easy generic is the only way you should be going it's all a balance and they're not mutually exclusive polarizing approaches have no place in the industry you should be doing all of it if if if I can get an IOC from X vendor and its associated with a b-29 and they reuse that IOC and I don't catch that like that's just negligence like now I'm not saying that that approach is the only one you should be doing quite the contrary like it's you know and 1% of what you should be doing but it is useful and we can do it all technology allows us to scale this
approach and honestly the more generic you move the less threat specific you are and a lot more effort it requires in terms of man-hours and technology and so it's really good to find your sweet spot on the balance on this spectrum this is kind of a pyramid of pain shout out wherever I pay that tax but yeah at this point I think we're near done on time so we'll open up for questions if there is time and if not we'll certainly be out back yeah
absolutely yeah so they added a call-out for information or a validation yeah I agree with everything you just said it's configurable one thing I find interesting so like I I think a lot like that of like well why is this useful because I can do X and Z but the reality is then most of the bad guys we see don't do any of that yeah they're just lazy it's again humans first humans and so like what's interesting is when ideas like this come out so like I'm hoping this talk I'm gonna go watch all the red team traffic that we see in our environments and I'm gonna watch all of them disappear but I'm not gonna watch
any of the real bad guys disappear because they do follow some of this but every hour they invest in their Ops is money they're losing real adversaries op like a much more efficient business than most of the legitimate pen testers and red teamers and so they they make those calls but yeah totally agree I would love to see more research on the red side of like building custom servers to configure these parameters like in or do it in Apache and do it in the different Python server applications and start masquerading making that more common it'd be fascinating
that's a good question it's on the order of thousands if not as higher I mean it depends it depends on what specific application so Cole strikes a really fun example it uses wininet underneath the hood and so anything else that uses when I net it looks pretty similar with the difference being Cole strike as a java server versus a lot of like apache stuff but we still see when i know talking to java like all the time in our real environments but there's a lot of java web servers out there my friends and so that's one that like has been collision prone that we wanted to use the hunt for like we were hoping that i would be a
silver bullet on that one yeah because of the value ability these these toolkits that have a lot of value ability like we were hoping these fingerprinting x' would be the bullet that one has a lot of collisions again scripting languages they collide on the j-3 but not so much to j-3 yes ETA being encrypted traffic analytics Francisco yeah yeah totally there's a mean there's been research here in the past I think we've actually spoken the past but there's been research here in the past about um encrypted traffic analysis and in ways to do it and there's a lot of white paper research out there and in this space too in terms of machine learning and modelling on it encrypted traffic
and the slide I did on the behavioral stuff a lot of that's in there there there are segments that aren't in there and I would say the mixing in applications into those encrypted traffic analytics is still kind of new ish like we're starting to see it in vendor spaces where a vendor but it's it's I think there's still a lot of myth around applications and how unique they really are and so we're seeing a lot of what might not be called false positives but in the terms of a defender sitting at the keyboard they're a false positive
so not much the essential data for the JT freeing prints is still there yeah a TLS version you have the cipher suites you have an S and I feel potentially maybe an ESN a yeah it's there sometimes but the core is the same because the it's still the cipher suites still the TLS version still the elliptic points I also think it's fine about 1/3 because you'll you it's 1/3 in the spec versus 1/3 in the implementation and what's been seen and so like we've seen I've seen presentations less like 1/3 you know fingerprinting should be broken it's not power or not working anymore but we've seen the j-3 folks will tell you like they've seen one supposedly one
three where everything was intact and we've seen it too and so it's implementation to spec is always a fun one to look at but there there has been people you know there are fields in the speck that could be challenging I think it's particularly with us and I and certain the extensions being encrypted so not a lot honestly we've seen very slow 1/3 adoption across their customers it's others almost all one two still and so we haven't we haven't quite gotten there we're still quite honestly we're still early in some of the behavioral pieces here so we did do a few tests of a13 via the api's on windows but like I said it's still you
still get a j-3 fingerprint in the end yep yeah I mean j-3 there's multiple scripts with j3 on their github actually has a Python script that will you feed it up IP cap and it spits out all the J 3 and J 3 guesses it sees there's a Wireshark Lua script as well yeah Plus that out there yeah
yep yeah totally thank you for that yeah we're actually using the bro config so are the bro module so it's um that's in there and I think that's it for time we're gonna be all back for the questions so please join us and we're happy to talk thank you [Applause]