
it's my pleasure to introduce Iain and Scot and they're going to talk about certificate transparency logs all right cool working awesome all right all right thank you all for coming my name is Ian Hagen this is Scott Behrens and today we're gonna be talking about cats and certificate transparency logs we got a lot of stuff that we want to cover so I'm gonna jump straight into it and start with the question what is certificate transparency because most people I've talked to about this they kind of have heard a certificate rants parentsí they know it's a thing that's got something to do with like the public record of certificates but then you know if you probe them on it they're just
like I don't really understand how it works or what it's supposed to do so let's answer that question so to understand what certificate transparency is and why we need it let's start by just making sure we all understand kind of the basics of public key cryptography so let's illustrate how public key crypto works with our classic example of Alice and Bob so here's Alice she's a friendly cat she makes lots of friends and one day she runs into Bob and you know they're kind of different Alice has more of a homebody Bob is more of an outdoorsy type but they become fast friends anyway and decide that they're gonna be pen pals so the next day Bob
sends Alice a message and you know this is great and all but Alice like she knows that messages get passed through a bunch of different hands before they get to her and she wants to know did this message really come from Bob and that's the kind of question that public key cryptography was sort of built to help answer so the way public key crypto helps us with this problem is Bob can generate a public/private key pair and he's able to use his private key to sign messages and now Alice as long as she has a copy of that public key is able to verify that signature so that's basically all public key crypto is and
that's as technical as we're gonna get on how it works it's really just the idea that you can sign things with a private key and then with the public you can verify that that signature is good so all right that's how public key crypto works when you've got two parties involved but what happens when Alice wants to get on the Internet so well here's Alice again she's gonna fire up her browser and go to google.com she wants to check her email and the first thing that Google's gonna do is be like hey please give me your password I need you to log in to check your email and you know this is kind of a poem again
because alice is like whoa like who's asking for my password I think it's Google but the Internet's a scary place like I've seen some of that stuff out there how do I know this is really Google and we solve this problem the same way right Google has its own public private key pair and it signs the you know messages that it's sending back to Alice with that private key but the problem is Alice doesn't have Google's public key and it's impractical for Alice or her browser to have all the public keys for all the web sites on the internet because there's hundreds of millions of those and they're constantly changing so there's no way for Alice to
have all those public keys in advance so the way we solve this problem is with third party is known as certificate authorities so here's our example certificate authority so this is digit cat and he takes the Internet very seriously he's filled out a bunch of paperwork he has these policies posted that all say like I take the internet super serious you can trust me to like be a good citizen in this internet ecosystem so digit cat has a public private key just like all of our other players in this game and because we trust him to take this internet seriously we embed his public key in the browser and so because digit cat is one
of these trusted certificate authorities Google knows that when they want to start a website they should go talk to digit cat and be like hey digit cat can you please help me create a trusted website on the internet and what digit cat does is it creates one of these certificates so it says google.com has this public key and that's all a certificate is for for our intents and purposes so digit cat signs this certificate what their private key hands it over to Google and so now Google has a copy of the certificate and they can include it whenever they send Alice or anyone else a message so Alice has the public key of digit cat they can so
Alice can verify that this google.com certificate was really signed by digit cat then they then Alice can pull out that public key for Google verify that that message was really signed by Google and so now Alice has confidence hey this really came from Google this message is trustworthy I'm gonna send Google my password so that's awesome that's how the internet works that's what makes this all safe on the internet when we're sending our passwords around the problem is not everything is awesome right there's stuff about this whole infrastructure that makes us sad and the reason that some of this stuff can make us sad is that you don't just have this one like super awesome certificate
authority in your browser you have a whole herd of right there's all of these different certificate authorities that your browser trusts and some of them are a bit better about being serious you know internet business cats than others so here's an example of a cab that's maybe a little overwhelmed with his responsibility so this cat is Simon and he runs a company called Simon tech and he's maybe not so great at being a CA so like sometimes it doesn't work quite right but he is the CA just like anyone else he's got his own public private key set his public keys embedded in the browser but because his CA isn't always working properly every so often what he
does is generate some random Keys and created google.com certificate with those keys and then he's able to look at that sort of kid verify that like Firefox works properly with that certificate he's like alright my CA works I'm just gonna throw this away but there's a problem there because you know we don't know if there's an employee that's gonna come around someone wishes employee like Malory who instead of throwing those keys away after this test is done just runs off with them and now if Alice tries to go to google.com and Mallory is in a position to be in the middle of that traffic be a man in the middle like maybe Mallory's just sitting next
to Alice in the coffee shop then Mallory can intercept that traffic sign back a please give me your password message with this random public private key and because that the public key is in that certificate that was signed by Simon tech now Mallory is able to get Alice's passwords and all their emails and everything in Google Drive and all that and the biggest thing that's a poem about all this is that Google was not involved in this transaction at all like they had no idea Simon tech made the certificate they have no way of knowing Alice got duped by malicious certificate so this is the problem with the CA infrastructure that exists today so now
we come to certificate transparency what is certificate transparency well it's a project started up by some really smart folks at Google to kind of solve this problem it's trying to remedy certificate based threats by making the issuance of certificates more open to scrutiny and more auditable so it's got three main goals the first is to make it impossible for a CA to create one of these certificates like one of these google.com certificates without Google knowing and it's going to do that by making the whole issuance of certificates a public activity something that's a matter of public record that everyone can look at an audit and ultimately by having these pieces in place it's going to ultimately
protect the users who want to go to Google and know that when there's that little green lock icon in their browser it's actually something meaningful and not a malicious actor so it's a project to make us safe awesome what is a CT log so a CT log is something that is the sort of key component of making a certificate transparency project worked so it's basically just a simple network service so it's a resti service it's got an RFC it's got like a dozen different endpoints it's not that complicated and all it really does is log certificates in it so it's a record of all the certificates that are getting issued but it's got some really important qualities
that actually make it useful so the first is that they're append-only so it's got some crypto built into it and this API is only have api's for adding new certificates and because of these properties you can't retroactively lis remove or modify or delete entries out of the log and they're cryptographically assured in this property so whenever new entry is added you take a hash of the previous entry along with the new one and then you sign that hash and basically if anything were to be modified anywhere back in the log this chain of hashes wouldn't match up and the and everyone watching these logs would be able to realize that this is not a trustworthy
log anymore they've done something they've removed or modified some entry and also very importantly these are all public services so they're out there on the internet and there's no restrictions about who can retrieve or even insert data into them so there are many CT logs like I said it's an open RFC so Google runs a bunch digit cert runs a bunch CloudFlare Rossum bunch there's Komodo certificate logs so great that's what a certificate log is how does having that log actually protect us from malicious certificates so chrome in the last year too started requiring something called s ETS be presented with certificates so an SCT is a signed certificate timestamp and all that is is
it's a promise mind using the private key of one of these CT logs that says I promise to put this certificate in my log and that's all it is and as long as your certificate is presented with one of these promises chrome is able to trust it now and now the other piece of that though is that chrome in the background is going to check that these SE t's have actually been honored by this certificate transparency logs and it does that by just in the background as you're out there browsing for pictures of cats on the internet taking all these certificates that it sees with these SE TS and then going and asking the log hey
prove to me that you've actually added this certificate to the log so if a log doesn't put one of these certificates in there we know that it's signed this SCT we can verify that signature we see that it's not in there and now we know that log is misbehaving so chrome is sitting there making sure these CT logs are actually doing what they're supposed to and all the hundreds of millions of installations of chrome are verifying that these CT logs are actually behaving properly so alright that's like five or six slides all about how CT works but what's the really short version of this especially for the purposes of this talk so certificates have to have an SCT
presented with them in order for them to actually be trusted and be usable so an SCT is a signature from one of these logs promising that the certificate will show up in the log and chrome is gonna notice if it doesn't actually get added to the log so logs do actually have to behave when they're creating these SC T's and because of crypto a log can never remove or alter entries once it's been added again if a entry were to get removed or modified or deleted or redacted then none of these hashes would work out all these installations of Chrome would realize this Ct log is misbehaving and then that log would no longer be trusted we no longer take SC
T's from that log etc so all right so as so CT logs have to behave in order for them to be trusted by chrome that's how CT works that's how it's making us safer but we're not really here to spend all day talking about how CT is awesome we're here to talk today about how CT can be abused so that's so after kind of learning about this whole ecosystem I sat down it was like alright this is a new toy what can I do with it so the first thing in the thing that we've heard people talk about a lot is that CT logs provide a way for attackers or pen testers to infrastructure reconnaissance so anytime
a certificate is created for some kind of web service whether that's an internal service or an external one it has to show up in these public logs so you can enumerate internal or external domains that are potentially interesting so you can search for admin microsoft comm and see what shows up and in particular you can probe all of these domains from an external vantage point and see which ones don't respond so if you ever get a foothold into a corporate network if an attacker gets that foothold then they know which ones are probably internal ones and may have nothing more than that Network perimeter protecting them so I ran a search for admin microsoft comm and lots of things
popped out here's a bunch of billing admin consoles that showed up in the CT logs I didn't actually go probing these things I don't know if they're internal or external or not I don't even know if they still exist some of these had expired but they're there and there are a matter of public record so something else that's important about the CT logs is that they do include the entire certificate and that includes the public key that corresponds to that certificate and so if there's any sort of problem with that key if there's any weaknesses with that key attackers can go looking through the logs and find those keys that they can ultimately break so there
was a talk presented at Def Con last year that was about breaking keys at scale and using big data to break RSA keys and what they did among many other things is basically scraped a whole bunch of certificates by scanning the internet and scanning CT logs and running some GCD algorithms to factor RSA keys and ultimately they were able to get out to break the keys over 200 certificates that they pulled out of the CT logs and this is a little terrifying right if there's anything wrong with how you're generating the keys for your certificates whether it's because you're running in the cloud so you had a weak entropy pool or you're running on an
embedded device and it's just started up and didn't have a great entropy pool there are attackers out there looking for these keys and they're gonna find out if your domain is vulnerable to just factoring the key and then masquerading is your website so that's a little terrifying right you should be paranoid that like your crypto has to be perfect now otherwise attackers are gonna be on the lookout for it but what I thought was one of the most interesting consequences of the CT logs is that it's a persistent data storage of all the certificates that are ever created so CAS have to publish CT logs in order they have to publish the certs into the CT logs in
order to get those SC T's a CT log can never what it has to add the cert to the log and it can never remove those entries and the CT logs had to be available and be presenting those certificates back to anyone who queries otherwise that law is considered to misbehave and so I was kind of thinking through like what all this meant and I had like a whoa moment I'm like okay so anyone can create a certificate with one of these CAS and then that certificate has to go into logs and then that certificate can never get removed from logs like all right so that means I can put something in the CT logs and then it
never goes away so what what does that mean exactly so here's an example certificate so I'm actually gonna pull this up look at the CT logs right now it's issued by let's encrypt it's got a public key yada yada yada and then you get to the domain name and it's like that's interesting there's clearly some structure to these domain names what exactly is going on here so let me play a quick video that sort of goes through pulling down the certificate and kind of munging those domain names a little bit so that we can see what's actually basically been encoded into that certificate so first we're just pulling that thing down and getting all the host
names out of it so here's exactly the same set of host names he just saw before so we're gonna clean it up a little bit replace that DNS colon with newline so we get each host name on its own line then we're gonna so we've got these leading well so first of all we've got that common suffix on the end of all these domains that's clearly not useful information we'll throw that away and then we got these leading numbers at the top which are basically there to order these host names because when a CA makes a search it's not necessarily gonna honor your ordering so we use that number to figure out how to sort these
we sort them according to that numbering now we're going to remove any trailing noise get rid of those numbers last thing we'll remove any dots or commas that are left in here and so what we end up with is just you know that cleaned up data set those host names with the prefix and the Suffolk stripped off so last up we're gonna take that data we're gonna translate it from lowercase uppercase because space 32 decoder is picky and I'll talk about why we're doing base 32 encoding later but what we get out is jpg cool so that's an example of what I mean by you can put data into CT logs and then it is bare right so all right
exactly how interesting is this how much data can you really put into CT logs all right because that was a bunch of crazy encoding magic and certificates are only so big so how much data can you put in a certificate so let's encrypt which is the free Ca we all know and love let's you put up to a hundred host names in a certificate a host name can be up to two hundred and thirty characters according to let's encrypts limit minus the dots and any character that's alphanumeric can be put into a host name but it's not case-sensitive so if you're trying to encode data using just alpha numerics in a case insensitive way that's why you'd
end up using base 32 so given that the shortest domain name that you can buy nowadays is gonna be a six character or suffix you can basically encode 230 characters - four dots - that six character suffix times 99 sans and assert I subtract one because one goes in the common name and let's just ignore that basically - encoding you get a five over eight compression ratio basically you can get about thirteen K into a certificate so alright that's kind of okay it's enough to get a like low quality JPEG into a certificate like I just showed you but maybe it's not actually that interesting of a thing to do so but you know we're computer
scientists right we we know how to kind of address problems of limited sizes per unit you use chunking right so you can chunk your data up into multiple pieces and then throw all those different pieces into your storage mechanism and what's important is that when we mint a certificate through a CA they have to include those SE T's with a certificate that comes back and the entire point of an SE T is that it tells you how to look up that certificate in the certificate transparency logged later so the next time we meant to the certificate that has our next chunk of data in it we can include that information on how to go previous certificate from the CT logs so
first piece of data you put up there it's just the first data chunk but then the next one includes a reference back to the previous certificate about how to find that certificate in the CT logs and so forth so you can end up shoving arbitrarily large data up into the CT logs that's changed this way and alright that's a bit more complicated than these sort of like BIM cat grep food that we showed you on the on that video you really kind of need a tool to do that to intelligently pull down the chain of certificates so I created a tool for that it's called cat log and it's for putting cat into CT log all right so
what does catalog do so it's an open source tool it's written in Python you can go check it out and play with it but it abstracts the CT logs as a data storage provider so it's got a push command it's kind of modeled on get that lets you push something up there and then it's in there forever and then you can pull down files later and it builds an additional layer of abstraction on top of that so let's you put all of your files into a box or all of your cat pictures into a box as the case may be so you can run a command like cat log in it and then give it a domain name you
can push a few files then you can run catalog commit which takes that index that collection of files and those references and puts that into CT logs and then later you or someone else can run the catalog clone command to get all those files back down out of it so let's do a demo real quick so I've pushed a bunch of files up into CT logs over the last year and so I'm gonna start with an empty directory I'm gonna run this cat long cloak command and this is live demo so hopefully the Internet's good and this actually works and we're waiting waiting Hey all right it didn't give me an error so catalog status here's the oh
I cloned the wrong thing let me try again catalog cloned demo cat log PW all right trying one more time that's a different demo that I'm going to show you in a minute and it's still taking a while all right so catalog status there we go this is these are all the files that I've been uploading there's a lot of them so let's let's pick one at random and ducat log pool on it so what you can see here is now it's actually hitting all the CT logs pull down all the certificates it needs in order to reconstruct this data and this and none of this is actually hitting any domains I own or host this is all just
going directly to the CP logs in fact when these domains expire and I have no intent of renewing them this data will still be in CT logs I don't have to own these domains anymore for this to be available so there's my cat file and I can open it up and it's a picture of a cat right so let me do one more let's start pulling this down so this is a video file and this files about five Meg's it's spread over around three hundred and fifty certificates so this is actually gonna take a while so I'm actually gonna flip back to the slides and if you've got some time at the end we'll come back to it but that's those
are a bunch of ideas about how he can abused certificate transparency logs and now I'm gonna turn it over to Scott to talk about are there other people doing this hey everybody okay so when Ian was originally discussing this idea with me I first thought I had was like we can't be the only people who thought of this right like because we've seen folks sort of abused sort of blockchain style technologies in bed data permanently in the internet before so it seemed really similar to that I figure we weren't probably the first folks so when I was approaching the research component of this to see if we had any evidence of folks doing this my hypothesis going in was to look for
large certificates with many sans right so like let's look for ones that have a lot of data that are kind of outliers if we were to run a distribution we kind of show up on the high end of the z-index I wanted to do two things with those large certificates calculate entropy to look for evidence of encryption I thought this would be potentially an interesting mechanism for C to maybe you had your command set stored in these persistent logs but you might encrypt that cuz you don't focus to know what you're doing and then I was also looking for magic numbers so for those who aren't familiar with magic numbers these are like a constant number that you'll find in the
file format that's used to identify that file so here's a couple example for gifs jpegs and elf binary formats so I wanted to make sure we were checking for those things within the sans certificates and the event that they were storing some sort of media and I'll be in order to help me use to do this research I used an open-source tool called ax men that uses the async i/o library in Python to pull down all the certificates from a certificate transparency log and it worked pretty well for my use case but I had to make a couple of modifications to really make it shine for what we wanted to do on an analysis perspective so we made a couple
of modifications to Axemen I wrote all those certificates I was pulling out of the Icarus log into hive where we did our analysis with some support from Rekha one of my colleagues who's sitting up front here we also calculated the entropy in advance we plucked out the public key in case we wanted to do something with that later we stored the cert size and the bytes for filtering and then we looked for the magic numbers in the sands okay so let's take a look here okay so here I'm running a count against the table we've loaded up that has the certificates in it just to kind of show how many certificates I was working with
it was roughly four hundred and fifty five million certificates that we used for our analysis the next thing I did is I ran a query to look for file types that match that magic number I use the Python library to help with the with the matching and we ran that query across the dataset and sure enough we showed up so at least we knew that our research worked we found our BBB dirtfish which is our cat certificate that's what we demoed earlier in that first video which was awesome so it was pretty excited like our stuff showed up and then our second test shirt search showed up except there was this third cert cat dot
JPEG that we have the tech like this isn't us we didn't do this and wait there's like numbers in the certificates and there's a ton of sans it looks like it has an ordering system like there's like there's no way that the only other person that did this put a cat and the logs right like that would be that would be way too strange right okay so we've kind of figured as maybe like a red herring or whatever but when I pulled the certificate looked up with my hand my eyes I was like this looks a lot like woody input in the logs and so okay so the curiosity got the best of us and we went back in and decided to take
a deeper look okay so interestingly enough this part of the video looks really similar to the first ones I kind of just fast forward a little bit to save you but you know I was kind of we're kind of slicing and dicing all the same kung fury here and we ultimately get a little bit further down the line deleting the number sort of cleaning everything up and then we changed the case we pipe it into a file we I ran a file out because I didn't believe it it says it's a JPEG is it a cat it's a cat it's a cat okay give me just a second here to get the presenting okay escape present thanks everybody okay so
I was really curious to who this was thanks to privacy guard I can't find any information out about who owns a certificate it was issued in Australia don't know a ton of Australian security researchers we asked some folks we were unfortunately we don't know who owns us if you own it please raise your hand or come talk to me that would be fascinating Leon strange but of all the analysis we did this is the only other person that was putting cats and cats on CT logs and this person has done some really interesting stuff like they have a number of certificates that have some numbers and texts we think they might have some binary data in one of the
certificates so for you for all you out there who are really good at analysis like please take a look at their certificates to let us know what you find because I was super super interested in trying to figure out who this was so I want to do a quick summary of my analysis we're running a little short on time something to try to breathe through this but first an assumption that we work with Reika with was we want to look for certificates that had more than one hundreds and certs and then we were looking for things that had really long sand lengths and interestingly enough when we found certificates that had a hundred sands in
them there wasn't a lot that even had a max length I can we talk to our 30 characters wasn't even really that showed up so this ended up being not a great search so we refined this a little bit and said all right let's look for shirts that have at least 75 ish sands and at least one of the sands has a length greater than 200 and this ended up being a really good search this we found some really interesting certificates this way the ones that I've colored out here were the ones that ended up being interested some of these other certificates ended up not being that that interesting so this was the first certificate I found I
don't let folks take a second to read the certificate so this pretty awesome so this person loved Star Wars so that was a good certificate we found a couple of certs with good entropy this POC that we have the tech this had the highest entropy for the for the greatest number of sans it looks kind of like binary data we're not sure how to sort it I'll zoom in a little bit it almost looks like it has a pattern I don't know I'm not we couldn't quite crack this one but it's it's one worth checking out if you are interested in the space this was a pretty cool one there was a ton of sands
in this certificate pretty convincing for a low resolution displays it looks like account google.com if it's a valid certificate it's gonna show up as such so we saw a ton of these in the CT logs a lot of this being a common fishing pattern have account Google comm and a giant Mac slaying string afterwards to trick your victims to do fishing this was a cool one multi-domain Pro per I was actually just mostly interested in the name of that and so I googled it obviously like a good laser cat would and it came back with this thing that looked like a blog written by someone named Martin Wallace I don't know this person but I figured probably have a
probably have a LinkedIn so I sent Mark Martin a note and said hey I'm working on the app psych team a Netflix I doing some research the shirt logs what is this thing you're doing he goes you got us our prober might be somewhat chatty and then he links me to some public docs about their managed ssl product that they've built for GCP which look for a Google App Engine so that was kind of cool don't know exactly what the prober does he wouldn't tell me but I did catch him cool so I'm gonna toss it back to Ian to do that I wrap up here yeah just gonna wrap up the last last few slides
here before we do it let's check our video it looks like it finished downloading so there's our yelling cat WebM it's about four-and-a-half Meg's it's the WebM file and if we play it it's about two minutes of this cat yelling at this person so yeah you can get some pretty high quality video shoved up into CT logs so all right final thoughts certificate transparency it is awesome I know we've spent like the last half hour kind of being like here's some weird stuff we can do with it but like don't get me wrong CT is awesome and it is an amazing thing that we have this because it's making us safer but there are bad things
about it right it puts all of your internal domain names on display gives attackers a roadmap of what to do when they get in your network it puts your weak crypto on display if you're doing things badly like attacker is gonna be looking for it and are going to abuse it and lastly it's a non redact Abal unmodifiable append-only public data store that anyone can write to and if you were to change any of those adjectives in that sentence it would significantly weaken the whole point of having the CTE project and the CT system and you know we're we're being you know non malicious about this we threw a bunch of cat pictures and cat videos in
there but if someone's putting pirated content up there someone's putting daxing someone's personal information if there's any kind of like bad material in there the people that run these Ct logs don't have the Liberty to take it down they cannot redact information and if they were to pull down their logs like the CT project would just die so it's potentially a problem so all that said want to give a special shout-out to Raiko who's over here she helped us do a bunch of this big data analysis and show us how to shove things into hive and do good queries on it I want to thank everyone has posted pictures of cats on the Internet to May making the slides
super easy I want to thank Google and all the CT log owners and let's encrypt their services were used and possibly abused a great deal in the ongoing of production of this research and I also want to say sorry to all those people hopefully appreciate that we're security researchers trying to do awesome stuff and one last thing before I let everyone go too long if you thought it was interesting if you'd like to talk I'd encourage you to download the slides and check out the links if you're looking for the slides there in the CT logs so with that we will take any questions thank you thank you thanks you and then Scott it's lunchtime so if people who want to leave
can leave there are a couple of questions that came on slider the first one is are you aware of any developments that will legally survive given the existence of newer right to forgotten legislation that's coming up yeah so one thing that we didn't really get into is I is that something that CT log providers have done is they've started kind of temporarily shorting their logs so there's one log that all the serves expiring in 2019 go into there's one that all the ones expiring in 2020 go into so once 2019 is over none of those certificates that have expired in 2019 really need to be kept around and need to be in logs so there is a way we're
sort of logs can eventually kind of roll off and no longer need to be publicly hosted so that's at least a way to kind of mitigate the fact that like you can't take these things down on demand but they'll at least kind of roll off as certificates expire in terms of any other answers for like how to mitigate these issues we're not really sure yeah there one other one all right that was it maybe this all right we'll be around if you want chatted with us thank you thank you
[Applause]