← All talks

Low Level TLS Hacking

BSides Manchester · 201555:28363 viewsPublished 2015-10Watch on YouTube ↗
Speakers
Tags
Mentioned in this talk
About this talk
An exploration of TLS protocol internals at the binary level, covering record structures, handshake mechanics, and implementation differences across SSL/TLS stacks. The talk demonstrates fingerprinting techniques to identify TLS implementations at scale, introduces PyTLS for crafting custom protocol probes, and reveals that OpenSSL dominates the deployed ecosystem, accounting for roughly 60% of internet-facing servers.
Show original YouTube description
Slides - http://www.westpoint.ltd.uk/papers/low-level-tls-hacking.pdf Code - https://github.com/WestpointLtd/ Simple inputs can conceal an {expansive} attack surface. Feature-rich web applications often embed user input in web templates in an attempt to offer flexible functionality and developer shortcuts, creating a vulnerability easily mistaken for XSS. In this presentation, I’ll discuss techniques to recognise template injection, then show how to take template engines on a journey deeply orthogonal to their intended purpose and ultimately gain arbitrary code execution. I’ll show this technique being applied to craft exploits that hijack four popular template engines, then demonstrate RCE on two corporate web applications. This presentation will also cover techniques for automated detection of template injection, and exploiting subtle, application-specific vulnerabilities that can arise in otherwise secure template systems.
Show transcript [en]

I hope you all had a good lunch. I hope you threw me definitely better than last year. I'd like to introduce Richard from another local company, West Point, which is good to see Manchester companies involved in this. So this is all low-level TLS hacking. So I'll hand over to Richard to let you know what that's all about. Okay, good to see you all. Glad there's a decent turnout. What I'm gonna be talking about is SSL or TLS. So, like everyone, I'm gonna get the two mixed up. Technically, what you should be using now is TLS, but everyone calls it SSL still. We're gonna talk about the protocol, and then a bit about some tools for dealing with it,

how we can use the tools to write exploits, and then some fun things we can do with them. So initially, I'm gonna give you an introduction to SSL, but it will be more talking about it as a protocol rather than the kind of introduction you give to someone who's a system administrator. We're going to be looking at the binary format and what it actually looks like inside. We'll move on to looking at PyTLS which is a library for creating these records and manipulating them. And then we'll build a fingerprint which lets us tell what kind of SSL we're talking to. And then we'll look at what happens when we look at the wide landscape. So to start with, SSL

is a layered protocol. So the lowest layer is the record layer. And this is basically a fairly standard thing. So it's got a version number and a length field. And it's like any binary protocol. So this can encapsulate any of the types of SSL message. It's actually the same in SSL 3 and TLS 1, 1.1, 1.2, et cetera. So it also looks like it's not going to change in TLS 1.3. They still haven't finally decided on that, but the discussions are pointing towards the fact that the record format is going to stay the same. And that's quite important for compatibility with existing devices. Now, like any low-level protocol, it's binary. That makes it a pain in the neck to deal with. So that's annoying,

but it's inevitable given that we want it to be efficient. Protocol's symmetrical. So the records... are different than the client sends to the server and the server returns, but the actual format is the same. So you get a stream of records in each direction, one going from server to client, and one from client to server.

So this is the structure of a TLS record. We've got the content type, and that's saying what type of message is this. So for example, it might be a hand check message. Every record also has a version number in it. That's quite important because that says what type of record format we're using. Now in practice, that's almost always going to be the same thing. The length field is pretty obvious. It's how big is the message. And the message itself could be anything. So you could encapsulate Hello World in a TLS record, and it would be valid at the record layer, but it might not actually be a valid SSL message. But you can put anything in it. And the different types of record all use that field in

different ways. But the overall structure always stays the same.

Once you've actually established your encryption, then the messages are protected with a MAC, which is obviously optional because initially you don't have any cryptographic parameters that you could use to actually validate the messages. And then you've got some padding. You only need that if you've negotiated a block cipher. so it won't always be present. And this overall record structure is always the same.

So let's look at the different content types. The initial ones are the handshake messages. These are the ones that are used at the start of the connection. The main purpose of them is to make sure that the client and server have agreed a set of cryptographic parameters which they can use to talk securely to each other. So at the point when these are being sent, you generally don't have any cryptographic information. And that's the whole purpose of it, is you're trying to get that stuff set up. When you actually have the information, then you announce that you're ready to go with the change-shy-for-spec message. So what that's saying is, everything I say from here on

in is going to be encrypted. Now, another important one are alerts. Alerts are error messages. So basically, either the client or the server can send an alert, effectively at any point. And there were a range of different alerts that can be sent. Now, initially they had loads of different ones and they were very specific. So there might be, for example, that there was a padding error. Or there might be a thing saying, the max wrong. Now, what they actually found out was that having that many alerts was a problem because it kept leaking information about the implementation.

These days, there are actually a lot less alerts used than there were. Generally, they just say things went wrong. Now, there's an odd edge case with alerts, is that there are also warnings. So alerts have two levels. They're either fatal, or they're a warning. Fatal at the head of the messages mean that you should stop. But if you're testing an implementation, does it? It's an interesting question to ask. The answer is not always. Likewise, if it's a warning, it shouldn't stop. Does it? Actually, sometimes yes. Again, it's very buggy. This is like most of SSL, because what the spec says is wonderful. What happens in practice is something completely different. And then application, that's the main bit. That's the whole point of the exercise. So the application message

is one that's actually got real payload data in it. So if you were sending an HTTPS request, for example, then the get would be inside one of these application data messages. and these should always be encrypted. Except of course, if you're using one of the cipher suites which doesn't do encryption. And finally, you've got the famous heartbeat. It's actually pretty much useless in terms of TLS. Its only real purpose was in terms of DTLS. But, you know, OpenSSL being OpenSSL, somebody threw the code in and they used it. Hence the heartbeat.

So what happens at the start? The initial messages are all used in the Handshake protocol and the two important ones are the Client Hello and the Server Hello. So, the Client Hello is the one that's sent first. That's kind of got a bunch of stuff we're going to see in a minute. Server Hello is the one the server first sends. Now, another important message is the certificate message. That's generally sent by the server and it contains the SSL certificates. Basically big lump, lump serve ASN1, which identify the server, have some cryptographic information, that kind of thing. Now, it is possible for the client to send a certificate message. It happens when you're using mutual authentication, so basically a client certificate. That's relatively rare,

but it does happen. The whole point of these messages is to give you all the data you need in order to start talking securely. So the client hello says the version of TLS you want to talk. So that's the preferred version. So you may have noticed that we had a version number in the record layer. That's the minimum version you're willing to talk. So this one is the maximum generally. So you might have a client hello that says 1.2 in the actual hello message and 1.0 at the record layer. That's saying I will talk 1.2 and that's what I'd like to do, but I'm willing to go down as far as 1.0. Client below has a list of all of the cipher suites that the client supports.

There can be a lot of those. It's actually disadvantageous if you include too many because it makes the packets bigger and you end up with fragmentation so everything slows down. You've got some random data. This is part of what's used in the actual crypto setup. And then you've got extensions. These are only in TLS 1.0 or above. They're not in SSL3. and that's the only real difference between the handshakes in those two. The main ones are server name indication, and that's basically nameless virtual servers, but for SSL. So basically it's announcing to the server which site you're actually trying to talk to so that it can present the correct certificate chain. That one's almost universally supported on the planning side these days. You've got secure renegotiation.

That's one of the hacks to work around one of the earlier books in SSL. where you could just renegotiate and there was no binding between the initial connection and the renegotiating. There were lots and lots of other ones. There's even an extension to allow you to pad the header to be bigger, which you need in order to work around some buggy F5 routers. So these things, there's millions of them and basically anyone could throw any of them in and servers have to handle them. But of course, it's SSL, so they don't. There can be other information included, like a session ID. That's basically used to make it so that if you make a second connection to the same server, things go faster. So

after we've sent the client hello, the server responds, and the message is actually very, very similar. But unlike the client, the server knows which of the ciphers it's gonna talk. So the server only sends one. Likewise, it knows which version of TLS it's gonna talk, so it only sends one. It also sends some random data, it's basically the same as the client. So the main distinction is that the client is announcing, I can do all of these things, and the server is selecting which ones it wants to use out of those and responding. The server can also send back its own bunch of extensions. It can send empty extensions in response to the ones the client sent, which basically say, yeah, I know about that.

And basically say, yes, we've negotiated support for that. but it can also send its own ones. So for example, if you're using elliptic curve cryptography, then you'll get a points format extension, which says how your elliptic curves are set up, and you'll get a bunch of others, basically. Most of the time, you don't need to worry too much about those. So this is a basic handshake. It's not the only possible handshake, incidentally. This one is not using perfect forward secrecy, so we don't have some of the key exchange messages, but it's fairly straightforward. So we start off, if you can see, does that work?

So client talks to the server, the server in order sends these messages. So the hello messages are the one we've just seen, then it sends the certificates, and then it says that's me finish with, that's what the server hello.me means. Then we get the client key exchange, the change cipher spec. Now, the change cipher spec, as we mentioned earlier, is that saying, now I'm going to talk to you encrypted. And finally, the finished message says that the client is done with the handshake. The server is not yet done, just because the server hasn't announced that it's happening. So the server will then send its own change cipher spec back, again saying, now I'm going to talk to you encrypted. So

at this point, both sides are talking and they're both secured. Is that music to be clear? Okay, so TLS. People obviously recognise this, I'm sure you've all run it. That's the standard heartbeat exploit. So everyone sort of saw that on the internet, ran the code, and most people were just going to get aligned. That's not, I think, particularly good way to do things. And there's obviously some limitations to this. It's quite clear that this server is talking, or this client rather, is talking TLS 1.1. It's obvious, surely. We've got these magic numbers in here. And what happens if we're trying to talk to a server that only supports TLS 1.2, and we want to test that for heartbeat, or heartbeat? Well, this

exploit won't actually work. So there's loads of people sitting there running this thing blindly, and yet they're not actually finding all the vulnerable servers. That's not pretty good. Now, the version of the link could actually technically patch into that without too much problem. There's only two numbers you need to change. Quite which two, is anybody's guess. They're buried in there. But what happens if the server actually supports a different set of ciphers to the ones in this exploit? Then, yet again, you're going to get a false negative with your test. This is not a good situation for a tester. So that's why, personally, I don't think running these sort of hex-based exploits is a good

thing. So instead, how about we have something like this? This is a thing that will make the exact same hex messages. It packages it all up, except this time we can actually see what it says. So we can see that we're making a heartbeat extension. The original exploit also includes a status extension, so I've thrown that in. and then I'm making a handshake message out of it. My client random is not very random, not 1, 2, 3, 4, 5, etc. And then I've got the same set of cipher-sweets that the S server had, or the original exploit had. This time, because it's actually a readable form, if we wanted to tweak it, then we really easily can. We don't need to start editing hex,

we don't need to actually understand the format, we can just change a few values and then we can perform the test. we wrap that hello message up inside the TLS record and we're done. So basically those two slides are actually showing the same thing, but I think most people would agree that this one's a bit more readable. So what were we using now? We're using a library called PyTLS. It's a little library I've thrown together for basically doing exactly that, making it easy to deal with the SSL protocol But rather than messing around with the binary, you just say, I want to make a record, or I want to make a handshake message, and you just let

the code actually do the heavy lifting. So it understands the binary format and wraps it all up for you. You deal with objects, and you don't have to worry about the serialization. It also works as a decoder, so if the message comes back from the server, then it unwraps it and makes it available to you as a nice object. And it's a very low level tool. So it doesn't try and prevent you sending things in the wrong order. It doesn't actually do any crypto. It's just about the records. So it lets you make valid messages. That's great. But it also lets you make invalid messages, which is much more fun. You can also send them

in the wrong order, which is also important. So here's an example of creating a handshake message. So we're making We've got a record that's TLS 1.1, we've got a client random, which again is not very random, we've got a few ciphers, and then we throw it together and wrap it inside the record layer. So that gets all wrapped up for us into the right binary format, and we don't need to deal with any of that. So yeah, we just create the object, it's that simple. For a lot of fields in SSL, you've got lengths. One of the things it does is, by default, it uses the length that you're supposed to use. But, because we're hackers, we don't always

want to use what you're supposed to use. So if you decide you want to send a different length, you can just override that. So that would, for example, be what you would do to make a heartbeat exploit. You would say, I want this much data, but you send a record that's shorter. So this stuff is quite easy. It's a hell of a lot easier than dealing with it. You can make things that are valid, invalid, and in various ways. In the record, the message field is the one that gets populated with the actual content. You can actually just stick an arbitrary string into the records, but generally what you want to do is use the library to create

one. Hopefully you can see this.

This is another exploit. This is a test for the OpenSSL change cipher spec vulnerability. So there was a flaw in OpenSSL whereby you could send a change cipher spec message early. So you could send it before you'd actually negotiated the cryptographic parameters. Now, in most cases it wasn't actually exploitable, but it wasn't some versions. But it's still an interesting case because it's a flaw in the state machine. There have been some flaws in Java where, for example, You could trick it into talking in plain text. And again, PyTLS makes it very easy to write these. So it didn't take very long at all to write this test. And it's able to send the change-life-aspect message. So we send a hello. Then we send the

change-life-aspect, but we haven't actually sorted out any crypto. Now, this is also decoding the response from the server. It's just looking until we get the server-how done message, which we should not get. What we should be getting is an alert, or in some implementations, it might just send us a reset and drop the TCP direction. So another problem that came up recently was logjam. So that was the result of people using weak Diffie-Hellman files. And of course, as soon as we saw the issue, we needed to have a test for it very quickly. Because we had all this infrastructure, it basically took about five minutes. All we needed to do was write a little bit

of code to be able to pull out the primes from the server key exchange message, and that was it. It didn't take long at all. So we get out the prime, and yeah, if it's a short one, then we know that the server is misconfigured. If we wanted to be a little bit fancier, what we could have done was pull out the prime itself and see if it was one of the standard ones, because they're the ones that, if you're looking at state level attackers, for example, it's likely they may have already factored those. It means you could say, for example, see if a server was configured with its own custom primes.

So what else can we do? Well, because it's low level, we can actually do a few things that you can't do if you're just using the standard OpenSSL S client that is the one that people are commonly using for testing. So, for example, if you're trying to connect to a server and the server wants the client served, how do we test whether it supports particular ciphers? Because if we use OpenSSL S client or any of the million rapid strips around it, we don't know what the problem was, we don't know what cipher was actually negotiated. But we've actually already been told this because that information was in the server hello message. So what this lets

us do is to actually get at the information so we can test the supported ciphers even though we can't connect. So that lets you do some interesting things. So if you've got, for example, a router that you don't have any credentials for when you're performing a test, you can still tell is that router configured securely so that you know if when the admin is trying to use it, is the admin connecting and using weak encryption to do so.

You can also test clients because the protocol is symmetrical and the record layer is the same. So you can actually tell PyTLS to listen for the connections and then pull information out of the client hello. So let's take a risk and try it and see what happens.

So, I'm running a PyTLS server on port 444. Hoping to god this deadline doesn't go horribly wrong. So I'm going to connect with S client. Is that big enough or bigger? So it's just a standard connection. So we connect. If we look, we've been able to decode messages from the client handshake. So the client sent it, it's client hello. And we've ripped out the set of ciphers that the client supports. That can be very useful, particularly if you're testing a mobile application or something like that. You can do it with Wireshark, capturing all the packets, opening up Wireshark, using the SSL protocol decoder. But that's a real pain. wouldn't it be easier just to get a list? So that's what this does. We could extend this, probably should in

fact, to also pull out which extensions the client partner is supporting. But this is enough to get us quite a long way. We can also see both the record layer version number and the version number in the client hello. So we know the preferred TLS version of the client as well as the minimum accessible support. You have to take that information with a pinch of salt because it may do a later check and say, actually, that's not good enough. And that's actually quite common. So even though this information should be correct, like anything, you can't trust it. So that's a simple example.

So what else can we do? It's nice to write little tests for vulnerabilities, but we've got a toolbox now that lets us build things. So can we build more fun things? Well the answer is, yeah. We can build, for example, a fingerprinting tool. So we can tell how does a server actually implement its SSL.

So there are only a few implementations of SSL that are actually in common use. The most common one is OpenSSL. And, you know, people can bring up the input of SSL and the boring SSL, but they're effectively at the moment just OpenSSL with some tweaks. There's no massive differences. Some features have been enabled, some have been disabled, they've reformatted the code, but most of the fundamental infrastructure is the same. That's actually becoming less true, and they're going to become more distinct over time, I suspect. But most of the differences are internal and not to do with the actual protocol implementation. Boring SSL has recently been making some more fundamental changes, but with Libre SSL, not really so much, so just keying up

the memory handling, that kind of thing. Another common one is S-Channel. That's the Microsoft implementation, so it's used by IIS, Outlook, Internet Explorer, et cetera. Another one is the Java one. It's been around for ages. It's a pure Java stack. It has quite a large number of problems and it's responsible for lots of stuff still being insecure. So for example, it has limitations on the maximum net surprimes it supports when you're doing Diffie-Hellman key exchanges. So it's unfortunate that that one's around, but they are at least fixing it. It's very, very common because of course we've got Android. Android has, it doesn't actually quite use it as far as I can see, but it has some

limitations that were based on it. So they kept compatibility even when that was probably a mistake. They've since changed that so they've no longer tried to be compatible.

GNU TLS is another one, nowhere near as widely used, but it's quite a solid implementation and it's got a reasonable level of deployment. Then you've got a bunch of other ones. So you've got Polar SSL. That's not that common, but it did get standardized by the Dutch government accepted it as one of their certified implementations. Matrix SSL, Wolf SSL. These are all minority products. They're mainly used in embedded systems. They've often got designs that are intended to be very small. And what's the code basis for these things are quite clean, Because they're relatively limited, they don't have a feature set or something like OpenSSL. You tend not to see them in practice unless you're dealing with

some quite specialist kit. Now, the TLS spec has, like any spec, it's got edge cases. TLS is quite complicated, so TLS has more than most. They've also gone with a bit of a kitchen sink approach of making all sorts of things possible. So, one example is in terms of the handshake messages, you can put as many handshake messages as you want into a single record. Now, you don't have to though, so you've got the choice. So you can send three handshake messages in three records, or you can send one record containing all three. Another problem is when there's an error. Different implementations all give you different errors. So these are the alerts that we mentioned earlier. So if something goes wrong, some implementations just go, yeah,

it went wrong. Others give you more specific errors. Some like IIS just tend to drop the connection. So you get quite significant variations there. They're also varying in terms of what they think is invalid. So some of them do quite a lot of error checking. Some of them do not much at all. Both have their advantages and disadvantages. So for example, If you check too many things, then there's a risk of introducing timing attacks. Whereas if you don't check enough, there's a risk that the cryptographic setup might be completely wrong. For example, Amazon recently released a new TLS library. That one completely ignores the record layer's version number. Now, that's odd, but it doesn't really seem to be exploitable.

It violates the spec, but since the version number's always the same, does it matter? But these are all significant differences. And by looking at these differences, we can determine for a particular server what kind of SSL stack it's running, even if nobody tells us. So let's make that a bit clearer. So we're going to send a bunch of pros, and we'll just look at one here. So this one is a handshake probe. So we do a connection, we send a client hello, and then we look at what comes back. If we get all three of the initial messages back in the same record, then immediately, you know, the chances are this is S-channel or Java. If we get it into three chunks, then the chances are it's

OpenSSL. Now, that's not definitive, but that is a significant difference that allows you to identify those implementations. Now, it's not the only difference. If we only had that one test, it wouldn't be very fine-grained. But because there's so many of these smaller differences, we can stick them all together and actually fingerprint implementation. So what the probe does is it sends a bunch of probes, in fact there's 24 different ones at the moment, and then records a fingerprint for each system. I'm sorry, for each response. So it looks at the records that come back and pulls out the information that should be constant. For example, you can't keep the client random as part of your signature because

the whole point of it is that it's different each time. Likewise, the server random. And there are other things that are quite variable. Now, this format is intended to be read by the computer. It's not intended to be readable, but basically you've got a star at the start of each record, the version number, what kind of message we got. You can see that the handshake messages are all together in the top one. and there's more than one star in the second one, so you can see that it's actually been split over multiple networks. And this is all generated for us from the network. We wouldn't write this by hand. So by sending all these different probes and recording the responses like that, we're able to build

up a database of what different servers talk and how they say it. So we do things like vary the version numbers so we can vary it in the handshake, we can vary it in the record layer. Some of the implementations have limits, so if you give them a version number that's really high, they'll send you an error. Some of them go, well that's okay, but I want to negotiate it down to a version I support. They don't give you an error. Or the ones completely ignore it, like the answer more. We can also do things like violate the state machine. So the early change cipher spec vulnerability, that I mentioned earlier, We can do the

same thing as one of our fingerprints because the behaviors are different on different implementations. Some of them will send us a reset, some of them won't. We can also do things like send two client hellos one after another. That's kind of a renegotiation in sort of old school. So some implementations say that's fine. Other implementations, no. We can mess with the length field, so send over long records or over short records. We can send complete garbage. because we're allowed to put anything inside a record and it'd still be parked. The other things we can do are nest with the extensions. So there are a bunch of rules about how a server name indication extension is supposed to work. So for example, it must not have an IP address in

it. Well, do they check? Some do. And there are specific length limits. Are they enforced? Sometimes, but not always. So by looking at these differences, we get a whole bunch of different sort of metrics about a particular server. The important thing with all of these metrics is that we're probing the implementation rather than the configuration. So what you don't want to do is base your fingerprint on the cipher suites. Because the first thing an admin who's any good does is they go online, find what is the right cipher suite list for being secure today. They copy and paste it into their config file, and at that point, your server's all at the same. because everyone's using the same cipher suite. But what they can't do is configure the

actual implementation. So they can't say, actually I want you to split the records up this way, or I want you to split them up that way. And to be honest, even if they could, why would they bother? Because these differences, in large part, have no actual security implications. They're not about securing the system, they're just differences in the way they've chosen to implement things.

What are the strengths of this thing? Well, the strengths are that we can distinguish basically every implementation of SSL that I've found. Now, in some cases where I've actually been able to build a big database, so OpenSSL for example, we can even determine the version. We're not actually able to do that so well on some of the commercial implementations, mainly because we don't have access to every single version of an F5, for example. But it's still pretty good. We can add new fingerprints quite easily, so if anyone wants to fingerprint the server, they can actually just record the fingerprint and add it to the database. It's not affected by the common configuration changes. So if a user has changed the Cyphers fix, for example, then,

you know, we don't care. We can still tell what it is. But still, of course, some improvements. So at the moment, if the server's got limitations on the versions that it's willing to talk, then the fingerprinting is a bit iffy. I've got some ideas on how to fix that, but I haven't done that yet. Of course, we can always use more fingerprints. So let's see this in action.

So here we've got burp, and we've set up an invisible proxy. So this is basically where burp is running, in this case, an SSL server, and we can just talk to it. And it's on port 1234. Now, burp is written in Java and hopefully we'll be able to implement it, or detect that. So

port 1234 on localhost. So we run the prober and it thinks for a little bit. That's because it's running a whole bunch of these probes. It could make it faster, but that would be at the expense of accuracy. I'm just hoping that it's actually running. Let's see. We can see some error messages. Oops, this is taking a long time. Let's move this up. Typical with demos. There we go. Okay, so. If we go up to the top, you can see straight away that the highest number of matches in our fingerprint database are for Java. And in fact, it's from a particular implementation that I have built for it because I fingerprinted it and added that specific fingerprint to

the database quite some time ago. We can see the next matches are also Java, so we can see that it could be Tomcat. You could also see, with a much lower probability, that It could be Tomcat again. Ah, yes, with the native APIs. But what we can also see is that we've got very, very few matches for some of the others. So it looks very, very different from Microsoft IIS. And you can see that we're only getting one match with OpenSL. So the scoring is working very well here. And we can very well tell that this is Java. We can be pretty confident with that. Now, it could be that I've just written some code to just think the same thing all the time. So let's

try the same thing again, but this time we're going to run an OpenSSL server. So I'm going to run it on 4433. So there's our server.

And once again, we have to wait for a little bit.

So we're getting lots of fingerprint matches for OpenSSL 0.9.8.

Slightly less for OpenSSL 1.0. Now you might notice that it's actually saying that it matches several of those versions identically. And that's because OpenSSL generally, when they apply security fixes, make the minimum changes possible. So I couldn't tell you which one of those versions it was. But I could be pretty confident that it's OpenSSL. And equally, I'm confident that it's OpenSSL 0.9.8. And just to prove that,

we have quite a few different versions of OpenSSL there. So let's see what happens if we switch to using 1.0, I know, d, pick one at random.

Now what we're hoping will happen here is it will be able to distinguish it from OpenSSL 0.9, despite the fact that obviously it's the same product, but it expects it to be relatively similar. And as you can see, yeah, it's spotted it, and it knows it's OpenSSL 1.0. So, it just goes to show you that even within the same product, over a relatively small set of versions, you've got quite significant differences in behavior. Now, they're small, but by sticking all those differences together, actually able to identify it. So, I don't think we need to try any more versions. We could do, but let's look at something else. So,

what else can we do? Well, Alexa helpfully provides a list of the top million websites. So that's the websites that they are seeing people using most often. Annoyingly, that list isn't free anymore, it used to be. But, the University of Michigan, or even lesser, so they've taken that list and they've actually pulled Scammable. One of the data feeds they provide is which of the servers in the Alexa top million support SSL. So it's basically which ones have port 443 of them.

Now that we've got a prober, why don't we just probe the lock? Well, I did. The interesting thing is the results are quite, well, not that surprising, but the nice thing is the actual measurements. So the prober is trivially parallelizable. So I ran 50 different instances of it in parallel, each one probing a different site and working its way through the film list. And then stuck all that data together at the end. 680,000 of the top million has 443 open. That's quite a lot of servers to Chrome, obviously. So it took around two and a half days. And that generated me about two gigabytes worth of fingerprint data. So what were the figures? Well, some of the servers didn't respond. That's quite

common because the data was a few days old, you know, the servers change over time, especially when you're talking about a million of them. We got 668,000 or so valid results. The reason I put the actual raw figures in here is because these are measurements. They're not me just sticking my finger in the air and making a guess. I've actually measured this stuff. So we got 17,000 where we couldn't fingerprint. We had 24 probes per server. We got over 16 million results. So 16 million probes came back. But when we look at how different those were, there were only just over 10,000 different answers. So basically that's 10,000 different configurations or 10,000 different setups. The most

common one matched 18%. So that means that with a single server version, it's doing 18% of that top million.

Unsurprisingly, it's OpenSSL. So even though people might dislike OpenSSL, OpenSSL is the important implementation of SSL. All the others are largely irrelevant. I mean, it'd be nice if we had a wider ecosystem, but we don't. And we have to live in the world as it is. So we need to get OpenSSL to be better, not just know about it. So 60%, that's a hell of a lot. So IIS is quite common, we see a lot of it, but it turns out that actually it's not doing the SSL. So generally we're gonna find that people are using content delivery networks or they're using F5s or other offloading the SSL from the main server onto a dedicated

box. Now that's not that necessary these days, but that seems to be what people are still doing. So what you've got are So that's really not much. The thing is a BIOS, it's basically a power law distribution, but if you've got a CDN, that CDN is likely to be doing large numbers of high traffic sites. So if you've got a single CDN that was using OpenSSL, basically if it wasn't doing lots of high traffic sites, it would go bust. So we would expect there to be some implementations that are much more popular than others. If we look at Akamai for example, we know they use OpenSSL, they quite freely say so. Google likewise used BoringSSL which is basically

OpenSSL with little bit of Google magic added. So that's what you'd expect. So we're not really seeing anything that we couldn't have predicted, but the difference is that rather than just making guesses or stabs in the dark based on our experience, we can actually directly measure this stuff. The fact is that in practice we have a monoculture for TLS and the implementation of choice is OpenSSL.

So what have we covered today? We've looked at the basics of the protocol at a binary level, how it all works, how it all sticks together to make crypto. And then we've looked at how we can put, you've put the bits together in an easy and a readable way, in a tweakable way so that we can modify things when the answers don't come back. So rather than being stuck with the binary that some random work on the internet threw out there, we can actually make these exploits ourselves and tweak them when we hit something different. We can probe TLS implementations. We can see that we can do it on a large scale or on a small scale. So you can use it in a small pen test or you

can use it if you want to get a data set about the state of the internet. That lets us say that we know the actual situation in practice. You can also run it over your own internal network and find out what you're using. All that coding is available on GitHub. So I haven't actually released the code for doing the massively parallel stunning yet. Basically because it's a bit of a mess, but I'm trying to tidy it up and then I'll do it. I've also got some code for taking the huge quantity of data and throwing it into a database. Don't try it with SQLite. I tried it. It is too slow to use. My SQL seems to be a bit better, but you've

still got queries that are taking like an hour or two to a month. But, you know, it's not that much data. When I compress the Alexa data down, it's going to be only, it's only about 20 meg. So I'll actually release the raw data for that and the scripts for importing it, but I'm not going to throw the 2GB data files onto my server because it will fall over. The PyTLS library is in the West Point GIF repository. That works standalone and has some tests for things like getting the Diffie Helmut primes, the server example that I showed you. It also has a few others like the change cipher spec vulnerability test, a bunch of others, poodle testing, that kind of thing. And you can easily write new

ones. You can also easily add in support for new bits of the protocol it doesn't have. I'm very happy to have any pull requests anyone wants to send. There's another module which is the TLS program one. That's the fingerprinting tool I showed you. Again, more than happy to our pull requests, we've already had some third party submissions of fingerprints for it. I would love to have more. The fingerprints are actually quite solid in that we can upgrade the tool without having to upgrade the fingerprints. And even when I change the format a little bit to handle the hand-shaking version issue I mentioned earlier, the format will be compatible. So basically feel free to add as many fingerprints as you possibly can. And yeah. And that's pretty much all I've

got to tell you. So, have you all got any questions? So you essentially created an N-map for SSL? Not quite an N-map, but the N-map fingerprinting tool, yeah. Yeah, it's absolutely the same thing. You said that different implementations sometimes lie about what do you have to do since they support. It's not a horrendous book. No, because what they do is they They'll say, they'll announce that they'll support 1.0 to 1.2 for example. The connection will be established, they'll do the key exchange, but then at the end it checks the version and says, oh, hang on, that's not good enough, and errors. So it's generally, it's not the implementation doing that, it's the application that's using the library, which

rather than sort of tweaking the config of the library, it's often easier just to say, did I negotiate the security parameters that I thought I'd get? It's generally applications taking the shortcut, usually because they don't actually know how to treat the knobs in the SSL implementations correctly. So it's quite common to see a connection be established and then dropped afterwards because the server, for example, was expecting to get 1.2, but in fact 1.1 was negotiated. Are you saying on the second attempt it will negotiate correctly? No, it would just be able to connect in that situation. You do get a situation where things do retries, that's generally only done in browsers. There's a whole other talk about that if you really want. But yeah,

in a browser you can get it, so it attempts a connection. And it sometimes will also attempt a second connection with a different set of settings. In some browsers there's actually three levels. So it starts off at 1.2, then it tries 1.0, then it will try SSL 3, So, because there's some things that are sensitive to extensions, and if there are any extensions in the handshake, then they'll fail to actually accept the connection. SSL Labs doesn't test for some of that stuff, but what the browsers do is actually even more baroque and complicated. Yeah, to be honest, talk to me afterwards about it if you want, it's too complicated to go into now. Do you test for Keep Alive? Which one?

There's the TCP level Keep Alive. You've got the heartbeats, which are another Keep Alive, and there's a probe for that. These days, unsurprisingly, most people have disabled heartbeat support. But yes, I mean, there is a test for that in there. There's a valid test for a heartbeat that is correct, and also an invalid test for a sort of de-recognized heartbeat exposed as well that are used in the probes. I mean, you've just talked about having lots of more signatures. Are your signatures compatible with NMAP? Can you use their signatures? Can you contribute to theirs? NMAP doesn't have any facility for doing this kind of fingerprinting. You know, they don't have this tool. It would be possible for other tools to be written that would integrate

with it, but NMAP's fingerprints are at the network staff level, whereas this is independent of the network stack, it's specifically about the TLS implementation. And they do think with some services? Yes, they have fingerprints for FTP servers and a few others. They don't have anything for SSL implementations. And the other question was, you're talking about active programming. Active programming, yeah. And same with SSL Labs. The NTP could you think of it from the right? Yes, you can, but not as effectively. So I can, for example, from a packet trace, I can open it up in Etherreal and I can tell you, oh, that's either Java or IIS, or I can tell you if it's OpenSSL or not. But without sending a range of different

requests to elicit different answers, I can't get the detail of the fingerprint. So I can't, for example, tell you the version. I think it's sort of P0S.

Yeah. Well, PCOF is a facility for fingerprinting SSL clients, and that generally works by fingerprinting suites supported by the client. And because most clients are being run by consumers, the consumers tend not to configure the cipher suites differently. That's why that's effective in that scenario. I'm interested in fingerprinting the servers and the servers, the Cypher Suites generally are being configured differently, which is why I have to omit that information from the tests that I'm looking for. Final question. You talked about lots of kit being opened in the cell hardware accelerators. How is it really hardware, how is it just sort of specialist operating systems sold as hardware? Well, what they generally are, or standard boxes,

and they'll have a little crypto accelerator. So it'll basically be hardware accelerator ADS. These days, commodity chips have hardware accelerator ADS. So you've got some... Well... You know that it's usually hardware accelerator ADS or something like it. I mean, the main risks in situations like that aren't so much the crypto as the key exchange. So it's the land of November generally, it says they're much more scarier. Yes. But I mean, court orders are a much easier way to get the key in general. Much less efficient. Yes. Any other questions? The servers that were listening on BOP3, the numbers seem quite high. Did you look to see if they were actually talking and think she's even serving

for the domains? I didn't, basically because there's too many. I know that a significant number of the responses, even though 443 was open, were actually plain text messages saying, basically, I can't talk SSL. Those stand out quite well in the database. I also omitted all the ones where I got zero responses. So I was able to throw that data away. You also find some servers talk a mixture. So for example, NGNX has a relatively simplistic heuristic to detect, are you talking TLS? And basically if that heuristic goes wrong, then it responds in plain text telling you, I talk SSL. So you do get some edge cases like that. What you'd really wanna do, I think, in a future version of the finger printer is make it so that the

code that detects this fingerprint basically records that that particular probe is invalid. because that's a useful data point for us anyway and I think that would actually include the effectiveness. Yeah? If there's an implementation that's using backward fixes like a . Will that provide positive? That's an interesting question. The answer is it depends. Like most interesting questions. So some fixes, when they're backported, will actually make a change to the signature. So for example, the fix to the early change cipher spec vulnerability changes the behavior of the stack in a way that's detectable remotely. Other changes to the stack might not be detectable remotely. So for example, if they fixed a memory management error or something like that, that doesn't change the behavior on

the wire. So the only way for us to detect that would be to actively try and exploit that vulnerability. Whereas what we're doing here is not actively exploiting any vulnerabilities, just sort of poking it with a stick and seeing what it does. So the answer is sometimes we can detect it, sometimes we can't. There's only two probes in the current probe set where we can detect it, and that's the early-chain cypherspec one and the heart-need one.