
Okay, we start. So what is this? I come from security engineering, I don't do software, I don't care about software, at least not until this moment, and I don't trust anything or anybody. So when they say you have a SSL, TLS and crypto, I say, who? Can you prove it? And so this was beginning. So to do it, what I did is collect everything that has an RSA key, so a crypto key in it, for a couple of years. So why is the study? Everyone says there is encrypted communication, everyone is using crypto for whatever reason, but the amount of crypto stuff is increasing and there is no verification for that. There is no test and there is no assurance for
crypto stuff because we assume the math is correct. But what about implementation and what about how people use it? So what is a crypto key? Nothing magic, two numbers and usually a multiplication. So no math, no advanced stuff. If you can multiply two numbers, you can break crypto. Then what is the base of crypto? Prime numbers. Okay. Now, if we have this, these are the assumptions that are false. The assumption is people that do it know what they're doing, but it's not the case. Software is supposed to know this stuff, so people are supposed to know how to create it or to use it or to implement it. It's not the fault of anybody because over time
the standard changes, the reference changes, the math changes, and also the library changes. So I'm not blaming anybody, just saying the reality is not what it seems. Then another disclaimer: I could have done millions of things, But because I'm based in Europe, I cannot do it for privacy issues. So we are limiting, I limited the analysis only to numbers. I will not point finger, I will not say what is broken or where or who. Nothing related to correlation. So who has this broken certificate? I don't have it. I could have it, but I don't have it. What did that? Ignored it. Okay, so we focus only on the numbers. Now, what I did? For four years I dumped, because again legal problems scanning in
Europe is problematic, so I took the data of somebody else, I set up my own server, and I dumped all the PGP keys generated by everyone. The result is around 20 terabytes of stuff. It's not a lot, because we focus only on the numbers. And then I use Suricata to collect the certificate on the fly that we're passing through networks. Okay, now this is public stuff. So we have to start with the test. We don't want to start with everything. So we dump every PGP key, GPG key used by everyone generated since the beginning of time, at least published online. That is usually measured as how many keys, but actually they are not keys, they are keyset. So PGP
servers set a key and say the oldest one is key one. But inside that key you can have 20, 30, 40, infinite keys. So in the PGP side they say you have 4 million keys, in reality there are almost 9 million keys. Again, theory created using large primes, forget about that. What are we looking for? There are very very basic rules. 1. Every key must be unique, ever. And that means forever. If I use one today, nobody, nothing can use the same key ever again. If anybody uses the same key, both keys are broken, but it's undetectable, because you have to have the information that 10 years ago somebody used the same keys that they generated today. And
it happened, but you cannot detect it unless you have the data. Then you have to use a big number, at least 100 digits. Says who? Says theory. Is it true? We don't know. Then we have to use prime numbers. We are getting we don't know. We don't trust anybody, so we are coming from this assumption. zero trust, we check. Okay, the reality is after crunching there are around 900 keys broken. And I mean anything that touches that key, there is no crypto at all. Because anything can take the key, reverse the key, regenerate the private key and anything touched by that, forget it. It's like plain text. Amount of key diff is change over time,
but that is not important. What is important is that there is a timeline. This is also false because the PGP servers accept whatever you throw at them. There is no check, there is no security verification, it's like a dump where you take the stuff and you throw it inside and the limit is that file format must be correct. Nothing else. So you have keys with duplicated keys, keys with a copy of other keys, keys with a signature of one key, a piece of a key of somebody else, and data that pretends to be somebody else. but you cannot remove it from the site, you cannot remove it from the dump, it cannot be tested, and a lot
of keys you cannot even download it. You see them only if you dump the key set, but if you try to get the key with pgp or with any command line, it will throw you an error. But the key is there. So the point is, so far in 9 million keys, this is a baseline. The assumption is wrong. Keys are not properly generated, all it's needed here is prime number, it is nothing magic, Prime number verification is not there. And also key duplication is not there as a test. These are some statistics. What happens in reality is that you don't need magic. You just, for example, test if a number is divisible by three or divisible by two. It's not really rocket science, but
it's not been tested. Therefore, it's there. Anything that touches this, again, broken. Crypto is useless. Doesn't matter if the key is 16,000 bits, because if the assumptions are not properly interpreted, it's completely useless. You can generate as many numbers as you want, but still it's broken. The problem here is if anybody has the same data set and everybody has the same because it's public, they can take the key, reverse the key, generate the private key and start to sign things on behalf of whoever has the key. So all better off, you have no idea what's happened. Then, okay, now we start, we finish to play with this, now we go bigger. So I take half a billion certificates, so 500 million sets,
take half of it, deduplicate all the stuff, we remain with a bunch of stuff, okay? So, again, what you can put in a certificate, actually whatever you want. You can have... Unicode, you can have a printable character, you can have binary files inside it, you can have links to another certificate, you can put whatever you want. Why? Because nobody's checking. You can generate a certificate, you can get good certificates, but if somebody generates a bad certificate, they can't. Once the key is generated, again, important, once that key is generated, nothing can generate the same key. And again, A key is a product of two numbers. So if anything on the planet by chance uses one of the two numbers, then also that key is also broken. Okay? Because
one key is created by two numbers, if a second system generates another number that shares one of the numbers, both keys are broken. So there is an infinite possibility of how many keys you can have broken. You cannot test it. Why? Because you need all the keys. You cannot verify it because again you need all the keys over time. If you have it, you don't need to do anything. There is no hacking, no cracking, no magic. You take the key and divide them together. If you find a divisor and you have the log of the traffic, you can decode both traffic at 0.4 because you have the key and actually you can regenerate the secret
certificate. Again, at no cost. So I put a tool here that we have a workshop, it's OSAFT. So far, as far as I know, it's the only one that if you receive rubbish from servers, it allows you to decode whatever you receive, because it's not based on libraries. It has an internal decoder that is parsing the binary that it receives. If you're using Linux or Windows, don't trust the operating system library because they lie. And they lie, why? Because between versions you have changes. So for example, in the new system they say OpenSSL removes the option to generate bad keys. But on the reverse, you cannot test bad keys. Finished. Then what you do? You have to
recall the library, but then you have to remove all the options. And then they say, "Okay, we have a set of 200 object identifiers to say what is in the certificate." Because the certificate has pieces. I have a list of more than 10,000 object IDs. And I have no idea what the object ID is because anybody can put whatever object ID they want. They are not obliged to disclose what is in the certificate. So secret encryption, secret stuff, whatever. They are in a legal format, but we don't know what is inside of them. So after all the duplication, we remain with around 75 million certificates. Something very simple as a test. There is really no math here, no coding. It's just data analysis.
at the average level, okay? Because it's 20 terabytes. Okay, public key, what is public key? Again a multiplication of a product of two numbers. That number has to be unique, ever. Then what is the modulus? Public key, for we are talking about here RSA crypto, is modulus plus exponent. What is the modulus? The modulus is a multiplication of two numbers. The public key is a multiplication of two numbers and something else that we call exponent, but at the moment we don't care about that. So, is the module unique? Is the key unique? Do we have duplicate numbers? And then is the exponent valid? We are talking only about numbers. Okay, first test: public key of my test, 38% are duplicate. Why are they duplicate? A
lot of reasons: certificates cost money, therefore you can rekey your certificate. You generate a new one, use the old key. So you have a different certificate, but with the old key. Or you have duplication in the collection. There is a very, very big bias here. The data that I collected is coming from scans.io mainly. The other part is from my staff, but the part on scans.io, they scan the IP space. So if you have a server with one million servers, sorry, one million services, you get one certificate, but you can have a million websites on that server. Okay, so here you have a lot of duplication of system or server asserts. We don't have the clear picture, and this
is only a sample, it's not all the stuff that I have. Now take this number in mind: 38% should be zero, but we forgot that re-keying is possible, key porting is possible. So if I take a key in GPG, why should I have to regenerate my key? So I create a GPG key, I take the RSA key and I put it in SSH, VPN, SSL. Can I do it? Yes. Is something preventing me? No. So it's one key, five services. Okay, now this module is unique because we can rekey. Now we can see that we have 5% of certificates that share the same modules. This again, in theory, should be zero. But if you regenerate a key and you change the exponent because whatever reason, again you can
have duplication. So this is not a disaster, but it can happen. Should be much smaller as a percentage. Okay, now an example. We have a key, we have to multiply the key. We have two numbers to multiply. As I say, key one, we have two numbers, key two, we have two numbers. If the shared one number is n. What is happening? Because I have a lot of data, first I tried to use Python for this, it didn't work. Why? Mathematical libraries problem, they crash, or you try to check the numbers and you have only integers and they give you a negative number. Or you have only integers and you get a floating-point number. So I say, okay, forget it, take something that has been
created by scientists, So I took the software created by another group of scientists and I ran my stuff with their software, resulting in problems. More than 100,000 certificates are broken. And I mean useless. No, worse than useless. You think they're okay, but they're broken and you cannot test it. Because to test it, you need all the other certificates. So you have no idea if they're broken or not. What is important is not how many certificates are broken, but how many are impacted by the broken moduli. The modulus, the broken multiplication means something across the 75 million certificate is using the same numbers. The reality is this number 700,000 certificate impacted. It's close to a million in a sample data set. And
if we see the numbers again, key size doesn't matter. The problem is that the 700,000 certificates are close to 1% of the amount of certificates in the whole internet. This should be zero, I mean literally zero, because we have a key collision. Then we have small primes. Tester, can you divide it by 2, 3, 5? Should be zero. We have a lot of them. Then again, if we check them together, These are maybe RSA broken implementation. Maybe it's chance. We don't know. Why? Because it's very difficult to check this stuff and you have no reference for this. Again, on theory, we are using RSA. Okay, perfect. I put exponent 1. And then what happened? RSA crypto works. Math works. No alarm, no error, nothing. The
problem is with exponent 1, the crypto is equal to the plaintext. So you are actually applying RSI crypto, but there is no crypto. It's allowed, it's possible, it's permitted and it's used. And so what you do? This is a very good example of somebody that did it and fixed it very quickly. This is SaltStack. For years they have been all their crypto with 8.1. Why? Because it's super fast. And what they say: "We are not crypto, okay? We hired somebody from crypto and told us: change it!" And they change it on the spot. So, perfect answer. Problem is: this is only one. And all the others that are doing it don't say anything, because as we see in
the previous slide, GPG, PGP-SAT, they also have keys with one. So as a summary, it's close to a million keys and this is only a sample. I'm not testing all my stuff. I'm not testing all the searches that are online. I'm not testing the whole set. Again, it's a sample. It's compatible with other research. 1% is more or less what everyone finds broken. The problem is that my 1% is 700,000 something. Their 1% is 70,000 or 100,000. The difference is that I use 75 million instead of 10 million, but the percentage more or less is the same. Everyone will find the same, but there is nobody to blame. You see this only if you have long-term on collection. And you cannot say something is broken because
actually there is nothing broken. It's math, you have to check it. If you want more information, these are two super good, incredibly well done research by real scientists. And no, we shouldn't panic. But now what we have as a problem? And now I'm really running. The problem is we don't need any math at all to check if the key is broken. Why? Some test: if we have random keys, we should have uniform distribution. Says who? I don't care. I took whatever I want. Okay? Then I multiply two numbers and who cares about the number? I just have to multiply two stuff together. Okay, this is in theory. In theory I should have something like a
triangle. And this means everything is random. If I take the real keys and I take only a sample of an example of keys, this is what I have: real data. Now, what is the stuff at the beginning? There is an answer for that. That is real stuff. That is the product of an optimization. And what happened, if you see there are three movements in the interval. They go up and then they go down. Because we have big numbers, we have more prime numbers at the beginning than we have prime numbers at the end. Therefore it will go in one way and go down another way. Nobody checked that, because they check only numbers, multiplication numbers. Therefore the distributions will be triangular. The reality is the weight
of this side is 38%. This 12% of key lost and everything else goes in different format. Then if I try to use OpenSSL, PGP, GPG, whatever, this is a key that I have. They are designed to generate the key with a gap because it's an optimization. And the optimization is if I want something that works fast, I put first and second bit to one, last bit to one and multiply the number. And they're prime, most likely. Then it's faster. Yes, because you cut a 12% of the key. Super good explanation. A teacher from university. You can find all the details you want there. It's a lecture. It's free. I contacted the guy. The guy is
super nice guy and you have all the information there. Another part is, and again, you have all the slides online. Another part is why we even care about the stuff? We don't care. Take the keys that you receive and plot them. Do they start at the beginning or at the end or something? Everything is squashed to the end. That is a beginning of distribution. There are more than 7,000 keys that start all at the end. We don't care about anything else. That implementation, that system is broken. But it's not tested, it's not reported, because all the keys are valid. It's just a number. Conclusion for somebody who wants to read it? We have no idea. Why? Because it's fault of nobody. These are my contacts, by the way.
Nobody's at fault here. You cannot point the finger to anybody. It's not a fault of the CA if they generate a number and by chance in 10 years time somebody generates the same number. You don't want the same archive in the hand of a government. Why? Because if they archive, they literally instantly say "Oh, there is a connection broken. Let's decode everything." Okay? And if your key is broken, what do you do? "My key is broken?" Yes. How can you know it? You can't. But somebody else starts to sign your stuff or encrypt your stuff with your key. So have you been you? Have you been the guy encrypting stuff or not? You cannot prove it, because crypto should be the one proving it. So there is no
way. And the other part is where do you put your stuff? I have no idea. I have terabytes of this stuff if anybody has any idea, come to speak with me. And another part is also the mathematic part is broken, because I found the hard way there are no test libraries. I have no idea how to test the math of the program. I tried in Python, again, super good for some stuff, but I reached a limit where if I start to use a key, I got negative number with primes of key. It's impossible. Sometimes you get overflow, sometimes you get an error, sometimes the key is okay, but it's off by one number. So it's another key actually, but there is no alarm raised. So that's
it.