← All talks

BSIDESLV 2018 - Ground1234! - Day Two

BSides Las Vegas9:16:36666 viewsPublished 2018-08Watch on YouTube ↗
Mentioned in this talk
Tools used
Concepts
Show transcript [en]

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪♪♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

so

You're awesome. I promise this cannot possibly live up to all of the shit that has just gone down, but thank you all for sticking this out anyway. It's 10:10. Let's do this. Thank you all for coming. So, using lockpicking to teach authentication concepts, or as the terrible pun I put earlier, "pick so it didn't happen." Yay. I'm Kat, by the way. I am the leading lady of lockpicking here at B-Sides along with Wendy Knox Everett. So if you're curious about learning more, we've got a nice lockpick village in the chat room. Y'all should come by later. Sweet Kat on Twitter, feel free to tweet me. Pronouns are she, her. I'm cis, but I like to try and normalize the whole not assuming pronouns as much as

possible. So let's get to it. So what are we trying to solve for here? So... Security can be a very difficult thing to communicate to those who aren't familiar with it. So when we're teaching security concepts, a lot of what we're presenting to people who might not be as familiar with security is very abstract and very difficult to convey. This whole idea of "think like a hacker" - what does that even mean if you don't - if you're not already in that mindset of "think like a hacker"? So how do you get people into that mindset? So we need to be thinking about effective novel approaches. So there's been some really good work on this

both in like the security education community and just other security practitioners. There's like the analogies project. Jesse Irwin gave a really good talk at ShmooCon a few years ago called "Speak Security and Enter" about using certain more universal analogies for teaching the security mindset, teaching security concepts. But not too many of these have gone on into a hands-on kind of thing. A lot of times we're still dealing in abstractions. So how do we solve for that? Big academic words, embodied cognition. I should just preface the fact that I am not an academic nor am I a psychologist, so this is very armchair. But embodied cognition basically is the educational and psychological theory that cognition is influenced by

more than the brain itself, also influenced by the entire body and the surrounding environment. So one application of this in educational theory involves using concrete objects and concrete metaphors to teach these abstract concepts. So that was kind of the idea behind this. Literally came out of late night conversations with a friend of mine a couple years ago who was a PhD student in computer science education and she built a curriculum that she actually taught at Roots a few years ago, Caroline Hardin, on teaching open source software to kids using a metaphor of Legos that were, some of them were glued together, the proprietary ones, and some of them you could take apart, the open source ones. And like open source licensing is not an easy thing for

eight-year-olds to wrap their brain around, but they got it. And so I was thinking, huh, what other security, how can we apply this to security? So that was kind of the thought behind all of this. Turning the abstract into the concrete. So that's kind of the focus of this. So why lockpicking? A few reasons. One, it has a very low barrier to entry. I have taught kids how to lockpick. There are a lot of kids who came by the lockpick village yesterday. Well, I guess a lot by the standards of a security conference in Vegas. But it's something that people can generally pick up pretty easily if they have decent manual dexterity or even crappy

manual dexterity if they have use of their hands. Most people can pick up the lockpicking skills in a pretty short period of time, especially when we're talking about one-pin locks. It also has a very low threshold of gratification. Lockpicking is sexy. People want to learn how to pick locks. It's a thing that people enjoy saying they know how to do. And so it therefore incentivizes people's interest in learning more about the security concepts that it represents. If you are doing a thing that is fun, that's going to be more engaging than something that you think is just really, really dry. It also has possibilities of think like an attacker and think like a defender, because

locks are a barrier. They're a security control. And so you can think of that as a thing guarding something that you want to protect, as well as something that you're trying to break into. So think like an attacker is useful, but you also have to think like a defender. And one that I didn't put on the slide too is a lot of security professionals have experience with lockpicking and so it's an easy enough thing to teach people rather than trying to learn something new yourself and then go and spread that to an audience. So I don't want to make any assumptions about the audience's lockpicking levels, so I just want to give you a high-level overview of the basic theory of lockpicking, the super quick and dirty version. If

you want a slightly less quick and dirty version, the Lockpick Village has beginner sessions that they'll be holding at 11:30 and 2:30 today, and we're also just happy to come by and teach lockpicking if you stop by. and Wolf Contest at 4 p.m. I promise this talk isn't just a shameless plug for the lockpick village, but it's more of a, if you want to learn more, go here. So basically, you have a lock. This is a sort of sideways view of the guts of it. And there are these stacks of pins that Driver pins, key pins, the key pins are all cut to different sizes, which is why keys look jagged. And so the driver

pins are all the same size. So when you have a key, you're pushing all the pins up at once and getting all above the shear line so that that cylinder turns. With lock picks, you're doing them one at a time. And it's a lot of trial and error. You're going in blind. There are those clear practice locks, but for the most part, you can't see what you're doing. So it's just kind of figuring it out. And so... Yeah, lock picking basics. So how do you map that to authentication? So the passwords track, you probably, the basics of authentication, the process of verifying who you are. God, it is really hard to come down from the nerves of having an AV

failure. That's okay, we'll make it work. So basically verifying who you say you are by something you have, like hardware token or something like that. Something you know like a password or security questions something everybody knows. So locks can be a form of physical security. You can have a key, something you have, but you can also think of as an analogy for a password, something you know. And so you can take that idea of this is a thing that I have that somebody else might not have and use that to apply it broadly. So Evasion is the word I was missing on my slide, missed in Hacker Pyramid last night. But when I often teach lock picking workshops, I start with why would you

pick a lock when you could just break the door down or blow it up or bump the key? So a lot of it comes back to this. You don't want to leave evidence. Why would you want to leave behind what you're doing? And so this is something that is obviously very broad in security. Attackers don't just have a goal of gaining access by any means necessary. The more sophisticated ones aim to get in without leaving any evidence of having gained access and leave a back door so they can get back in. Hey, back door, locks are on back doors. Terrible puns. So if you're doing some kind of brute forcing, that's going to set off alerts and show up easily in logs. And so it's about going in stealth

mode. And so...

By the way, I also forgot to say this at the beginning because we had some fires we were putting out, but I'm happy to take questions at any time. I will not take five paragraph long statements, but if you have questions, feel free to ask them. I'll either hold the mic to you or just repeat your question for the sake of the recording or for people who can't hear very well. So feel free. So yeah, basically leave no trace. So we also want to talk about uniqueness. One of the most common problems of good security hygiene that we come across is password reuse. And so how do you really drill home the importance of not reusing a password? Well, you think about it, even though keys might not be

like, even though locks might not have 50 pins and be super, super complicated, we don't use the same key for everything that we lock. In the event that one gets compromised, in the event that somebody breaks into your house, you don't have to replace every single key you own. And so similarly, in the event of a breach, an attacker could use a compromised password to access any account that was associated with it. And so we want to drill home that it's easier to clean up after a compromise. Complexity is another one. There are more pick resistant locks that exist, but they're more costly and difficult to manufacture, and so they often lock more critical assets. And so there's these different levels of difficulty with picking locks. And so

while we don't encourage weak password use, the fact of the matter is some people are going to Use weaker passwords and so if people need to balance what they use their strongest passwords for We want to make sure they're focusing on accounts with the most sensitive information A friend of mine put it really well. She said never spend more on a defensive solution than the thing you're protecting is worth Yes risk so when I taught security workshops, I Often going a little bit into data classification. What is the thing you're trying to protect? How sensitive is it? How risky is it to the business and to the person if it gets leaked? And so we can think of the different levels of complexity of locks and the ability

to pick them and the resistance to picking as the strength of passwords based on the sensitivity of the information you're trying to protect. And not even just the strength of passwords, the strength of the various security controls you're putting around them. Also defense in depth. One of the things I'm always trying to rail against is that there aren't any silver bullets and when people often learn how to pick locks sometimes their first thought is oh my god I'm gonna go cry in a fetal position because nothing I own is safe if I can pick this lock everything is fucked But, again, and sometimes security people are the same way, especially pen testers. They're like, I got root in five seconds. Oh, my God, this entire organization could

be completely pwned. But, again, security is all about balances and tradeoffs. And so... It's still an extra layer of resistance and that's what we try to emphasize with you still lock your cars, you still lock the doors to your house. And so there's no single silver bullet. There is a need for a layered approach. And so this is where we can talk about defense in depth. And we can talk about maybe you put a lock on your door, maybe you've got a home alarm system. Authentication and access control will never be perfect on its own. We're trying to kill the password, but even then, like... There's always going to be stuff. So we want to try to build around things to minimize the attack surface in case

of bypass. Things like password uniqueness, other technical and physical controls, all of that. So basically it's a matter of saying, yeah, maybe everything is fucked, but we can still put layers around our various assets. So, some limitations to this idea of using lockpicking to teach these concepts of passwords and authentication and security. Lockpick laws vary by state. Tool has a list of lockpick laws by state on their website, and I think they have a few other countries as well. But there are some places that we live where we might not be able to do this, especially other countries with more stringent lockpick laws. Like I know Tennessee, just having picks on you is considered intent.

Most of the other states it's not, but you never know. There's also, there can be social consequences and legal consequences. Even in places where lock picking is legal, certain populations, especially certain demographics, may face consequences if they're carrying picks around or if they learn how to pick. Especially like, I don't know, I'll just say it, like racism is a thing and certain people are more likely to face consequences if they get stopped and they have picks on them than others and it's it's shitty. Also, analogies in general can be a really useful teaching tool, but they can also fall apart if we don't draw solid connections to the concepts that we're trying to convey with

them. Or if we take an analogy a little further than it's applicable, like some people keep doing down a rabbit hole and saying, how does this lock thing map to this security thing? And that's not always going to be possible. So it's important that we can Sort of set boundaries around some of that and use this hands-on thing to draw an analogy, but know that it's not going to be a catch-all. So, God, I really blew through this because I was afraid I was going to be strapped for time because I started 10 minutes early. And, yeah. So... Yeah, so what comes next? I had to throw in a Hamilton reference. Lockpicking to teach authentication is just one of many possibilities.

This hands-on approach can be applied broadly, and I encourage you to think about how you can-- what you want to convey to people about security and how you can draw that to something that's hands-on and not just them sitting in a lecture because they're not going to soak up as much information that way as when they're actually playing with stuff or at least interacting. As security professionals, we deal largely in these really abstract concepts and just trying to repeat them over and over to somebody who is trying to learn is not So if we can teach the importance of digital security by drawing these physical conclusions, these hands-on analogies, we become more effective at our

jobs and our users become safer as a result. So I really rushed through this talk, but if you have any questions, we've got a lot of time, so I will take any of them. Yes? Yeah, here.

You brought up a good point about the carrying the lockpicks around. Like, I was wondering if I bought a kit, bringing it back on a plane to where I'm, you know, in Santa Monica, how that would go. Yep, it's on. You've just got to kind of, like, hug the mic, practically eat it. So the question was, picks on a plane, basically. I'm not sick of these motherfucking picks on this motherfucking plane. If you're traveling domestically, you should be fine. Tools are allowed if they're under six inches. I've never had a problem with TSA stopping me for lockpicks. I haven't traveled with them to another country before, but it depends on the country's lockpick laws. As

far as locks, if you've got a buttload of locks in your suitcase, TSA might be a little freaked out because you've got this giant pile of metal in your suitcase, but you should be fine. Anyone else?

Yeah. Got you next. I was just thinking about his question on playing. I've only been stopped one time. I carry two lockpick kits with me in my wallet, and I've only been stopped one time, and TSA just looked at him. He just gave me a look. What are these locks for? What are these picks for? It's the job. Yeah, exactly. Normally I don't have a problem carrying lockpicks themselves, but a few times I've had those little lockpick sets that are like a small pocket knife, and those have been challenged by TSA. So I'm not sure why they really focus on it. Pocket knife. They don't like those. All right, Urban. Hey. So I don't have, like, a great well-formed question, but I'm

really curious about sort of the embodied cognition aspect of this and just, like, If you've seen specific cases of sort of people like actually understanding password cracking or password manipulation better as a result of manipulating locks physically? Yeah, I mean, it's hard to draw causation from correlation, but I have seen... People sort of like when I've sort of talked about password cracking while doing it Then they do tend to get it. They're like yeah, so yeah Yeah, I Really do any ocean I had my own experience not so I teach basic crypto using physical using locks and symmetrical asymmetrical and different keys and bits and pieces so I found when I teach it's exactly the same set up that the

physical hands-on really works and You've got lockpicking teaching passwords. Have you got anything or have you tried any other physical analogy that works? Not yet, but I'm always trying to think about that. Security education is a big part of my role at work, and so I'm always trying to think of how we can incorporate anything that's interactive or hands-on, especially since they've found that even just doing something with your hands, even if it's completely orthogonal to what you're teaching, can be useful. Like, I don't know, like playing with a fidget toy or something while you're listening. And like in college, I would knit during lectures and they found that just doing something with your hands

can help with retention of information. So I'm always trying to keep that in mind. But if you have ideas, please, by all means, implement them, share them widely. Yeah. I'm sorry. Question for the gentleman who was just speaking. Do you have an actual asymmetric physical lock set that you can use to describe PKI? I've been looking for something like that. So the thing that I've seen used in videos for talking about PKI is actually a lock box where you can unlock the top with one and unlock the bottom with another. But I don't know if you've got anything to add? Very similar? Or paper and envelopes. Nice. Cool.

Thank you for your presentation. It's given me some ideas that I can take back when I do some general user awareness training. Question about the analogy for, do you have an analogy perhaps for multi-factor authentication, maybe the home alarm with a code that's like randomized, maybe something I can use? Yeah, this is kind of a bread and butter that I need to think about a lot more, but I work for Duo Security, so 2FA is one of our things. Oftentimes, the thing that I talk about is ATMs. You have a card, something you have, and then you have a pin, something you know. But there's got to be a lot of things out there for MFA. Cool. Anyone else?

All right. We finished super early. So if you have follow-up questions that you prefer to not answer in a crowd or you want to catch me afterwards, I'll be around for a little while. And then you can probably find me in the Lockpick Village for a bunch of the day. Again, I'm sorry for the AV issues. And I'm sorry that I then rushed through this talk. But we got through it. Thank you all for coming. ♪♪♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

Hello everybody. Shit, I completely forgot your password's name, or your talk's name. How I met your password. This is how I met your password. Greetings everyone. We would like to thank our sponsors, Rapid7, Amazon, Oath, and Simmel. And one other one. We'd like to thank you guys. Well... We'd like to also ask you to silence your cell phones if you haven't. Good? Alright. This is Dimitri with Bitcrack. Thanks. Okay. Hello everybody. It's pretty quiet in here. So, my talk today is how I made your password. basically covering a few things around password cracking primarily from the offensive side of things why it's sometimes becoming more difficult mistakes that we can be making before sorry before I continue can everyone hear me

working around okay thanks and also some attacks that some of them are not new they just haven't we haven't seen them for a while but they're still very good they can work very well especially if you're new to to cracking passwords, there are a few things that have evolved since they were first made. Then also what to do for the slow algorithms, the ones that are more difficult to crack. And then finally some new information about WPA2. attacks that I'm sure you recently heard of that was released by Atom at hashcat so we'll have a look at that as well and how it's going to help us a lot with Cracking those Wi-Fi pass passwords of course Everything shown here is for you to use in your

work. I assume no one would ever want to use it for anything bad That thought would never cross your minds so nothing mentioned here would would be should be used for that. The key is to try and make things stronger and more secure by finding what's wrong with them and trying to fix those holes. That's where these approaches are coming from. I will do the questions because it's going to be pretty close to lunchtime when I finish. So what I'm going to do is if you have any questions when I'm finished, then you can come up and we can discuss it. Because I don't want to keep those of you who don't have any questions

and may be hungry and want to get to lunch. just having a quick look at the outline then why you're failing just having a brief look at uh at why it's getting more difficult to crack passwords leaks best friend or worst foe so uh there's quite a bit of leaks out there and um we're gonna just have a brief discussion on whether or not they're actually working for us or they're not working for us as somebody who does password audits does it make your life easier does it make it more difficult um some people may feel that uh with leaks becoming somewhat apparent there's no it's harder for them to to crack passwords because companies

are securing pretty well against the ones out there we'll have a look at that as well then some attack methods uh markov tms and prince some of these those of you who have been cracking passwords for a while may recognize they're still very much capable of increasing the amount of passwords that you can crack if you use them properly. Attacking phrases in other languages. So we often just seem to default to cracking passwords in English, but if you come from different countries, different places, there's different kinds of passwords you can crack. Emoji may land up in a hash somewhere and you want to be able to crack that, so we'll have a look at that

as well. Cracking heavy duty algorithms and the new WPA2 attacks without client interaction. And then questions finally. Okay, so why are you failing? So what's been happening a lot recently, I'm sure you've seen it out there in the news, is that Well, we're seeing two things increasing a lot. We're seeing real-time password blacklists finding themselves into applications, into third-party tools, major leaks like Hava being pwned, releasing the SHA-1 hashes that companies can then query in real-time and say, well, the password you're trying to set is not secure. Try and use something else. Then the user education side of things. So there's always an increase in the amount of education that's been given to users every year. Unfortunately, passwords are still relatively secure,

and I think they're always going to be. But the level does increase as time goes by. As a new generation starts using the internet, they're a bit more pro-feed to those attacks that can happen. They're a bit more aware of what insecure passwords can be, and so they up the level a bit more as they start using systems out there. And then of course, better hashing algorithms also have come to the fore, and so that also makes it more difficult to crack passwords sometimes as well. Okay, salts, increased use of salts. So from a developer point of view, a lot more salting of passwords is being done and stronger and longer salts as well. Unfortunately,

sometimes there's salts being used when hashes are created, but they're so small and so weak that they may as well not be there at all. On the other hand, you have some people who are building very robust, very good applications that are salting those hashes very well, making them very difficult for somebody to be able to crack.

And then increased computing power resulting in better uptake of slower or intensive hashes. So certain hashing algorithms that were once considered too difficult to use because of the fact that they would have been too slow, the processing power to be able to do multiple transactions or multiple events in hashing a password That was an issue in the past. It's becoming less of an issue now. Of course, the more difficult algorithms that have been introduced may still also be causing systems to slow down. But the fact is that processing power is getting faster and faster and faster, and so more advanced hashing algorithms can be used. Just to note at the bottom there, of course, that

the inverse is true. So the more processing power that we have to use stronger algorithms, we have also stronger processing power to crack passwords as well. So the two tend to rise together, but the fact is that we are seeing a lot more use of better hashes or better hashing algorithms out there than we did five years ago, maybe even four years ago or less.

People also moving away from the easy hashes, mostly moving away from the easy hashes. Unfortunately, we still see some people trying to do authentication with MD5 in databases or SHA-1. If you're doing that, then I would highly recommend you, well, don't change it now, because if you get up and go, we'll know. But... As soon as possible you need to get over hashes like that. They're pretty trivial to crack and even if they're salted they're still pretty trivial to crack these days as well. So MD5 basically belongs in the history books or maybe for the odd check if a file that you've got matches a different file that you've got. That's about as far as I would use it these days when it comes to any

form of authentication. you want mp5 out the window fast and you want char1 out the window fast as well okay uh leaks so leaks are creating awareness for uh better password security um basically every time they go in the news people see these this terrible leak has happened and the passwords that came out of it were horrible also related to our previous slide because often those leaks are easy to crack And just one point I'd like to make here too, also, is that there's no point moving to a stronger algorithm. You can go all the way up to bcrypt if you want to. And then your security and the rest of your system and application infrastructure is so bad that it's pretty easy to get

hold of the hashes and start working on them. So you need to look at both sides. Because I've seen, for example, on a pen test, you can get hold of hashes pretty quickly in an application. So yes, the box was ticked to say we're not using MD5 in SHA-1 or anything that's insecure like that or fast to crack like that. But the downside is that they weren't kept securely. The application had no way of protecting them. Everything else that should have holistically protected those hashes was not there. Back to leaks. There can be query to prevent password reuse. I'm sure many of you have seen those lists going out there that you can query. There's

APIs. You can download the SHA-1s and other forms of the hashes. And what this is doing is that it's allowing websites and applications to actually say to the user, well, the password that you've entered, please don't choose this one. It's been leaked somewhere. Now, that's very good, but it's not very effective if, you haven't switched to a stronger hashing algorithm and it's not possible to attack the application and get the hashes anyway because otherwise you're just telling users to pick a new password when the rest of the system is insecure as well. So leaks, best friend or worst foe. So leaks give us good word lists for cracking, right? Because they're real passwords. They're not

ones we've guessed. Maybe someone's chosen it, maybe they didn't choose it. It's real world passwords that people have used. We can get statistics from those. We can look up and do all sorts of fancy analysis and see, well, for this particular site, these are what the users are doing. When it comes to HR sites, this is what they're doing. When it comes to social sites or the dating sites, this is how users are choosing their passwords, common words used here and there. And then we can also block or ban these leaked passwords, as I mentioned, using the various APIs. So what is the downside? Well, the downside is that many of our word lists, when we're cracking passwords, now become negated. Because if you've got a few billion passwords

that users have chosen, chances are all those nice word lists you've got worth 10 or 15 gigs, at least 80% or more are going to be in that list. So it's kind of bringing down what you used to have to work with when it comes to cracking passwords in your word lists. the user experience in entry so i don't know if any of you have used any sites that are using these uh these ban lists to block certain passwords that you can use okay it can be a bit frustrating because you're typing and it's saying no you're trying something else and it's saying no okay those leaks are pretty big so the chances of getting

or choosing a password that it's gonna that's gonna be in that list is is pretty good So what you have is two things are happening. Either your usability of the site is affected because users are like, well, I just don't know what to choose. Keeping in mind that users don't sit all day planning what's the strongest password that they're going to pick. So it becomes frustrating for them. And then the other problem that happens with that is is that some sites have a limit. So by the fifth or sixth time that it doesn't accept something, it'll give in and say, OK, because we don't want to frustrate you anymore, so we'll accept what you've got,

but you need to work on it or change it at some other time. So now what's happened is a weaker password has been allowed into the system again. It also does not re-educate the user. So telling a user that password 123 should not be used because it's in a leak doesn't provide them with any kind of education of how to choose a stronger password. It just tells them that, well, you can't use this one, try again. So they try password 456 and they get the same thing. They try password 789. By the time they're up to 10,000 or so, they still haven't been taught anything about how to construct a secure password. Yes, these leaks

have pros that they're making them useful on both sides for those cracking passwords, for those who need to have secure passwords on their sites. At the same time, though, there's certain things that need to be addressed. Simply querying those APIs and blocking users is not sufficient to get this thing to work. It's going to become frustrating. Business is going to ultimately force you to put some kind of a compromise in there. And then of course, the fact is that, as I'm sure some of you know the case of those SHA-1 lists that are out there, I think 99, was it 99.5% are cracked? I think more by now. 99 point something high. So already people

wanting to crack passwords will have those in their dictionaries. They'll be able to work with them. We'll touch on that a bit later on. So those are not going to be affecting them. They're not going to be trying them as false positives when they're looking to create a word list to attack hashes. And then finally, the user education is a problem as well. So to make password audits effective, then your word lists need to be efficient. And I think that's a given. But even if you've got the processing power, you don't want to just throw word lists that are wasting time. So if you know that a particular site or system is already using or

querying a certain number of leaks to check if the password's in there, get them out of the word list that you're using because otherwise you're just sending stuff that you know is not going to be in there. One way to do that is the RLI tool from Hashcat Utilities. Very simple. You give it a word list and then you give it a second word list. it'll then give you the output of the difference between the two, right? So what is not in the one, what in your word list was not in the other one. In this case, what in my word list wasn't in the parent passwords. What you may want to do there is

remove special characters, drop the case down because you're going to use rules ultimately to attack the passwords later on. So there's no point trying to compare those at this point in time. What you want to do is get a nice streamlined word list that hasn't got the leaks that you know that particular target was excluding from allowing users to use when you want to audit it. Understand how users will bypass these pwned password checks and then also try and make your password attacks using your rules and your hybrid attacks, and that's what we'll touch on just now. Get them to try and follow these types of patterns that the users are using. For example, in

this example, June 2018, July 2018, and June 2019, the user may have tried to set those as a password in the site saying, no, it's in a leak, you can't use it. No, it's been leaked, you can't use it. So what is he going to do? He's going to try June 2019, 2019, and maybe that one is not in the list. I'm not saying this is not an example, but if it's not in the list now, the site will accept it, but is that any more secure than the other two? There's very simple rules in both John and Hashcat when cracking that would give you the output of June 2019, 2019 having just the one

password candidate of June 2019 there. Think about how users are going to think and how they're going to work around this because ultimately that's what users are pretty good at. The ones who stick around long enough, I remember I was saying earlier some are just going to be frustrated and go somewhere else, but those who stick it out will find a way to find a password that isn't in that leaked list and then they will continue. It's not necessarily meaning it's going to be a secure password. It just means it may not be in a word list that is floating around on the internet somewhere or that has been leaked. Okay, common password attacks. So we're not going to go into detail on the very common

stuff. So I assume most of you know that. Using the word list, very easy. You run your particular tool of choice and you give it a dictionary list. The brute force, well, that's not a dictionary, but trying different characters, different sets of characters. Built-in attack types, like the hybrid attacks, the combination attacks, okay? These are basically using word lists and rules or word lists and brute force or combining word lists together. These have been around for very long. They still work very well. We're not going to go into detail on them. And then rules and rules files. So these things are still out there. They're still being used very much. But let's have a look at some of the other ones that aren't as simple as

these ones. And some of them have been around, as I said, for a while. Some of them are quite good. We're going to look at Markov, Tmesis and Prince. So the layout I'm going to use, the blue box tells you what it does. The green box tells you what an example of what the output would be. And then the yellow box is basically just the commands that you can be using. Now, the Markov attack... a few years ago, you used to need quite a few tools to get it to work, or certain tools to plug into your, or output to your hash cracking for it to work. These days it's built into most of the

hash crackers, so for example, Hashcat automatically does a Markov when you're trying to brute force. So what is Markov? It's a statistical prediction or selection of the next character based on the previous character and/or including the position of that character. So for those of you who haven't heard of Markov before, he's a mathematician, came up with these very interesting algorithms. And Markov chains are basically tables that try to predict with some kind of certainty, sometimes very good certainty, what is going to be next in the chain. So I'll give you an example. If I use the sentence, Humpty Dumpty sat on a... Your brain knows, because it's seen it before, that statistically the next word is 'wall'. Your brain is

not going to just pull out 'car' or 'bus' or something. Because it's been trained, you've seen that sentence many times in your life and you know what comes after it. Markov works in a similar way. We can take certain words, sentences, phrases from books, from publications, from anything really, and we can work out that if I have one or two words, what comes next statistically? If I have one word, what's after it? If I have two words, what's after that? And you can group them. It can become quite complex as it goes on. But the fact is that it gives you a pretty strong prediction that you can use when cracking passwords so that you're

not trying things that would have a low chance of working. So if you look at the green box there, if you had the characters H-E-L, in the English dictionary, there's a 90% chance that there's an L, 80% chance of an O, I left out the P, but that would have been in there as well. Where there's only a 5% chance that the next character would have been a W. So if you were trying to crack a password and using these as candidates to see if it matches the hash that you're trying to crack, W would be the very last thing you would be trying, H-E-L-W. It's not saying you won't try it, but statistically you

want that to be at the bottom to increase your chances of cracking a password closer to the top. Tmesis. So Tmesis takes a word list and provides insertion rules to insert the word into preset positions of each word. So Tmesis, what it basically does is it will take a word list. It will create rules to insert. If you look on the right there in the output box, I've given it the word hello. Now what Tmesis outputs is that it's outputting. If you don't know what that is, you're looking at its output. It's the rules for Hashcat, right? So when Hashcat sees this, it knows to apply certain rules. So here it's telling it that in position 0 of whatever input word I'm getting, put an

H. In position 1, put an E. In position L, and so on and so forth. When it's done that, it moves on the next one. So now in position 1, put the H. In position 2, put the E. In position 3, put the L, and so on and so forth. So it'll carry on doing that all the way down, inserting this particular word into every character that it can find in the target word. Right. And then you will be left with a bunch of rules that you can use as output, and you can feed it to Hashcat as a rule file on a dictionary. It can work very well. It can insert things that you may not have thought about. Obviously, it can also get pretty big, depending what

you're feeding it and how big the rules are that come out. It's up to you to see what your appetite is. If you've got quite a few GPUs and you want to keep them quite quite busy, well, give it a good word list and give it quite a few rules. It'll be happy to go on with that. Okay, then the PRINCE attack. The PRINCE attack stands for Probability Infinite Chained Elements. So what this does is it takes one word list and it creates and chains and returns words, right, that can fit into certain positions there. So, for example, we input a word list and we're outputting, assuming a length of six, right. So if we are assuming a length of 6, what the Princeton attack

will do is it will output 3 plus 3 character words to give us a length of 6. So it builds a table of all the words, indexes them, knows what the lengths are, and then it starts outputting based on their table. So to give us a 6-length word, it will also output a 4 plus 2 character word. It will output a 5 plus 1 character word, and so on and so forth. It will keep doing this. So that gives you a good way to generate quite a nice word list that you can use for that. So the command line there, also pretty simple. Once you've compiled the Prince binary, it's PP64, you give it a

word list, or you can give it constraints about where it should start and stop generating these passwords for you. So if we go to the next slide looking at Prince. So there's an example, right? I've got a words.txt file, and I just put three words in there. Hello, Bob. and "aeroplane". Now I'm going to ask Prince to generate words on this. Now I'm not going to give it constraints, I'm going to let it start from the minimum, which is three characters, and work its way up. And if you look at the output, you can see what it's doing here. So for three characters, it outputted "Bob", because "Bob" is a three-character word. Then it went

to four, well, "hello" fits that. Then it put airplane. Then it started adding. Now, if you notice as it goes on, I'm not sure if you can see it in this example, but one thing you need to keep in mind is that you may land up with some duplicates here that you'll need to filter out before you go on because it's going to keep on adding, and it may switch them around, and then you're going to have these duplicate words coming in here too. So it's outputting a whole lot of possible words that you can use for that as well. Just some other tools to mention. There's the combinator attack. This is done by most

tools by default, but if you want to create combined word lists for yourself, there's the combinator binary. It's also part of the Hashcat tools. And it's very simple. You give it two files, it joins them. You give it three files, it joins all the words in those files and gives you output. So a very nice way to do that. So, you know, for example, if you've got a... If you've got a word list with a bunch of nouns in it and a word list with a bunch of verbs, you want to join them all together, you can use Combinator to do that. And then, Hoorasort is a tool I wrote, I think, in 2015 or

so, basically just to help manage the word lists that you have, to make them more efficient, to make them easier for you to work with. Sometimes we have we tend to add to our word list to such a degree without cleaning them that we're sitting with quite big word lists that are actually ineffective. And one very surefire way of a word list becoming ineffective is if it's got entries that it should be doing with rules instead of just the base word itself. So for example, if your word list contains @hello, you don't need to have the @ in your word list because there are rules and rules files that are going to tell the cracking

software to add an @ before every word. and so on and so forth. So you want to get those out to keep the word list nice and small and as lean as possible. There's various options. I've just shown two there. One of them is to take away the sentences. So if you've got a word list with a bunch of sentences and you don't want them in there, you can say no sentence, which basically compacts everything into one line without one string, without any spaces. The Wordify option will do the opposite. It will basically break each word in the sentence up into a new word on a new line. There are some various other options you

can look at using there as well. It is a Python script, so feel free to change it or do whatever you want to it to make it work for what you need. It's not the only tool that can do that out there, so if you want to look on Google as well for something else that fits your needs, then you should do that. The key here is to make sure that your word lists are very well maintained and very efficient when you're cracking passwords. Otherwise, you're wasting resources with things that it doesn't need to be doing. Okay, so having a look at the next example is attacking phrases in other languages. So looking at phrases,

there's various ways that we can create phrases. There's some simple ways, for example, just taking books, taking information from sites, grabbing the text from them, and basically cutting out sentences from there. That's one way you can do it as well. Some other ways are using the combinator tool, so basically combining certain elements. So for example, what you can do is have a... a pronoun word list and then have a noun word list, right? And then you combine the two with combinators so that you start making some kind of sentence. Or have a bunch of source words like I like, I hate, I do, I won't, etc. Combine those with a whole bunch of other standard words like I won't stand, I won't sit, I don't like

this, etc., etc. to get those phrases out. TMS is the tool we discussed, also very good for creating various kinds of phrases as well. Remember what you can do in your word list too when you're combining them or using any of the tools is add a space either on the left or the right side of what you're combining if you want a sentence with a space in it instead of just everything on one line. Generally, users will either just enter one phrase in one go, some of them may enter a space, but certain sites won't allow that, in which case you don't need to worry about doing that. Then you can also use the combinator

attack with the space as well to create sentences there. And then the Markov attack is also very handy for doing that as well. How to use the Markov attack for creating sentences? I'll leave it up to you to decide. look at what tools you want to use or if you want to develop your your own what we have here are the engrams for the from the corpus of contemporary american english okay so top noun plus noun mark of change so basically what this tells us is that if you look at the word health right that's a very high frequency that the next word is going to be care after the word health right looking at

law very high frequency that the next word will be enforcement. Now, if you go to the website and you scroll down, you will find the word law quite a few times, right? But what's going to change is the frequency or in the chain, what statistically would be the next word after the word law. So you can then use this as you want to customize your own tools to basically output sentences from these Markov chains. Then attacking other languages. So often you may find yourself faced with certain hashes that need to be cracked that aren't in the English language. Now, if it's the English character set or the English alphabet, that's not too difficult. But you might find that you're getting ones that are

in Arabic, for example, maybe in emoji, maybe in Chinese, et cetera, and so on. So you want to be able to try and attack these as well. Now, obviously, first prize is to look for word lists that you can use along with the rules. But what you can do is you can also get Hashcat to brute force certain of these characters for you if you want to as well. And to do that, we use an option in Hashcat that is called --hex-charaset. So that's basically telling Hashcat that instead of giving it just characters, I'm giving them in hex. And the reason we do that is because because in utf-8 certain characters or the non-standard characters

these are represented in in hexadecimal values as the unicode so if you click on that link there at that at the table it's a very nice website that that that person has set up there you can basically look up all the characters that that you want it tells you what the the values are for for those so the example if we look at generating or brute forcing arabic output We can say, okay, they have an example of hashcat-m0, which is to attack md5, my hashes. And then what we're doing is, remember, we need the two characters for the hex. So we're going to say that character 1 or dash 1 is either d8, d9, da, or db. Now, remember, we've told hashcat that it's, It's hexadecimal, so it's looking

at that in pairs. It's not looking at it in characters. Remember to always add that switch, because if you don't, Hashcat's going to assume that you mean a D or an 8 or a D or a 9, which you don't want it to do in this case. So that's the base. Then dash 2 is telling us to add the next or the following hexadecimal characters. And these are the ones that will basically change or that it'll brute force through to give us the various outputs. And then we have the mask at the end, which is combining the 1 and 2, 1 and 2, 1 and 2 to output these. The same thing with emoji. With

emoji, just one difference is that it's actually four of these hex characters that you need to put. For example, if we look at-- just switch out here. What side is this on? This side. It might be a bit small. Let's see. Right. So if we just put it in a script to make it easier to run. So here, what I've done is-- let me first show you that file. OK. I don't know if you can read it. It's a bit small on the top there. Is it better now? A bit better? I don't want to go too big. It's going to start cutting it off. You can ignore the parts in the beginning. What's interesting now, what you want to focus on is that

I've got those four preset characters I've given to Hashcat. F0, 9F, and 3.8, right? Which tell us which are the hex for the emoji. Dash 4 is the one that's actually going to change. Now, I didn't include everything in there. You can get that from the website if you want to include it all. And I've told it to generate a-- that it must view those as a hexadecimal character set. And then I've told it to output there. So if you look, I'm going to output two emojis at a time, which is basically two sets of four. So if we run that, we get the output that's shown there. How many did I have? Did I

have more than four? Yeah. Sorry. I had three. So then you get-- that i put there so that's one example of what you can do with that if you look at that at that table you can basically output all sorts of characters all sorts of uh of languages greek chinese arabic as as i mentioned you can go on from there just keep in mind that uh this is more last resort after a word list or use it with a hybrid attack and add certain characters to word list okay because brute force you generally want to be at the very end of uh i ran out of all ideas and now i'm i'm going to try

and brute force something

Yeah, so that's how we can do that. And then also just keep in mind that when you're outputting this, just check that your hexadecimal values are correct. Because if you make a mistake, you're not going to pick it up, because you won't be seeing what's coming out and what is not cracking. So my recommendation would be try and put into a file, script it, do something like that with it, so that you've always got the same amount of data there. Okay, the next one is the new WPA2 attack that you may have heard about. And it's very interesting. It's very helpful. It's very nice. Courtesy of Atom at Hashcat and also 0B for that. So what we have here is

a new way to attack WPA2, which doesn't require us to have visibility of the client's handshake to the access point. So I'm sure many of you who have tried it, it's always a frustrating thing because you need to be in a range of both. And if you're not, you'll end up having to walk around and there's antennas facing all sorts of places. And in the end, you start looking quite conspicuous about what you're doing. With this new attack, you only need to have visibility of the access point. And how it works is that it looks at a certain point Packet in the EAPL frame called RNSIE or robust security network information element. it's uh uses looks at the pairwise master

key right and in there what was found was that it's a it's an hmax one hash using the the pairwise master key as the key and then the following data that it uh it does the run that hash on which is the pmk name the ap's mac address and the station's mac mac address with all of that it then creates these uh I'm just going to go back, sorry. And then creates these frames. Now from these frames, we can get everything we need to be able to attack the network without having to actually go and get any clients or get any handshakes from that. So Atom very kindly put a new attack mode in. And

ZsiraB very kindly created or updated his tool, HCX dump tool. And it works very well. I'll try and run it, but... I was trying to do it just now, but with all the Karma devices running around, the attacks are not looking very good because it is messing up everything. So the screenshots are sufficient. So what's happening here is we basically run it. We give it the output of a PCAP file. You give it an adapter, okay, and then you give it a enable status one is basically just to show us when it has found a PMK ID. That controls what kind of output comes out of there. So when I ran this, it found certain

handshakes, and in some of them you'll notice that it found the PMK ID, and that's the key of what we're looking for in order to be able to crack that WPA2. So the next one, the HCX PCAP tool. What that does is it reads the PCAP file and it dumps these PMK IDs for us in a format that Hashcat can use. So if you look at the bottom there, three of them were found. Just coming back to this one, it may take a while. Also, the tool does have options to filter on certain MAC addresses, so you're not just targeting everything in sight. If you don't want to do that, you can target specific MAC

addresses, specific APs, and specific channel numbers as well. It'll then dump these PMK IDs that are written down. I've used a very descriptive file name of Bla. And you simply then have a look inside there. And what you have in Bla is this hash that's comprised of certain elements. You'll notice that the AP's MAC address is in there as well as the station. And then also a HECS representation of the ESS ID. And the attack is pretty simple after that. It's like you would have done with your WPA2 hashes in the past, except the new mode is 16800 that you'll give Hashcat. You'll feed it the file that you just generated from that tool, and then the W3 is just to control

the workload, and then your word list or whatever attack you're going to do. And what's going to happen then, it's going to output. Now, I've got it running on these. I don't think it finished yet, but I'll show you. What it's doing. Thanks. If you have a look at this one, it's still busy running, the ones that I was trying to attack. With certain access points, it may not work. So the one that I had that I was testing with, it didn't get it. It managed to get the VMK IDs of other access points. So you just need to give it some time, play around with it until you get the output that you're looking for. But in this case,

I've given Hashcat that file, and it's busy trying to crack those passwords over there. If I do a status printout, you'll see it changing and going on. So that's mode 16800. and painfully easy to do. You just need to run the tool, have a Wi-Fi adapter, of course, that's in monitor mode, get the PMK IDs when they're dumped, convert them into a format Hashcat can use, run Hashcat with a particular word list or whatever your attack is going to be, and then it will give you the passwords once they are cracked. So it doesn't, just keep in mind that if you're new to WPA2 cracking, it doesn't It's not magically going to just spit out

passwords. You still need to have word lists that are good. You still need to have attacks that are good at cracking these hashes. What it does that's very good now is that you don't need to be inside of the client capturing that handshake. You don't need to de-auth clients and try and get those handshakes to come in. You just need to have your machine with your Wi-Fi adapter and visibility of the access points that you want to attack.

Cracking the heavy-duty algorithms. So just a brief discussion on how we can approach these algorithms that are very strong, that are very slow, especially salted ones, especially ones that are, you're just finding that you're not getting anywhere with them. Optimize wordless are a must. I think anybody you speak to who cracks passwords will tell you that if your wordless are not good, you're going to go around in circles for a very long time. So you need to have word lists which are very precise, very scaled down. Remember, take away special characters because your rules are going to do that for you. You don't need them in there. You're just going to keep GPUs idling if

you don't give them any work to do which you want to do with the rules. Keep brute forcing for the end, especially with slow algorithms. With some of them, you're just wasting your time even trying to do a brute force because it's actually not going to go anywhere. profile the target as well so you know if you're assuming that you're doing this as as an audit you know think about what the person is doing are they filtering out leaked passwords like we discussed earlier are they uh what type of passwords would the users use is there a company name that i can that i can use in a pattern or that i can feed into

t-message to start adding into my dictionary for example Why would they pick certain pass passwords and what enforcement does the company have on character size or the website have on character size? The the password length how short can it be how long can it be and so forth check if the GPU or CPU is being optimized right so if you notice that John or hashcat is showing you that not much is happening Okay, you've got six GPUs and ones doing the work that isn't optimized so you're doing something wrong you need to change that and especially on these slow algorithms because you want to give as much power you want to have everything working as

hard as it can so that you you get good use out of it and then you use rules as well to create a workload right so that brings me to the end of my part for today i hope it answered some questions i do some interesting new attacks for the wpa that you can try uh i just want to quickly um show you where you can get those tools from The internet's working, it looks like all the Karma devices have been turned off for a few seconds. So here's Zerbe's website with HCX dump tool. So you'll need this one to do the WPA attack. You're going to also need to get HCX tools, which is right next to it, and compile these, of

course, as well. Then you will need the latest version of Hashcat, if I can type it correctly. It's very simple. just go to the site there and I'd recommend perhaps fetching it from GitHub and compiling it. And then lastly, if you need the steps to be reiterated and quite a good explanation of it as well, I just want to bring that up for you. Okay, so there's... Atom's excellent article. He's got links to the tools as well that you'll need to do that. So you can use that as well. And he also gives a very good description of what it's doing along with the actual packet capture and then also the steps that you need to do that. So

how to defend against this is as it was with WPA with the other attack actually as well, was to have Wi-Fi passwords that are secure, that are long, that are complex, that are difficult. to crack or switch to a different authentication mechanism like a managed authentication instead of a pre-shared key on the clients. But if you can't do that, you need to make sure that it's quite a secure password. Just also note that you need certain adapters for this to work. It's not just going to work on like a Max Wi-Fi adapter or most Wi-Fi adapters out there. So I think on Zerobie's website there, he did show which adapters it's known to work with.

Or if you do have one that you have been using to capture handshakes, try it anyway on that. For example, the TP-Link I've got was not listed as being supported, but it does work on there too. So I'm sure he will just keep on adding these supported adapters as time goes on.

Okay, that's the end of the show. So if you have any questions afterwards, you're welcome to come up here. I don't want to keep everybody from those who need to go to lunch, so you're welcome to come up to the front and I'll answer any questions that you may have. Thank you. Thank you. ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

Good afternoon. Welcome to B-Sides Las Vegas, Abusing Password Reuse at Scale, B-Crypt and Beyond. A few announcements before we begin. We'd like to thank our sponsors, especially our Inner Circle, sponsor Rapid7, our stellar sponsors Amazon, Oath, Simul, and... Geez, all of them really. It's just amazing what they put on. It's their support along with other sponsors, donors, and volunteers that make this event possible. On cell phones, these talks are being streamed live on YouTube, except in underground, and it's a courtesy of our speakers and our audience that we ask that you check in to make sure your cell phones are set to silent. If you have a question, raise your hand. I'll bring the microphone over to you so that YouTube can hear you. And

with that, let's get started, and please welcome Sam.

Alright, everyone can hear me? All good? Alright, so my talk is Abusing Password Reuse at Scale, Bcrypt and Beyond. You probably don't know me as Sam if you know me at all. You probably know me as Chicken Man, but we'll get to that. So, in this talk, I'm going to be assuming you have some password cracking knowledge. I'm assuming you have cracked a hash at some point or thought about it or done any amount of research that I can say things and not have to explain every single word I say. But I will cover some simple stuff. I'll gloss kind of over it a little bit fast, but I'll go through it. We'll be talking

about mega breaches, the news' favorite new term, and database dumps and stolen hashes, things like that. We'll be talking about the concept of password herd immunity in salted hash lists and then efficiently reducing that herd immunity, that would be the attack. I'll be doing a demo of the attack itself and then talking about what can be done in the future to improve the attack and then questions. So first, a little bit about me. Again, I am Chicken Man to almost anyone who knows me. I currently work at TerraHash, formerly known as SagittaHPC. You probably know them for their Brutalis, their cracking machines. I'm a password enthusiast, I'm a hash cracker, I'm part of Team Hashcat. It basically consumes my life. And I'm a part-time

chicken enthusiast, so chickens. So yeah, so let's go over some of the basics real quick. What is a hash? You should know this by now, but just in case you don't, password hashes are the way of storing passwords in a database. You don't have plain text passwords. Hash functions are one-way deterministic trapdoor functions. For every input, you have a single output, and with that output, you shouldn't be able to know the input backwards. You shouldn't be able to know it without having it first. Never ever store passwords in plain text if you can help it, please. Always hash your passwords. This thing is annoying. Okay, so how do we crack passwords? Because password hashes are one way, you can't go

backward, you can't decrypt like encryption. So the best way to figure out what the password was is to generate tons and tons and tons and tons of hashes until you get the same hash. In which case you either have a collision or you have the correct input password. The tools we use to do this, the free tools are Hashcat, John the Ripper, MDX Find, Hash Manager, Inside Pro. There's a bunch of free tools and then there's commercial tools like Hashstack. I believe Password has tools, Access Data, several other companies have commercial software for this. The hardware we typically use is mostly GPUs, although some algorithms, as you'll see in a moment, are better on CPU. There are FPGAs and ASICs and other Fancy specialized hardware that can

generate hashes very quickly, but those are typically pretty rare. You've probably heard of Bitcoin ASICs. They do SHA-256 times two, I believe it's two rounds of SHA-256. And they do it very quickly, but they're not useful for breaking SHA-256 hashes. They can only do that one type of math. The attack methodologies that you should probably be familiar with are would be brute force or mask attacks, where you start with, you know, AAAA and then AAAB and so on and so forth. Those are usually very inefficient. The key space is very large. They're not very good for complicated passwords or long passwords, things where there's lots of symbols, things like that do not fall easily to brute force. Dictionary and rules is probably the most used attack, I would

say, past brute force. With a dictionary, you take known passwords, English words, things like that, and you run those against the list, and then with rules, because you're unlikely to have every modification of every password in your list, rules allow you to modify each candidate by adding a number, replacing "a" with an "@" symbol, "s" with fives, doing very basic stuff like that to try and mimic human behavior and how people create stronger passwords. Stronger these days tends to be longer, not necessarily more complex, because most of those patterns are very predictable. People add ones and zeros to the ends of passwords all the time. And there are other attack methods: Combinator, Hybrid, Prince. I'm sure there's more that I'm missing, but most of those tend to be kind

of on the edge. I know a lot of people spend a lot of time on them. I tend not to, so I'm kind of just gonna gloss over those. So, how do we make hashes stronger against this cracking? The first methodology would be salting, where you take a random, usually unique to each hash string, and you append it, or you prepend it, or you add it somehow to the input plaintext, so that two identical plaintexts generate two unique hashes. This solves the problem of rainbow tables, but this does not solve on-the-fly cracking of just a few hashes. It does create a higher workload, as we'll see in the talk, but it's not really suggested anymore. Everything should be salted, but

that's not all you need. We have iterations. If you take a hash and you hash it again a couple times, You can create, you know, a thousand MD5 hashes layered over each other is a thousand times harder to crack in theory. Although that being said, if it's a fast hash, a thousand may not be, you know, very much. You could be up in the trillions of hashes per second on your system and a thousand is not going to be noticeable. So iterations are good. They're just, again, not really what you need to be focused on anymore. These days, the suggested hashing algorithms would be slow hashes. bcrypt, scrypt, argon2ind, I believe. The slow hashes are not just high iteration and not just salted,

but typically they are memory hard, which means because password cracking is a parallel operation, if you have an algorithm that requires a certainly large amount of memory, you can't start that many threads. Your device will fall to local mem size and you'll be limited, even if you're not consuming the whole processor, to how many things you can do at once. And so that makes it much harder to run these algorithms. And plus, they are very high iteration. bcrypt, the iteration, the cost value is actually two to that number of rounds. So if you have eight cost bcrypt, as you'll see later, that's two to the eight number of rounds. A lot of rounds. So it

makes it a lot harder to run these hashes, and you'll see that they're much slower. All right, so let's talk about megabreaches and database dumps. The news went crazy about megabreaches, sort of recently, but they've been around for a while. A lot of the popular ones were dumped in 2011, 2012, 2013, and only came to light really in the past two or three years. So they've been out there for a while, if you've known where to look. MySpace was a very popular one. It was one of the largest for a long time. It's still one of the larger ones. There was 360 million accounts that got dumped. And that was accounts with passwords. Now, they

hashed their passwords with SHA-1, But they made some mistakes. The passwords that they were hashing were truncated and forced to lowercase. So we were able to complete the keyspace on the 10 character passwords in, you know, a couple weeks. No big deal. There were other passwords in there that were full passwords. Some account lines had a truncated hash and a second hash that had the full password as well, which you could just take the first hash and take that password and basically truncate off the first 10 characters of the second hash. LinkedIn was a very popular one, got very famous because small portions of it came out right as it got dumped and then later on the whole thing came out. LinkedIn was 164 million accounts

but not 164 million password hashes. LinkedIn had closer to 62 to 63 million password hashes because lots of people like to log into their LinkedIn account with Facebook and Google and other accounts that use OAuth, I believe. So they didn't have passwords to LinkedIn. Dropbox is actually... So we're going to be focusing on LinkedIn and Dropbox. Dropbox was interesting because it was dumped around the same time as these other databases. But they had already upgraded to bcrypt. They were already using 8-cost bcrypt. It's a plenty good enough hash for the time. It's still pretty decently strong. About half the database of the 68 million was in bcrypt. The other half was a 40 character hex hash of some kind. It's believed to be SHA-1, but if it is,

it's either got a pepper or an unknown salt that makes them impossible to crack right now. No one's ever successfully cracked any of the SHA-1 formatted hashes. The bcrypt, however, are possible to crack. There's about 32 million of them. They're a lot harder than cracking these other databases, as you'll see. And so that's sort of where our attack kind of plays in. According to Have I Been Pwned, Troy Hunt's website, there are 300 pwned websites listed that total up to about 5.3 billion accounts. Now clearly there are more than 300 websites getting hacked. So 5.3 billion is probably pretty conservative. We're probably looking at, you know, closer to 10 billion accounts floating around in dumps and underground forums and

anywhere else they would show up. So we can make use of that data. LinkedIn, again, 164 million accounts, 61.8 million SHA-1 password hashes. And according to hashes.org, who hosts some of these password lists, just the hashes usually, there were 60.5 million of them cracked. So that's a 97.92 recovery rate for LinkedIn passwords. Now SHA-1 is a very fast hash. As you'll see with a benchmark, the GTX 1080 and NVIDIA graphics card will do about 8.5 gigahash or 8500 mega hash per second. That is 8.5 billion hashing operations every second. So if your password is, you know, a little stronger than most people, it's still probably in reach of someone with enough dedicated hardware and enough know-how to run a solid attack. Dropbox, on the other

hand, has been around just about as long. It had 31.8 million bcrypt hashes. Currently only 6 million have been cracked, according to hashes.org. and that gives us an 18.98 recovery rate, much worse than the LinkedIn recovery rate. And that's because bcrypt is so much slower and so much harder to run. This was around the same time. I bet, you know, the passwords are roughly similar in strength and format, but running them and actually doing the cracking is just so intensive that no one has the power or the time to make it feasible. As you'll see there, The GTX 1080, same card as before, is only doing about 2,000 hashes per second. So we went from 8.5 billion hashes

per second to about 2,000. That's 4.5 million times slower, which makes it 4.5 million times harder to crack this list. And they're salted, so it gets even harder, as you'll see here, with the concept of password herd immunity. When loading and cracking a large password list like LinkedIn SHA-1, if it is an unsalted list in theory, to do the math, you still have one salt. It's null. It doesn't exist, but you have one salt that you're generating against. For each new salt in the list, you have to do all of the hashing again and again and again as you add more salts. So the workload for attacking LinkedIn SHA-1 would be for a basic dictionary attack, it would just be the number of lines

in the dictionary times one. You only have to hash it against that one salt and then you can just, the rest of it's comparing to all of the hashes in your list regardless of how many. With uniquely salted lists like large bcrypt lists or I believe vbulletin forums actually has a very unique salt that's 30 some odd characters and random. For each individual unique salt, you must do all of that hashing again for every candidate in your dictionary. For bcrypt's 32 million salts, or Dropbox's 32 million salts, that same very basic dictionary attack just got 32 million times larger in its amount of work. So if we have the very basic Roku 10,000 password dictionary, I have 320 billion

hashing operations to do for the same attack that would have taken 10,000 operations on LinkedIn. And that's on the slow hash. So we have 320 billion to do at 2,000 a second at best. So that's far from feasible. It would take years. It's just not the same. Whereas with LinkedIn shot one, it would take a fraction of a second. The cost of attacking the whole list together because of these salts is what makes salting good. It's what makes it useful and makes the actual password hashes stronger. But it really only stands up in that list. And that is where the herd immunity concept sort of comes from. If you have a list with lots and lots of accounts, and each one has a unique salt, the

more accounts you add, the more work it is for an attacker to crack the whole list together. Single accounts still have one salt, so if you're targeting, that's not a big deal. But if you are an opportune attacker, and you are trying to crack as many as possible, as fast as possible, the more salts will make it much harder. If you're a targeted attacker and you want to attack one person at a time, then you probably know a good bit about your target. You probably have one or two hashes to attack, depending on how many accounts they have, and that really plays out to one or two salts. You probably have quite a bit of time and money to invest attacking this person. Whereas an opportune attacker, someone who...

steals large dumps of accounts and then uses them to just scrape as many as they possibly can out of it and then, I don't know, crack Netflix accounts and sell them online. Someone who's doing this maliciously but isn't really dedicated on attacking any single person. These attackers are typically looking for the low-hanging fruit, the easy passwords, passwords that are heavily reused, you know, 1, 2, 3, 4, 5, 6. They don't care who it is, they just care as many accounts as they can possibly get. They're not going to spend much time on a strong account or a strong password hash because it's just not profitable for them. They only want as many as possible, as

fast as possible, any that they miss, they just throw them away. Because a large salted list is so much harder for an opportune attacker, they tend not to go for them. They tend to be, the Dropbox list has seen very little attacking. No one's really been running it because it's just so difficult. Those six million cracks, I believe most of them actually came from a cluster of FPGAs owned by Tycho Tithonus. He's a friend of mine. He cracked most of those over the course of, I believe, a month. It took a very long time. And that was FPGAs that were specialized for bcrypt. So he was moving pretty quickly. With large lists of salted caches like this, the time spent on them is just

not feasible. You can't crack them classically. So the way to solve that is to remove as many passwords as you possibly can that are easy from the list to reduce that herd immunity. To try and remove as many of the salts as you possibly can. Because the more salts you remove, the more you can do to the remaining hashes. They get weaker the less there is in the list. So that's where the attack comes in. I'm calling it offline credential stuffing. Because it's very akin to the idea of just, you know, correlate a password with a username, shove it into, I don't know, the Netflix login page. If it works, great, sell the account online for some Bitcoin. The classic credential stuffing malicious activity that people do. The way

this works is you have source data. Databases with fast hashes, usually things like LinkedIn, where you've cracked 97% of it. You have plenty of accounts that are username or password or email and otherwise on the line, and then you have the actual plain text password. You have target data. Typically, this is databases that no one else is attacking. This could be something private to you. This could be something that you're doing for an audit. It could be anything. But these are going to be slow, uniquely salted lists, just like the Dropbox bcrypt. The larger the list, the better this actually works, surprisingly. And what you do is you take data from the already cracked lists and you correlate it with the uncracked list, the target data. When you do

that, you take a plain text password and a slow password hash. You stick them together and then you run just that candidate on just that hash. the reduction of workload from doing just one candidate per just one hash would bring, say you had a perfect scenario, you had 10,000 lines in your source data set and 10,000 lines that you were targeting. If you just took that source data set as a dictionary, threw it in Hashcat and said, target these 10,000 hashes, your total amount of work would be about 100,000, sorry, 100 million hashes. That's a lot of work, especially because each one's uniquely salted, so you have to do that 10,000 lines against all of the 10,000 targets. If you correlate data between two data sets,

a source and a target, you bring that down to maybe two or three candidates per hash at max. But in a perfect world where 10,000 lines equals 10,000 lines, you can bring that down to one candidate per hash. And that's per salt as well. So your total workload goes down to about 10,000 from 100 million. It brings it down incredibly. And it does so very quickly. So, and as you crack these hashes through this correlation, the ones that you miss, you can now spend more time on because there's not a whole lot in the way. They're not getting blocked up by other hashes with other salts that are causing more work. They're easier. It makes

the other hashes in the list weaker. So what I did is I took LinkedIn. I took all of the cracks that I had. It actually came out to 112 million cracked lines. I believe what happened was the version of the data I got had been mixed with something else. Not sure what. And then I took the Dropbox bcrypt. There are 32 million, but they were split into two files and one of the files is a little messed up, so I skipped those. So I ended up with 24.8 million email and bcrypt hash. And I correlated the emails in each list. And I found that in LinkedIn and in Dropbox, there are 3.5 million perfect email

matches. So 3.5 million accounts exist across both LinkedIn and Dropbox. I then took all of the accounts that had cracked passwords in LinkedIn. That came out to 3.2 million. I took those accounts, took those passwords, and put those passwords next to my bcrypt hashes as my targets. And as you'll see, I can crack lots and lots of bcrypt hashes on my laptop CPU. a lot faster than you typically could be able to on GPUs, on large clusters, etc. And it'll speed up in a moment here as it catches up on its own workload, and it'll go. It'll go much more quickly than I expected when I first did this. When I first did this, I thought I broke it. I thought it was just

spitting it back at me. But yeah, so what this is doing is each one of these hashes is only tested against that candidate. It's only doing one operation per salt. And those candidates tend to be right because people reuse passwords like crazy. The number of people reusing their own passwords or variations thereof is surprising. And so we took a list that had 32 million, we found 3 million matching, 3.2 million that had cracked passwords that were already known, and we threw them in here and it will very quickly, I actually know how many it will find, it will find about 2.2 million to be correct. So in minutes, not hours, days, years, on a CPU and a laptop, you can crack 2.2 million bcrypt hashes without really

doing anything. And then you have 30 million left, not 32 million hashes. So you have effectively reduced the amount of work you have to do on the rest of them. So if you take the rest of the list and then attack it classically, it's easier. You can crack more, and the more you crack, the easier it gets. So by reducing it in this way, you've made everybody else weaker. I mean, these people reuse their passwords, maybe you didn't, but your account is now a bigger target for me. And this is cost effective for an opportune attacker. If I want to attack a whole list and I don't really care to target anybody specific, I just

made it cost effective to actually keep attacking that list and doing a lot more to it than I could previously. And this is one database. My source database was LinkedIn. My target was Dropbox. If I had five, six, seven billion lines of source database, I could have a match on every email in a list. And then if 70% of the people are reusing their password, I've cracked 70% of that list without doing much at all. So it's, you know, it's... Not the most serious thing, it does require there to be commonalities, but it does crack a lot of passwords and it doesn't really do a whole lot of work to do it. And it reduces the, again, the herd immunity

of the rest of the list. We're gonna stop that because my laptop is quite hot. Oop, I stepped all the way back through. Right, so... Again, of 24 million lines that I was targeting, I pulled 3.5 out that had commonalities with my source data, and I cracked 2.2 of those in a couple minutes, not on the laptop, but on a beefier CPU. And that was a couple minutes of just running it straight through. This is very akin to, if you're familiar with John the Ripper, single mode. But instead of applying rules or anything else, all this does is straight run the hashes.

This can be improved surprisingly well, actually. Of course, adding more source data, if you've got billions of lines, you're suddenly Troy Hunt and you want to be a bad guy. You can very easily source more and more and more data and use it-- oop-- and use that data to correlate more lines, and then if more lines come with more cracks, reduce the security of the list even further. You can correlate across different columns. This actually works pretty well. I've already tried this some. That was just emails. Not everybody uses the same email everywhere. Sometimes they use the same username or the same phone number. So if you correlate across other columns in the data, depending on what data you have, you can find collisions between lists that you maybe

wouldn't have initially found. And then, of course, reduce the password list even more. You can stem emails. I actually thought about this and didn't really play with it much. I talked to Ryan C. the other day, and he said, hey, why don't we stem the emails? You could do that. And I guess I haven't really personally done this, but a lot of people, especially more technically inclined people will use custom emails per account. So they'll have like test plus LinkedIn on their LinkedIn account. Now Gmail or Google, they ignore everything after the plus. So that's technically the same as test to Gmail. Same with adding a bunch of periods. So in theory, those are the

same email address. So you can stem emails, correlate that way, find more collisions. You can reduce the total key space. Say you have a ton of data and you've correlated accounts like crazy and now your workload is too big again. Reduce it. You can throw out site-specific passwords. Say you go through the password list and everybody's using... Why does it do that? Say everybody's using LinkedIn as part of their password and you're trying to crack accounts on MySpace or something else. There's no reason to have LinkedIn in there. It's not going to be useful for you. They're not going to reuse LinkedIn123 on their Google account. They're not going to do that. So you can

throw out accounts that have that problem. You can throw out candidates. You can add rules. This is something I've already explored, but it's decrypted slow. It's not very easy. If you have a large list of candidates and you run through and 30% of the correlated candidates crack, it's probably just because you're only doing exact identical matches. If you have someone who has password 123, maybe they're using password 1234 or 12345. So if you add rules to the individual candidates, you can still reduce the total amount of work to 5, 10, 15 candidates per hash per salt and still reduce it massively from trying to run a normal classic attack against the list and your recovery rates will go up. I did explore this but again very

slow, didn't bring my rates up enough to be economical but it does work. So yeah there's a few ways you can make this better. Of course more source data will always work better If you have more source data, you'll always have more collisions, and you'll always be able to reduce the total list further. And then once you reduce the list, of course, you can run it classically. You can put it in Hashcat and just let it run. You can do whatever it is you want to do with it. So yeah, so that's the attack. I can actually make it faster if you'd like to see the demo again. If you give me a moment. I

forgot a flag. So that's how fast it normally runs. Those are real-time cracks on my laptop CPU for 8 Cosby Cryptashes. So yeah. That should actually complete in a couple minutes, I believe. It should be about 4 or 5 minutes. And it will have found 2.2 million of the 3.5 million hashes in the list. Whereas if you took this to Hashcat and took those passwords and put them in a dictionary, it would take you something close to 35 years. So nowhere near as quick as this. I'm still getting too hot, so I'm still going to stop that. Probably not a good idea to run this on a laptop. It works. It's just You know, so let's go

all the way back here. All right questions go who's got a question do we need Mike for the question? So you mentioned? Filtering the list based on site specific passwords, but what have you thought about saying? Okay, this LinkedIn password is linked in one two three How about I change that to Dropbox one two three? so That would be a little hard to do technically, from a technical standpoint, because... So what you're talking about there would be stemming the passwords themselves and saying, you know, this is site name modified by 123 and so on and so forth. Your rules are the second portion of that. They are the add 123. But the actual dictionary that you're pulling and adding to,

in that case, would be pretty easy. If it's just site name, that's not a big deal. But in other cases where you know, maybe their password is child name Whatever and you only know one of maybe five children. You really can't figure out what the other password candidates would be I have thought about that though. Yes. I just technically you can't go backward You can stem the rule modification off but stemming the actual source is a lot harder anybody else over there. I

So kind of along the lines of Dropbox versus Facebook or something like that, what about character substitution intelligence? Did you ever look at that or see any correlation where people would use slight variations? So what you're talking about is where if they have just a normal password and it's slightly varied between sites? Yeah, that does come up a fair amount. If you're reusing a password and we know you're reusing it, you're very likely reusing a portion of it elsewhere as well. The problem again with that is how many modifications is too many. If I start getting, you know, thousands of rules in, I haven't solved my problem, I've just recreated it. So the idea that

you want to reduce the number of candidates really means you have to assume it's either going to be identical or very close or ignore it. And you can do that again later when you run the classic attacks, but For this, it really doesn't help too much to modify and add rules and things like that. It's better to just sweep as many as you can out now and then fall back to classic attacks. Someone over here? Over here? This might be a bit of a side note, but I was just curious about the FPGAs. You mentioned that they cracked some B-Crew. Do you know what kind of performance, like hashes per second, you could get there these days? And are there commercially available ones? So I do

know how fast the FPGAs that he was running were. They are ZTEX 1.15 Y boards. They have four Spartan 6 chips on them. And I believe they do anywhere between 43 to 45 thousand attempts per second. That was across a cluster of 15 boards, I believe. So 15 times 4 chips. Now these are older FPGAs, they're not new, they're not fancy, they don't have fast memory. Newer FPGAs could be much, much faster. The major problem is cost and getting someone to write the software for them. The 1.15y boards, the old z-text boards, are supported by John the Ripper. They have bcrypt support, DES I believe, and a couple of SHA-256 unix I believe. They're just out of the box

supported by that tool. There are commercially available FPGA units or ASICs or otherwise. I think the only one I can think of right now is the Tableau accelerator. It's a forensic accelerator. It works with Passware and a couple other softwares. It does not do bcrypt, I don't believe, so I don't know how fast it would be. And it's also, again, kind of outdated. Making a new one costs a whole lot. They're not very cost effective, so most people tend not to. They could be fast. If you have the money, they could be fast.

So to any extent which you checked, were these reused passwords the 123456 variety or were these some actual solid passwords? So actually these were solid. I was very surprised by the strength of these reused passwords. We can go ahead and well, I'll just start it back up and then we'll stop it and look at some of them. So if we stop it, these are decent passwords. I mean, Stephanie123, maybe not the world's strongest. But some of these are just absolute gibberish. Hotdog22 is definitely a very easy password. There are definitely passwords in here though that I was very surprised I was cracking. Some of these... Actually, most of these look pretty simple. There's a couple hard ones here and there. Again, For this list,

maybe not, but there were times when I was testing this. I also tested this on the Edmodo dataset, which was a recent very large bcryptos 12 cost. Very effective there. However, the passwords there tended to be site-specific. That specific website was mostly students and teachers, I believe, and most of the passwords we ran across had Edmodo in them. So instead of the source being site-specific, the target was. So the ones we were getting were very, very strong, but that was abnormal. It's because they were using a password manager, generating one, and then reusing it everywhere or something like that. That's not typical. Typically, they're a little on the easier side, but I have seen it

where only the strong ones get reused because everybody else uses something so trash that it's not going to come up. I believe you had a question. Okay. Anyone else? We got one in the back.

So recognizing the ways that things can be broken, other than using absolutely random, totally different passwords that have no basis in reality, is there anything where there can be some level of reuse that isn't going to be easily cracked, or does it have to be absolutely totally random to get what you're trying for? So, in theory, The best option is always going to be complete randomness longer than 13 characters. Actually, roughly exactly 13 characters. You can do things to make it easier. You can do formats like diceware, where you have four or five words that build a passphrase that's very long, very hard to attack, very hard to proof-force. They're very strong Diceware passwords and they're easy to remember is the goal. They're very easy

to, you know, four or five random words that have no correlation. Read it four or five times, you can probably recite it fast enough to type your password in. For things like that, you can have a couple words be the same and maybe change the last one, but really you shouldn't be reusing any part of passwords across websites. Websites should have unique passwords that are long and random. And that would be because you don't know if the website is hashing your password. If tomorrow the website gets breached and everything's plain text, as strong as your password was, everybody knows it. So you're trusting the site to be secure and sort of taking that security into your own hands the best

you can. Anybody else? I think I have no idea how much time it's been because my timer got reset, but, um, I think we're good on time. Okay. Well, if no one has any questions, we can stream the passwords by and okay. Sure. Now you, you mentioned that, you know, there are these, these lists that, you know, you strong B crypt where it's not easily entry level, entry level crackable by just common people. Um, Do you see, have you heard any inference that there are some attackers that are going the long game and continuing to attack some of these more difficult patterns? Or is it really something that only someone like you would really have the endeavor

to go into? So there was one list that I have seen come up time and time again that's very difficult. It's very slow hashes. And that was a Bitcoin related forum got breached and what they've done is what I've seen is mostly people that are malicious They're looking to break into accounts. They're trying to crack these hashes They do it on a one-by-one kind of basis. They will pick a target and pull their hash, try to crack it and move on. So it's still not quite the massive list opportune attacker style. They're still targeting, but they are coming back to the same lists over and over and over again. And with something like this, if

they could mass crack that list, it would be much more devastating because it would take them much less time to attack much more people or many more people. So it's not happened. Most people have left Dropbox alone. That's why I picked it. Edmodo came out and I don't think really anyone noticed. The hashes have sat untouched on the websites for a while now. I don't think hashes.org even loaded it because it's just so difficult to crack. I mean, if you're doing it for sport, great. You can spend weeks doing it and spend a bunch of money for nothing. If you're doing it maliciously, you can't. I mean, there's not a whole lot you can do.

And anyone else, I don't know why you're doing it. - And now, the eight card rigs for Hashcat, about how much are those, if you wanted to build those retail right now, how much would those go for? - You would probably want to talk to sales at my company about that. Because I know I don't know the right number, so I'm not gonna guess. I mean, if I built one, I have no idea. But yeah, no, the eight card and ten and however many cards there are now are great. They're just things like this where B-grip comes into play. You can only throw so much power at it at a time before you have to find something new. And so that's what we did. Thank you. Sure. Anyone else?

No? Okay, we're going to watch a bunch of passwords stream by. I'll make it a little slower so it's easier to read them, but there are usually some good ones.

Oh, that's not slow enough. There we go. So these are roughly the level of password that we're seeing in the Dropbox list that came from the LinkedIn list. These were all cracked in LinkedIn and then correlated over and then re-cracked in Dropbox as the bcrypt list. Some are stronger than others, some are quite weak. I imagine maybe some of these accounts are bots. A surprising amount of cross-person password reuse for passwords that don't make sense. Every now and then there will be a random password where there are 50, 100, 200 accounts with that same random password. So I have to imagine that's bots. But it does show up and it does crack pretty well. Yeah, most of these are pretty weak. I would show Edmodo, but

that one is layered and has a number of issues. Edmodo chose to do 12-cost bcrypt probably a couple years after they chose to do MD5 and to fix their database they upgraded by just hashing MD5 with bcrypt. So whenever you log in your password is hashed with MD5 and then again with bcrypt. So when they come out you can either do that layer internally or you can just crack them with a list of MD5 hashes. You don't even have to have the password so that usually just looks like gibberish when I run it. because this tool will automatically parse that out for me. This tool was written by the creator of MDXFind, Waffle. I believe it was edited a little bit here and there by Hopps, one of

the other developers. It is custom to this, and they actually used it in the Ashley Madison talk that they gave. This is a new revised version of something they call BCVAL. It's a Bcrib validator. Right now, to slow it down, what I've done is I've had it do case twiddles. It is modifying the beginning and end character cases up and down, and it is doing 25 different variations throughout the password before skipping it and moving on. So that creates enough work that you don't see just the blur on the screen. You have a question? Is this tool open source? So the reads does that generally include the users names associated with those assets? So to

do the correlation you have to have the usernames. So, um... Can you repeat the question for everybody watching? So do the braces can really include the user IDs associated with these hashes? I'm not sure I understand. I'm not sure I understood it the first time now because I revised it. Like you mentioned, you have these various breaches on the internet with hashes. Do they include the usernames associated with those hashes? Because otherwise you end up saying, okay, someone in the world used Pepsi7 as a password, and then what? Typically for research and the things that I do, I strip all of that information out and I throw it away. There is some... kind of gray area, kind of a legal gray area on whether or not you

can have this data. It is still technically stolen data even if it's been posted publicly. So typically the information like that, especially among the more legitimate researchers, is stripped out, thrown away, and what you are left with is just hashes or just passwords or if you're Troy Hunt, just emails. As far as I know, he doesn't really save anything else from the databases. But you can't find them. Typically, wherever the dump came from, before it got parsed by as many people as wanted to play with it and all the researchers and stuff, the original versions, or at least nearly original versions, will have all of the data in them. And you'll have emails, usernames, IDs,

phone numbers, anything that got dumped. Sure. Thanks. Mm-hmm.

Looking at the list of the passwords that's scrolling across, I see clumps of very similar passwords. Can you explain? So that's artifact of when I cracked the LinkedIn list. These passwords, as they appear in the correlation list, are sorted by that column, not the hashes. And it was sorted in the order that they cracked in hash cap. So I cracked the SHA-1 from LinkedIn. and then left it in that format, and then correlated across using the hash plane, and the email hash, and then the email hash from Dropbox. So what you're seeing is a little bit of leftover clumping because that's how Hashcat broke them the first time, and now these hashes are being broken by those same passwords in

the same order. It is interesting, though. I probably wouldn't have noticed that. Yes? Over here.

So obviously I'm seeing that a lot of the passwords are a lot shorter. And one of the passwords I saw scrolling by that caught my attention was just the Pikachu. And I was thinking that if that wasn't in the dictionary used as a word, that it might as well just be a random array of characters. And obviously it's still short to be a strong password. But would you say that something like that is generally better? than like obviously just a straight normal English dictionary word followed by numbers? So in theory if you have a rare word, however you might come about classifying a word as rare, a normal dictionary attack and maybe even dictionary attacks with rules won't find them. If they're short, it doesn't matter though because then

you fall under mask and brute force and everything else. The other thing is sometimes words look a little more random or a little more rare than they are statistically. There are many attacks like some of the cut attacks where you take two halves of the same word, split them on the whole list, and then start shifting that list. You'll find words that are valid that you would never have thought of. So there's attacks that may find it here and there. Of course, if for some reason your word is a word, it's probably in a dictionary. But again, if it's not, then yes, you should avoid most of the attacks that will attack your password. All right. Thank you. Over here, over there. So you mentioned that

one of the lists was not able to be cracked because there's a pepper in it. Yes. Based on your experience going through all these passwords and how they break down, do you think that generally, in your personal opinion, that developers should, even if in theory it's not very imposing, they should have a hard-coded pepper in their code that maybe won't be dumped in a database dump? I don't personally believe in hardcoded because if I have, I'm an attacker and I've broken enough to steal your database, I've probably broken enough to steal the pepper. Do you see that in practice? I have seen a few databases where a hard-coded salt or pepper was used. In almost all of them, it was either

short enough that we eventually broke it, or it was found by the hackers and then dumped with the database. Can you speak of how you found the pepper? I think we're a little out of time, but I can, I'll talk to you outside about it, and yeah, they're typically pretty easy to find, and the dumpers have found them for us. Thank you. Of course. So I think that's it. ♪ ♪ ♪ ♪ ♪ ♪ ♪ Good afternoon and welcome to B-Sides Las Vegas. Deploying WebAuth in a Dropbox scale. And this is Brad. He's going to give him the speech today. A few announcements before we begin. We'd like to thank our sponsors, especially our inner circle sponsor, Rapid

7, and our other sponsors, Amazon, Oath, and Simmel. It's their support, along with the other sponsors, donors, and volunteers that make this event possible. You've probably heard before, these talks are being streamed live. As a courtesy to our speakers and audience, we ask that we please make sure your cell phones are set to silent. If you have a question, use the microphone. Raise your hand if you guys have the grill. I'll bring you the microphone. Ask a question. Easy. Any other questions? We'll get started now. Excellent. Thanks for coming, everyone. We're going to be talking about WebAuthn today, so how Dropbox supports it for second factor, using it next, what comes next for login, and generally looking forward to telling you about it. First a quick disclaimer: I do

work for Dropbox on the product security team and I worked on the WebAuthn project, so that certainly informs my talk, but all opinions are still my own. I'm not actually here speaking on behalf of my employer. With that out of the way, let's start with a quick user story. So let's say I'm working on a presentation. Perhaps I want to get some feedback on it. Could be based on a true story here. Share it with my friend. They receive an email. Might look something like this. Nice blue button to click on. Go ahead and click. What happens? Come to this website. Username, password. Click sign in. Now I can view the file. So how did

this website actually figure out this was my friend, that my friend had access to my slides? Well, it's that password. That was really the only secret part of this. And if we start thinking about what we actually want out of a secret like this that authenticates us to a website, controls our accounts, one key element is that we don't really want to tell it to people. That's kind of a strong secret. You're not telling it to everyone. You're keeping it really restricted to only where it needs to be. And unfortunately, passwords are not so great secrets. Since you're in the passwords track of B-Sides, you may be familiar with this, but you're always entering them

on websites. You're essentially telling this web form, this website, all the time, "This is my password. Please authenticate me." So this, of course, leads to problems because it relies on me, it relies on all of you to not mess up, to not just once put your password into the wrong place. So if we go back to our example, maybe you're used to putting in a password if someone shares a file with you so you can sign in to your cloud provider of choice. So someone else, an attacker, can rely on that to send you an email that looks like someone is sharing a file with you, looks like it redirected you to the right place,

but it's of course really a phishing site. So all of a sudden, your password has been told to the wrong set of people, to the wrong server, to an attacker. They can compromise your account. That's one problem with passwords. The other problem is just that you have too many of them, and we're not actually good at remembering long random strings of letters or numbers. So typically, you might choose an easy-to-remember password, which often means easy to guess. So with that, you also end up with both fishable secrets, bad secrets, and weak secrets. you know, some smart people thought about this and were like, we can do better. We can add a second secret, second factor authentication. So we're probably

mostly familiar with this. You have an authenticator app on your phone. You have a code sent to you over SMS. And this really helps with this second case of having weak passwords or reusing passwords. Because now you have this second secret that's a little harder to get at. It's actually separate for each site potentially. Your authenticator app has a separate secret for every single site. Or the SMS infrastructure generates a new one-time code for you on every login. So we have this second secret where the part you're telling, it only works for a little bit. It's one-time use. It's a minute long. So we've reduced the risk here and really helped with the second case

of password reuse and weak passwords. But we haven't really solved the problem of phishing because that part you tell still works. It's still working for a whole minute. So someone can just as easily ask for that part as well as your password if they're trying to phish you and go ahead and turn right back around and once again gain access to your account. The key lesson here was though that computers can actually help us. to design stronger authentication schemes. So we have this human problem of not being able to remember passwords. One other solution is a password manager, of course. Or in the second factor case, we can rely on our phone to actually remember

those secrets for us. And for phishing, if we want to solve that problem, we can see how a browser might be able to help us with that too. your browser does actually know the difference between a phishing site and a real site. It might not know which is which, or that one is trying to be a masquerade of the other, but it does know their different origins, their different domains. And so we can actually use that information. The TRTP was a pretty good start, the authenticator apps, because they remembered a different secret for each site for you. And the weak link was really that you had to actually be the one to enter the code

into the browser, and there was no guarantee that you were entering it in the right place. Instead, we can have our browser talk to our devices directly and handle that for us. As long as we keep credentials scoped to a particular domain, our browser can make sure that phishing sites and can never gain access to a real site's credentials because they're just fundamentally different domains. So enforcing that separation, using our computers, our browser to help us, that's kind of the key idea of what WebAuthn's bringing to strong authentication. And then really the rest of the talk, see how that actually works and then how you can build on that to have strong second factor authentication

and even potentially passwordless login and seeing in general where do passwords fit into this new world if we have strong authentication with WebAuthn. You can see here kind of a roadmap for what I just said the rest of our time here. And I'll go ahead and dive in to a bit more detail on what WebAuthn actually is.

So we have a nice, you know, technical looking specification here. And this is kind of the most specific version of like what is WebAuthn. It's a specification by the W3C that outlines a set of browser APIs that websites can use to authenticate users on the web using public key credentials. The great thing about public key credentials is now we remember that secret part. We want a secret that we never tell. Here, the secret never needs to leave the authenticator. Only a public verification part is sent to the server. Of course, the other nice thing about this is those credentials are also going to be scoped to a website, as I mentioned. So if you're looking

at the spec, it can be a little intimidating, and it helps to just know first, you know, what's the overall structure of what WebAuthn actually lays out for us. So it's going to give us as, you know, where us would be, say, server developers, it's going to give us two operations here. We can register new credentials to a user and we can authenticate with them. And we have three main parties who we have to figure out how they're going to talk to each other and how they're going to behave. So we have this server that wants to authenticate users. It's also called the relying party, as you'll see that in the spec. Now the browser

mediates communication between all of these elements and the authenticator devices themselves. So this could really be kind of anything we'll just see going forward. You were talking about like YubiKeys, hardware tokens, also authenticators built into your laptop, maybe a TPM, maybe a fingerprint reader, maybe your phone. So this is kind of a wide class of things. But basically anything that can talk to the browser and actually undertake this protocol. Now if you're looking at the spec, or you're just getting a bit lost if I talk through things, it's helpful to remember that as a server developer, you can ignore a lot of this, and really as a user you don't have to worry about too many of the details, so it depends which angle you're coming in from,

but the spec really has to describe, you know, both, for all three of these parties what they need to do. So you can really focus in on, especially at the beginning, the party you're most interested in. So in this case it would be the server developers. And instead of kind of going more in depth and like specifically the messages and protocols, just going to kind of go at a little bit of a higher level of what are some key elements of both of these operations. And that'll set us up for how Dropbox actually deployed WebAuthn and what it looks like for different kinds of these options. So for registration, we're going to generate a request

for a new credential on the server. We're going to pass that to the browser, call navigator.credentials.create. This is this new API that WebAuthn specifies. The browser is going to go ahead and take that and pass it off to the authenticator along with what origin the user is currently on. And then they're going to undertake their own protocol. It's specified some in WebAuthn, but the real detailed specification that looks at this side of things, the browser to authenticator, is called CTAP2. So that's the one to look at if you're really interested in the hardware side of this. In any case, it'll come back. The authenticator will go ahead and create a new key pair, pass the

public key and some other information back to the browser, and the browser will go ahead and pass it back to the server where it can go ahead and parse and save this credential with whatever user is registering. So next, you've registered a credential with your favorite site. You go, you want to log in, maybe it's second factor, maybe it's login. In any case, the server's going to go ahead and generate ourselves a request, call this API again. The browser passes that to the authenticator, again with that origin, so it's only scoped to that specific origin. So if you're on a phishing site right now, the authenticator will not see the real site. It'll see the

phishing site and won't use the real site's credentials. Then the authenticator signs this request using the saved credential. Passes it back to the browser, browser passes it back to the server or your JavaScript passes it to your backend and now you can actually verify the response. So this is of course a key piece for security. You have to check that it's for the right origin using the right key pair that a user has actually, the user you think it is actually has registered with you. So there's a series of verification steps in the spec for both registration and authentication, actually. They're most important in authentication. For registration, a lot of the information is actually not

signed or not as important because the real security comes here where you're checking the origin and credentials are correct.

That's the overview of how this protocol works. Before we kind of get further, I want to interject, if anyone has been researching WebAuthn and is confused by the names out there, or if you do later, it can be helpful to kind of see at an overarching level how these different keywords or organizations have interacted in the development of this. So you overall have the FIDO2 project that's WebAuthn plus the CTAP2 spec, which as I mentioned is the piece that describes the hardware-to-browser communication. These actually overlap a lot, they really work together, but they're kind of a collection there that was developed by both the FIDO Alliance and W3C. These came out of some earlier FIDO

specs that you may have heard of, U2F, UAF, the hardware parts of those, which is now called CTAP1. So you can see WebAuthn evolved out of basically U2F and UAF, and CTAP2 is the newer counterpart to CTAP1. Where this gets a little trickier is that you can actually mix and match these. So you can have the browser API, you know, you're calling the web-opt-in browser APIs, but the browser is actually talking over CTAP1, maybe to an old legacy U2F-based authenticator. So there was kind of a lot of care put in, actually, to how these could all be somewhat compatible as much as possible. What you miss out on in that case is some new features of WebAuthn only work if you have a CTAP2

authenticator. So hopefully that helps if you see someone marketing, oh, this is a FIDO2 or a WebAuthn authenticator. If it's a hardware device, that means it speaks CTAP2. If you see a website saying, hey, we now support WebAuthn, that means they've moved from the U2F browser APIs to the WebAuthn browser APIs. So hopefully that's, yeah. Sorry. Let's.

Is that better? Not as loud, but yes. Sorry about that. Thanks. Great. So, recap of this section before we move on. You know, if you were feeling a bit sleepy cat, hopefully you get a wake cat for the next section here. But in any case, kind of essentials are just WebAuthn, you know, lets websites authenticate you with some sort of authenticator device. Authenticator devices could really be anything that talks to the browser over this protocol and can store credentials. and the browser and authenticator are what's working together to make sure a phishing site never gets real site credentials. Now I want to talk about where Dropbox comes in. So if you look at, you know,

2014, 2015, Dropbox had second factor authentication, TOTP, SMS, and for all the reasons mentioned, wants to do better. We've seen even still recently the pitfalls of SMS second factor. There was a recent breach Reddit announced saying, hey, we had SMS 2FA, but attackers were still able to bypass that. Wasn't as secure as we thought. As a developer at a cloud company, a cloud company like Dropbox, you would say, oh, this U2F thing seems quite promising. Fundamentally, U2F is what actually introduced a lot of the ideas that I've been talking about so far. So if you've been thinking like, what's the difference? We'll get to that, but they're pretty similar in the fundamentals, especially for second factor authentication. So, Dropbox went ahead and implemented that, announced we have security key

support, quite exciting. And what happened? Well, some users did adopt it. They improved their security. This is good. Definitely encourage all of you to turn on security keys for whatever sites do support it, whether it's U2F or WebAuthn. But there was limited adoption overall for our user base. It's only supported in Chrome. And later Firefox 58 plus but still behind a flag, not on by default. And generally another key point was that there weren't that many hardware devices that users just already had that they could use for security keys for U2F. You had to purchase, go out and independently purchase something like a YubiKey or any of these other hardware authenticator devices. And so this

is a big barrier for getting kind of mass adoption. And what's really hopeful about WebAuthn is kind of taking this initial effort of U2F and getting the whole ecosystem on board, saying we're going to have more and more devices users already have, like their phones, like their laptops, have WebAuthn authenticators built in. We're going to support this protocol in all the major browsers, not just Chrome. So that's really the biggest reason to go from, you know, if you have U2F support and you're thinking about going to WebAuthn, like, that's the biggest reason is it's the future. That's where the strong authentication is going. And, you know, this might seem at first like, is this just

marketing? Is there, like, you know, anything of substance to this? Like, there also is as well. But it's not to be, like, discounted, this kind of ecosystem adoption, because that's really going to be the thing that actually makes it feasible for users to use this on a broad scale and actually be, like, the commonplace way of authenticating to sites on the web. It's basically connecting websites to hardware devices in a scalable kind of mass adoption way. Of course, we're not quite there, but that's the hope of the future. And then there are also some differences in the protocol. There's some additions, some new features. So kind of the two main ones are that you have

more options over credential creation in WebAuthn. So you have a little more control over what kinds of devices or credentials users might use. And you can also have authentication that works without usernames. It works without needing to say, hey, these are the devices you've registered before. Which do you want to use to authenticate? You can just send a request to the browser saying, please authenticate a user. And it will come back with some sort of cryptographic proof of some user device that they previously registered with you, including their user ID. So, with that in mind, you know, if you don't have U2F, a site doesn't have U2F, it's really time to just skip to WebAuthn. There's already more browser and device support. And it avoids

some complications we'll get into when you support both at the same time. So, I have a quick video here just to kind of demonstrate that there are already starting to be new devices there. Which this specifically is going to talk about is showing, like, touch ID second factor on a MacBook Pro, so the new Touch ID bar. Today, if you get Chrome, beta Chrome and turn on a flag, you can actually register Touch ID as your second factor. And so eventually you can imagine using WebAuthn for that to be your primary login as well, or second factor on other sites. So this is a kind of screencast of what that looks like on the actual

Touch ID bar, as well as the prompt that came up. So that's a pretty exciting thing WebAuthn unlocks for you, you know, already coming up in the next couple months. And to see how to get there and how, you know, we had a solid U2F implementation, what really changed, I'm going to go into a little bit just like what is actually new at a little bit more detailed level. So even if you're not, you know, planning to directly do this, hopefully it still can kind of illuminate what really changed here and what WebAuthn is all about on a more detailed level. So in U2F, you had some identifier for your website app ID. You had,

you know, I want to challenge -- you know, register request so it has a challenge to make sure that request is fresh. And you had some registered keys because you didn't want to duplicate people, you know, registering the same device multiple times. And in WebAuthn, you know, it's lined up, color coded to be what kind of corresponds there. So you have, you know, relying party. It has an ID that replaces the app ID. You have a challenge. You have some credentials. And then you have a lot of new options. That's the part in purple. So we'll talk about some of those a little later. Signing, also pretty comparable. One new option at the bottom there

is user verification. This is a nice addition where you can actually request that an authenticator has done some sort of check that the user who's using it is the one it expects. So if you imagine like a YubiKey or hardware token, if anyone steals that, they can still just use it and press the button. The button doesn't check that it's actually, you know, you pressing it. Whereas you just saw Touch ID, that'll actually look at your fingerprint. So there was some verification by the authenticator there that you are who you say you are. If someone steals my laptop, they can't just use my Touch ID authenticator because they won't have the same fingerprint. So that's

a new option in WebAuthn to request. The other thing to look at is that you might see the app ID and the relying party ID are different formats. And this actually causes some backwards incompatibility because from the device perspective, since those are different formats, it looks like a completely different website. And since the protocol is pretty privacy-preserving across origins and also to prevent phishing, it'll treat those as totally isolated sites and won't let you use credentials from one with the other. So to get around that so people don't have to all re-register their old devices, there's actually an extension in WebAuthn called the AppID extension, so you can specify a legacy AppID to use so

users can still keep using those previously registered devices. But that extension doesn't exist for registration. So if you have a device registered with WebAuthn, it's not going to be usable in older browsers with U2F. But if you have already U2F registered devices, those are going to be usable in both with this app ID extension. So this does have a few confusing edge cases. The main one here, where Like, let's say you have a user, they had a FIDO U2F key registered, and they lost it, so they got a new key. Makes sense? They go in, they register their new key, now it's on a new browser, we've switched to WebAuthn, so it'll get registered with

WebAuthn. And, you know, in this case, we'll say they forgot to delete their old key. Now, a little bit later, say they're at a library, they're at a friend's computer, they need to log in and get some file or get something out of their account. They log in on an older browser that only supports U2F. So, you know, if we're using Dropbox in this example, we'll see this and we'll say, hey, they have a U2F registered key. So let's go ahead and prompt them to use that. If they don't have any U2F registered keys, could send a separate error message and say, hey, sorry, your key doesn't work in this browser. But in this case,

they do have one that'll work. So let's prompt for that. But now the user sees this. They think, well, I have my security key. I'll tap on my new WebAuthn registered security key. And, okay, so what happens? Now it comes back. And it looks like a completely different site to that WebAuthn key, because the app ID is a different format. So it'll say, hey, there's not actually a credential registered that wasn't found. It's not registered. And so if you just present that same error to the user, now it's kind of confusing. They're like, I just registered this key. What do you mean it's not found? So in this case, you can actually detect this might

be happening. You can't know for sure. It's possible they just actually did, you know, try to use a key that's not registered at all. But if you think it might be happening, you can go ahead and give some other error message explaining this case. So that was something that Dropbox did for that as well. The other thing to think about when you're switching from U2F to WebAuthn is looking at something called attestation. So this is cryptographically verified information about the device, and U2F, it's provided by default. If you have a U2F implementation, you've been receiving this attestation. And it can be useful to see device capabilities, like will this device be useful on mobile? Or

also, let's say you have an enterprise deployment and you've given all your employees a particular model of authenticator that you trust, and you only want them to use that one on your internal site, you can actually use attestation to require that. So there's some uses there. But on broad public-facing sites, there's less need to have it be cryptographically verified or to enforce things like that. So it's actually not provided by default. If you look at what Dropbox actually does here, it does request this information. You'll see, like, do you want to provide your make and model of your key? But it doesn't enforce that it's provided or cryptographically verify it, because there's not a need

to actually restrict the devices users are using. It's more to guide the product user experience around warning them if their device won't work on other, a mobile device or something like that, or just providing better UI to show what devices are being used.

The nice thing about WebAuthn, if you're even thinking, "Okay, this is great. I'm going to have more authenticators, Touch ID, YubiKey, things like that." Do I have to do a lot of work to actually support these new devices? And the answer is no. It's really nice. The browser actually does almost all of the work, especially if you don't care as much about attestation, because that is one area that can differ between device types. The only other real thing you might have to do is just add support for a new public key, private key signing algorithm. So usually that's just a few lines to translate the WebAuthn version of it to your favorite cryptography library's version

of it. So one example is pretty recently Microsoft released Microsoft Edge in a preview edition that integrates Windows Hello, Face Sign-In, Pen Sign-In with WebAuthn. And so this, to make this work on Dropbox, was just a few lines of code to add a new signature algorithm, and then it was ready to go. So if you have a preview build of Windows, you can also try this out for second factor. So now that we have this implementation, new wrapper libraries that choose U2F and WebAuthn based on browser support, tweaked the formats a bit for each case, did all the processing, time to roll out. So it went ahead and rolled out second factor on both Firefox

and Chrome. And you generally want to do second factor auth first, not the registration part of it, because if you already have U2F especially, you can always turn off WebAuthn auth and go back to U2F. But if you have people registering new WebAuthn keys, you won't be able to use those with U2F authentication or signing. So it's not compatible that way. So typically you'll roll out your authentication piece first, test it with, you know, some internal registrations, things like that. And then when you're feeling solid, go ahead and roll that out. In this case, Dropbox rolled out to Firefox and later Chrome when WebAuthn was stable in Chrome as well. Most recently, as I mentioned,

Microsoft Edge Preview Edition is ready to go. So you might be looking at this fancy new WebAuthn experience and see some link at the bottom that says, hey, can't use your device? Want to get a TOTP or SMS instead? And you're thinking, well, OK, that seems like an issue. Couldn't someone just tell me to fall back to those and I'd still be tricked? So the answer is not ideal. I agree. But unfortunately, WebAuthn isn't actually everywhere yet. So hopefully in the future, You know, it'll be able to be WebAuthn only everywhere. But for now, you know, on a mobile app, it's still pretty difficult to get WebAuthn working, but that'll be coming soon. You know, that'll be coming. Dropbox has a desktop app, so

that's its own kind of separate piece of work to actually integrate WebAuthn into. Safari still doesn't have support for WebAuthn as well. So, you know, this is something that's hoping to be coming soon, but still, you know, it's there has to be a balance between this increased security and just locking people out of their accounts when they need to have access. But if you are determined, you know you only need to use WebAuthn on desktop, for example, you could sign up for a TOTP and then just throw away your TOTP seed. But this, of course, is pretty risky in terms of you would want to really understand where am I able to use WebAuthn devices

and where aren't I? So stay tuned here for updates. But that's the current state. So just to recap what Dropbox has done so far and then what we'll talk about for the last part of the presentation here. Rolled out WebAuthn for second factor. You know, you can too. It's pretty well understood based on people's experiences with U2F, kind of how these flows should work. So generally pretty good there. There's a few tricky parts if you already have U2F with this backwards compatibility stuff. If not, just skip to WebAuthn. So that's what Dropbox has done so far. And then now I want to, you know, kind of, okay, that's where Dropbox is. Now more of my

own opinions of speculating on, like, okay, what comes next? How do you get from the second factor authentication to an actual primary login experience? How do you go passwordless potentially? What's the future of passwords there? First just to look at it at a technical level. You know why first you know WebAuthn 2FA is actually pretty useful for validating that you want to do WebAuthn sign-in because at a technical level they're almost the same. The specification doesn't actually make like a hard distinction in its basic operations of registering and signing between the cases. It's really about using the options that it provides you differently in some cases. So one example of this is like in authentication or signing. And the second factor case, you would generally always provide this

allow credentials field because a lot of older devices to save space would offload storage of their private keys to the servers. They would first encrypt them with a symmetric key and that was the only thing the device had to remember. It encrypted the new private key it generated and then passed that off encrypted to the server where it could store it. Then the server would pass that back in the allow credentials field. And the device would then on the device decrypt it and recover the private key and actually use that design. So this was kind of a nice hack to save onboard storage on these little authenticator devices. This, of course, doesn't work if you

don't even know who the user is yet. You can't pass in these allow credentials. But WebAuthn and CTAP2 specify something new called I want a client resident key. So this means that private key is generated, you know, on the device and never, you know, it's actually stored on the device. It doesn't need to be offloaded in this way to a server. And then if you do that, you can actually receive like a user ID back in the response that lets you see, okay, which user actually tried to sign in here. Of course, you can also do sign in without passwords but with usernames. So ask for username first, then look up the keys, and, you

know, if you're just looking at, like, taking your second factor code and, like, turning it into login code, one thing you might forget is that in In the second factor case, you've already authenticated the user with the password, so it might be okay to include something about their account to have better UI. It might make a reasonable trade-off, and you might want to make sure you reassess that trade-off if you're returning it just after a username, because there's really been no authentication done at that point. But the spec itself shouldn't pose a problem here. It's just something to be aware of if you're kind of porting that to a fake case directly to log in.

Yeah, but in general, you know, these basically the same registration signing operations. The harder part is what policies does that that's where it gets more interesting because you want to balance users choice and like how they sign into their accounts what devices they want to use with their account security and also risk of them locking themselves out of their account. So, one example of this is maybe you've gone to a WebAuthn, you know, signup flow where you want to say, hey, the user doesn't even have to, you know, make a password for my site anymore. I'm just going to use WebAuthn. So your user goes, registers a device with you, registers an account. And maybe

they registered, you know, the Touch ID bar on their laptop. Well, now they're never going to be able to sign into your site unless they have the same laptop. So if they ever go to a different site, you know, they won't have the same, you know, this is like a platform authenticator is what it's called. They won't have the same platform to use, so they won't be able to authenticate. So in this case, there's clearly a high risk of kind of lockout if they lose that device or they're just somewhere else So you might want to then say hey you need to register like a roaming authenticator is what it's called and Actually, you know

force them to register some sort of like hardware token that can be removed and moved around between devices Or maybe eventually like a phone would serve this purpose although right now most of the phone operating systems aren't quite there yet. I So, that's one example there. The other is that if you have a YubiKey that's just, you know, test of user presence is what it's called. It's like you press the button, but it doesn't actually verify the user is who they say. You would want to be hesitant before letting users have that as their only, like, authentication factor, because it's really easy for that to be lost or stolen. And in the case where it's

a second factor, that's not too bad, because they also have to know the password. Like, if someone, you know, if you have login through just plugging in a YubiKey and pressing a button, someone could go ahead and, like, try that on all of the major sites for, like, you know, common accounts and just, like, see what they find. And so it's a bit riskier to have, like, no form of user verification at all. So that's -- those are some examples of policies. The other one I mentioned was, like, in an enterprise deployment, you might want more control over, say, you know, if you're using biometrics, maybe you don't trust biometrics and you want to say,

okay, actually, don't use that. or maybe you only trust certain devices that you've already handed out, how their biometrics work. So you can go ahead and use some of the other options in WebAuthn, like attestation, or some of the, there's a few other registration options to select different types of authenticators to enforce some of those policies. So, kind of just to summarize, really the two main questions are do you require user verification? I would tend to say this is -- you should. Otherwise, it's pretty risky. But, you know, it's still something you could think about. And also what sort of restrictions on device type, whether it's platform or roaming, whether it's a specific model or

brand. In general, for like a large public deployment, you probably don't want to get too in the weeds with having to keep up with all the different devices. But, you know, there are cases where, you know, certain categories of devices and distinctions based on them would make sense. Another example is like on iOS, authenticators like hardware keys are only going to work if they're Bluetooth enabled because right now there's no NFC support. So you might want to make sure someone registers a device that's usable on iOS. But that's some questions that would be up to whoever implements it. So we have some ideas of you've got to set some policies, but it's understood we're working out what the best policies are. We could get to a

world where there's definitely, you know, it'll be possible to sign into a lot of websites using WebAuthn without a password. So what happens to the password? What are kind of the use cases that are still hard to, like, know the answer to right now or directly get rid of? So one is recoverability. What happens if you lose access to all your devices? Like something catastrophic happens, there's a fire, or you were carrying everything that authenticated you to a site in a backpack that got stolen. Who knows? You lost access to everything. And how do you get back into your in-account? The nice thing about a password is that it's in your head. If you have

managed to remember it, and it's possible to remember a few strong passwords, even if it's hard to remember one for every site, it's still in your head. You can use that to log in from anywhere. So if you're without any of your devices, you can fall back to that more easily than what do you do in the WebAuthn case or in the passwordless case. So there are some solutions for certain cases here. Like you can fall back to your email if you're not an email provider, of course. There's another new specification that's pretty cool that's being developed that's called delegated account recovery. So this is something right now that like Facebook and GitHub support where

you can recover your GitHub account through your Facebook account. And it's actually also like privacy preserving. So the only thing the two providers know about each other is that you have an account on the other provider, but they don't actually know like what the user name is or what account it is. So it's a nice privacy preserving way to link accounts for the purpose of recoverability. So you could imagine the future of this, you know, is more widespread and you're able to kind of have, you know, one, you know, trusted account. Maybe it even has your real physical world identity attached to it. And you can use that one to recover all of your other

accounts. Maybe through actually like showing up in person, you know, if it was like a bank, you could show up in person with some other form of identification or a police report or et cetera. You could use the physical world's responses to these kind of catastrophic circumstances to bootstrap your online identity as well. So that's kind of one future-looking thing you could think about it. But in general, it's, you know, we'll have to see kind of how this ecosystem evolves. But at first, it'll at least reduce the frequency and number of passwords that are needed. Because at a minimum, a lot of sites, you know, you're probably fine recovering through an email account or something

like that. So you could just have them as WebAuthn only. And if you lose a device, maybe you recover it do something else. The other kind of thing people are exploring in the WebAuthn ecosystem is having better ways to handle like registering, say, a backup device that you could keep somewhere safe. And then if you lose the primary device, kind of automatically transferring all the credentials from the primary one to the backup one in a less painful way than having the user like go to every single site and do it manually, because that's obviously going to be quite painful if you're actually using these for a lot of places. So that's another kind of promising

thing that people are looking into. But I think recoverability is one of the biggest reasons that passwords might stick around for a while, just to have this ability to bootstrap from losing all your devices. The other area just to think about is that now that we have something like WebAuthn, what value does the password provide? Like, clearly if it's a weak password or something like that, it's not providing much, but it's still some additional factor. Like, do you still need it? Is there still a benefit there worth the usability cost? A lot of people, it seems like, are trending towards a no answer here. But I could imagine maybe for a few sites you're particularly

concerned about, you might make the effort to invest in remembering a strong passphrase in addition to your device Or the other way this could go is that maybe, you know, instead of using the password directly on the website, you use the password on your device to unlock it. So you have a very strong password for your device. So that's probably better long term because then that password doesn't need to go over the network, but it's still something to think about. And the implications it has is like, how do we encourage users to like adopt WebAuthn? You know, first, should they adopt it as second factor authentication? Or should we just directly say, hey, you know,

use this to sign in? Does that answer change if they already have 2FA enabled? If a user already has a TOTP code and they turn on WebAuthn for sign-on, like, do you, you know, prompt them to get rid of their TOTP? Do you leave it there? Do you get rid of it for them? You just want to think through -- and I don't necessarily have the answers here exactly, like, how much security any of these factors provide when multiple ones are being used. Because obviously sometimes it could be totally redundant and you don't need both. But there may be cases where you actually want these second factors even in the webAuthn world. Yeah, so now

kind of just want to bring it back to all of you and say, you know, WebAuthn has kind of ecosystem support to bring, you know, the strong authentication to the masses, like browsers are on board, hardware vendors are getting there, but we still need more just websites adopting it, users turning it on, showing that, you know, people actually value it. So it's now on to all of us to kind of push this forward and to push WebAuthn to be more and more adopted. It's relatively easy and well understood for implementing for 2FA, so go ahead and do it. The login with device, you know, passwordless flows also, you know, seem to be getting closer to at least being tried out places. But the best practices around,

like, usability and policies, we're still figuring that out. So, you know, help figure those out as well. And just generally, we can all look forward to fewer passwords, less phishing, and more account security. So, all in there if anyone has questions. Thanks. Yes.

Thanks. I'm curious if you see the need to either keep track of a whitelist or blacklist of devices in this case, just given that the devices are so new, if you find a vulnerability in one, what have you guys done about that kind of thing? Yeah, so far we haven't seen the need for that. There haven't necessarily been widespread vulnerabilities that have been publicized for a lot of these. It is a good question and kind of an example of why some of that attestation information is useful sometimes to potentially be able to do that. But it's definitely a big cost that in general you don't want to undertake unless you feel like you have to. So far it hasn't

been something that we've, or that I would think is really necessary at the moment. - Thank you for the great presentation. The problem spaces that you identified here are problem spaces that I've traditionally seen addressed with password managers, right? Phishing and passwords being shitty. What is the, how do you see those two, how do you see this working with password managers? What is the strategic relationship you envision with this product? Yeah, that's a great question. Because they do solve some similar problems often, especially if you look at like autofill and relying on your password manager to make sure it's only autofilling on the right domain. So it's certainly possible to-- Imagine when, and some people have talked about this actually, it's not necessarily my

idea, is to have sort of a password manager, but instead of storing passwords, it stores your private keys for a web, and then communicates over WebAuthn. And the advantage there is then, you still don't have to have this symmetric authentication with passwords, you can actually have more usable public key authentication. So servers, there's less risk of a breach of a server, because now the only thing there is the public keys. So that's certainly one path that could take to have more adoption in general and maybe be easier to use. I'm not sure how much that'll take off versus just signing in with devices like your phone that you already have. I think there'll probably be

a mix of both. I'm sure someone will create something like you described. And it will also be up to websites to decide do we really care if users are storing keys in hardware versus software. And for many websites the answer is probably no, actually. Like we're just happy that users are storing secure things somewhere instead of not having secure passwords. But there could be some cases where you do want to actually still enforce some sort of hardware security as well.

I have a very similar question to that. My business is very interested in web auth, but our challenge is the portability of the keys between devices. Because we have entire help desks dedicated to just swapping out replacement laptops. I know this is more of a... But what is your personal opinion of how that portability of these keys should be approached? Because I've seen in the past people have said, oh, you know, we'll store it on the device. It'll be hard locked into a TPM or something. And you'll have to re-register. But, you know, we have employees that are probably on, you know, 30 different sites and we can't... We can't have them locked to a device like that. So I was just wondering if you could go even

further into your thoughts on that kind of problem. Yeah. Definitely a difficult problem. There are people doing work on it now. I was reading a cool paper. Someone proposed a scheme to help with this problem, essentially, like you said, where basically you could imagine starting to sign chains of like have the old device, you know, be able to sign for the new device and maybe like have that preregistered or some scheme like that so the keys can still stay on each device, but you could get some sort of chain of trust. I'm not like describing this in like great detail, obviously, or exactly how it would work, but people are thinking about things like that

to help with it. Personally, I guess I don't have a strong opinion on like keys have to be, you know, in hardware to have like improvements over traditional password security. But at the same time, I do like when keys are stored in hardware because then they're less vulnerable to things like malware, which is a bit outside the main threat model I've been talking about. But it is still something that's quite nice to protect your credentials as much as you can. So personally, I'm definitely willing to have a bit of usability cost to have my keys on something like a YubiKey where they're hard to extract. But at an ecosystem level, then I would hope we

would be able to develop some sort of solutions like where maybe you can chain these devices together somehow or have it be a bit easier to automatically re-register a new device using an old device. Is it okay if I have one more question? Yeah. I've seen so many times, the first iteration of 2FA is often the SMS. And I also often see websites fall back, say, oh, you don't have your key, you don't have this other fancy thing. We'll just have SMS. I know it's out of scope, but... What is the response to how common just fallback to SMS is when it comes to WebAuthn? - Yeah, I think it's really the fact that it's still new enough that SMS just has a broader usability kind of

support out there. Like we have a phone, the cellular network will get a text to the user and that's going to work, whereas maybe their phone just has no way to talk to a WebAuthn authenticator that they have. So, it's a big challenge right now to kind of develop things that are both usable in all of these cases and secure. So, my ideal would certainly be that you can enforce WebAuthn only, U2F only. It just takes more work to get there to make sure any users who say opt into that are aware of like what it means for what devices they can log in on. And also hopefully pushing all of the hardware vendors or browsers that don't yet support it to

support it so it's not a problem. So WebAuthn will be fairly ubiquitous and you won't need to have some sort of fallback like that.

It's a quick question really. Is it possible to identify if the website is using U2F or WebRN? Because probably hardware tokens are now available for both. Yes, so it's possible just by looking at basically the JavaScript APIs they're using. Now in practice, given how JavaScript code is minified, etc., I'm I'm trying to think if there's a really quick way you could immediately see it. But it's definitely visible to the user because it's code running on their browser. And oftentimes, you can look in the network responses and see what the challenge format looks like or something like that. Or just search for, are they using U2F or navigator.credentials.git. The other way to test it is just-- actually, the easier

way, I guess, is in Firefox. You can toggle U2F on, WebAuthn on, WebAuthn off, U2F off, and see what works. But you might have to do some user agent, you know, spoofing too, depending on the site. I don't know if that means I should stop. No, that actually means I have a question. Oh, excellent. Yeah. By the same token, you've got about five minutes. So is there a way... I'm noob on this. Would it be a good thing if you could have your hardware token but also have a way to back up so if your hardware token gets destroyed, you can go to your vault and rewrite a new hardware token with the correct data? Yeah, that's, I think that's definitely one reasonable solution to the problem with the

caveat that like now there's a new area to attack on that hardware device, right? If someone, if there's some way to back up the keys, malware maybe can also do that. You know, so how do you prevent that? You know, maybe there's solutions here of like putting the device in a certain sort of like mode that like unlocks this functionality. And then, or like pairing them device to device. A device could support that where you buy two, it automatically pairs itself in some way. And then after that you can't pair it again or something like that. So I think there are ways to work around some of the security implications. But in general, that's definitely

one possibility, yeah.

I recently read a write-up of Google's advanced protection. And then they have like a titanium version of that or something. Anyway, I think the advanced one requires that you have two tokens. And now that I know the difference between WebAuthn and the older UTF, I went and checked and they're actually supporting UTF, the original FIDO spec, but with Bluetooth so that it doesn't do the YubiKey NFC. Anyhow, I guess sort of a generic question. Do you have thoughts on Google's approach to that and that they're not supporting WebFN, it looks like, for that higher level account security and Yeah, just don't know. Yeah, no, I think it's great. Google advanced account protection, that is kind

of a U2F only mode. I think did kind of make some good trade offs around the usability issues, like forcing users to register a key that is gonna be usable by their iOS device because it supports Bluetooth. And Google doing the work to add that support into their apps on the mobile phones to undertake the U2F protocol from a mobile device. So overall, I think it's a pretty good approach. Clearly, I would like to see it move to WebAuthn to get more browser support on desktop, just generally for all the reasons I've mentioned. I think when it was released, WebAuthn wasn't ready to be used for this sort of thing. U2F was still really the only thing to use for that sort of thing. So that is why

it started there. I think also if you look on like WebAuthn versus U2F support on mobile, there are more existing like mobile code libraries support right now for U2F. It's a little easier to do U2F on mobile than WebAuthn right now, but I think that'll change pretty soon. So I would hope they're moving towards that, WebAuthn, yeah. Cool, I think we're out of time for questions. So thanks so much. I'll be around as well if you have more thoughts. Thanks.

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪

We're going to talk here at Guardians of GitHub with Daleep and Josh. Just to get us started, good afternoon. Welcome to B-Sides Las Vegas. Track 1, 2, 3, 4. This talk is Guardians of GitHub, given by Daleep and Josh. A few announcements before we begin. We'd like to thank our sponsors, especially our inner circle sponsor, Rapid7, and our seller sponsors, Amazon, Oath, and Semele, as well as another one at Random. This is the part I mess up. I'm sorry, I do not know. Oh, on the bag. Humio is another sponsor at Random. Thank you so much. It's their support along with our other sponsors, donors, and volunteers that make this event possible. These talks are being streamed live, so please, as a courtesy of our

speakers and to everyone on the livestream, please turn off your cell phone ringers. That's the camera back there. And if you have questions, use the microphone so that those on the livestream can hear. And with that, let's get started. Cool, thank you. So the talk for today is Guardians at GitHub, and we'll be diving into a few of the security, I say assurance controls we saw that are missing in some of the default GitHub accounts. But a brief introduction before we get things kicked off here. Myself and Dilip. Hi, myself, Dilip. I'm an application security engineer. I work on many projects that revolve around different areas of security. And I'm more passionate about vulnerability management and solving or designing security solutions. Thanks for welcoming me to the B-Sides Las

Vegas. And this is my first time here. Yeah. And so Dilip and I are both part of Copart. Dilip focuses on the application security side. I'm hitting up the overall InfoSec function, so definitely not as technical as Dilip. So he'll be answering some of the questions you guys got probably towards the tail end of things there. To better describe the problem we're trying to solve is right now if you have GitHub, particularly you have the non-Enterprise version, there's some core functionality, security functionalities that are actually missing from that. And you can get those from getting the GitHub Enterprise version, but MFA is one of the things that we absolutely had to have. I mean, there's some things you can probably give on when it comes to credential scanning and

some other pieces, but you have to have MFA, at least from our use cases that we had internally. Also, on-prem is a strong nice to have that you don't have that option unless you have enterprise. And then also, probably maybe the kicker, for the number of users we had within our environment, it was about $100,000 a year in order to be able to maintain GitHub Enterprise. So this is the problem that we were trying to solve, and I think we have a few others that were probably trying to have the same struggles as well. To give a little bit more color to this, one issue we came across was with credential management, where developers and

the like would be embedding plain text credentials and also access keys right into the code. Also, leakage of intellectual property, where a lot of times that we'd have our developers not being malicious, but sometimes just forgetful or just not even practicing good data hygiene, they would open up the repo and it'd be open to the entire public. So that was another risk we had. I think there's a lot more breaches for this in the past year or so, particularly around Amazon S3. But I think there's a few issues. I think Apple had an issue with the GitHub, if I remember right, maybe about six months ago, and there's probably going to be a few more

in the next few months or so. And then also, how do you actually secure access to GitHub itself? So this is where that multi-factor authentication piece comes into play. Talking about one credential management, as I mentioned before, a lot of developers, not being malicious, they don't have a lot of solutions instead of embedding the credentials, username and passwords and plain text into the repositories. So that was one of the biggest problems. I think we saw a lot of movement in this space because we developed this probably about a year and a half ago. In the past year and a half now, you probably see a lot more options for this if you want to pay.

But we want to solve this problem as well. How do you ensure that the credentials are actually secured? So you can make sure that maybe they're encrypted, but how do you know that you're not using a different solution or not using your authorized one? So this was another problem we were looking to solve too. And last but not least, we have a team of several hundred developers. So how do you ensure that that each developer is actually having, sometimes they have their own rules. They may have the safer non-prod, another safer like prod, or developers have their own solutions, right? If you're in a lot of, you've grown a lot from mergers and acquisitions, they

have their own products and solutions in place too. So you have to be able to come up with some customized reporting. That was one of the other issues we had with an IT group that's about 500 people strong. And leakage of intellectual property. As I already kind of touched on before, but right now there's tons of repos that are being exposed to the internet right now. And they're leaking everything from credentials to access your environments, but also any type of intellectual property as well. Depending on your organization, this may be more pertinent than the credentials. It just kind of depends. I think I already kind of touched on that point. There really was no great

solution to secure it by default. In some cases, we come with some features that allow you to click a button, and you can do it for like, and sure, no public repos at all, but that's not really manageable for a lot of organizations. You may have some repos that have to be public. So you don't really have, I'll call it a manageable solution for most companies. and insecure access to GitHub accounts. We were finding a lot of issues with password reuse, and this is where we were actually able to use GitHub Guardian as well to help us out with that piece, and particularly when it comes to MFA. That was one of the biggest pieces,

but also on API key management. We had some service accounts that aren't able to use MFA. How do you actually ensure that your developer is actually properly managing their keys, that you're actually rotating out maybe on an annual basis or whatever policies you guys stand up internally? And managing of stale accounts, I think I already kind of touched on that piece too. So there's a few solutions that are available. The first one I talked about was you can pay $100K a year for a GitHub enterprise. I don't know about you guys, but I think I'd rather spend my money somewhere else if you've got $100K. There's plenty of other products or even people you can

probably use there. Third, Second, third party solutions. We saw, like, especially in the last year, there's a lot more players in this space that are able to solve some of these problems, but not all of them are doing all of them. Like, we found a lot of players, and you can just Google them too, that are doing, like, credential management for GitHub, and they're charging, like, $2 a user. I mean, I thought that was actually kind of expensive for just doing that one function. And there's tons of more players in this space just in the past few months to a year or so. And then third is to develop our own in-house solution. So we

went with number three. So the solution that was developed, GHG, for legal reasons, we had to change our name multiple times, but we settled on GHG when things were all settled in. With GHG, we're able to ensure that no plain text credentials are available in code. And right now, as soon as a developer actually does make a commit and actually has plain text credentials into that code, we actually can send an alert. We're working some improvements right there, but now this has been really helpful for us. In fact, when we first started this, we found so many, we found quite a number of plain text credentials. We had to turn this off before we can

get everyone else into place, and then we were able to start actively monitoring for that piece. Private repositories. I already beat this one to death about talking about ensure repositories are not publicly available, right? And then enforce multi-factor authentication and this kind of goes back to the other piece because if you actually are familiar right now You can enforce multi-factor authentication, but you have to do it for all the accounts I'd say it wasn't manageable multi-factor authentication So this is one of the other problems that we're looking to solve with ghg as well so to recap Multi the features we included within ghd so you can see like everything that we're including is multi-factor authentication

credential scanning for both username and password