
you're all set I'm lick good useful word so I did all the research all the research and go still
work good all good you did all the work I didn't work on that s movix I'm talking about how this can work in the offensive world so I really suck at picturing people in their underwear so feel free to get naked i' like to ask the women to come up front Men Please going back I mean to Egypt so we weren't the first to do this maluna security they did this first they uh went through and downloaded over 500,000 repositories of 5,000 repositories 400,000 words in their database so the the only problem with copying that is that they Ed Google code and they did this a lot a lot of it manually we tried uh Wick tried to do
this automatically which sucks when talking about uh a website with catches yes so Google code is not the way to go I am a very lazy individual so this is the only reason why I'm up here because um someone in IRC posted this this link Wick said nice find pass V said I wish I have thought of it I said that's awesome I wish someone had done this with GitHub and then he's like next time I could do it now remember this is 2:00 a.m. in the morning it is so I'm thinking I'm going to hammer this out and then go to bed by 3 so here's the first problem Oh I am half asleep and the only thing the only way I
could think of at this point is to just go through the top rated repositories that they list on their website it changes weekly I can easily SC this problem solved no caption yes so what I did was I took some basic python functions pretty much uh W geted the page pulled out all the usernames and repositories dumped them into an SQL light database and just started cloning them any questions so far understand pretty simple so now I have all these repositories I have no idea what to do with them which is a lot so I took some python OS walked all the python did some said command line F lots of manual review and then got the word list I
wanted the problem is I spent about 17 hours notice how my 30 minute 2m. went to yeah next day with a few naps so I did a whole manual come through set scripts to get this all to work notice I didn't have to do any work so far awesome so the O walk took forever trying to get the counts into SPL database it was taking like 5 seconds per insert query so there had to be a better way to Loop through all these things so I found better walk which claims that um OS walk makes unnecessary API calls while it's looping through all the files and folders it needs to figure out is this a file is
this directory please tell me cuz I'm too late to figure it out also if you see on here the link is it's not very all the links that we're going to be talking about you can take a quick uh picture with your phone and and use the QR code to go those links we swear there's nothing malicious on them because he created yes so there's a nice little graphic chart show the speed up and honestly I didn't really understand what they're talking about but the CH oppressed me so I went hey let's use this instead so now the good news I've got wordless they're interesting people seem to like them I only had what but GitHub claimed was the
top repositories SQL transactions were horrible way too much manual labor and my hard drive was now full we're talking maybe two terabytes at this time like my OS didn't even want to boot how did you that what's your internet uh actually my internet for this is a 756 DSL line that is what I'm working with so I actually used Python and did um threading and downloaded like 10 at a time I started doing 30 at a time but then I couldn't watch my Netflix my kids were pissed my wife was angry she couldn't watch her stories it's just not in here so now let's get some serious so my first problem is how am I really going
going to store this data so my options are bit Casa very cheap it had built-in indexing so that you could easily search it it indexed on the fly at the time this was done 6 months ago it was a Windows only client I crashed it every single time I launched robocopy and it was very slow because it encrypted the traffic as you send it over security features suck yes so my other option was a Nas way expensive and they just don't grow on trees H but it would have been perfect cuz it's Central and local I could fire it up 50 BM start downloading using tour different IP addresses really had fun with this now remember I've already got
three terabytes of data that I'm trying to shift around here so there was my solution that's what my computer looks like there is a CLE and these were all hooked up via external drives I've done my job yes thank you van awesome so back to that's that's just so awesome pictures of that that's literally all the data on each drive 10 terabytes in a database that's anyway so this at 2: in the morning these are what I started with I had them laying around I had another Drive laying around and the other two were once I filled up what I had I was going to stop the project but then thank you to some Nova hackers they donated
extra drive space they really like the idea of what I was doing so now I need a better way to download this stuff here comes API I wish I saw it two in the morning this is after he went and just get cloned everything by scraping yes this is after I spent the 17 hours doing all that so here's all the nice neat information they give you they give you pretty much everything except for the original Fork ql if they give that to you in there then I couldn't find it anywhere so I could keep track of where there was a fork of some other project but s exactly I couldn't tell who was the
master so now I need to get rid of SQL light it takes way too long to get the word list and to get the data in there because it just kind of sucks for solution MySQL now I'm not a programmer I'm sure there's better solutions for trying to save all this data but I'm lazy and I know my SQL now let's put it all together it's still on those yes as of right now it is still mostly on those hard drives we'll get to that later so here's the upgrades I did to my code I merged the modes into one python script so now I have one script it can either download or process repositories uh I've added the multi add
better threading and added the MySQL code um I created a separate script that you'll see later that just takes a list of words and adds the counts for those words into database that was extremely unfun and lots of manual stuff did you try to use sqlite as anent database or was it file system it was a file system okay 3 terab doesn't work in memor oh okay good point so even if I thought of trying that 2 in the morning yeah it will work for like maybe an hour and that would have been it so so the database got the most upgrades I said I want to know everything so instead keeping just the usernames and the repositories I've
started keeping everything I've got account for the directories account for the projects the emails that I manually pull out unfortunately the usernames and passwords and I keep track of the last scene ID so that when I call the API it knows RC all that good stuff lot simpler than my original to download these things so it downloads it clones it and then it dumps all the information into my mySQL database simpler yes here's the processor which looks incredibly but 90% of this was manual before this so now I've got all these repositories I am using a processor to OS walk and grab all these repositories I have then run a script that dumps whatever's in the database
out into frequency wordless and the fun part is I manually have to GP these repositories for keywords like uh password username internal only um email address I have a reg for email address dring so it pull all those out dumps them into files I cut and paste a whole lot of set scripts after that to try and get rid of a whole lot of the unnecessary password password equals space this and try to get the passwords to the front then I do a cut s all that crazy food and then I manually have to clean them up each about a day worth of download takes 3 days to manually go through and clean up all the passwords
that I've repped out so that I can add them to the database so now I'm getting all the public repositories I am now generating my output once I do the manual work and all that stuff I can just press a button and get updated wordless in matter of minutes and I keep track of what hard drive what project is on so now I can plug in all the hard drives externally and then go find what I want hey this what I have that's great bad news is carving out all the data takes forever I am estimating based on the 10 terab I had and the ID numbers for the repositories that I'm guessing is going
to take about 30 terab to get everything but with the amount of GitHub repositories that are added daily and weekly I may never get to the end of these things do you create separate Bo lists depending on how you got the context like this is this is a uh these are like passwords that we found these are email addresses that might be password things like that how like how many wordless did you come up with how many repeat the question can you repeat the question for the recording he asked if I kept track of where or made separate word lists based on how I found the content such as here's an email address that might be a
password and in fact I actually went through and if there's an email address in a password field I just deleted that line because I knew it would show up in my email list and if you really want to use emails for password and you just en force with the email list nothing really need to say here people actually seeing this movie only ones with kids okay cuz last time we did this like nobody nobody knew what this was and that's my favorite part of the movie a bit marble all right so here's some of the lists um Alders is the all of the directories that were found um all files obviously all the files that were found
usernames and things that um were Creed out for users and passwords anyone see anything that is at all surprising on here yeah F Bar is actually up there don't really test for food bar normally but it's there is so that's a we're going to have a link later on into to the word list but that's the link if you want it right now to take a picture of it but I'm going to go faster yeah to keep this really really interesting all these word lists are stored on [Laughter] GitHub all right so pretty graph you know this is where it gets interesting to me all the obvious stuff we list for Force browsing um I got Force browsing
from a web app guy so I don't really know what it means but I know that it's something that I use as a group Force list so that's what kind of answer um so the word lless just like the SVN Digger project originally stated um creating a smaller default passwords list so I love the rocky list for cracking but it sucks for trying to brot force anything um uh static salts uh we'll kind of get into that one later which is awesome the number 22 file in the all files list the thing that shows up the 222 most times is exceptions or exception. PHP how many of you who do web app testing look for
exception. PHP one not after today right and guess what exception. PHP usually handles exceptions are errors and guess what it usually outputs awesome stuff so I started looking for that also I didn't normally look for file.php um just because I'd look for upload.php instead but turns out file.php is more common than upload.php so file.php and password. text in web applications yes please wow yes so just to keep in mind this is of 19 million right so even though it's 4,819 down the list this is a 19 or yeah 19 million words or files I'm sorry so this is how you do Force browsing it's a technical
term yeah ssh1 off Keys number 37,000 stored on the web application thank you ntlm SSO magic I just thought this was funny it's not really high up on the list but I just the file really just has like this a password in it and a [Music] username magic were there any other Crazy Ones you just did yes all right could have would have should have right so it's it's what's about the real world everybody see the uh whole secret token thing about uh GitHub where if you store your secret token I get command execution so there's a great list of where you can find those secret tokens now and then K command execution on unless you change your secret token
which everyone does right everyone changes their secret toking W now so this is where um um I can't pronounce his last name uh Joe or Chon or something like that um sent a thousand emails a thousand some odd emails to all of the U people who had uh secret tokens stored on GitHub that he can find I'm not doing that for anybody and there's the medit module going to be released and blah blah blah blah blah so yes a request what it's already a request yes it's already there so you can it's not it's not committed like every other for request but hey um just saying all right so um love you so start parsing every so the not so obvious
stuff I talked about you know the files and stuff um but the not so obvious stuff is looking U parse every every file from the git history so the great thing about git is when you clone something you get entire history of of that repository there for you to looks through well if you store a password or a key and then you delete it later because you thought oh damn I should have believe my secret token thing and you commit the change it's still in commits and he can look through it it might be on my list so uh another one is MH static code analysis if you just want to find a ton of like you take a PHP analyzer and you
go through every single repository and say oh here's one that has 500 or 5,000 problems um the other great one is finding SBN in get G uh repositories where they store credentials in the SVN for some reason HG HD works as well another one is do settings um so parsing dog ignore what's great about parsing dog ignore is now I know which files you want or are automatically generated by your web application and now I can request those files like configs so things that you don't want stor up there anyways you guys get it um and the the interesting one that I kind of found was on GitHub if you want a directory to stay there there are a
couple of different ways but one of the most common ways is just to put a file called empty directory in there now if I start a web applic or I start a website with your application and you have a bunch of empty directories and you have some security at least on there so that you can um so I get a 404 when I don't know application you have if I look for empty directory file in there and I get nothing or a 200 at least but it's blank now I know that that directory is there now I know what web application you're using and now I can do a lot more same same goes with that DS store but pretty
much every we who's ever used a Mac knows knows how annoying a DS store file is more fun stuff running OCR on all of the image files um we've actually found people who took screenshots of default usernames and passwords because they thought they were cool and and secure like that and uh yeah OCR finds them pretty good um using the list of text files to do some intelligence gathering um and grab all all the email addresses I talk about that stop G ideas that's it now here's a little update since the first time I've given this talk I now have a file server with 16 Drive Bas so now I've got 10 terabytes that I'm trying to slowly move
to a file server with uh I think so far I have 12 terab in it and 11 more slots over so send me your old hard drives yeah no haers donated this thing to him yes um if you have hard drives or or just want to give him money for hard drives to help him with this project please do there awesome stuff on there and I said this at NOA hackers he's like what I won't do that if you if you give him stuff he'll do two queries for you and I actually I actually set up something on my on my web page there's a little form that you can fill out if you have ideas of
something that I'm not gring out and you want to see a self list for it just submit it as a reg X or or you just have an idea hey why aren't you pulling this out cuz I haven't thought of it and I'll start pulling it out for you so have you used these in a real world like have they been useful for you or I mean it's totally awesome the way it is now but I'm just curious if they've been useful for you yeah for web applications especially they've been awesome for me um especially the top 10 uh directories list like a lot of the stuff I just wouldn't have even thought to look for
like the exception. PHP which isn't a directory but it's it's one of the things that just was like oh now I know where all this is and it actually uh resulted in code execution for one time since this is being recorded I'm going to say [Music] no but the the most interesting stuff like it's cool to see what's mostly used but the real interesting stuff is when you scroll down to the bottom of the web page and you start seeing one one one one it's like oh these all look legit no one ever done Mau analysis found a one. exe right there is I haven't all the feds in the room we like oh I found um the email addresses I
have uh in the list for email addresses I pulled out all the domains because spammers aren't getting this list andless they give me lots of money if you're out there um but there is a whole whole lot of references to botnets and command and control code I found some some malware code in it so I've actually had to turn off my malware scanners on the VM that I'm pring this from or it causes all kinds of problems and I'll go back and why do I only have half of this repository yeah looking for licenses to license files license keys not yet I've got them are you looking files not yet no no and actually the static salts I
actually stopped pulling them out because there was so many static salts being used and I wanted to get like I wanted to submit for cfp so I just concentrated on emails usernames and passwords but there is so much stuff to pull out I've had idea to create a web page it's kind of like a the capture well that too but it's kind of like a capture where you'll go to it and you'll get maybe 10 lines of raw fil and say these are supposed to be passwords so I can kind of crowdsource this and kind of get through it because it it takes just me sitting watching TV going through I have got so many shortcuts for
my f keys in Vim that is not been funny say oh this line this line no one uses Vim I use VI I use VI too it's all about the I have also used Nano I am not afraid to say that it's all in the Box I will use it all right cool [Applause] interesting that's it thanks thank you [Applause]
that worked out really well minutes