← All talks

PW - Harvesting Passwords from Source Code Repositories - Philippe Paquet

BSides Las Vegas36:5522 viewsPublished 2016-12Watch on YouTube ↗
Mentioned in this talk
Platforms
Service
Vendors
About this talk
PW - Harvesting Passwords from Source Code Repositories - Philippe Paquet Passwords BSidesLV 2015 - Tuscany Hotel - August 05, 2015
Show transcript [en]

good to go good morning so my name is Philip p uh and you at talk harvesting password from source code and code repositories um we have 10 minutes at the end for question and answer uh presentation and tools are going to be available on GitHub I don't have them on GitHub right know but by this weekend I promise you that everything is going to be there so why is this talk I've been stumbling on passwords in source code very very regularly and uh getting some amount of frustration about it and uh because I'm curious by Nature I started to dig in that area and I found a lot more than I was expecting you know I was expecting to find in some

moreos that's just a few pass for some credential and uh what I found was actually um very impressive if you want to own a desktop Network you're going after the domain controller if you want to own the data center go after the code repository everything is there to illustrate the issue um let's see what we can find on GitHub couple of consideration so what you're going to see is that what was done in the last 6 months so because of that if you do the search today you're going to find different things the same result can be replicated in about every kind of public source code repository and uh I specifically avoided to search for generic broad term just because of

the sheer volume uh of information there so to be able to deal with that volume you need tools and we're going to talk about that later so when you're doing just a search on GitHub only the default branch is considered so if someone is branching and putting some password there you're not going to see it if it's a very large pile you're not going to see it uh so there's a lot more that you can find that simple search are going to show one way to search on GitHub without actually using GitHub is just to use search engine like Google you just need to restrict uh the domain to the GitHub R content uh domain and you're going to be

able to search with uh Expressions that are a lot more sophisticated that what GitHub is going to provide you so let's start with U authorize.net authorize.net is a payment processor uh if you look at their sample code uh I've got some sample code there uh you can find some keywords API loging ID transaction key md5 settings so a simple search for transaction key API Ling ID and defy settings actually give you a bit more than 1100 results and that gives you 31 valid set of credentials uh that's kind of a problem if you have your payment processor credential available on the public internet um you could process payment of course but you could maybe process

negative payment or do a purchase and uh find yourself after um Shodan HQ Shodan is an interesting example because it's a tool really geared toward the security Community it's a search engine that allows you to search uh in HTTP headers so what that allows you is to find uh devices on the internet uh identify devices if you want to search for specific brand of r you can use Shan for that because that's geared towards the security Community uh you would expect that the result are a lot better um their sample code as a keyword should an API key so let's search for that and uh we only find 114 results but we still find 15 valid keys

so the result for a tool geared towards the security community are about the same as something very sensitive geared toward the general public um one anecdote is that if you search for that keyword you're also going to find the aing team Shan key because their source code was leaked and put on GitHub Amazon web services Amazon web services as most of you are familiar with it's an infrastructure provider uh what they have as particular is that the with ai ai so if you just search Aki AI you get 100 code result you get two valid keys and you get something uh kind of interesting you get people trying to hide their Amazon key in public Source

Code by breaking the key in part the only problem with the technique is that the characteristic part of the key was kept intact so that very inefficient don't do that um we can check for default password default password are very interesting um most of the time they don't get Chang if you have a default password uh the chances that your user are going to do something about it is very very small unless they have to and you obligating them to do so so if we search in files that have an extension called properties for default pass password we get about 208 code results 28 default password and some of them are going to look very same very

very familiar uh don't do that it's not a good way to uh start the day if you search for default password in yl file 72 c result 23 default password something very similar but you also find some stronger password there the only problem is that there on the public internet so stronger password is not going to help you there a bit more fun let's search for backd door passwords I mean if you're putting a back door in your source code you're probably going to do something special to IDE it so if you search for back door password you still get 16 call result and for back door password it's yes it's it's a lot more

than I was expecting um that just to give you a few example you can expect a lot more uh it's possibly right now to find a key and credential for every major provider uh and that's on public Source repositories if you go to an prise s SC repository because of the sense of security and privacy and people have you're going to find a lot more there if you want to have API key for FedEx ups I don't know maybe you want to ship thing for free uh don't do that please then uh just search for GitHub about every major payment provider you're going to be able to find Keys um Amazon web servic there ands of keys those are actually very

good because they do search public repository for their key to be able to disable them uh so consider that if you're in the business of providing public API if we want to get more from that than a simple search we need a systematic approach and it's actually really simple harvesting password is Trivial what do you need you need a few things you need to search for definition and string assignment and uh you can also search for particular function names you need to do that regular expression it's very easy to develop and you need keyword if you're not familiar with regular expression it's a way to match a specific pattern so that keyword equal password is a specific pattern where you

have one word you have you have some space equal some space and another world the regular expression is going to be able to match that pattern and get you the information when you develop your regular expression you will have to develop that pair language I'm going to give you some examples there uh we're going to go through that uh very quickly because it's not to talk about regular expression and I'm going to give you all of it anyway so uh for JavaScript you have different patterns you can match that with regular expression you can see they are very simple there uh PHP it gets a bit more complex because strings can be using single quote or

double quote so you have to consider all the possibility when you're developing your regular Expressions python same as PHP just consider that you can have uni code strings so because of the uni code strings you need to consider that little us there and for C I'm going to give you an example where we have a function that is usually containing credential keywords so you have your set of regular expression for a good set of languages no you need to have a set of keywords so harvesting keywords is very easy also you choose your target service provider you download sample code and you look at the name of variables programmers usually are going to take the sample code copy and paste

that into their source code they're not going to change the name so it's a very good way to get uh a set of keywords for that so for example Amazon web services uh that a sample code that's a keyword you're going to get that a function call from the sample code and uh that's a regular expression to find that function call so to do that I'm going to give you two tool to automate harvesting uh those are very simple tools it doesn't take that much time to develop the first one is called password search it's going to go through a file system and uh depending on the type of file is going to apply a set of regular

expression and uh use a set of keyword with every regular expression you have there I'm going to provide you regular expression and keywords the second tool is called password call it's very similar except that integrate a web CER so when you have your Enterprise uh GitHub repository that's private you're going to point it at it prush a button and uh you're going to get extraction of all the password we can find or API key no these are auditing tools I am not going to provide a way to go around restriction from GitHub GitHub at abuse restriction so if you point the tool to github.com you're going to have some problem and you will have to modify the tool so but it works

actually really well for your internal Repository know that we talked about uh getting the password there's an interesting part of that talk what is dear to me is how did we get there and what we're going to do about it it's terrible situation to have that amount of credential not only publicly available but in every Enterprise in their source code repository that's not something that should happen there's a few problems that led us to that situation the first problem is CR uh credential insource code is because of the development process if you are developing source code if you are developing a tool if you are developing an application you're going to use an interative process you're

going to start with something relatively simple and there's nothing more simple that is putting arod credential in your application you don't read a file with your credential you just put them there um problem with that is that you're going to push that version even if it's not the final one to your Source repository just for beckup so that's one way where you end up with credential there that's uh that's a problem historical version of credential in a code repository you can find a c a lot of things no that you have coded your credential in the repository um what's going to happen is that you're going to have pressure to put more feature in your

application so what do you do are you going to clean up something that is already working or are you going uh to Simply add the feature that your management is pressing to get you can bet that a certain number of developer will continue working on new future and just leave that for later they believe that they would be able to clean up that code most of the time they are not and 6 months later a year later you end up with a mass of credential in your source code that developer has a good will to clean but they don't have the time and it's not a priority anymore so what are you going to do

about it what is really important is to design a solution and by design a solution I don't mean a technical solution it's more a process on how they are going to deal with a credential if you standardize where they're going to all the credential and exclude that kind of file from your code Repository you're going to be way ahead of everybody else already that will make sure that you don't have code uh in your source code repository you should also consider that you want separation of Duty with separation of credential if you're developing with your production credential that is a really large problems um development will happen with development credential then you move your application to QA and use QA

credential you move it to production with production credential that seems simple but I've seen times and times again people developing with production credential there's a a few problem with that uh if you make a bug and uh you drop your database table and you have production credential you're going to drop your production database there's a lot of good reason that you can give the developer for forsel not to do that it's a safeguard for them uh but you cannot go to them without a full solution and that full solution include involving the IT team or the technical operation team everybody needs to know how that's going to work you go to the developer you educate them you said okay

we're going to put this credential in properties file properties file are going to be excluded from the depot no that's done on the developer side if your technical operation team doesn't know that the credential are going to be in the property fight if they're not expecting that you're going to have miscommunication between the development team and the technical operation team uh that is going to cause frustration and that frustration is going to go uh back to the developer and they're going to B back to Orab bit out coding credential you need to find a pass of least resistance than they can use for the their credential if you're proposing them a pass that is more complicated than art

coding credential they will not use it that's a significant problem and uh you need to have everybody involved not only you need to have everybody involved but the need to understand why the main argument you can propose them is our coding credential is going to make it incredibly difficult to cycle the password they they need to you have an application in production you have our coded credential all over your application and know you need to change a password because it has been compromised or because it's expired uh your technical operation team is not going to be able to do it they're not going to tr the source code trying to find password your developer may not be

available that an application that they finished a year ago are they going to be willing to go back to an year old application and uh look for password to update them probably not and your time for cing a password is going to be gigantic so it's a main argument that you can uh bring them to uh be able to separate their credential from source code another large problem is default password that shouldn't be a problem because default password get changed they get uh you know people install application and of course they change the password um but we know that they don't so the main way to deal with that problem is actually not to use default

password it's a bit more work for your developers but if you can convince them that they should not have default password in their application and they should force the user to create an account when the application is first run you're going to save yourself a lot of UR it's a uh it's a small effort it's not impossible you can do that in every kind of application if you look at a WordPress for example WordPress is going to force you to create an account if it's good for WordPress it's good for you it's good for every kind of application the third problem is a sample code if you working for a large provider that has sample code there uh that's

about you on one side this is your fault sample code get integrated as is developer are going to go copy and paste code in their application I understand you want to sample code to be concise so because you want sample code to be concise you're going to be art coding credential which means that art coded credential are going to end up in application it is your fault there you need to do something about it you should treat sample code AS production code because most of the time your sample code is going to end up in production they're going to copy the code P their credential and it work and they're not going to look back and six months later they're going

to be brid because of that now the the meat of that talk is for me that the root of olil um it is tempting is the only tool you have in your hand is the hammer to see every problem as nail so think about that password are an authentication method that was designed for humans it's something you know and that knowing imply safe storage I have a password in my head I can access it whenever I want but you cannot get it from me you cannot read my S so I have an implied TPM module that is going to protect that credential and allows me to use it when I want that's what we have so we have a

developer that goes to work in the morning he is logging with a password on his computer he logging with a password on the website and is's writing an application uh the problem is that application are not humans application do not provide safe storage if you putting your credential in an application uh you don't have a TPM there maybe you one of the ly that actually do have a TPM for your application but most of the people don't have a TPM in your application so right now your developer are going to use an authentication method that is designed for human with application and uh to me that's a root of ol we're using an authentication method that is not adapt to the

problem I don't have a solution for you but uh with Community here I want to get to a solution maybe we could do something uh with white box cryptography if you don't know white box cryptography the idea is that U I'm going to give a certain key to a tool the tool is going to generate an amount of source code for encryption decryption key verification uh where the key is spread all around the code so that way it's uh very difficult to construct the key there um that may be an idea for a solution but uh so far I didn't come up with anything good you want to authenticate application um if you're thinking about

for example but I can use Keys the problem with keys is that you're not authenticating a particular application application B can use a key application C can use a key as well you're not authenticating that application you authentic the key authenticating the key there so and I'm ring on you guys and uh to come up with a solution that is better for application as long as we're keeping credential human design system in application we will have a problem where the credential are going to be spread all across application across source code ac across infrastructure and uh we will be able to harvest those and use them in a not so good fashion I'm a bit early there do you

have any

questions oh yes

I'm I'm wondering to what extent we can help things by instead of actually making the particular appli authenticating the application authenticating the process or user that the application is running under um you know and uh so for example if you've got a script that does some administrative task on your network um you you have that run under a user and then you've got something where that it where that nonhuman user is um authenticated through something like cvus or some system of that nature okay so the question is um jeffre was wondering if it's not about authenticating the application itself but authenticating a user that the application does work on its behalf uh yes it's possible but it's not always

the case and um usually what happen in Enterprise is that you will create service account or Ro account that are for an application and not for a human with an application doing uh work on its behalf so the problem with that is that if you actually using the user credential you're going to have to store that user credential with the application so you're going to weaken your security infrastructure by doing so so credential is really well protected for humans we are using a password derivation algorithm we're using Central credential store uh and we're looking that done but as soon as you do that you're starting to store credential in the clear somewhere else

hi yeah this um Nick Sullivan from cloud flare uh I'm actually tacking tackling similar problems and pretty much everyone else who's developing uh applications that are distributed and have multiple applications talking to each other are trying to tackle the same thing a couple projects that have come out R um lately that have been kind of in the right direction incl include um kiwis from Square uh there's a project called Vault from hashicorp uh that's it's mostly based on on trying to keep a a central credential store to uh distribute things to application but the the problem is bootstrapping your applications and getting it to the point where you can actually authenticate I'm I'm working on some things that involve

I guess a lot of the things you mentioned there even in your talk in combining I guess TPMS uh white box cryptography and uh sort of secret sharing Shamir type schemes to do this but there there's there's really we haven't hit like the good the right formula to do this especially in kind of containerized applications so um I I guess a lot of people are thinking about this but there's not really a solution and if anybody else is interested in in in working on this yeah talk also talk to me I'm interested in in other people who who who have uh insights into this problem space I would be interested to talk to you one of uh one of the problem is that

uh even if you have a TPM the application need to be able to access that TPM so it uh if the application has a way to access the TPM you can actually write another application that will access the TPM the same way so it's uh it's better but it's not a full solution and as long as you can substitute an application for another you don't have authentication of the application you cannot say okay I am that process working on that machine and it's me uh so the way I'm leaning is trying to authenticate the nature of the application itself there are some differences between humans and uh uh computers uh so a human is not going to

be very good at doing a mathematical transforms a computer can with whiteboard cryptography you can do something similar you can uh you can do a lot of computing that human is not going to uh be able to do to verify that there are still attacks where you can actually lift part of the code and uh reuse that code to authenticate a different application but it's one step in the in that direction if you especially if you do anti-debugging anti-tampering for your application you you have to armor it like if it's your brand uh it's not DRM because you're not um you're you're not uh well no actually I guess so it is the DRM because you

will have to tie the application to a specific uh host if you don't tie the application to a specific host you cannot say that my mail server running there uh that is authenticating it could be a m server running somewhere else so it is DRM it is anti tempering anti debugging and uh a way to authenticate that is different than just giving credential it has to include some kind of uh Computing and processing another question uh I'm sorry but don't use white cryptography it's all broken we don't know how to do it securely uh it will be broken if you use it so don't use white craphy um that's yeah it's it's a road bump there's been a

paper about I know one or two weeks in print and breaks uh everything so I want to talk to you yeah I I want to talk to you that's um for me that's a general direction there has to be some Computing that the application has to be able to do and only that application to identify its nature and that Computing has to be tied to host yeah that's a different problem then but yeah TPM could do that as as long as you do processing on the TPM and you just don't do simple storage and retrieval

that's what I want but not everybody is going to get a GPM so how do we do

that a little different from source code what about configuration files uh in a in a former life I used to have to deal with a lot of router configuration files people would post their config files in order to say you know hey you know I'm having trouble setting up my VPN and here's the config can somebody tell me what's wrong and you get a lot of those that come with uh with uh Secrets sometimes poorly hashed U but um you know in other cases the protocols require that the uh that the uh secret be uh in a in a reversible form and you and you you know they can't contain a hash so the question was what about

configuration file uh actually there's a lot to that uh I consider script as a source code so one thing you're going to find is that there's a significant amount of script used to deploy application that integrate credena uh you will find a I'm going to give you regex for that that's included actually I did not mention script and configuration part specifically but that's one of the places where you're going to find uh a lot of credential you can even uh find credential um because people are using tools like SSH path uh if they want to write a script and they want to connect to a server using SSH um SSH require interaction with a user so

if unless you set up key you're setting up key so a lot of the time you will have a technical operation team that is going to use uh a tool that allows them to provide the password on a common line so that way they can write a script that will deploy their application so I have a specific set of regular expression to catch that and uh you you just treat that as source code because the same people uh are pushing the script to source code repository your technical operation team most likely is going to use the same uh code repository as your development team so just see that as an extension there and it's very very very

rich there's so many things there so that's why I'm saying that if you want to own the data center go to the source code repository if you want to own the desktop Network domain controller is good but the data center you would find every credential you can think of there uh I have just one remark I have been scanning a check internet uh check domains for uh readable configuration files and uh I've did I've done that like twice and I have found of course uh configuration files with credentials to access the database payment providers and roughly it's about 1% of all the domains that they are actually leaking uh configuration data and uh credentials so just one

remark that that's a big number that's a very very very big number so if you consider. which is probably 30 or 40 million demands um that's famili on demand more questions okay so well Round of Applause again to Philip thank you very much and your tools will be out within the end of the week promise promise promise so look him look him up on Twitter and look at the passwords 15 hashtag eventually if you're following me on Twitter I will also of course tweet and retweet uh those tools as soon as they're out I I didn't bring my GitHub keys with me so okay that so that's reason

yeah is so depressing slide yeah the gith with uh yeah your gith with yes so we we are a little bit ahead of schedule as well so we will be back at uh 12 and this time we have two uh speakers PhD students from cone melon University returning once again to P Kong to do a pretty cool and interesting talk and they look really really good both of them uh today with you know bow ties and everything on so uh uh I I will see everyone back at 12