
so next we have xxc for dummies this will be Bryan Meyers who's been working in security for five years and in software development long enough for his resume to include a Borland and Netscape he's written books and articles on Windows programming and in his current job as director of information security at WebMD health services Bryan guides multiple teams building SAS web application housing HIPPA housing HIPAA related data data and he also serves on the CSIS industry advisory board at Western Oregon University please give him a round of applause thank you all for coming let's see is this positioned can you all hear me all right great I'm warning those of you in the back that although I have tried to keep the
font size large there are a few screens that are like constant views of a console window for the demos and you might want to be a little closer to read those up to you the title xxe for dummies is a bit of a carry over this I'm not assuming you're all dummies this was designed this is an extract from an internal training I developed at WebMD well let's see if we can get there oh yeah this slide you've heard that I worked at some companies right in 2017 the OWASP top 10 list was changed and some new vulnerabilities came up one of those those are the ones in yellow on the right and the did I just change over
there we go okay and that's the one I'm going to talk about so this was given two teams of software developers who had just noticed this change and wanted to know what it was about that's the goal I wanted my dotnet teams to leave the room understanding what xxe was how it could be exploited what danger it represented and have some idea how to defend against it those are the goals okay so I'm gonna start kind of slow from basics cuz I wanted people to be able to follow along and be sure they got what the risk was and how we got to that point the first question is what are entities we're going to end
working up to understanding what XML external entities are we're starting with just entities entities you've seen before probably they're in HTML they exist there too these are some common ones they are characterized by being short strings that begin with an ampersand and end with a semicolon and their point is that the browser when it sees those will render them as something else it'll make a replacement and these are standard ones that are predefined and XML has the same ones these same strings work exactly in in XML as they do in HTML in addition XML and now it starts to get interesting XML lets you define your own entities so on this page there's a bit of XML
beginning bang entity that's a standard thing that can appear in the document type definition section of an XML document the DTD section the DTD section is can either be a section within a single XML document or it can be an external DTD file that has the definitions in it that is referred to in your XML document that'll be important later so bang entity is the beginning of an entity definition XML that small blue string there is the name of the entity I am defining in this statement and what follows the green string in quotation marks is the definition of my entity and the way that works is that in the XML document I can refer to my entity
beginning with an ampersand and ending with a semicolon and the XML parser will do what a browser did and it'll take my little text macro and replace it with the string that I provided as the definition so I think at this point we have a demo I taped them because I only have 20 minutes for this and I couldn't afford for anything to go wrong but this is just running on my machine or was when I taped it it's a simple MVC file NBC app and I've taped pasted some XML into it it was really simple XML is all markup except for the string foo fo o foo and when I pressed parse this page
passed the XML snippet back to the back end where an XML parser in dotnet went through it to extract anything that was just text and wasn't markup and it passed that back to the web page which is why it says foo at the bottom I'm use this page a lot so I hope that's clear that's just a simple setup so let's look at what that what happens if you define an actual entity here and now I put in the mark-up that defines the same entity I showed on a slide earlier the entity name is XML I think I should be probably running this maybe it'll highlight for you yep okay so there is the reference to the entity ampersand
XML semicolon that's the expression that's gonna be expanded here's the definition of that expression XML and the long string and then when I press parse what I get back is not ampersand XML semicolon it's the definition of the macro I gave should be pretty straightforward at this point all right that's what an entity is and a user-defined entity and now let's look what we look at what an external entity is remember the name of this vulnerability is XML external entity so it's external entities where the risk comes in XML lets you do things like this with entities and it's not the full definition but you can see I've imagined markup where there are three nodes each
called chapter each chapter node contains an entity and I'll define one of them for you there's a definition of one of those three entities the chapter 1 entity and it introduces a new keyword that I didn't show you before called system so the definition of this goes bang entity chapter 1 chapter 1 is the name of my entity and then the system keyword which tells the parser that what follows isn't just a string to put in there it's an external reference to a resource it's a URI so what the parser is going to do is go try to find that resource it will look for a file named chapter 1 dot txt find the contents of that file and
that those contents will be the definition of the macro so I would pull in this what this markup would actually do is concatenate the text of all 3 chapters and I'll show an example of that so here is a notepad file it contains the first couple of paragraphs of Anna Karenina it's just a file in my file system it's in the same directory as the code files for that XML validator page I keep using so it served up from the same web server that those pages are using as a test I am pasting in here a reference to that file I've chosen to represent it as a URL this time to localhost to the same place the page is
being served from there's the full definition of my external entity using the system keyword there is a reference to that macro to that entity an ampersand chapter 1 and when I press pop parse I get the text from the file okay so that I hope is clear everything is just the way X everything I've shown you is just the way XML is intended to work and I'm gonna kind of summarize it with this there are three things interacting and what I've been showing you a user a web page and on the back end somewhere an XML parser the user me entered some XML in the web page the XML contains an external entity reference so when the
page passes that back to the parser the parser attempts to resolve that and looks for the external resource it reads a file then it takes the contents of that file and sends it through the web page back to the user what could go wrong okay I'll show you what could go wrong have a look at this markup it's very very similar I've changed the name of the macro to phishing instead of chapter one and I have changed the name of the file I've asked for two windows slash system in E and if I press parse which I think I'll do here in a minute I get the contents of the windows system in E file and if you aren't a windows
person this is like Etsy password it's kind of a standard well-known system file with some private information that has nothing to do with a web server and absolutely should not be accessible to a hacker and should certainly have not been served up in this webpage so you have now seen an XML extend identity vulnerability demonstrated I'm going to do a better job than that that's just a start I was trying to make it easy so you see how it works this is a very contrived example and it's contrived in the following way the thing that sort of cheats is that this webpage I have designed returns back to you the parsed value on the webpage most
web pages that receive XML or more likely most web services that receive XML parse it on the back end but don't return it to you they'll do something with it and then they'll give you back like an error code or a success code so you might very well assume it would be reasonable to assume given what I've told you so far that any web service or webpage that isn't kind enough to pass back the parsed value for you is not vulnerable but that turns out not to be true so now let's look at a real exploit and what I'm gonna set up slowly is this I'm here in advance is what we're aiming for very similar set up same with its
three three things interacting the user puts some X amount some attack XML that contains an X X E into the web page sends it back to the parser the parser goes and looks up whatever was referenced and sends it to another server so it doesn't matter what it returns back in the web page where I execute the attack I set up another server which I've called evil comm in this slide and get the parser to resolve a reference in the course of which it sends me the contents of another file that's what we're leading up to okay let's look at how you would do that this little string at the bottom is the basic logic it won't work it's not the attack
but just trying to show you basically what we're trying to do I want the parser to do sort of what this suggests that is look up the contents of the system any file put those contents on a query string for a web page on my server and send it to me that way what's written here would actually only send the name of the file the string I got to do something else I've got to put that in an entity so that the parser will go look up the contents for me and in fact I have to put evil.com in an entity too so that it'll try to look up a web file on evil comm that has the query string I
want in it and this way I have on the screen here is close to the attack it doesn't quite work in the way you expect because that reference to an entity inside the definition of an entity doesn't produce the result you hope for when the DTD section where this all occurs is in the body of the file you end up having to create a separate DTD file and of course how can the attacker pass that in all you have is that one little box on the webform there's no place to put a separate file so the DTD file that has part of your attack has to be up on the web server the evil calm web server and your attack
XML has to ask for that to be resolved some magic happens I'm not actually going to be able to explain in the time I have every single thing that leads me to this solution but I will show you what the solution is and it is only using things you already understand about XML it's using them in clever ways so this is the external DTD file that I'm going to put on the evil comm server it's got a nested set of entity definitions in but the core one in the middle is absolutely recognizable that's the evil comm payload with an entity that can expand to the contents of the file I want to steal and then the
other part this is what I will actually put into the page as the attack refers to the DTD file that's the read macro and gets that resolved the parser will actually pull the DTD file off of the evil server back to the backend where the processing is happening and so whatever was in that becomes part of the definitions here it all resolves nicely trust me I will show you that it works and there's nothing in that except nested and separated cleverness and extend definitions of external entities great here's the last spectacular demo I hope it's spectacular it's fun so we're looking at the evil comm server it's running in a on kali linux and a vm on
my machine there is the contents of the evil DTD file in the same directory as a default HTML file so i am going to start up the HTTP server it's a python script and now let's go over to the browser and convince ourselves that the evil.com server is up and running working the way you'd expect there's the URL with instead of saying evil comm it's actually got the IP address for this VM and it served up the default dot HTML so great the server is working more importantly over you can see that the Python script that is the HTTP server is showing the request it got a request for default dot HTML that's important that you'll see
requests it gets echoed over here and we'll look at what those requests are okay here is the attack XML this is the same thing I showed you on the previous slide it's going to refer to the system dot any file that's the one I want to steal it's got a reference to eat through the DTD file on the evil server there and it's got the send macro at the bottom which is actually defined in the DTD file not here in this snippet I press parse yeah I get some text back who cares that's not the important thing we said we're going to look at our server where we've had some more traffic there is the request for evil dot DTD
there is the request for default dot HTML with the payload on the query string that string is the payload and that was the contents of the system dot any file and I think that was clear I've heard a few gasps I'm happy but just to be really sure what I did there was over on this in the the validator page I crafted to some HTML but put in it the name of a file I chose that I hoped existed on a target server and I passed that into the target server which very kindly blithely for me looked up the contents of that file and sent it to my server so if you didn't if the demos
went too quick there's a summary of what that was and why it's dangerous [Applause] that was the climax the rest is just you know winding down some obvious things to ask how can you tell if you are vulnerable what you need to do first is ask are you parsing XML obviously if you're not parsing XML no problem but there are some other questions to ask too if you are parsing XML does your parser allow DTD sections if it does then is it also resolving external entities these are separable questions and I'll say more about that in a minute if you are allowing DTD files and if you are allowing external entities whether you're using them or not in your own
code if the parser you are using will allow them then you have to ask are you validating untrusted input just like what sequel injection the question is are you validating what you get do you know that the external entity definitions you might receive are safe validating those can be painful and so it's a good thing to turn off those other features if you can how do you defend against this well don't use XML if you have the choice if what you need if what you need to describe in markup can be described in something simpler like JSON do that that's a good defense I'm sorry I didn't mean to page there we go second if you have to use XML be sure
you have a current version of your parser because parser companies got better at it and they put in some safety features in modern versions that I'll say more about if you have a modern XML parser then see if it has DTD support disabled and if it doesn't disable it if you can if you don't need DTD if you do need DTD and it is enabled then the next question is can you separately disable the ability to resolve external entities because there are lots of uses for DTD that have no use for external entities you may very well be able to disable that one feature and be safe if you actually need XML external entities then
you have to validate your input some parsers like the dotnet one let you attach an object to the parser whose job is to resolve external URLs and you can then implement a policy that says you know I'll only resolve things if I recognize the string or if it's part of a sub domain or something like that I was writing this for a.net audience so here's a little more dotnet detail most of the online examples are from open source tax so it was fun for my team to see this actually demonstrated in the stack they use on the right are a list of dotnet objects that are capable of parsing XML and information about whether each one is safe by default and
if it is not in what version it became safe by default in general the answer - safe by default version is 4.5.2 which came out in 2014 so if you're using modern stuff that's good what does safe by default mean it means that the properties on the parser that allow the dangerous things are set to null or false by default so if you want DTD in a version from 4.5.2 on you it on before that it was on by default in dotnet code this kind of logic looks like this you create an XML document in those first two lines that's something that we can parse XML and it has a property called XML resolver and if you
set that to null then it will not be able to resolve external entities that is good in the second two lines I have created another object in XML reader settings which can also parse XML and then set a property on it to false the property is prohibit DTD if I set that property to false I am allowing DTD which is not enough to know for sure if it's in vulnerable I still have to check to see whether it will resolve external entities but I can't say for sure that it's not vulnerable at that point just giving you a sense of what it's like to read the code closing words some we always put resources here where do you
look for more information and I want to shout out to Tim Morgan who I saw here yesterday I don't expect he's here now we hired him he's a local guy to come and give some security training to our team he's really good and he talked to us about xxe that's where I first learned about it if you start googling for information you'll quickly find his app set us a talk and a paper he published on the topic and I absolutely used these in putting together my work and I'm happy to share these slides that's it there's contact information for me if you have questions are there some way I can be useful