OWASP Amass Beyond Subdomain Enumeration

Name: OWASP Amass Beyond Subdomain Enumeration
Uploaded: 2019-03-24
Duration: 47 min 5 s
Description: Talk Description: Today, large organizations deal with the challenge of running their infrastructure across many networks and namespaces due to the use of cloud and hosting services, legacy environments and acquisitions. This can make it difficult for an organization to maintain visibility of its In

BSidesROC · 201947:051.1K viewsPublished 2019-03Watch on YouTube ↗

Speakers

Jeff Foley Anthony Rhodes

Tags

CategoryTechnical

TopicOSINT OWASP Threat Intel

StyleTalk

Mentioned in this talk

Tools used

OWASP Amass

Platforms

Kali Linux

Frameworks

About this talk

Talk Description: Today, large organizations deal with the challenge of running their infrastructure across many networks and namespaces due to the use of cloud and hosting services, legacy environments and acquisitions. This can make it difficult for an organization to maintain visibility of its Internet-facing assets and an ability to track down systems that pose a risk to its security posture. The OWASP Amass Project attempts to help organizations perform network mapping of its attack surface and better understand how its assets are distributed across the networks of trusted partners. During this talk, contributors to the project will discuss how OWASP Amass takes subdomain enumeration to the next level, providing both attackers and defenders better visibility. Bio: Jeff Foley, Project Leader of the OWASP Amass Project Jeff has spent the last eighteen years as an innovative technologist and technical leader taking on challenges in the area of cyber warfare. He started the Amass project after noticing the need for robust and practical OSINT tools that aid information security professionals in mapping complex networks. Anthony Rhodes, Contributor to the OWASP Amass Project Anthony has over five years of industry experience as a penetration tester, red teamer, and software engineer. He has been following the OWASP Amass Project since its inception and has recently joined as a contributor to help enrich its functionality beyond DNS enumeration and network mapping.

Show transcript [en]

beginning so this talk is about the project Oh wasp amass which is doesn't network reconnaissance and open source intelligence and this this presentation is focused on showing you what this can do and hopefully a little touch of where we're going with it and opening up discussion around that little quick bit about Who I am so I'm the project lead for a wasp amass I'm also the purple team manager for National Grid and I've done this kind of work for quite a while you can find me at these two places on the internet we also have some other places that this project exists like our discord and things like that if you want to if you have ideas that come out of

this discussion and you want to talk with us more about it please feel free to become more part of our community I'm Anthony Rhodes I'm recently a contributor of the amass project I also recently became a senior purple team member National Grid and I have been doing pen testing for I think five or so years and yeah I can be found on Twitter github discord all things

forgive us I think we're gonna be playing with this thing a little bit mm-hmm see how well we juggle it all right so like I said amass is a network reconnaissance tool when this got started I guess you could say it was sub domain enumeration it's definitely grown into more than that now and I hope you'll all see that by the time we're done but what it really I think means for organizations that are using this is it's automating all these techniques that typically are being done manually or with separate tools to map your organization's or any organizations attack surface on the internet or to understand the exposure of an organization on the internet perhaps would be a better way to say it

and what's so powerful about this once you start using it is really it's all the data you end up collecting from it so we try to provide that data in a lot of different formats that make it useful to different audiences or users like one of my favorites is multigo I love being able to do this and then see the whole entire network come come alive in front of me but really the the graph databases are probably the most powerful data source that it creates since there's we're just getting started honestly with what we can do with that data mmm I threw this up here immediately just to show you that unlike some of these

capabilities where you just get a bunch of text coming out and you have to go through the still the work of making sense of all that and deciding what you want to do with it I mean I'm a very visual person and I find it empowering to be able to just see what was discovered and start doing some like exploratory analysis on it to start deciding where I want to dig deeper this is just one of my favorite visualizations came out of a I would say medium size target but I thought it looked kind of nice and some of them do so I threw this slide in here I want I want to talk about in a

lot of ways this this is becoming more again than just sub domain enumeration or just a piece of pen testers job to understand an attack surface so many organizations that I engage with when I ask them do you know what you look like to an attacker their answer it seems like almost always is either no or well maybe not completely or we don't have that all documented but usually the answer is they don't they don't know what they look like if someone like myself or others were to say you're in the crosshairs and if they wanted to and they wanted to put funding behind this it it's getting hard harder to do this with the use of third-party services the

cloud all these different ways that you can be fragmenting your exposure or your attack surface it's it's getting difficult to stay on top of this with all the different teams that are involved in creating exposure and then not documenting it properly and it's causing more companies I think to have to use Osen as part of their almost like asset management program which is interesting but that's one place where I see people practically desperate for this kind of capability but of course it's useful to so many other so many other groups as well which we talked about later but that that's really the the interesting thing that I learned from this project is when I started this

it was it was all for me honestly it was just something that I wasn't getting enough from what was out there I created something that did what I needed to do because I wanted to be able to answer this question for myself is that organization what do they look like do it for me I don't want to sit here and do it all day but I'm learning I'm engaging with organizations and finding out there there's a bigger need for this or larger need for this mmm so one thing I'm not sure I said yet is we've tried to make this presentation so that you're not listening to me the whole time instead you're gonna be seeing what this

can do we have a lot of demonstration video content to show you but I want to talk about first what should you be maybe thinking about as the workflow for using this or that could of course vary depending on how you use it but I'm gonna I guess start with all the way at the what could be the beginning to the end so you can see how this tool so we can handle your entire investigation whatever whatever that means to you mmm and it's spread across at this point in time it's about four different tools not not necessarily excuse me the end of the story but that's what it looks like right now which are these right here

I put a little lost summary here just so that when we're talking about this you'll know why did what are the dividing lines kind of between these tools the the point of net domains is to go from information you could have like an organization's name details about their networks ASNs that they're on or a s is for that forgive me and then come up with domain names because the amass proper tool pretty much just takes domain news and then gives you infrastructure I would argue and then from there a visualization tool we've seen what that can do and the tracker is what allows you to watch what's changing between your observations or enumerations alright so

this is where I gonna show you some

there we go alright so first we're gonna go through net domain so say you're not given a scope you just know you know the organization name you don't know any domains heipiess I assign so loading oh there goes okay okay so for this example we're just we just gave the organization name Utica College with the dashboard flag that gives us the ASN information for any ASN zone then we take the ASN number and pass it through the ASN flag and then that should give us some domains that are associated I think it'll be done soon and then there's a okay there's also who has capabilities so that does I don't know if anyone's familiar with the reverse who is there's

a number of services out there that will actually collect Whois information and then you can search by some of the registration information like the registrant email address or the registrant organization name so what this run of the command is doing is it's calling out to one of those services and then giving you all the domains that are registered by the same person so now I'm showing off just you know to fast so I resolved one of the domain names got an IP address and then to the Whois on it to try to get the entire net block because Utica College owns the entire net block and then you could also use that to you know pass it back into the

program to get more domains based on the IP addresses in that map yeah it really depends on you know what information you're starting with and this is just showing off all the different I guess entry points yeah that's I think that's it for this one okay so now we're moving on to the amass tool

Mack finicky

I like did I play it before all right you said you copied the videos can everyone see this well I can I have a play button here get out of here it takes forever I did this I did this on my laptop yesterday okay yes okay okay so mass has a number of modes you can run it this is the passive mode you just give it a domain name and it'll go out to a bunch of you know sent data sources and pull whatever domains or subdomains that they have for that domain name

so I guess I can talk about some of the sources of holes from spotter hacker target sir Sh

there's like over 15 yeah I think this is yeah it doesn't give you anymore okay yeah and then at the end it gives you just a summary of you know what types of data sources were pulled so you can see like there's some cert archive oh yeah that's a big one we pull from web archives and kind of crawl pages till occur subdomains I guess we'll move on to the active one

okay so the active okay so there's a config file you can give a mass that allows a little more configuration to the tools that you can't find in the command line tool flags so in particular the alterations meant for word flip isn't implemented as a command life I get so but also you can add a bunch of things like API keys because there's a lot of data sources like Cisco umbrella virustotal you can throw in an API key and it will pull sub-domain data out of those so yeah and so for this run i have brute-forcing i'm specifying a custom word list the active mode also reaches out to some of the eyepiece and tries to

pull assert so you give it the port alright and it also does a dns zone transfers and then we're going to navel alterations which we'll explain later and yeah we're gonna do it on OS org so yeah and the the - source command line flag gives you this which data source each particular subdomain came from as he likes to your spotter and then - ipv4 just prints out ipv4 addresses so a big thing a big difference to from the passive compared to the active is that the passive mode doesn't do any DNS resolution so if you want i PS you can't use the passive mode so you can see it does brute-forcing and you know major part of the architecture

is every everything almost you know feeds off of everything else so when a data source finds a new domain subdomain it'll try to resolve it if it resolves then it gets fed back into a number of you know any other the data sources so that they can try to generate new names off of that so like brute forcing for example if you have like a wasp or org you know it'll try route forcing subdomains off of that yeah and as you can see it's pretty fast that's actually slower than we usually see yeah

yeah and then at the end it's like the past a tool a tosya which you know categories they all fall into but it also gives you a lot of the IP m.asam information that it extrapolates from the IPS that it resolves and then we're going back into the slides right

yeah so Anthony covered this a little bit but so we kind of break these functionalities or pieces of the architecture into what we call data sources for some you know most of where we're getting all this but we also have some other techniques like we call it alterations which is uh creating permutations of the names or trying to guess new names and forcing you know there's there's a handful of these things that we we see as generating names that then need to be resolved or we need to query for these names and as he already said a lot of this gets fed back into these same sources or techniques so for instance all the resolved names go back into the web

archive data sources and they all use those as potential new targets for crawling new pages to get more names brute-forcing uses the sub domain proper names to then do a recursive brute-forcing and alterations you know takes all the names actually considers them for making modified versions or to create we even have we'll talk more about that in a minute so we we break the tech techniques into these categories I think maybe DNS could have been up here because really there's so much that we get just from DNS but yeah we're definitely doing scraping we're using R STP is some of which require API keys DLS certificates web archives it's pretty standard but that's what you're

seeing when I tell - what types of sources came up with this information actually I was gonna have Anthony talk about a lot of this interior oak some of it recently yeah so alterations it'll basically look at liquid common patterns in sub domains that amass is numerating and then try to modify it in ways that will generate more results so for example if it sees like test1 dot org it's kind of hard to see but it'll try replacing the number with you know two through ninety nine and then if there's like key words like - prod or - test it'll try switching those with you know other common words like QA and you eighty origin it's

common like stuff it'll also try adding those key words to just normal subdomains like test auto auto will turn into a test - prod dotto West org and then there's also kind of like a fuzzy alterations engine right now the dictionary yeah sorry he's asking if you can specify a dictionary for the key word some variables for the alterations the answer is not yet currently it's hard-coded but it's we we took a lot of lists like common lists from other tools other sources just experience - so yeah there's also an aspect to it where it like tries to you know look at the word and you know switch characters and then just recently we added the predictive

name guessing using Markov models this is where I give it back to Jack whiskey wrote that yeah not that it's a exactly rocket science or anything but as more of these names are discovered it takes the the names and then uses a it breaks it down into engrams so I guess I mean any of you can go dig this up if you're interested it's very similar to how password cracking works where you break it up into pieces and say what characters tend to come after what pieces and start building probability around this and then when the more of this data you collect or training data essentially that you have the more accurate you're guessing could become

and the idea what this address is that some of these other techniques did not is that it finds very similar yet just a little different names like the fuzzy I don't think we have it in here but we have another technique called I call it fuzzy label searches it does a little bit of this where we have a very small edit distance and we just make small changes to the labels and see if we get hits but it doesn't quite do what the Markov models does which allows us to see more like patterns that are showing up and say okay so based on what we're seeing more of give us more of that but with different differences and

let's try it and it tends to hit things that the other techniques just don't so we had it we kept it in there and and obviously with that particular technique the larger the namespace of the target the more you know the more accurate your model becomes and the results get better alright back to Anthony like I said juggling yeah just right before I get into this I did want to mention the the config option that I showed off before I'm in forward flip that's the minimum number of times that a mass has to see a certain keyword after a dash before it times to try it on everything else so if you set it to zero it'll try it on all

new names but if you set it to like - it has to see it twice before it'll try it and that's just a way to reduce the amount of you know DNS queries we're doing yeah so the min recursive flag is you know essentially it just specifies you know how recursive needs to be before ok so a mass stores all data into a database mcgraph database and the massif is tool will then pull that data out and they can generate a visualization in this case I'm doing the d3 visualization which does that mean everyone yeah anyone's familiar with the d3.js library that's what it does and it's gonna be in the few seconds it'll show up there's gonna be a graph that

shows essentially the internal state how it's all stored so you can see here the the big red dot in the middle is the domain you started with yellow dots I think our pointer records at purple that's a s and it goes to met block addresses MX labels there's another domain and the green are subdomains so this is a good way to actually visualize yeah I guess your external attack surface in our experience like sometimes I was maybe as a result of an acquisition there'll be like a huge blob you know kind of off the rest of the graph so this is the easy way to visualize that and also you can see like there's a lot of you know

strands that come out those could be you know different cloud providers okay like I think like one of these was digital ocean another one was Google cozies Google Mail so yeah just really easy way to kind of explore through the data and there's also different output formats you can this js3 Travis tree multigo the one that starts with a G that's written in Java je x f format I can't think of the name of the tool without my head but popular visualization so yeah speaking of the graphics three visualization this is an example this particular visualization has over 30,000 nodes so as you can see it might be very difficult to just manually sift through

that data but looking at it this way you can definitely see patterns and just like how some are related to others I mean most of these other formats they don't do very well rendering 30,000 nodes so graph the street is a rather special service and that it can handle that the other thing is you know to Anthony's point about so why even look at this or what does it do for you it's the structure I would say again something back to what I said in the beginning that could give you leads as to what what here looks interesting to pursue further it might actually be the the smallest pieces you know the pieces that are clearly not the main part of

their network but the dependencies that they have it's it be a lot easier to figure out what those are using something like this than to go through all the data yourself now we're gonna go over the tracker tool so the tracker tool shows off okay so the tracker tool because we you store that data in database in the mass output directory every time you run an enumeration that database is going to be updated with new enumerations and then the tracker tool will be able to the tracker tool will be able to show you the differences between all those enumerations so the default view is just the latest enumeration you did compared to every other one previous so you can

see here we ran 200 organizations and the only differences are a lot of these you know third-party services like I think they use Google Mail and Google Docs and things like that so you can see up color they're all cloud resources their IPs are changing all the time so yeah and it'll show additions and removals to just that there's not a lot of movement on the lost other than those cloud services then there's also a history feature okay and that'll show you the differences between all of the the previous enumerations so you can kind of get an idea of the timeline like how an organization is changing say you run this every week you can get an idea

of how an organization changes this week by week [Music]

yeah so as I mentioned earlier when I first started this all the reasons for doing it we're just driven by myself but quite a community seems to have built around this and people that are using it every day and they're from all different kinds of teams blue teams that are trying to keep track of their exposure you know red teams and pen testers as you can imagine that they're just trying to find more opportunities same thing with bug bounty hunters they keep telling me how all the time this is saving them because now they don't have to do this themselves or write their own scripts to automate using lots of tools and hoping that they can get all the

data to come together but also the OSINT community seems to like this tool investigators threat intelligence you know i without getting into too many details I mean you can imagine hunting down people on the internet can be a lot easier of course if you could actually know what they look like or what they're well for instance I mean with all these tools we've shown you you you can watch the campaign's of operations unfolding I mean you could you can keep watching someone's network where they're performing these activities and you can see what they're doing you can see them setting up for upcoming attacks so I mean I would say it takes a predictive or pre-emptive methods to a whole new

level kind of because it's like having a scowl that is actually watching this whereas if you're only going off of feeds where someone else is saying well we think this is happening or it's likely I think it helps to be able to say well we're seeing it too we're watching the same thing and we can map it to what these feeds are telling us yeah if you would like to play with this you can go to the project page see what you know well what we've done with this and where we're going with it you can go get it at the repo ol wasp amass and there's if you're playing with Callie like so many other people people seem to

find it easier to install this using snap crafts even though snap craft is not I don't think it's installed by default on Callie but it's pretty easy to get it installed and get this installed on your Callie machine so I guess I would say unless you are fine with setting up your own go environment to pull it down snap craft seems like a quicker way to do it that's what we have for you questions it's a good question so the question was how has this tool so we impacted the day to day work that we're doing he specifically requested how is it impacting your work at National Grid right now I'm not in a position to the

answer to your question completely because I don't actually I haven't spoken with National Grid about how they feel discussing those details I can say I could share with you some things I've heard from other organizations that have told me right oh well not exactly or not yet but I would say it I kind of answered the question earlier when I said so many well have any of you noticed that a Adobe recently announced this project called marinus okay which is where they they said they've been working on a project for I think they said two years or something like that where it's addressing the same exact problem which is larger organizations or organizations with large infrastructure that are

making changes just can't wrap their arms around their exposure on the internet and this is just and they included a mass in two Mariners so this tool is really great for addressing that problem I would say there's so many of other organizations out there facing the same challenges where they're trying to use clouds resources more they're using managed services or they're trusting partners to do some of their work for them and it's becoming hard to keep track of where is all this where is it going on is everybody keeping the documentation up you know up to date things like that so for blue teams I would say that that it really helps I mean I kind of look at this like it's

helping everybody it's it's visibility that's what this is at the end of the day this represents an ability to see your attack surface on the internet better regardless of what your motives are for wanting to know you had a question as well so his question was can we use the trackers history information to then visually represent the results and right so for DNS changes anything really I mean whatever caused it to move in cyberspace or or pop up so we we do not right now have the ability to go from tracker changes to visualization and it would be interesting because in order for the graph to remain linked there's there would be pieces that would have to

still be in there that could still be constant or static in order to show the other pieces properly excuse me you couldn't just do it off of the changed nodes or elements so that's an interesting idea it could be done in the future probably just require a little bit more work yeah it's a good question so his question was how much data do we need to start seeing for the Markov models to really kind of kick in and show some results and it's a hard question to answer exactly I don't I haven't done enough work with it to say oh no I know it's it's always 100 or something like that but it definitely needs a larger target for the results to

start popping up like I would say at least 100 to 200 but I mean when you start getting to larger targets where you're looking at over a thousand result bindings it definitely shows its quality more I would just say I'm sure it's more like a curve where the more you can give it the better it can absolutely and some of the other techniques are not dependent on that whatsoever right like some of these other alteration techniques but we threw this one in there really to try to find the leftovers that aren't getting picked up by the other ones and when it did we were pretty happy about that someone else over I'm really glad you asked I

should have mentioned this myself so his question was the enumeration data right like the graph databases he said is there a way to link up the results some with something more centralized that could be then shared with other their capabilities correct all right yeah so I shared a shared database perhaps where others could benefit from your findings so with the way we did these demonstrations today for you the answer would be no because as we showed you all the output was kept in that directory that we showed you what was in it but there's implementation already and it's being expanded right now in the project to use gremlin as an alternative way to communicate the graph

database insertions and updates and things like that which works really nicely with Azure cosmos DB which has been really powerful for just what you're talking about being able to put the data out there and then have anyone else across the globe be able to get it so the short answer would be yes we're working we're looking into what other methods would be beneficial or desirable to do the very best same thing anyone else strike at us

hmm well so since the findings are specific to your target I mean we'd have to keep it that way for them to be to be meaningful or at least I guess it would depend on how similar the target where you collected the data from is to your new target but but there's no reason why we couldn't dump the data as you put it yeah because I'm sorry I didn't really reiterate the question the question was can we like share the Markov model findings for a target with others and we could it wouldn't really be that hard you just have to be careful I guess in where you decide to use it since it could just as easily start giving you

bad guesses as good ones if they're not trained on the data that is relevant to what you're targeting so good questions more this is exactly the kind of feedback I love getting for this project I mean I get some really great ones on Twitter on discord there's been it's really been essential that I get all this feedback because it drives our future directions for this project the bug bounty hunter community has been very useful very helpful since they seem to use this a lot and they they want results I mean they want to make they will they want their time to be well spent and they they find all the problems and they are quick to ask for

more because they but it's good it's great

all right so the question was can this tool or tool suite be used internally at an organization correct yeah and it is not designed to be used that way mm for instance I would say typically like bloodhound would be used for something like that maybe not always but it's a good definitely good tool for this kind of work or task but this has been designed and I think we're keeping the scope that it is only for internet-facing assets like that's the only thing it's looking for or infrastructure well if that's it then I guess we will conclude here thank you [Applause]

OWASP Amass Beyond Subdomain Enumeration

Related talks