Crawling Metadata with Recon-ng

Name: Crawling Metadata with Recon-ng
Uploaded: 2014-05-06
Duration: 12 min 20 s
Description: Grant Willcox presents metadamn.py, a Recon-ng plugin that leverages Bing's API to discover and extract metadata from publicly available documents on target websites. The tool automates reconnaissance by identifying operating systems, software versions, user information, and network details embedded

BSides London · 201412:20699 viewsPublished 2014-05Watch on YouTube ↗

Speakers

Grant Willcox

Tags

CategoryTechnical

TopicOSINT Tooling

StyleTalk

Mentioned in this talk

Tools used

Recon-ng

Service

Bing

About this talk

Grant Willcox presents metadamn.py, a Recon-ng plugin that leverages Bing's API to discover and extract metadata from publicly available documents on target websites. The tool automates reconnaissance by identifying operating systems, software versions, user information, and network details embedded in document metadata, while filtering noise irrelevant to penetration testing.

Show original YouTube description

This talk will discuss my plugin for Recon-ng, metadamn.py, that aims to use Bing's API to scrape target sites for documents and download and extract metadata from them. Along the way I will discuss what metadata is, some of the difficulties I experienced, and my future plans for the project.

Show transcript [en]

Last speaker before lunch is France.

Okay. Okay. So, go for it. Okay. Um, so if everyone's sat down, I'll just go ahead and start. So, today I'm going to be talking about uh calling metadata with a tool I plug that I develop like for Reconng. Um, so what is metadata? Does anyone in this room not know what metadata is? Okay, I'll just briefly go over it, but um essentially metadata is any data that can describe the properties or attributes of another object. So say you have like a ball um the metadata for that ball might be oh it's red or if it was manufactured in China or something like that. So essentially it's just data describing other data. So, where can you find metadata? Well,

you can find it pretty much everywhere. Um, there's some really interesting examples. For example, in libraries, libraries use it a lot to actually organize their books in databases. Um, databases used it to organize entries that are within it. Um, there's metadata in books such as the author, the title, uh, where it was published, the year it was published, stuff like that. So because it's such an abstract concept um it comes into play really interestingly because there's many different types of data that metadata can apply to. Um which brings me on to my next slide. So the origins of metad well essentially what I wanted to do the project started out as I wanted to develop a open-

source alternative to folk. Um does anyone in this room not know what folk is? Okay, I'll briefly explain what fauker is. Fauker is essentially um online metadata crawler. So what you can do is you type in the site, right? And if we'll go off and say, "Oh, look, here's all the publicly available documents for this site. Um let's download them, extract the metadata, and then we can start forming a map." So we just from the publicly available documents we can see oh okay um there's like a Windows XP machine running in the uh target network which has this user on it which is running this software and you have these printers in the network and you can

start building a really really accurate map of a target corporation just based off of their public data. Now the problem is um I was going to develop that sort of tool and then I realized there was another tool called metagru pi. Has anyone not heard of that one? Um which essentially did the same sort of thing. So I spoke to Rob and we sort of started talking about what other things can we do. So I decided he and I decided um oh well um there was meant to be an image here. I'll just show you a demo later, but um essentially Rukconng I decided to use it because well it's open source it's modular in nature and I just

decided to use it because um if I want to expand it later on it will allow me to incorporate the features of other modules and there's already features enabled within it such as the bin API which I'm using to query for my results. there's um already a built-in search engine within sorry built-in like method of quering the bin API that I'm able to take advantage of in my tool. So because of these reasons I've decided to sort of move it there so that it can be expanded on later. Um so yeah this is what metadam.py sort of looks like at the moment. So essentially we have um an important option. Uh the important option basically will go through and

basically say okay well um we have all this information in the documents that we have but a lot of it isn't really important. So you got stuff like how many pages are in the document or what the document security is. And for a typical pentest you don't really need to know that. So that will basically filter that information out and only display the information that's important for a pen test. Um the next option is the site option. Basically you set the site. Um you can either specify the www in front of it or leave it out. It's entirely optional. My tool if you can't find the site from this it will try preending www in front of it and see if it can reach

it on that site. Uh next option is the text option. Basically, that will save the output to a file name. If you leave it out, it will just output onto the screen. And lastly, I have the type options. So, there's several different types that I can query for right now. Uh right now, the options are doc, PDF, PowerPoint, or XLS. Alternatively, you can specify the all option, which will just query them successfively one by one. So, I'm going to try give a quick demo and play to them regards this actually works. Uh I have recorded a video which I'll briefly try and

explain minimize the bottom of the LC. Okay. uh door. Okay. So, yeah, essentially I've loaded up the recon here and I don't know, can you guys see that at the back? Okay, so essentially right now I'm just loading up the tool here. Um and we're just going to show the options. So, uh I'm just going to forward the video a little bit cuz there's we've already sort of gone through the options here. Um, and just kind of skip over this somewhat. Okay. Oops. Um, go back a little bit. Sorry about that. Okay. So, here basically I'm running this against homed depot.com um, and querying for the doc files. So, right now it's just going through and

saying, "Hey, we're downloading this file. This is where it's located if it doesn't download or if something's wrong." So you can go back and now it's basically just prints out all of these different documents. You can see we got the operating system. Um if it's an old file format such as the doc file format um there actually is a field within it that specifies the specific operating system that was created on the new XLS file formats don't have that but they do have the application version of the creation software. I'll just pause this for 5 seconds. Um, you can also see other stuff like who last saved the file, the author that created it, um, and sort of like the

last save date and create time. So that's essentially the sort of information that you'll get for my tool. Um, I'll just carry on with the PowerPoint.

So, some of my future plans are at the moment if you have um encrypted PDFs. So, if someone was to encrypt a PDF with a password, um right now I can't. Oops. I can't actually like decrypt the file and extract the metadata without knowing the password itself. So, I am planning to potentially add like some basic dictionary attacks to see if I can actually guess the password. Um, I haven't been asked if this is legal or not. Uh, I might want I might try and look into that to see if it's actually legal before I add it in. Um, other things is I have I'm planning to add additional support for file formats starting with images. Um, a lot of

people have been asking about that and I'm also planning to add threading. Threading at the moment I've got it into the tool but it's run so fast that nothing gets downloaded. Um, so I do need to work on that. If anyone's interested in helping with that, please do let me know. Um, yeah, that is a major feature I'm planning to add. Uh, yeah. So, the final note is the tool is not yet in recon. I tried to ask Tim and then he basically said, well, I have a 3-w week um, religious vacation, so I haven't been able to get back to him on that. Um, he was very interested in the tool and getting it included. Um, but

like I said, plans being made to do so. It's just still being worked out at the moment. Okay, so Robert Mccardi was a great mentor on this. Uh, just want to give a quick shout out to him for that. Uh, the files will be hosted at here. Um, I will be sending out a tweet from you can check tech123 and I'll send out the link on Twitter. Otherwise, if you want to just copy that down quickly, um that will basically be where all the files are hosted until they're included on recon. So, is there any questions? How long did it uh take you to code up? Uh this project it took on and off about

3 months. Um, a lot of it was actually taken from the medical field of pi code um, which was open source. So I was able to sort of copy some of the code across but this was actually um, university research project and then I decided to expand it and port it into econ. Which file formats do you support? File formats at the moment are doc. Sorry, I did try and cover that but I don't know if I went too fast. It's doc um PDFs, XOS formats, and PowerPoints. I was going to support uh rich text format files, but there's not enough metadata to really support that. So, I've taken support out for it. Any other questions? How does it differ from

like an open source search engine like you see or something like that? What do you mean by that? there's a like uh y there's an open source search uh tool that's kind of it just does web uh crawling like how does it differ to something like that is it just collects more metadata or um you're going to have to be a bit more specific on that like it it it does collect metadata but um I just didn't know you come across that uh project ycy no I I just you've seen it. Okay. Different. I imagine it's going to be on these list now. Yeah. Okay. So, it's built in Java and does some similar things. Okay. It's like a

peerto-peer uh open source search engine. Okay. Y cy you see totally going crawled as we do anything else with regards to future file formats we consider implementing sorry can you speak up a bit with regards to future file formats going through archives for example store on sites extracting documents from there archives for example so going through like the archives types of a site. Um, do you mean like the caches results or sometimes caches or uh for example archives of documents on sites? Uh, some some sites put out like zip files or ra files containing a whole library of documents also can be called as well. That be something you consider implementing. So like implementing extracting metadata from archives. Um,

yeah suggested that might be a good idea to try implement. I'll see if I can get that implemented. Anyone else? Okay. Well, thank you very much.

Crawling Metadata with Recon-ng

Related talks