BG - TAPIOCA (TAPIOCA Automated Processing for IOC Analysis) - Ryan Chapman & Moses Schwartz

Name: BG - TAPIOCA (TAPIOCA Automated Processing for IOC Analysis) - Ryan Chapman & Moses Schwartz
Uploaded: 2016-12-14
Duration: 47 min 41 s
Description: BG - TAPIOCA (TAPIOCA Automated Processing for IOC Analysis) - Ryan Chapman & Moses Schwartz Breaking Ground BSidesLV 2015 - Tuscany Hotel - August 05, 2015

BSides Las Vegas47:4112 viewsPublished 2016-12Watch on YouTube ↗

Mentioned in this talk

Tools used

Bacon CRITS pandas Pivote Scriptifier

Service

OpenDNS VirusTotal WhiteScope

Frameworks

scikit-learn TAPIOCA

Languages

Python

About this talk

BG - TAPIOCA (TAPIOCA Automated Processing for IOC Analysis) - Ryan Chapman & Moses Schwartz Breaking Ground BSidesLV 2015 - Tuscany Hotel - August 05, 2015

Show transcript [en]

yeah all right folks thanks for showing up you're in the breaking ground track if you are not looking for the breaking ground track maybe you could go look someplace else uh or stay because I think we got something interesting here uh one last thing uh for anybody who hasn't heard the blue cell phone was uh left in a cab this morning it's turned into hotel security if you're missing your blue cell phone talk to hotel security or if you just would like a blue cell phone maybe go talk to hotel security um I'd like but I'm sure that would never happen uh uh I'd like to welcome uh Ryan J Chapman and MOA Schwarz from the Beil

Corporation they're here to talk to us about their framework for automating stuff put your hands together please thank you you at the bar High you already clapped so we can't we can't stck now all right so I'm Ryan this is Moses I'll learn not to smack my mic as we talk so just to get into it here the agenda for today we're going to talk about who we are where we're from what we do our goals and intent for the talk we're going to discuss the concept of an Intel shop and try to get everyone to move toward being an Intel shop if you're not one already then we're going to go over the process at least that we use to

analyze indicators to compromise and then we're going to talk about our tool cuz SAA and get into that it's our official tool release day yay and then we have discussion wrap up and questions and such okay so my name is Ryan Chapman I work incident response with Beal Corporation I have some degrees and some certifications that really no one but me cares about but the important things are I'm a retro gamer so I have quite a few old retro consoles like if you're into 8 bit 16 bit like yeah let's do all that right my wife hates my collection it's too much stuff I'm a husband and a father I've got a four-year-old daughter who runs around and touches all my

things and does whatever she wants and my wife does the same thing so so I'm Moses Schwarz can everyone hear me we're good uh I work with Ryan ATL we're both on the ins Response Team I've uh got a degree I've got no other acronyms I recently moved over to Industry I was with government National Labs before did a lot of work in industrial control systems and I think that's the end on my slide all right cool next up so what is a Beal anyone here heard of Beal except for the people I work with back there you don't count okay so Beal's one of the largest construction companies in the United States the number flipflop right uh one

of the largest in the world engineering and construction excuse me we build Mega projects when I say mega projects I'm referring to projects in the tens of billions you know potential of dollar value like Mega targets and because of that and because of the things that Beal builds we end up being being a high value Target so we deal in the nuclear sector we deal with in fact we'll we'll show you some of our projects going up I want to go into a whole rant on that but the idea is very simple when you're dealing with government contracts and dealing with things where you have these like nuclear facilities with your name under them under your umbrella you

become a Target okay which is actually kind of cool for uh no I didn't say that but it's actually kind of cool for us because you get to deal with maybe adversaries that other companies don't get to so all right so some about Beal we built the Hoover Dam I'm sure you're all familiar with that right local palver Nuclear Generating Station and this is actually really cool this is a Pueblo chemical agent deconstruction pilot plant what we call pcap so we have all these chemical agents built up for different wars in the past right and because of certain treaties they have to be dism we have to get rid of them right so we take care of those types of

things and then we have ivpa which is these huge collectors uh built from mirrors and we harness the power of the sun yeah renewable energy right so just said a little smid of of the things and projects that we we deal with okay speaking of we our team is comprised of well quite a few number of smaller subsections or arms of our overall Global Security operations and Engineering but primarily I want to talk about these three teams okay and one of the things that I really want to talk about here is what we do but then I also want to get some feedback from the audience so hopefully you're willing to do that for me but we have a security

Operation Center which is a 24x7 incident Response Team okay this team is primarily responsible for incident response within our organization which you may be a little confused because we have the team one down from that which is the computer incident response team and you're thinking well wait a minute instant response is in that title so what's going on there so our team the CT we do the engineering so we stand the tools up and we maintain them so that the sock can use them right that's what we do we're also we provide expertise in the sense of if we have a real like oops situation we jump in and do like incident command for the incident and

stuff like that but our sock handles pretty much 98% of any stents the tickets that come in from our SIM from our log management utilities I'll leave the name out right all those so we also have a TMT a threat management team and what they do is they try to mitigate threats and what I mean is they run the vulnerability scans they perform patching do service count reset password stuff like that our overall goal is to deter detect respond and remediate and one of the best ways to do that is through utilizing intelligence I'm sure you're all hopefully on the bandwagon the intelligence bandwagon right otherwise you wouldn't be in this talk jeez Okay so

do you have these types of teams at your work anyone want to volunteer if you have this type of segregation you do do you have the same three or what do you

have oh you have a cloud team team just did Cloud that's a good thing I kind of like that all right um anyone have an intelligence team like just a team for intelligence vetting and Analysis really you do back there a couple of them huh all right we should talk later then see how that works out all right but how many of and be truthful here how many of you have like five guys who do all the above that's it come on more hands to don't lie all right yeah that's so we're we're lucky in in this regard that we have these teams in this fashion so if you want to talk after about team

formation and how you guys are running through Intel and so on so forth you know we'd be definitely happy to do that so all right our goals and intent this like we'd like this to be an ongoing discussion I'd like feedback from the audience there's a mic right in the middle right there I don't know if we have a floating Wireless one I don't think we do right now but uh if you want to just grab that and if you want to just jump in or just yell something out I'll repeat it for the camera you know that's perfectly fine besides they really want to engage the audience and I I do too we do too so let's do

that so the Intel shop the idea of the Intel shop producers Not Mere consumers we don't just want to consume intelligence take it for face value and say that's cool block it in the firewall and move on we do not want to do that okay how many of you do that you're like I do that right all right so what we want to do is we want to exploit the information two primary ways number one is to derive additional Intel from that so we can further analyze it and the other one is to vet that Intel honestly I probably should have flip-flopped that we want to vet it first and make sure it's even valid intelligence okay

so speaking of vetting Intel does anyone here you don't have to say the names right are you part of intelligence sharing groups like right yeah open Source intelligence sharing groups right can you throw any names out of those groups start exchange cool I was like no no no no shut up Brian keep going next slide next slide all right so when I say pivoting I want to discuss this to it paints a clear picture of what we're dealing with all right so an example of data pivoting if you're like what what are you even talking about say we're given a domain name all right that domain name is an atomic indicator and a lot of people just use it as face value

and say this is the domain block it in the firewall move on no no no no no no check the pass of who is what's the registrant name how many what name server does that use and for that matter if you check a name server associated with a malicious domain and that name server hosts only two other domains you probably want to look into those two domains right that's the concept here so if you have an IP address well how many domains are hosted on that IP address and what registed information did they use and so on and so forth so that's kind of the goal we're going to get into in the discussion of of data pivoting

and when you pivot you're able to hunt throughout your network all right anyone here use log aggregation utilities yeah you want to throw your names out no no one wants to share names whatever what's that elk awesome cool elastic search cool good deal okay before we start getting into how to Pivot through intelligence there's a very important discussion to be had here regarding operational security or opsc okay so I already see you're laughing already you know where I'm going with this okay there's two different types of threat intelligence gathering and if you think of it just think of like standard Recon right you have passive and you have active so the basic difference is passive the threat

actor against whom you're accumulating this data or whatever right they have no idea you're doing it you're checking passive DNS stuff you're not actively making requests out to their infrastructure whereas passive or excuse me active is the exact opposite right you're like hi I'm looking at your stuff okay and we have to be very very careful on how we deal with this whole thing because our operation team might not want us to be putting our names out there like oh we're looking through your stuff yay you know you have certain AP groups and you're like doing all this Recon on them you might attract attention right like oh you have what are you doing and then of course you

also have sharing agreements so like when I asked which sharing or Intel threads you're all using and like no one's like shut up Ryan right in that case that's the same thing maybe they don't want you to actively probe out and let people know that you know you have that information all right and overall just tipping the hat is bad all right it can be very very bad the simple thing is if you have a spear fishing campaign that comes into your organization and it has targeted just your organization and all of a sudden you start pivoting off that data in that Intel they know you're doing that right and so I don't know if you want to do

all that so we can talk about that more after and how you actually what network do you come from right does anyone here do that type of thing you receive intelligence you try to Pivot back and learn more about them all right for those of you that are doing that are you going through tour are you going through your internal Network to make it look like a regular user what are you doing what's that cell card all right that works what's that one more time what sorry the last one oh yeah okay cool lines maybe you have a a non- business line right in your secur operation Center or something yeah anonymizer all right many different

anonymizers out there cool hopefully going through different exit nodes almost every time or something right yeah awesome kill the circuit bring it back up all right so ioc analysis process we're going to talk about some of these Atomic indicators now when I say Atomic indicators I mean just the primary indicator types email domains IPS and file hashes okay so right now you may work with these in certain ways we're going to discuss some of the ways we work with them hopefully you have ideas We join forces and volure on it up here so email we have an email that comes in and quite often people just take like the from email address they pop it in

their not the vendor we use uh their email blocking utility but also say like they they put it in their whatever exchange whatever is sitting in front of exchange right they block it and then they just move on with their life okay but that email has associated with it email addresses from and reply to might be different why are they different go look at that right SMTP servers they're coming from like check the hello like where this really come from where it was purported to come from or somewhere else x- mailers are you guys all familiar with x- mailers what's say the term for an x-mailer like in a web browser what do you refer

to that string what's that yes the user agent string is what the x-mailer is right so it's identifying the mail client like when you visit a website and knows you're coming from a phone because you provide yeah right so the x- mailers you can see if they're using a specific x-mailer well then maybe they're using that for all of their malicious activity they commonly do subjects of the email so on and so forth okay so here's an example of an email that we kind of broke down technically this is screenshot from crits anyone familiar with miter crits project yeah awesome Okay cool so this screenshot is just It's Kind of a Funny screenshot this is Moses sending me an

email it's in reference to our Defcon 23 submission or our I should say rejection for Defcon 23 so I'm just showing here that we have the reply to it does happen to be Moses so I mean you know that's good right and then we have the hello over here so we see it did come from a Google server and then we've got down at the bottom down here hello our iPhone x mailer so I know it came from right all right domains and IPS so when a domain or an IP is provided to us through intelligence gathering or for that matter it's not all just shared intelligence could be a spear fishing campaign that comes in from which we

derive the indicators right we we extract them out okay so I want to find out is this domain reputable or is it not in general okay so some of the things I would do is I would check who is results okay we like to use some mini tools I'm just going to name a couple in these slides domain tools robtech what else are you using Dion I haven't used that I'll check that out now all right anybody else who isy other I'm GNA go check these out now all right that's the point of this conversation who isy was the other reference you also want to check for reputation scores and Blacklist so is this domain a known bad domain all

right so we can check web of trust URL void open DNS right what are the tools you're using for reputation scores

anyone what's that merem merging cool okay and then maare site hits you want to see if they have hits on various malicious software sites of course buyers totals on there Mal code right and the idea there is has this domain been associated with any known bad samples okay all right so anyone have any feedback any additional things that you're doing for domains or IPS in this yeah

yes oh yeah oh yeah to repeat that what you based the feedback we got here was just the note on whenever you're using virus total whatever malware samples you that come down to you just look do hash lookup stuff don't actually submit the sample we we have a very very very strict rule on on getting that approval for that because if it's a targeted sample that's just written for your company and your silly butts like we buyers total then obviously sitting there watching it going oh you're analyzing it yes change what change a bite yeah give them a different hash value right yeah well if you change one bite though you might run into some other problems which we're

going to talk about when we go over the file hashes import hashing and but you're probably familiar with that seem like you are yesal passive total yes I just signed up for an account crowdsource

cool the one reference was passive total so if you haven't checked that out and by the way one of the important things of having these passive databases it alleviates the you making all these queries and reaching out it's already passively obtained by this third party Okay click all right for who is results we have some data pivot Points how we doing on time we're good who is results we have some data pivot Points dates for created updated and expired for domains okay so if you see a domain that comes in that's quote unquote malicious and being leveraged as C2 right command and control infrastructure for like a known AP actor and you look and the site was created

like literally hours prior to you receiving the Intel that's not good but then again if you have one that maybe was created 5 years ago and renewed ever since then what does that probably tell you they yeah they've been act right yeah it's not necessarily saying oh that means it's a good domain it means well it's currently not our good friend of ours so we have name servers the domains associated with the registar uh registrant I should say contact information and then of course you have like names organizations and addresses now a lot of people when I go over this part they're like why would you look at the registering contact names they're going to be fake right well yeah often

times they are fake but these group this is a human error type of thing this is when the threat actors fail essentially right and they do things like they reuse the same fake information so if I'm looking at passive information right and I have this name like uh John Jacob hhk some random name right but that exact same name is associated with four of the domains that were all registered around the same time I'm probably going to need to look into those so it's still definitely valid okay file hashes we're going to get into file hashes now so you have a file hash you run uh whatever you prefer in your shop 5 sha whatever uh you can

get Association to other related samples through signature-based Association is one method so this one it's kind of a with all the generic signatures that the antivirus vendors are using these days it's very difficult to to get a lot of utility out of this so are you all familiar with some of the more generic signature or detections that various AV vendors are using anyone throw one out Trojan do Jin yes yeah if you go just search that on buus to see what you get man yeah dude it's a Trojan just toss it in there uh Trojan do insight mcaf has got their own I can't remember the name right now so when you see one of those generic names like you're not

really going to go look at every other sample that has that name okay but if it does come up with something a little more threat actor related like Dooku or something you're like wait a minute right that's a family from a known attacker well known set of attackers typically and it gives you an idea of where to look next so we also have import hashing anyone here familiar with import hashing yeah you want to explain

it perfect perfect cool for those of you that couldn't hear it basically saying that the Imports the import address table within a portable executable Windows executable in in our case right now it Imports various Windows Library Windows API functions right well if the Mau is just slightly modified like maybe to say to attack one group to the next you're probably importing the same functions you know you're importing your url mon for internet based calls and so on and so forth so when you have that table built up you can hash just the Imports so any of the text space strings that are passed to those as arguments or whatever if that's the only thing that

changes you still can actually find out that the sample related if some of you are get looking at me funny if you have a virus total account uh with virus to like an actual account you know with them an API key Associated account you go into the actual intelligence section you click on your sample and then you'll see by the hashes right so you're going to have your hashes like underneath I think it's the SS deep section right underneath there around it says import hash and if you click on it it'll run this the hash correlation for you or just Al obviously we're using API calls to do the same thing so that's actually a newer thing pretty cool another thing

that's awesome is this uh tool called pivote here by one of our co-workers and he has gone through it's available at the GitHub location here and so what he has done Rick is he's gone through and he's utilized import or excuse me uh section based hashing so sections within the PE so if you're familiar with the resource section the code section the text section things of how the PE itself once it's compiled down uh those things can be hashed themselves so for example say is a okay this might work say there's a net application right it's compiled in C and it has a guey and it uses very like a very specific look to it right that's

all going to be probably in the resource section okay but they changed the code in the what technically would be the text section but the hash for that resource section will be the same so yeah that works yeah I like that I made that up on the spot all right I'll take that I feel good about that so check that out if you want to take a look into that and then of course malware sites we want to run the hash against and this goes right back to the don't upload the sample unless you've had a big discussion about doing so right but you want to look into virus total Mal code what other the sites are

you using for file hash lookup Mal word Mal were yeah you spelled it out cuz you didn't want to pronounce it like I tried yeah yeah yeah Mal I don't know all right yes so Mal word.com what else total hash yep what else anyone use bit nine what two okay two like I use bit9 I do I do awesome and then we have known software Association lists right so the national software reference library the NSL RL acronyms are difficult for me and then white scope which is something that Moses found you want explain why scope oh uh Billy Rios I I forget the name of his company put this together it's a a hash set of known scada software that

he's collected over the years so you can look up hashes and see not not if something is bad but if it's a known uh piece of IC related software that's in the database which can be cute when you're scanning your like General use VLS and you find all this skada software you're like what's going on there dude why are you doing that all right okay tools so who here is pivoting off of intelligence at this point okay what tools are you using to do it when I say tools we talked about resources thus far right so what tools are you using to do that python all right you're just writing your own script you have a

script that just kind of runs through or yeah okay what else it's it's difficult right you have all these different sources of intelligence some of them have apis some of them don't have apis and you kind of want to put it all together but then you run into some problems right so what we're going to talk about right now is Moses is going to talk about right now is a tool that we're going to be releasing in fact I'll turn it over to you man okay we on that so we've got this clever acronym Papa and uh I'll tell you what the tools CAU on the next page but this comes from a lot of the work that we put into doing

this kind of analysis we're a very script driven group like we were just talking you know everyone has their own Python scripts you tie into apis you pull in other tools where you can you you know try to automate this domain and IP vetting process we've got a script we call bacon that our sock is actually using for their vetting we've got tapioca the the original version which was like 40 different scripts that I wrote because we would periodically get these big dumps you know five th000 IP addresses they might be bad what do you do with that you you can't block them so you know I started writing these scripts that do API queries and Pull and virus

total and open DNS and whatever other services you've got put in a big spreadsheet you can start to sort and maybe at least find a few that are interesting that you definitely want to block because they're on you know already known bad lists and it can give you an idea of where to keep looking to hunt and find other information uh scripture is a tool that I put together for our group that's open source on GitHub that was intended to make it easier to take these Python scripts that everyone has with their various dependencies and put it all into a single web server give it a really minimal autogenerated web interface and also expose those scripts via rest API

let me let me jump in there actually went faster to my part than I thought that I would talk all day so let me ask you how many of you what reference right now was a modular platform where you take the scripts that you you you you you you and you create and you pop them on one way based interface and everyone just accesses it okay how many of you have a problem because you don't have that type of setup how many of you have your own scripts and then that guy has his scripts and that girl has her scripts and you try to which version are you writing did you do a clone did you

pull it recently so you get into all these problems right well it's awesome whenever I write something I just go hey Moses can you did the thing it goes on in scripture and I'm like Yay it's right there for everyone to use so yeah it's really really useful to check that out thank you Ryan so the tool is called cassava another name for tapioca it's the same route it's a a python library and the it ties into scripture for the web interface and rest API stuff basically we're just trying to pull together a bunch of apis right now we've got open DNS virus total um a couple system utilities who is in dig wrapped and we are wrapping the tool automator

which goes and pulls from what IP void URL void a whole bunch of other sites to look up domains and IDs it puts all of them into kind of a consistent interface massages the output so it's a little bit easier to work with a lot of these apis you know you've got to know that you've got to look for the the XYZ element in their Json that they return to actually get to the data this simplifies it puts it all in one place and it also stores this in a local database so you've got a a hased storage of all these lookups you've done that'll let you see you know have we done a lookup for this domain before it also

makes it a lot easier to do uh like live analysis because usually you don't want to have to save everything out when you're doing all these calls and then piece it back together when you want to do your analysis it really cleans up your code if you can just leave it the API calls as they are but actually have the results returned from a cache so you don't have to go out and reach out to virus total every single time you want to look at this same file while you're doing development uh this is up on GitHub at our search repo name is casaba let me jump in real quick that cool one good thing about having that cash is for

those of you who have a team of more than one person if you have intelligence that comes in or for that matter just tickets right that one of your malare analysis runtime environments May Fire off whose name you probably know who I'm talking about so they shoot a ticket into your sim and another ticket comes into your sim analysis is going on here and Analysis is going on here but all of a sudden this person runs cassava and then this person runs cassava and says wait a minute that was cashed like two minutes ago and well who's doing that and then you can find out right away that there's a correlation with these new incoming quickly firing events so

that can be very helpful now I mean we we don't actually have that in the interface yet but that's that's where it could be helpful yeah it's not oversell it okay all right T so uh but when putting this together my goal was really just or our goal sorry sorry it was to to make ioc analysis easier to make it simpler to pull these apis together to and not just that but to be able to you know easily get information from a multitude of sources because one may not be reliable if you've got a lot of hits on a lot of sources that something that's suspicious then it's probably worth looking into I'm also really focused on you know

this being a library that can be reused one of my pet peeves in this whole security industry really is a lot of tools are written just as tools and the only way to integrate it into another project is to actually go to a system called to execute it and read in the output and that makes it a pain in the ass to actually build new applications that use those capabilities so I did everything I could to make this you know modular extensible and through the the scripture framework it's got that web interface and the rest API so you know you can use the network networ or do a API calls over the network to build your other tools you

don't even have to import cassaba directly wait wait leave that on the screen believe that you see his rant down there right was he just talking about about the monolithic programs so one of the guys former from our team Chris Brewers his name designed the idea for the tool called bacon and I took it ran with it and I ended up writing this huge monolithic script like freaking so Moses is like what about the talk and Tapioca And like oh just you know decompile you know not decompile just take the the composition the parts you like and just pop them in there so he goes through the bacon Cod for a couple days and then all of a sudden he starts

this rant about I hate people he starts going off it's so horrible it's so hard to Port I'm like all right all right no it's not just you Ryan I mean has anyone ever tried to use wire shark or or t- shark you want to get the dissectors use them as a library like you'd like to pass it a packet and get something back right has anyone ever done that successfully they I I ran t-shark and had it export to XML and then read ran read that in that's the best way I could see to do it and that it just every single tool is written that way so I'm not saying I'm changing it

but I'd like to mention that like anytime possible modules libraries it's really cool so on to what what this web interface actually looks like we've got a couple examples here and then there'll be a demo if we're lucky at the end uh so this is what the the autogenerated script input looks like we've got a number of gooey elements you can put in there you can define a checkboxes radio buttons uh multi-line input and single line input right now I've got just got bsides l.org you hit totally and it looks it up in virus total spits it out as Json the database and then displays it on the web page in an HTML format a

table you can also click the tabs for Json get directly at that you can copy and paste it out of there uh CSV python and a few other formats just this is really just making it easier to integrate this into your workflow I mean a lot of the work we do is you put in a bunch of domains and then you copy and paste the results into a ticket or into another tool so the goal there is just to make this as painless as possible provide all these utility functions that otherwise you'd have to open up Python and you know run yourself or WR a small script to do it it should all just be in

one place there's also the the rest API that I mentioned so if you can see the URL up above we're at SL API verion 2 virus total _ domain it's just the name of the script and then uh apparently that input box is named domains and you just give it a list of domains and it returns Json right there in the web browser you can do this from curl you can do this with other tools can also import it as a library right here I'm doing the exact same example importing cassava doing this virus total domain lookup and then just pulling out a couple fields from there showing yeah it's in there we've got zero positives 63 total scans There's

the peral link to the virus total report uh so we' we've mentioned the database a few times right now the if you go into install cassava it's sqlite which I love for development for production I think postgress is going to be the way to go if you want to actually put this on a server that other people use there's actually a a really solid rationale for that I'm using pyth uh SQL Alchemy to interface with the database which makes it pretty much database agnostic but it means that you can't use most of the nosql solutions postgress is awesome because they actually have a Json data type now so when we start to build more analytics on this database looking at

our previous history you'll be able to just query directly on elements within that Json blob if it's in

postgress I feel almost like we've said everything on this slide yeah pretty much yeah okay so once we've got this data what do you do with it we uh have already mentioned crits it's a an open source project out of miter to help with this ioc analysis you enter in IP addresses domains emails it parses a lot of things out and it has a number of modules they call services that'll do actually the same kind of look up we're doing they'll look it up in virus total um there's an Ops module I I don't know whether Services there are but you can't easily get those Services into other applications again you know back to my

rant so we want to integrate a little better with crits Expos our library as a service to crits but also have some functions to export directly into crits so every time you run a query you know it's cached in the database but it could also automatically be sent to our crits instance and you know put in some kind of susp or looked at once bucket and you can do analysis on that side we didn't want to replicate work it it could be kind of fun to start talking about correlations and you know grouping domains or or other ioc's but really crits is a great tool so we just wanted to integrate with that kind of settle on what's becoming a

standard we're trying to make everything we do talk sticks and related XML formats but there are some other things that we can do with that data it doesn't all have to go into crits and the crit Services aren't necessarily going to get all the information we want some of the stuff we've done is uh you know taking these open API lookups but then also looking into our own logs pivoting from there we've got a a database of see IPS and domains when they hit our Network like down to the second so we will see an IP we'll look it up and all this but then also go take that Tim stamp and go look in our Sim and see what kind of net flow

data we can pull out what protocol was that domain on pull this all into a big table and then we can analyze it you can dump it into Excel and you can try to sort by you know size of files transferred or uh you know results from virus total number of positives I've also been playing a little bit with data science as it were I I don't want to make any uh overly large claims here but python is awesome the uh Panda's library is uh a data frame manipulation Library basically it gives you a mat laab like Matrix and all sorts of transforms and operations you can run on it scikit learn is a collection of uh machine learning

algorithms so it it makes it super easy to whip up a basian classifier or something to try to come up with whether an IP or domain is suspicious based on what we've already seen on our Network like what we've already got in our blacklists now I've got a big dump this is anonymized of data here in a spreadsheet there there is a prediction column there targeted probably good malware turns out the prediction was terrible but but that's the way I I want to go and once you've got this kind of large data set and it's accessible you can really start to do some very powerful analysis on that so if anyone wants to get started

and run this tool it is super easy or I hope it is it it's uh up on pii so you can do pip install cassava I highly highly recommend you set up a virtual environment before you do it because it has a lot of dependencies and really I just recommend virtual environments for all python development uh then run the Run cassavas script with this in config argument it'll dump out a a minimal config file and you can put in your API keys and it'll read that back in and can actually start using it okay it's demo time we'll see how it goes so I mentioned scripture I'm using it as a library in this cassava tool so the

spash page says welcome to scripture that's it might get a little bit confusing but that's the way it is this is the tool that we use uh you know to consolidate scripts and have them all run in the same place so let's see so I'm going to go ahead and go to the tapioca Tab and put in a bunch of domain names this this is pre-prepared these are in the cache so hopefully this is gonna work and summarize output this narrows it down to only a few tables oh look it didn't fail okay that's good thing all right cool uh so here here's this HTML table I apologize for the user interface I know horizontal scrolling is bad but it's a

lot of data we've got let's see if a few Fields pulled from virus total the positives and the total the status is open DNS security rank as well uh and then most of these others are tools that automator is pulling out we're using you can grab that in Json CSV then open that up in Excel python you could paste this into a terminal if you like and a couple of others that are probably less useful track is useful if you happen to be using track for ticketing uh markdown is useful if you want to drop this into your GitHub or something yaml I haven't used in a really long time but someone

might see and you can also access each of these uh parts of the API individually so we can do the Open DNS related domains call put those same things in and uh see they think besides lb.org related Dom s are Bid l.org blue coat.com black hat.com Defcon dog that's that's pretty reasonable apparently wired.com is related to black hat we've also got these active lookups who is dig those are wrapping the system tools that's probably only going to work on Unix systems automator these are all the the sites that automator can access and we access through it like a library uh virus total right now we've got part of the virus total API implemented uh full

op DS API from their uh open source investigate python script uh like I said automator dig who is we'd like to add a lot more domain tools is probably top of the list um every other tool that we mentioned you know I'd like to get it in here be able to go to just one single place whenever you want to query an API because every time before before I would try to write a script to do something I'd go Google you know what what virus total API was I using before and there are 14 of them on GitHub for python only and it's unclear which is the best and they're all a little bit different and half the time I just ended

up writing my own uh rest query you know using the request library because that was easier than dealing with the API hopefully this makes life a little bit easier for everyone you know i' like to see this become something really useful show the tapioca results again H you got to show the tapioca results again oh you want to do the full verbose output no now it don't click do the uh the bsize Las Vegas foret classification oh I want do summarize output okay bides LV

what are you looking for which one's besides see Bit Defender category hacking hacking and no category for the other two which one's bides which which bides of the top one oh it was Defcon then okay Defcon go back over to the right to the Black List defc con's currently on scummer Blacklist I just find that funny if you're familiar with that particular Blacklist you probably know why was just funny right gunware so that's about all we've got if anyone has questions yep you know we we'd love to talk let me uh come back

here we have a microphone uh that's live up in the middle there if you want to ask questions do it

so uh right now there there's no integration with spunk in this tool we're using the the spunk python API if we want to talk to Splunk and writing separate scripts uh does that answer that question basically a road map item is really what we're getting at so right now this everything that you're seeing right now still requires a human element on the back end to take the data and analyze it do something with it right so the goal is once we have this you try to eliminate as much of the human element as you can so that your people are just focus on the right they're not going out and collecting all this information well

right now one of the things they're still doing are they're crafting their own Splunk quers based on that and so once you get to that point where that's automatically generated also and it starts to become prettier like if have you used crits before have you used crits before no okay so in crits when it does the data correlation and shows like oh these are related domains it generates a Splunk query for you you just click the button right so if we could have something like that eventually that'd be fantastic that's it

uh so the question is is is the cach clean enough to import into Splunk um yeah I mean you you could dump it out as Json and put it in Splunk or or any other SIM if you want to be able to query it actually you could drop that right into elastic search that would work perfectly but but you could also you know keep it in postgress and do queries as you like and integrate it through the API there there any number of options it's mod

go so right now we don't have all of the hash stuff implemented that's on the road map that's something I've been developing for our internal use actually so that'll make its way in here and if anyone wants to contribute you just volunteered buddy yeah you

did gotcha okay the tool that I wrote bacon it does take hashes uh but butchering that basically and taking his expertise with the awesome way he does things the right way apparently whatever and put that in there yeah that' be so road map item y there's one in the

back ah so basically data aging out question oh uh aging out not really handled with any depth right now it defaults to uh if a query is over 30 days old redo it you can force it to update but that that's kind of an interesting question something that's you know on a blacklist tomorrow may not be there today and that that part's uh you know up up to the user well let me ask you in your firewalls whatever you're using when you block a domain what are the intervals for your block do you block forever do you block one month do you block three months a year two years

okay so can you actually provide a time frame for expiration like custom based on each

block what do you base that decision on all right yeah got to be so the idea is like say there's a consulant that gets you know hacked quote unquote and all of a sudden they're redirecting users to an exploit kit or something right but I mean you're a big company and you need people to be able to hit that consol's website you know you block it for 3 months but it's cleaned after two days like whoops all right so that yeah and a reminder we have a microphone up here if you want to talk that's so is postrest and for Dev purposes you mentioned sqlite are those primarily your data stores or are you um

involving any graph databases as well given the highly like deep relations in between some of the data uh so right now this is not tightly integrated with much of anything else I'd say we've got telling the truth dude yeah we're going to do a talk so we're like database go that's the truth yeah that's going to be flushed out more to be honest any further questions all right thank you very much gentlemen cool uh please be sure pack it in pack it out I uh I have to pick up anything that you don't so be kind

BG - TAPIOCA (TAPIOCA Automated Processing for IOC Analysis) - Ryan Chapman & Moses Schwartz

Related talks