← All talks

Keith Gilbert Security BSides Boston 2013 - Malformity

BSides Boston33:36233 viewsPublished 2013-06Watch on YouTube ↗
Mentioned in this talk
About this talk
"Malformity" with Keith Gilbert at Security BSides Boston 2013 in Cambridge, MA All video links are available at http://www.bsidesboston.org and http://bit.ly/BSidesBOS Twitter: https://twitter.com/bsidesboston Website, Biographies & Agenda: http://www.bsidesboston.org http://www.securitybsides.com/w/page/12194141/BSidesBoston http://bit.ly/BSidesBOS Don't forget to follow us on Twitter at @bsidesboston or tweet to us about event using #bsidesbos Video created and edited by Peter Larson (c) 2013 http://vimeo.com/user4206417 Posted by Roy of Security BSides Boston 2013 Team
Show transcript [en]

so presumably if you're here you are all um potentially interested in conformity this is the uh the first talk uh today that uses or is focused primarily on using multigo there's another one

in forensics and a response i started out in dot gov got out of there pretty quick went to uh msi sag which is a multi-state information sharing and analysis center some of you may be familiar with the the other various isacs or several different industries and now i am a member of the verizon risk team i'm very happy that gabriel's important season is over for the time and then primary author and maintainer of malformia which is what i'm presenting today most important piece on this slide all opinions are my own do not reflect or represent those of my current or previous employers quick overview i'm just going to do a basic what is about 40.

before i start that how many people here have used or use on a regular basis multigo okay pretty good how many of you are familiar with canary great so uh also a few notes multigo produced by turbo the uh the guys at the turbo are great and it's a great product and it is not all that expensive considering how uh how expensive a lot of commercial products you get so if you're using it in commercial capacity please buy a license canary is developed and maintained by nadeem duba if you ever run to him at any conferences feel free to buy him a treat a special thanks out to uh oday including his entity packs which we'll

get to later in malformity and uh like i mentioned david wrestler has to talk after this one on using maltino in the enterprise so malformity is a local transform package for maltigo developed using the canary framework to assist in gathering data about malware and much infrastructure but what does that actually mean so starting from the beginning a bunch of you already know photos that helped montego is an open source information gathering tool a lot of people jump straight to an open source intelligence tool but i try not to use that because collecting data does not automatically turn it into intelligence if you're not doing analysis on it it is not intelligence so this tool can help you collect it can

present it in such a way that you can do analysis on it and in that way it may help you produce intelligence based on the data collected you can collect this data on several things out of the box websites companies uh network infrastructure social media accounts uh a vast array of standard features it's cross-platform customizable you can run it on windows linux osx and as you'll see can be customized and then it provides a very nice gui that will give you a graphical representation of the links between the data that you collect so when you generate things like this in certain companies management may get really excited about that there are a few building blocks within

multigo that everybody needs to to understand for the rest of this the first are entities basic things like a company ipaddress domain it would be those things over on the right each of those is the basic building block represents essentially one piece of data on the graph so transforms are then used and applied to entities so this will take in one piece of data do something to it and return another piece of data in some cases they may not return a new entity it may alter the current entity common ones if you have a domain pull back go out variety methods grab an email address associated with that domain if you have an ip address

grab all known domains associated with that etc etc and then a newer aspect are machines which are used to run multiple transforms at one time so i like to think of these as kind of moving closer down the stack instead of pulling every piece of information separately and maybe doing some analysis to tie them together you can kind of use machines to to skip if you have a common task that you do a lot for instance say you want to go from an ip to a domain and then to all the ips previously associated with that domain instead of having to essentially run two transforms you can run them in a chain do it both at once

and you're going to generate more data points on your graph quicker now these come in in two basic flavors and that's macro which is a one-time run you can create a machine select one or more entities and then run it once or they have timed machines which are good for automatically updating data so for instance if you're pulling somebody's twitter feed you can create a timed machine and then on a regular basis pull any updates to throw in

so a little bit more in-depth for the transforms you have local and remote transforms mouth already using the canary framework by default is a local transform package now there are prototypes of each naturally they are somewhat inverse of each other pros for the local transforms you get big control over what's run if there's an error you can change it or edit it immediately on your box they're machine specific so if you're using a service that is limited to originating from a specific ip or is otherwise restricted you can tie it to local aspects of the boxes being run on and they're language independent as with as with the system uh operating system independence the local transformers can be running uh

python ruby uh pearl basically anything you need cons if you have a package with a lot of dependencies every one of those dependencies has to be installed on every box the transformers are installed on which leads to conversion control issues down the road with updating which then can then lead to missing features if the transform in the package is updated and not everybody using that package has updated it you then have some people that may not be running the the most up-to-date feature set and then data sensitivity with the local transforms if you're using something that uses an api then that api key is going to have to be on every box that the transforms are on so

in some instances this obviously isn't isn't ideal same thing goes if you're using something that requires a any type of credentials you may not want to store them on every user box that may be using this and then kind of ops it for the remote transforms a lot easier anybody can configure the remote transform server and then automatically automatically finds all the transforms and the user basically doesn't have to worry about them which results in universal updating you change it on the server once it automatically gets pushed out to all of the clients and then there are some features on their remote transforms that aren't supported for local transformers in my opinion the biggest one that causes the

most issues is limiting the number of results that you get from a transform if you're running a local transform this most for me most often happens when i'm pulling past the dns if i use an ip that is a hosting provider and i pull past the dns my draft dies you're going to get 10 000 results and you basically have to wait till it populates hope that you can select all the results and leave them all if you're running a remote transform now multego has a built-in capability to support a limit so i could say i only want 200 results max or for this transform i only want 20 results cons um different type of datasets city

if you don't own the server the transform server um or the network you're on is going out over the internet um everything that you are looking up is getting sent to the transform server so if if the entity itself is sensitive you would not want to send that out to the transform server if you don't own it single point of failure the server goes down there could be potential issues with connectivity for updating or if you have a new client coming on with grabbing the transforms and then integration and control is harder if you're using a service that is rate limited then coming from a single api or sorry a single ip for that one api and you have

a bunch of people using it you're going to run into quick issues quicker than if you were running a local transform i know buyers total does this especially if you are using a community account it's a four per minute limit if you're running four people off of one ip from uh from one server then you're going to have a hard time doing firestorm analysis so then a little bit more in depth with machines besides being a one-time run or recurring timed run you also have serial and parallel machines and essentially this allows you to run multiple lines of execution with the same machine so with a serial uh say i had an input entity address

it would first run all of this block it would run the first transform the the second transform and you would end up with our resulting entity now there's also going to be entities returned in here from each of these uh the biggest thing for machines is that you have to make sure whatever transform it's going into can accept the entity returned from the previous transform now so like i said in this example these are going to be run and then it's going to go back to the original input and run this stack now with the parallel it's going to run both transform a and b and then on those results it will run transform d so the

big catch with that is you have to make sure transform d is applicable to the resulting entities from both a and b if that does not if transform d isn't applicable to what's returned from amv you can run in issues although multiple's pretty good when you're writing when you're editing machines in the editor it will tell you if it's invalid now this doesn't necessarily mean if you have like i have some transforms in my formula that will return more than one type of entity now there are also filters in machines so if you have something returning more than one entity you can say okay throw this one away keep just this one allows you to

customize it a little bit more and get different lines of analysis moving on to canary is a framework for transform development if you start developing transforms or other types of entities or machines directly in maltego if you don't do it in the gui is not fun they use a lot of xml and doing doing the development on large scale outside of the gui basically becomes the pain they provide some tools to deal with it but it's still pretty uh pretty labor intensive compared to using canary it's cross-platform that way it works on everything that moltega works on it does support remote transforms now malforming is currently a local transform package one of the goals you'll see later is to convert some of

the local transforms to remote transforms others basically won't work as a remote transformer multiple language support so you write whatever you want and uh what does that mean simplify development distribution and installation if i have local transforms that i want other people to be able to use and they're not using something like canary or canary then every one of those transformers has to be installed into each baltigo client with canary i can run one command and they all get installed at once along with any entities and machines in the package so it's a one-stop process and basically that means malformity and uh there were several several other projects using it now at this point um the original that was the

predecessor to canary was fledigo not sure if anybody has heard of that also developed by nadeem and that was presented at defcon a couple of years ago so moving on to malformity itself the name malware transforms and excuse that was prior to including machines so that uh that's what's stuck it's a collection of transforms to assist in conducting the analysis and currently these are probably the best maybe one or two others but these are mostly the ones represented the biggest build out at this point is buyer's total um just because their api is really easy to use and they return really useful data uh also the isd passing dns there are quite a few transforms

the other is like the the teen time room hash check basically it'll check and what it's not going to return anything but it's going to update the entity to tell you whether or not it was detected in the camera database and then um things like malcolm bi check for expert they can pull back additional uh information if the samples are in their databases this is uh the portion i was talking about the canary simplifies install basically you get clone project uh create a directory or change into the directory and run the python install that will take care of all the dependencies there aren't too many at this point i think there's actually more for canary than there are for malformity

right now one one hitch with the dependencies is that previously the newest version of canary could be automatically downloaded and solved but that is not currently available through the normal easy install methods so you would have to also get clone canary from any of every photo and i have a link for that at the end of the preview very important point if you're setting this up for the first time and you have not used multigo on your system you must start multego and initialize it before installing malformity otherwise um the the environment that it sets up is not going to be initialized and the install will not work once that's done basically this is it

canary install package malformity and all the transforms entities etc will be applied to your your install if you have been using more than one turbo product a little prompt will come up and say which one do you want to install it to and then something like this will show up on the command prompt and within malformery you'll then have or within montego you'll have access to various transforms and entities that you can view with either of these buttons in the top control bar and if you see those then it is successfully installed no questions though or the either uh on the malware app every cache doesn't use so the virustotal ones can accept anything that are still accepts which is

uh basically um it's most of it is source dependent uh by default i try to return md5 values just because those are the ones accepted everywhere right now the virus level transforms are returning the default 56 i think however i was told there is a way to force the api to return md5s instead of those so i'll probably be uh implementing that on those so that it's consistent but the entity itself um can accept them so some example use cases the first one individual hash look up uh depending on your daily day job maybe any various number of reasons that you'd like to get some more information about any particular binary so just an example

in this particular one an easy button may kind of be all transforms and then that'll just go uh run them all the problem occasionally that you're going to get is if there's a lot of results you may not know which results in which portal if you care i prefer to go source by source um and then occasionally if if i know there's a good chance that one of them is going to return a lot of entities i will try something else from that source first just to see if it's present and if it's present in that source i will then run all of them uh all the transforms from that source so in this case

this hash was present on fredericksburg and example goes from a hash entity to two different types of resulting entities you have a url and a phrase now these phrases are actually mutexes i don't yet have a mutex ht so those are what i was able to successfully pull from the threat expert report now looking at these we don't necessarily know yeah it's an oip and probably most businesses don't want dynamic dns services and the mutexes look kind of funny uh in this case there was some other information available on the report that i don't have transforms for at this point so that was the only source that hash was present in so if i go directly to that

site i can also take a look real quick and see that the registry keys match some of the mutexes probably don't watch extreme wrap running in your organization and give you some more information to go look for now the biggest thing here is that cut down i got this hash initially if i don't have an internal method of doing it i don't then have to go search virustotal and go search malware and then go search stride expert i can run them all real quick here figure out if it's present in any of them and if it is it can go directly to the one that's presented so basically a big time saver case two uh

for a network uh piece of information in this case the domain i'm gonna run and see uh the two domain transforms i have for virustotal at the moment are using their newly introduced pasadenas feature in this particular case this one is scraping the web results so you do not need an api key to use that one the domain searched you do need a private api key it's because it uses the uh the api search functions so if i search that domain uh these are the results i get you know right off the bat then yeah if there's that many samples in virustotal communicating with that domain there's a good chance it's bad but we're going to go ahead and confirm

want to know what it's recognized as what is detected as so i'm going to select because there were so many results i wouldn't normally select all of them in this case because there's a good chance they're related and just it's going to kind of clutter if i select all of the results to run but if i go to malware name down at the bottom and this uses the publicly available api the community api so it's limited to uh for per minute but we did that so that you didn't have to have a paid api key in order to use that in general i try to like to keep anything free that i can if it's a

function only available in the paid version then obviously that's what i have to use but if it's something that's available in both i try to go with the free one so that more more users can use it and so probably doesn't require a deep dive most of the detections you see are very general but the the middle two there uh good chance is some kind of adware gaming type of type of malware so if i wanted to get some additional indicators to look for i know that i have this depending on how you got it you may not have them off hand running one more transform on the same or selecting other random state hashes i've got

the user agents reporters communicating on what i can address at the time of those reports and the url request and in this case the different samples we're using different user agents so there's a couple others on the other side

use case 3 my personal favorite um and the one i i use this more or use the score the most is uh threat tracking and mapping so basically assume we start in this case with one verified domain um you know a lot of times that's probably not communication you'll probably start with more or you may start with something that's unknown so i'm going to take that domain and real quick uh use the opacity and s to see what ip addresses it was associated with in the past in this case we have only two that some are limited but results nonetheless not sure if anybody in the room recognizes that top level domain one thing that was or that uh

i can't really see it on this screen but this right here real faint outline our gates of uh first last or first scene lesson for that id and that domain however if i take those two ips and i continue you can occasionally get carried away this is primarily just past the dns results starting from that first domain a lot of people may say wow that's really cool uh we know all the itunes and domains used by whoever was using that original domain however you should not assume results from any of your transforms already fallable especially when dealing with uh with network type information it should be an inner process when doing that type of analysis basically

building other data sources is one thing that i would highly recommend so that that graph was was primarily ips and domains now you're going to get a lot more uh i guess corroborated links when you start building in things like email addresses that have registered those domains when you start building in malware samples that are known to communicate with those domains and then groups of those malwares nipples they then share similar characteristics so you're going to get more defined groupings as you build more different entities in the biggest thing is watching out for false positives if you get an entity that returns a ton of results especially with dns or other network data you're going to get sick holes on the

graph guaranteed can you put the date range parameters on the passwords that is one thing that i have it's not i have a script that does it it's not currently a transform um i want to uh that the primary issue is it's not done sending it out to uh the request right yeah so it's going to have to pull them in it's basically to increase processing because you're going to bump on and then based on what clients only select them but um that is something that people asked about before and i do have on the roadmap it's something that um like i said i have a script where i just need to port it over to a transformer

you're going to get parking pages especially common with a lot of malware if you all of a sudden get multiple um either types of hour an hour that's attributed to different sources uh anything like that that all points to the same ip make sure you do some research on that id because there's a good chance that it's going to come up to a center in dspark campaign or it's going to be a google ip or a microsoft id and those are ones that they may be interesting to know as far as if a specific group or entity uses the same parking ip over and over but for the purposes of your graph and generating more information and further

research it's going to make that hurt and then there are other random abnormalities that just once a while something will show up that doesn't really apply a lot of times you'll need to weed out results that you want or don't want again that goes back into being being an iterative process so future development plans right now it's uh primarily transforms i have a list of machines that i'm working on that i want to get running um to automate common common analysis steps and that kind of like i said earlier moves one step closer to generating intelligence versus just data um however it still does require analysis it's not going to be a one-stop answer building out additional host based and

local network transforms the one big area that i want to move into with with this package is actually doing binary analysis and graphing the results in multego so one example that i'm probably going to move or start with and i guess on this front is something like iu which is a python package for some basic static analysis of binaries give it a hash or a file name essentially go out run whatever particular commands using the pyu on the binary and then returning the results on the graph i don't know if that's going to turn out beneficial i theorize my theories that with certain families or groups of malware you would be able to see correlation among a lot of the

information returned and that's one of the areas that i really want you to move into and look at however with those host-based ones obviously you have to have binary available those are not good candidates for remote transformers just because of what they're doing uh along the same lines integrating it with um with other analysis tools specific to malware analysis could be another route i would like to include more vendor tools and apis biggest catch with that is if i don't have access to it then it's hard to develop for it so especially if i get a request from somebody i am more than happy to develop the transforms for whatever product or api they are using the catchers they

have to give me access to it because otherwise unless the particular vendor has extremely good documentation and they are willing to test it then there's not much i can do and then building out the web sources uh new sources if anybody has suggestions for new online sources for malware or network for intelligence that are widely available shooting my way would be very grateful i want to get this applicable to as wide an audience and as useful as possible so as long as i can gather more sources i have a couple and a hopper that um i'm working on like uh instead of just i see past the dns there's firestorm right now um also have

a transform testing for circle the malware or the luxembourg cert they have a passive dns service that you can get access to if you uh if you apply for it but it's not paid access so more people have access um and then biggest thing community suggestions you know i'm gonna keep going and things that i would probably use but i would love to develop based on what the community and users of malformity are going to use that makes it most useful i get more testers the biggest thing a lot of the web scripting ones uh if this happened recently i had um malwr malware primed by the guys that put up cuckoo i had a bunch of transforms for that and

they just changed their web interface and obviously kills them all um that i found that out before prepping for my last presentation uh so that kind of put a kink in things i hadn't used them for a couple weeks and that was the time period changed so the more feedback i get the better is usually things get fixed faster happened last week um there's a few buyers total ones fire skulls changing up some of their web interface and there was a couple there that that stopped working um got those fixed right off like i said if people are using it and send that feedback my way it's easier for me to to get it fixed and pushed out

the links i was talking about the git repo for mount harmony maltillo if you haven't used it before canary the project forums are a great place to go if you have any questions about any projects using canary are trying to develop for canary and have questions or are having issues with install basically it's a chime supposed to be a one-stop shop so there's a malformity forum within those forums to post any questions or github issues are great emails are great basically i'll accept it whichever way you want to send it and i had a quick post here on my blog about installing malformity a quick walkthrough to try to hit some of the important points that

can be somewhat blocked from the install i hope people have the questions these are the various methods to contact me or look for additional information off hand does anybody have any anything maybe you think would be a good candidate to include that is not already in malformity yes that would be i got it as long as um the basic the biggest thing with the the web sources is if they randomize their urls for grabbing reports it makes things more difficult however if there is a search function and it's still doable um using uh mechanize and few other python libraries so i think you can easily search in url query right so that should be still uh still doable

without any issues any others curious um about i've had i think one request for some reputation services and i think that's where kind of the behind a room hashtag falls out of that and the bit9 file advisor is is a a yes or no answer from a reputation service worthwhile this basically if i want to submit something to ip void or urlvoid is that are those results interesting okay and that's one thing i was um i was curious about as far as people usually get being one stop you know if they want to get information and just immediate results um that is good to know some people fitting out a big graph yeah yeah definitely

any questions comments feel free like i said reach out uh any method up here and uh if anybody has any questions feel free to come up and talk to me after give you a couple minutes of your time back if you enjoy this lovely afternoon