OSTrICa – Open Source Threat Intelligence Collector

Name: OSTrICa – Open Source Threat Intelligence Collector
Uploaded: 2016-07-04
Duration: 41 min 15 s
Description: An introduction to threat intelligence fundamentals and the OSTrICa framework, a free, open-source, plugin-oriented tool for automatically collecting and visualizing threat intelligence data from diverse sources. The talk covers attack lifecycles, incident response workflows, and demonstrates how to

BSides London · 201641:155.0K viewsPublished 2016-07Watch on YouTube ↗

Speakers

Roberto Sponchioni

Tags

CategoryTechnical

TopicDFIR Threat Intel Tooling

StyleDemo Talk

Mentioned in this talk

Tools used

BeautifulSoup

Service

VirusTotal

About this talk

An introduction to threat intelligence fundamentals and the OSTrICa framework, a free, open-source, plugin-oriented tool for automatically collecting and visualizing threat intelligence data from diverse sources. The talk covers attack lifecycles, incident response workflows, and demonstrates how to develop custom OSTrICa plugins to correlate indicators of compromise and build threat profiles.

Show original YouTube description

Current approaches to protect sensitive data, such as Intrusion Detection Systems, Anti-Virus programs, traditional Incident Response methodologies by themselves are no longer enough to face today’s relentless threats. Cybercrime used to be a hobby. Now is highly organized, more financially driven and in many cases operating much like legitimate businesses, complete with organizational charts, C-level executives and even human resources departments. Cyber-threat actors are constantly improving their tools, techniques and procedures (TTP) to gain access to valuable companies’ data. According to the “2015 Verizon Data Breach Investigations Report”, in 60% of cases attackers are able to compromise organizations within minutes and 75% of the attacks spread from victim 0 to victim 1 within a day. Cyber-attacks have changed and is extremely important to implement additional levels of protection to identify incidents and malicious events. Organizations need a holistic view of the threat landscape to proactively fight a multitude of new threats that companies can face every day. This is where Threat Intelligence comes into play, whether you are a SOC analyst, an incident responder or a cyber-security analyst; knowing more about attackers’ actions, correlating IoCs (Indicator of Compromise), network traffic patterns and any other collected data can give you a real advantage against cyber-enemies. Unfortunately, not all the companies have enough budget to spend on Threat Intelligence Platform and Programs (TIPP); that’s why OSTrICa has been developed. OSTrICa is a free framework that allows everyone to automatically collect and visualize any sort of threat intelligence data harvested, from both open source and commercial sources, allowing anyone to create a relevant and accurate threat profile based on the information collected. Moreover, OSTrICa is Open Source, plugin-oriented and comes already with a set of plugins capable of collecting highly valuable information regarding suspicious domains, IPs, malware hashes, malware behaviour and much more. If attacks investigation and threat intelligence is within your agenda or one of you top priority, then this talk is for you. In this speech I will describe: • The lifecycle of an attack (including APT - Advanced Persistent Threat) • What is Threat Intelligence and why it is so important and useful for an organization especially during Incident Response operations • How powerful Open Source Threat Intelligence data can be • What is OSTrICa and how important an open source and plugin-oriented framework can be during threat intelligence collection and attack investigation • How OSTrICa works and what can do with the current plugin set • How to develop OSTrICa plugins • Different scenarios where OSTrICa could be used Note: Alpha version of OSTrICa will be released in June 2016 and will be available for download from my GitHub page (https://github.com/Ptr32Void/) with all the current developed plugins.

Show transcript [en]

Thank you very much. So hello everyone. Um today I'm going to talk about Oria which is a an open source threat intelligence collector. So that's the agenda. I'm going to do a very quick introduction to threat intelligence. I'm going to show you a few scenarios where threat intelligence can be used. Going to explain what is uh OTIM. I'm going to show you a demo. Hopefully it's going to work. and I'm going to show you how to develop uh Orica plugins. Uh who am I? I am a senior antimmalware engineer at Semantic in Dublin. Um normally I work u as an independent security researcher doing some I don't know development of tools uh uh malware analysis and um time ago I

worked as a security consultant was doing uh mainly in incident response and uh penetration testing. Those are my contacts and my GitHub page where I will upload Ostrika and uh you can find some of the tools I developed and some of my research. So let's start off with a brief introduction about the big problem that uh lots of companies are having. Data bridges is a very very big issue for everyone for the company because as you can see from this uh very nice graph from 2012 uh up until now there has been lots of data breaches eBay anthem uh even um adult frame finder Ashley Medis hacking team uh the Italian company that uh developed the Galileo malware

And uh the main problem is that when they get breached, lots of information um are stolen and um released to the public. even the source code of this malware that Galileo malware that has been used basically they copied and pasted some of the code uh you know the malware and u that's a very big problem for the companies because they're losing their data they're losing their money but even for us as a customers is a very very very big issue for us because if they steal uh credit card information email uh physical addresses that's a very big problem even for us as a customers and um well for the companies of course they they they lose lots of

money. Um all right so at the moment we are using different uh things to protect our network as um firewall uh ids whatever they works okay they work okay but I think that uh another thing that should be implemented in uh in our network is threat intelligence and according to Gartner trade intelligence is basically Um the information including context mechanism indicators about a specific well about a specific um threat um a new threat or even an old threat. And um with this information, if you can link them together, you can actually um profile the threat and uh try to you can try to proactively protect your network. [Music] Um what can you do with threat

intelligence? So you can do attack investigations. So you can track down cyber criminals. you can provide a context to reconstruct uh the attack quickly. So if you are an incident responder, you can use thread intelligence to um identify new domains that the malware can use or the IPs or any file that has been created by the malware. you can proactively uh block a new attacks and u of course you can prioritize specific alerts uh in your uh uh systems and sorry if you can provide these alerts these um these alerts to your C levels they might be able to uh make uh business decisions like for example they can invest time uh they can invest uh money for training they can

invest uh money for new products and so on. So let's have a look a few scenarios on where uh we can actually use uh threat intelligence. So um the current solution that most companies are using are the AVs, the firewall, ids, log collectors, whatever. um that works fine but this is actually just data we can uh create AV signatures uh we can do everything we want with that but let's suppose that we have our internal data and we we can also grab external data from external sources so if we can merge together the information if we can link together the information. That's what threat intelligence is. You can, for example, identify the attack, for example, a

PDF. From there, you link together the information and you can identify where it came from, for example, a fishing email. From there, you can identify the victims and so on. You can identify for example the kind of data that has been stolen from the victims and sometimes you can even with the help of law enforcement identify um the attackers behind it. But not only that you can even identify the attacker the attacker infrastructure because uh you can see the servers they're using you can see what kind of network they're using. You can see lots of interesting things and some of the um some of their server might be kind of uh vulnerable. So you can identify additional information like

information stolen. Uh you can identify uh interesting data about specific uh gang behind a specific uh attack. Another thing is that trade intelligence can be used to proactively protect your network. What does it mean? Let's suppose that we have one machine in our um corporate environment. It has been breached and we see from u our logs that is uh connecting to a specific server in US sending data. we have another um server in uh South America and still is sending data. So we have this information. Let's suppose that the same uh exact behavior um is related to some external sources. So let's say on VT on VT we find out that this specific behavior is

associated with another let's say behavior where the server located in Russia is uh basically pushing an update to a specific to that specific machine that has been compromised. So if we have this information, we can proactively protect our corporate network by for example, we can add a firewall rule to block a specific update. That's the main uh thing about uh threat intelligence. Uh it's very very powerful. Uh the main problem is that you need to link to get a lots of information. Um and where can you get this information from? For example, uh if we are talking about indicators of compromise, we can talk about MD5s. For example, we can talk about the domains and we can talk about the IP addresses,

the AV detections, the mutx names, file names and so on. So we can grab this information from different sources like free sources where they are actually of course free. They contains generic information. They comes from public systems or honeypotss. We have the community um systems where um they contains generic and some of them specific uh intelligence information from for example um send box sendboxing products. Sometimes they are free, sometimes you need to pay a little fee to get access to these kind of feeds and the commercial one which are very very expensive talking about thousands of thousands of dollars, even millions of dollars, but they have a very specific intelligence. Um they can um basically give information about a

specific tit um they do surveys, they analyze underground marketplaces and of course that's the main problem is very very very expensive. And the other thing is the internal one uh like um our appliances low collectors firewalls AVs um everything and it's a very very specific intelligence related to our um um network it's free of course so what if we can actually grab this information from everyone so I mean internal commercial community and free sources that's the end goal of the tool that I developed that I'm going to release it on my GitHub page. Uh it's called Ostrika. So what is uh Ostrica? Um Ostrica is a plug-in based system where you where you have um plug-in

loader which loads every single plug-in that you have. You can create any plug-in you want. It loads the plug-in. It execute the the stuff I'm going to show you later. And then it can visualize the information. So for example, we have an MD5. We found out an MD5 in our network. Okay. Uh we give it to Austria. That's basically the structure of my tool. You see there is virus total deep vit is um an Italian company that developed some sort of u online sandboxing system. Uh we have safe browsing who is and many others you can develop any kind of plug-in you want to collect the data. So in this case um we say okay what do you know about my MD5?

you say okay that's the MD5 and then the MD5 go go goes through um all the plugins so if the plugins can u collect information about an MD5 it will be executed and it will start collecting the information and then it can show the information to you for example in this case virus total can handle for example MD5 and show 256 and deep bits can handle MD5 five. So both of uh these two plugins will be um executed but safe browsing and they they don't know anything about MD5. So they basically do nothing. Um so for example if we have a domain we give uh the domain to oa and parata will be executed def will not safe browsing

will be executed and uh the hook will not be executed. How does it work? At the moment uh the tool has been developed uh basically in my free time at home when I had some spare time. So I didn't want to pay um a single euro for accessing specific services like a virus total or any other services that I'm currently using. So most of the plugins are using web scraping techniques where I scrape the web page extract information and I collect the information. Only one plug-in is using API which is deep fits because uh was free. So um I basically implemented the API access to deep bits. Once the plugin um has executed, they will produce a JSON report where

the analysts can actually look into it and see um all the information collected and the visualization module which uh creates a visual report and only some information are actually shown because uh I'm collecting loads of information and only some of them I'm I'm showing and um So, let's move here. Uh, why Oriica? Well, first of all, because it's free, is open source. Uh, I'm going to release it soon. I need to fix some stuff. Um, the code of course has not been reviewed by anyone, so it'll probably be full of bugs and whatever. It has a plug-in architecture, so you can fork the project. You can create your own plug-in and you can use it uh basically for

free. It can actually proactively well it cannot at the moment but threat intelligence can be used to proactively protect your um network. So if you write some code maybe you can identify how to proactively protect the company's information by adding specific uh rules and then of course as I said before it produce a JSON format JSON report um containing all the information and then a visual um report which can be used by the sock analysts inside a sock. They can be used by incident responders to identify very quickly what's going on based on a specific uh indicators of compromise and then can be used by attack investigators when you have to uh investigate a specific attack or

identify specific information uh behind a specific attack. So quick information um all the collected uh information are taken from opensource sources. So they are available. I have not haven't used any uh semantic internal data. Of course you can do that if you want in your network. Uh but for this demo uh the information are taken from only from uh um external sources. So I'm going to show you a demo now on how it works.

So let's see

here. That's the directory structure of um Austria. That's the main executable is written in Python. Um there is a report directory which contains the JSON output of um uh the collected intelligence and there is a visual report which contains the HTML

files. Thanks. Um all right. Okay. So to run ostria you just need to where there is a command line interface which is this one at the moment you see no module I don't no module named BS4 whatever because lots of plugins as I said before they are using web scraping I'm using beautiful soup to scrape the web pages and u I installed different other um libraries like a deep fit sandbox and uh who is you don't have them here because I didn't develop all the stuff on my machine on my production machine. Um but yeah, how does it work? You have a help here. I don't know if you can see it. Basically you can provide different

um intelligence information like the IP the MD5 the domain set56 ASN email and then you can basically you do something like let's suppose we have something like this now we have this one let's suppose that this domain has been seen by our firewall logs We do something like doain equal to this automatically oa knows that this this is a specific domain. So if you do run, I'm not running it right now because I am not connected to the internet and I don't have the libraries uh to collect the information. But basically if you do run, it will generate a report in the um report directory like this one for example. Oops, not pin.

Now, so this is the JSON report that the analyst can uh look through it. So you can can see what kind of information has been collected. Let's see page down. Okay. Now, so for example, in this case, the requested intelligence was an MD5. Um the plug-in that has been executed is deep width. So if you scroll up you see all the information about this specific MD5 the connections the UDP connections which well um the DNS lookups if um the DNS has a specific IP related to it. It shows you the AP and um the cool thing is that you're not actually executing the malware. You are relying on an external sources an external source to provide this

information. So you're not doing anything on your network exception made to querying uh specific um uh remote u services like deep fits. In this case I was using uh yeah I was using virus total see because all the information here are related to specific detection. So the AV result is the AV name the detection name and when it was updated. So you see all the information that you can actually uh go through it and then you can identify if there is something some additional um data that you want to extract or collect using oa and uh you have all the information about the behavioral information if there is and the main problem is um as I said

before I'm not using API so I'm scraping the web page uh at the moment moment VT uh they probably found it out because I run lots of queries for testing. They didn't block me. I don't know why but still uh is working quite okay. The main problem with web scraping is that if someone changes the pages of course uh you have the well the output will not be like this will be something I don't know something weird uh because they changed the page so they change maybe some HTML tags and so on. Now the cool part is um so going back to this domain. So let's suppose that we run our [Music] um our tool. If we do something like

graph, it automatically generates a graph based on the information that it has collected previously because first of all you need to provide intelligence. Then you need to run the run command and then you need to generate the graph with graph of course. Um where are the graphs are inside the vitz directory? So let's remove this one. Um in this case for example uh should be this one. I hope so. Yeah. Oops. Sorry. It's this one. So see you we have the domain and then we have the information collected from the different plugins. Like for example, we have here the IP related to this domain. We have the uh URL and then we have the email that has been used to uh register

to specific uh uh domain. So, Ostrica will also uh can identify uh if a specific email has been used to um register other domains. So, you can provide that specific intelligence to OTIC as well and then you have other um TDL information. The colors here are actually um related to the legend we have here. So the original intelligence which is a bigger node is this one and then you have virus total which is the IP address because has been detected by VT. So I'm collecting this information and the domain that has been uh detected by VT. Then in this case we have domain big data which is another external sources I'm using and you see all the

related information to this domain TCP uh TCP IP um no information back from it same for the others um let's move to the second part here so let's suppose that we found out this information from our um network appliances now we see Okay, I want to know something about this MD5 because uh it connected to this IP which is actually related to at this original intelligence. So we do something like this copy and we do something like MD5. Okay. And then we do IP equal to Oops. [Music]

So if we run the tool again, I'm not running it, but then if we it will start collecting the information. So it's going to take a little bit of time because depending uh how long it takes to the server to uh reply back. Then if we do graph again, the previous graph is um overwritten and the output is something like this. It's kind of messy, but you can zoom in. You can move the nodes wherever you want. And you can, for example, let's suppose we want to know something about this specific um thing here, which is URL. We click on that every other node um are hidden now and you can see exactly if that specific

node is uh related to something in this case is related to I suppose the IP address let's see yeah so because basically uh you see here that domain is um a detected domain by VT and is related to this API address. Another cool thing is that for example, if you double click on one node, it will automatically disappear because uh okay, we say is connected only to our IP. I don't want this information because it's just one link. I don't need this information. We can hide that node. We can double click on that and it basically go away. Another very interesting thing here is that I'm going to show you with these other thing here

which is actually let me see. So these are three different MD5s and we if we click on one node like this one for example that's a mutx see I suppose you can see it's a mutx and all of them are sharing the same mutx although this mutx is kind of generic so you cannot really base your detection this mutx but it has it does something so if you try to double click on this. It says, "Okay, so you're trying to remove a node that is actually related to another to an original intelligence that you provided. Are you sure about that? Because maybe you're going to lose the the links that you are that I created for

you. Um, so you can decide to remove it or not remove it. And another thing is that the graph is quite noisy because I'm collecting loads of information from everywhere. So if you add other let's say 10 plugins the graph will be huge. So you can for example search for information like I don't know Trojan crypto lockers it automatically identify the notes related to crypto locker and it hides everything else and shows you only the crypto locker note. So you can see the relation uh between the different nodes or you can for example this

book hide specific information like uh VT information or you can remove or hide all the deep VSTs related information and you show them again. That's the cool thing about this graph because you can filter out nodes that you don't really want to see. Same for the other um other plugins. You can hide the the mutx names or you can hide the domains and things like that. So at this point there is another thing that I implemented like for example see the um the node is kind of bigger if you compare it with this one because if you click on this you see that is related to multiple uh other nodes. So you see okay it's getting

bigger because this information maybe can be useful for you because there are lots of uh other um nodes that are um in relation to this kind of uh IP address and of course you can Oops. Does it work? If you doubleclick on one original intelligence, it will automatically remove all the nodes that are not associated with other original intelligence. So you are removing all the nodes that are not associated with your original intelligence. In this case is still kind of uh noisy. But let's for example remove deep vits. So what do we have here? We have Okay. Okay. We have three specific original intelligence. Two of them are detected as crypto locker.n. One of them is actually not

detected at all, but it is sharing the same mutx name. As I said before, this is kind of a very generic mutx name. So, but it's sharing an IP with another node that has been detected and it's sharing the same probably common and control server um HTTP request. So, you can go back to this one and say, okay, this is not connected. Well, this is not detected, but it's actually connected to other nodes that are detected, other malware that are detected. So at this point you can um for example send the file to the AV company so that they can uh detect it or you can add your own rule to uh remove it. So that's

um how you can proactively protect your network because you didn't it was not detected. Oops. Where is my mouse? Was not detected but has some relation with other um with other notes that are uh actually detected. Okay. So that's the demo. Now I'm going to show you very quickly how to develop the plugins. Um the plugins are inside I suppose here. Yeah, OA plugins for example these are all the plugins that I'm currently using. Um all of them I'm scraping the web page exception made for this one. Um let's take safe browsing. You just need to open it. Okay. It looks kind of complicated. Uh is even uh difficult to explain. Sorry. But you need to

um to basically get the code to understand how it works. I'm going to show you very quickly, but it's not that complicated. Um so the first thing is that you need a specific function that o is calling from uh from outside the the plug-in which is run. In this case run will um execute a specific functions inside the main class which is this one. And what does it do? Basically I'm just collecting the information from safe browsing uh providing specific user agents and things like that. Um the easy thing is that in this case um the response from the server is a is a JSON format. So you don't need to parse it. So it's going to be very very

easy to to add the the JSON output to the to the re to the report. Um you have some information that you need to specify like this one the extension type. So are all the types that that specific plugins can uh can handle like the IP like the domain or like the ASN number and uh while you have u the which means okay uh do you have any visual data that you can represent that you can show to the user? Yes, because there is a class that does that and uh if it's enabled or not. Um in this case um once you collect all the information you just need to send them back to Austria. Uh the structure

of the the data that is sent back is a sort of a dictionary format where you say okay the extraction type is for example domain and uh the intelligence information is all the intelligence that you uh collected through the plug-in. Then you have the data visualization um which checks if the plug-in is actually the one that I'm currently using. There is another class here which is a safe browsing visual. You pass the notes and the edges and the JSON data that you collect from here. So that thing is done automatically by ostria. So when you run graph that this function is uh is called and executed and um okay this class will basically it's kind of complicated to explain here but

if you take a look at the code is kind of easy um you just pass the original output that you got from here and you say okay do I need this information do I need this information yes or no and then you decide if you want to process it or not. Eventually I um I am processing the domains. Um so all the domains that I collected from the from safe browsing I'm passing through each of them. If uh a node is already inside if a domain is already inside our uh node dictionary I'm just increasing the size. That's the color of the node which is which is the one that I showed you in the legend

before. And then uh um that's the type of the information that has been collected in this case is u detected domain. And then of course you need to add the edges because um if the not if the domain um is not in uh in the edges already. So it's not connected to the original intelligence you need to add that otherwise you won't see the arrow point the arrow pointing to it from the original node to the collected node. [Music] Um yes I think that's yes that's it. So um the main problem as I said before is that some nodes are some plugins are using web scraping. So if you have for example the VT um API key uh which uh um

I don't have at least personally I don't have it. So you can modify the VT class which is quite huge because I'm passing every single thing in VT like the detection name, the mutex names, the behavioral information, file names, when the file has been seen and things like that. So you can uh add your API and uh of course you need probably to create a new plug-in or at least overwrite this one. Okay. Oh, another thing that I actually forgot to mention is uh is this one. So, the main problem is that we have lots of um uh nodes here and some of them are overlapping. So, uh they are not that good to see.

Um there is another um library which is called the calligraph. And the idea of holograph is to spread the nodes so that no node is overlapping. So you see everything here and there is no overlapping here. And uh if you have lots of detail, lots of data, it's going to take a little bit of time to generate this graph. Um but yeah, at least you don't have nodes that are overlapping. Okay. So let's move

here. All right. So future works is open source. I'm going to release it on my GitHub page. Um I, as I said before, I didn't have time to finish. Probably going to finish it maybe end of June or July. But you can fork it from GitHub and you can add your own plug-in or you can um uh fix some bugs because I'm sure there are [Music] um I want to add the timeline because at the moment I'm collecting the information from everywhere and of course if some domains is was malicious I don't know a year ago I'm collecting that node. So you might have lots of post false positives in your uh report. So another thing is to add a sort of a

timeline if there is. So you can say okay give me all the information about this specific domain from uh today from uh yesterday to today or something like that. So you can uh uh see only specific malicious um uh domains or IPs or MD5s. I want to try to waste the intelligence based on um the links that they have uh the connections that they have because if you can uh weights the automatically of course if you can waste the the intelligence that you collected you can probably try to proactively protect your network. So let's say this node is has a very high um confidence that is malicious. So okay let's write a rule automatically on our firewall so we can

block uh attacks um or we can let's say if we are an incident responder we can say okay this is very malicious maybe we should look into it properly um I want to improve the graphic interface because at the moment if you um if you mess up with some notes uh you can go back you just need to uh refresh the page so all the notes will go everywhere and uh that's a very big issue I'm having. So if you're using it uh make sure you don't uh um make any error and um another interesting thing is to add an interact an interactive map so you can identify where the attacks came from uh the IPs in in a map so you can um

provide a fancy uh um map to the sock analyst or the incident responder or anyone that is actually using it. Um I suppose that's it. So thank you very much for your time. Uh if you have any question I'm here or around. So any

OSTrICa – Open Source Threat Intelligence Collector

Related talks