← All talks

Open Source Intelligence Based Intrusion Detection System

BSides Lisbon · 201842:42279 viewsPublished 2018-12Watch on YouTube ↗
Speakers
Tags
Mentioned in this talk
About this talk
Ivo Vacas presents a fully automated approach to enhance intrusion detection systems by integrating open-source intelligence (OSINT) feeds. The system autonomously collects threat data from 49 diverse sources, generates detection rules, and identifies real-time malicious activities including botnet communications, brute-force attacks, and phishing. Deployed at the University of Lisbon, the approach proved inexpensive, modular, and continuously self-updating without manual intervention.
Show original YouTube description
The presentation shows the implementation of an Intrusion Detection System based on Open Source Intelligence and how it behaves. Cybercrime has steadily increased over the last years, being nowadays the greatest security concern of most enterprises. Institutions often protect themselves from attacks by employing intrusion detection systems (IDS) that analyze the payload of packets to find matches with rules representing threats. However, the accuracy of these systems is as good as the knowledge they have about the threats. Nowadays, with the continuous flow of novel forms of sophisticated attacks and their variants, it is a challenge to keep an IDS updated. Open Source Intelligence (OSINT) could be explored to effectively obtain this knowledge, by retrieving information from diverse sources. This presentation proposes a fully automated approach to update the IDS knowledge, covering the full cycle from OSINT data feed collection until the installation of new rules and blacklists. The approach was implemented and was assessed with 49 OSINT feeds and production traffic. It was able to identify in real time various forms of malicious activities, including botnet C&C servers communications, remote access applications, brute-force attacks, and phishing events.
Show transcript [en]

Let's start. My name is Ivo Vacas. Right now I'm working at National Cyber Security Center. I'm a part of CERT.pt team, which is the national cyber security team to respond against cyber security incidents. If you want, feel free to email me. This is my email address. anytime you want. The time I've done this work, I wasn't working at the National Cyber Security Center, I was working at the directory at the University of Lisbon, which is a building right in the middle of this campus, but right now I'm on the National Cyber Security Center. Today, nowadays, our life is pretty much half of it online. We use Facebook for opinions, for photos. We use computers to communicate with each other. We even

have household appliances that have internet communication and as probably everyone knows, is being used to do bad things. And we even read the news online. Basically, The life we live online is pretty much the life we live in person. And this is probably the cost that the cyber incidents will make till the end of the next year, which is, well, too much, two trillions of dollars, which is a really big number. And it's a number that is increasing every day. Steve Jobs once said that it's not a faith in technology, it's a faith in people. Right now we have the reverse paradigm. Right now we must trust in technology in order to help us to protect us from people because they

are the ones that are doing bad stuff online. This is basically the state of the art in the internet right now. There are botnets. What is a botnet? It's a network of compromised hosts that are used by someone really mean, a hacker, that uses those systems to do bad stuff, to steal, to... to attack other systems, to recon for information and they have three types: they are centralized, this means that they only have one command and control, they can be also decentralized where there are no command and control and the communication is made with every nodes on that botnet and there is an hybrid which is a mix of those two. We have an intrusion detection system which help us to track bad stuff done from these

guys. We have the host-based which are installed in the host systems that we want to scan and we have the network-based which are always sniffing the network traffic searching for patterns that can match some malicious activity. And here we have open source intelligence Which I'll explain because I felt like most of people which I've talked about this didn't know the full concept of open source intelligence. Here we have a bad guy. He is already looking mean. And like this one there are many others. This kind of guys usually register malicious domains that they use for command and control, to send emails, to do a lot of bad stuff. to us, people that are always on the internet, doing our lives,

than usual citizen. And like the other hackers, he has an idea which is probably to attack people. And how does he do that? He sends an email to every user. This is really effective because If this is done like in the Black Friday, probably everyone would open if it is something with cheap products. And they open it and there's probably a bad thing in that email that infects everyone. Can you hear me? Okay. This is probably better. It won't fall off, I hope. But we can look here that this girl wasn't affected, but it's okay because this guy have some other mean machines to work for him. And these ones, specifically, he uses to recon the data all over the internet. But what he doesn't know is

that this girl, she's a researcher and she even has a honeypot. A honeypot is a server with false data. applications that are only there to gather information from bad people doing bad stuff on the internet. And then she uses that information and share them amongst lots of platforms and if we look what happened she has really lots of information she knows the guys who did the scan she has information about the emails she has the email the the attachment in the email that probably compromises the other ones And this is basically open source intelligence. It's lots of information that we found gathered all around the web. It can be even news information, like from any journal we can find.

And this is important. I've lost some time here because this is the baseline of my work. This is the most important thing and this is what I'm basing to do, what I'm going to show next. My objective is to show that it's possible to use this information to turn a normal one EDS, an EDS that we already know, an EDS capable of finding threads And for that, I have here a theoretical model I've done, which is based on three modules, which is information gathering, knowledge generation, and incident detection. To be more specific, during the information gathering, we will go to the internet, we will gather that information, but that's not quite simple, because some information comes in CSVs,

other comes in TXT, other comes... as a new. So this is kind of hard to keep up on how the information is on the web. So, and we also need to be careful because there is probably duplicated information on the web. Then we get that information and turn it into an indication of compromise that we know that some machine probably got compromised because of that information and we turn it into what I call indicator of attack which means that if that pattern is on the network there is probably an attack ongoing like if there was a malicious IP that attacked someone and if I am using that IP to match patterns on my network and if there is

a pattern that's probably because it's having that is occurring an attack and During this phase we can also gather more information about the information we've already gathered with some commands Then we represent them as the always as in indicators of compromise appear now we will apply that automatically in the intrusion detection system and well that would probably help us raise the security on our organizations. As the internet says, if it works, it isn't stupid. I've implemented this and I'll show you how I've done it and the results. First of all, we have the internet where There are open source intelligence, like I've said before, some feeds where we can gather some information, and there are also

rules. There are already some fabricants that have rules already made, so we can use them for free in our intrusion detection systems. I have one thing that is very important, and sometimes it gets kind of confused. My objective isn't to substitute those rules, it's to help those rules being more effective. So we can have these rules working against the rules I'll make. Now, to do the information gathering I've used the Thread Intelligence Platform, called IntelMQ, which I'll next explain how it works, to gather that information and to store it in a database so I can further use it to create rules and to generate information to the IDS so we can detect threads on the network. This is done by

this module, rule and blacklist generator, which will create the rules. And in this implementation I've used Snort and used Pulled Pork. Why is this really cool? Because it let me fully automate the process in keeping the IDS updated and gathering the information around the web. So with this system I'm showing here, we don't need to do anything to keep our IDS updated. The information comes from Intel MQ and from those feeds. There are rules generated automatically. Pulled Pork is a program which is very cool for Snort because it gathers the rules I've done and some rules even on the web. It manages the rules and it automatically applies in Snort. So it's automatically updated. During the implementation I've

run this through eight consecutive days and I'll show the results right ahead. So we have the feeds coming right from here, then the information goes there, it's turned into rules that pulled pork will match and use against the rules to see if there are any duplicates and automatically applies it into Snort. I've done this every day. Actually, I've gathered information from the internet during one day and in the end of the day I use the rule and blacklist generator to generate rules to help me protect the day next about the things that happened the day before. I've done this, I've gathered information during the whole day, probably could do that for like every hour update the snort, every four hours, every eight hours, ten, whatever

you want, that's easily configured. How does the Intel MQ work? It starts with collectors which are going to collect information around the web and over us in repositories and over news, whatever you want. We can even use systems we have in our organizations to give us information to be managed by Intel MQ. Then we have a parser. This is really cool because it helps us harmonize the information so we have a standard output and the information is always with the same data structure which will help us then to make rules. The experts is really really cool because sometimes when we gather information around the web we only have an IP and that doesn't tell us much. With the experts we can use some admin commands

so we can have even more information that we have already collected on the web. The duplicator will match if there is any duplicate information and will drop it. The information experts will gather more information and protocol expert which I've developed help us know against that IP which protocol does it attacks like if we only have an IP we know that that IP will be used to make like attacks via HTTP or via SNMP and that's really good because it help us to focus our rules on the malicious action that that IP is known for The output, we basically store that whenever we want and wherever we want. I've stored it in a database so we can be used after. Here is how it looks,

the information. Here we have an event, we have a blacklist, we have an IP, we know what kind of protocol it does attack, it's HTTP, we know from which country, we know lots of stuff. And the time I've gathered this information on the web, it's just a simple IP. So, from an IP I could transform it in lots of information which I can use the moment I'll be making the rules. And now I'll explain how the rule and blacklist generator works, which is kind of simple. IntelMQ writes to the event database and then we have our rule generator, which is really basic and it is really easy. It's basically a Python script that picks the information I've

shown before, applies some script in order to get rule and then it generates a rule that can be applied in the intrusion detection system, which in my case was not. As you can see it's really it's a rule that's really specific. We have a port, we have a domain and we only... I've done some things in order to have better performance, but there are some feeds that they actually say just block that IP because that IP is no good. Just block it. It doesn't matter what application protocol it is, just block it. And I've separated that IPs from IPs which are able to do attacks and they are known for do attacks in a specific protocol.

Why do I do this? Because of performance. Since we have lots of information on the web, we and the systems are... Well, we need to keep up with the performance. I've made two different databases, one for blacklists, other for rules and use then Pultwork, like I've explained before, to apply those rules on Snort. But there is a trick here in Snort. Snort has two different detection... It's the same engine, but it's like a pre-engine which is called preprocessor. Why is preprocessor cool? Because it allows a bigger throughput. So, we... handle the blacklist where we don't know what kind of attack it does on the preprocessor. So it's much more, more, more and more quick. It's just like a firewall. It's really,

really quick. And then we have the detection engine where we will apply all the rules we've made with those scripts, from the information we've gathered from those repositories and from other feeds. And then, after sniffing the traffic, we have a database of alerts. To show that this implementation is really cool on performance, I've done this like a year and a half ago, we had two Shannon CPUs, 12GB of RAM, and it is an hard drive with... 165 gigs, it wasn't that much. And I've used a really, really cool open source operative system, which is Security Onion, which has lots of network forensics tools, which helped me to handle the generated alerts and to do some research

on them. And, well, this was probably too... It wasn't probably, it was just... stupid because I just had too much traffic at the time I didn't know because I've developed it and I was like, hmm, since it works on my PC, it will probably work in the network of University of Lisbon, which is a totally stupid idea. And the functional core which I used was North IDS and Thread Intelligence Intel MQ. The feeds I've used it were a total of 49. 9 for malicious domain, 16 for IP blacklists, some for phishings and dangerous URLs, some which are known to perform attacks on VIPE server because the University of Lisbon uses VIPE a lot. Probably all

the telecommunication doing that is probably done all on VIPE. We have EIP is a performa-tech on email server, so we have a lot of information which we can use to our intrusion detection system so we can find if there is any signal issues on the website. And here is the network. And why was this stupid? Because One of the things that I was sniffing was Eduromo, which is the network that everyone uses like in the campus and in the... almost everywhere in the university. So this is the students that have lots of problems on their computers and generate too much alerts. And here we have the central services. and the data center network which was where I've done the traffic analysis. Now,

the results. During those eight days I've generated a total of millions of IPs. And why are millions of IPs? There shouldn't be this much. It's just because I've used some FIJ which has bogons IPs which are IPs that isn't supposed to be used by anyone and they are just like lots of IPs like really much. really much IP like 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0. something bad because it should be receiving something coming from 192 from the outside of a network. So all those IPs are in here. So it's probably just like half and probably less than half, like two tens of these whole IPs. But I had to put that because I need to verify if there was something wrong coming outside of the network.

So I use them. And then we have a total of 25,000 rules generated by that system, by my system, during 8 days, which was a lot of rules. And each rule, some rules were made with lots of IPs. I've had probably, if I can recall, I had like... 100's, probably 150 IPs for each rule. So, there are lots of IPs and these ones, they aren't plugins. Phishing or malicious URL, they aren't IPs, they are URLs, so I have... I need to make a rule for each malicious URL. And everything was done automatically and applied automatically with the system, which is cool. And now this was a stupid part. This is way too much alert. I couldn't analyze more than 50

or 100. It was just 5 million alerts in 8 days. And this was actually true. Those IPs were inside the network. So, even though, and were being communicated, even though it is a really big number and during those analysis I've seen some SNMP traffic going outside that shouldn't, so there were really someone inside the network sniffing on SNMP traffic, there were some vibe communications trying to be done inside that shouldn't be being done, which probably were being done in order to steal information, and to steal some communications. It was just really a big number, but it helped me found some problems inside of the networks, like some compromised servers that were being used, that should have been used like three

or four years ago, and they were just compromised because they were totally forgot where they were. And I found them because of the system and helped us to get more restrict rules on the firewall. Number of alerts by the category of open source intelligence that I've gathered. I have 3 million of blacklists, majorly of the traffic that I've analyzed generate alert on blacklists. IPs that try to get remote access to servers, of course. This is the... the dish of the day. We always have someone trying to access Telnet, SSH, RDP to get access to the machines of our organizations. Malicious domains, they were really too much. They were probably from compromised headroom clients. Somewhere inside

that I've managed to found out and to report to the system administration team which took care of the business. But the cool thing is that I had like zero phishing or malicious. This work was done during the vacations of summer, so there was not that much persons on the directory. So, and we also add a spam filter, so this probably, that probably help us to have zero alerts, so, and which is cool and is expected on organizations. Number of alerts by thread category. We have traffic generated by a blacklist IP, SSH communication by a malicious IP, we have It's probably the biggest one. It's 1 million from SSH communications. We have SIP option request. This is really nice because it's more information, it's

just not some communication over SIP. It's specifically a SIP option and I could see the options that was on that packet and I've even... this SIP register and SIP option and SIP communication by malicious IP, I could trace them and I've seen they were using tools that are known to do attacks on SIP servers. it's still a really big number. So, I haven't said it before but the bad thing about using the blacklist on Snort, well the preprocessor which wasn't a rule, it was just an IP, is that I can't gather much information on what happened during the analysis of the traffic. But what we can know is that that malicious IP have hit some port and the top destination port was

Telnet which was expected by those malicious IPs because it's, like I've seen before, it's one of the biggest categories. Something really really really cool I've seen here was the port 0 which caught me by surprise And it's something unfortunately I couldn't analyze because I didn't have the packets because the machine was not powerful enough. So it was like we had an alert and then we had other alerts and then the alerts previous generated was just being dropped. I probably shouldn't have put this solution on such a problematic network. I would like to try it again on a network that doesn't have so much problems. It's also because it is a really, really, really big network.

Really, really big. And... What I found out on port zero is that some scanners do use the port zero to gather information about the hosts. I've also heard that port zero is used by malware to know if they are on a sandbox so they can evade. If anyone here knows anything about this port zero communication and I expect something in the end of this presentation, please say something because I would love to know more about this. We have 1433 which is used by SCADA systems from Siemens. I don't know why it is here and port 22, the top ones are obviously trying to get remote access, 5060 which is used for SIP communications for VIPE

is also a huge number because our infrastructure on the University of Lisbon was basically all supported on VIPE. We had more than 3,000 VIPE telephones there. 443 which is used in DAT which are where the HTTP servers are listening there are lots of attacks lots of communications of those IPs here and this is the unfortunately the blacklists are bad because if there is something bad happening from an IP if an IP was considered a bad IP, may not represent an attack and the rules help us on that, these ones don't. And why does the rule help? Because we have the specific part. So if we have an attack from a blacklisted IP and we know that IP was an IP

from a big organization like University of Lisbon that has some external IP addresses which is just one and it's netted. So, we can't just block an IP of an entire institution because we will be killing lots of machines. So, on that, the rules generated will help us. The blacklist isn't just that good because I didn't have the control on which port I wanted to block. So, at least I've sniffed and I've seen that they were trying to access ports that they probably shouldn't. These ones, this one, zero probably too. I didn't find out which one was this. 50, 60 probably not much of them because we weren't using VoIP for the outside communications unless we established a protocol.

And that's basically it. The system work, it helps the University of Lisbon feeling safe. So, I guess nowadays they already have some security, more equipment to help them finding out threats. And this work, which I've done with Iberium Dedge from faculty, of Science of the University of Lisbon and Nuno Neves which helped me writing a paper which was accepted on the European Dependable Computing Conference. I want to humbly thank them. They are not present, unfortunately.

And, well, it works. It's a system that is quite simple. We can have it everywhere. We can use it with the systems that already exist. We can use it to add information on the rules that we already bought on some private institutions. Like Talos, I guess, they sell rules too. And we can merge the best of the two worlds to help us to help us turn our organizations more secure. It's inexpensive because it's everything on open source. We only need the hardware and it depends on the size of our network. So we can put this basically everywhere. I've tried it in my own computer at home during some time and it worked pretty nice. It's modular

because we can use the feeds we want. We don't need to use only the feeds on the internet. We can use feeds from other intrusion detection systems. We can have information from lots of places, not only news and OSINT repository. We can also use Intel MQ to collect information from our applications that exist already in our organizations. And this is really cool because I've done this like one year from now, one and a half, and right now it's fully updated with the threads that are up right now online and I've done anything. I haven't done anything, I haven't even touched the solution. It every day goes on the internet, gather that information and it's always updated. And

well, that's it. Thank you. Anyone have any questions? Guys, any questions? Hi, how can Intel M key integrate with MISP? There is already a bot on Intel MQ that already integrates MISP, so that work is already done. You don't need to program a bot for Intel MQ to get information from MISP because that's already done. The guys behind Intel MQ already created that bot, so it can already be used if you want. I could use that as a feed from my system and to generate rules automatically. Any more questions? How do you keep up with all the new ideas? How do I keep up? I don't. That's the cool stuff. It's about the pulled pork. Pulled pork will keep

up that and will manage it for me. So, if we have rules that doesn't exist on the previous day, the pulled pork will throw them away. And if we have rules that match the rules we've gotten from other sources like Taos, for example, it knows that it's the same information. So, It only puts on the SNORT the information that it's good, so we don't have any duplicates. And SNORT uses ID for rules. Pulled Park does that for us too, so we don't even need to bother with rules IDs, which is pretty cool. Well, Ivo, first of all, congratulations on the work done. Thank you. Well done. I was just curious if you measured the performance throughout those eight days. Yep. And what were the results?

I mean, we were seeing the alerts... About the performance? Yep. Well, the bottleneck was the HDD drive. It was really, really, really poor. It was kind of hard to keep up with alert generated. I even had to eliminate some villains so I could keep up with the other ones. So, probably an SSD would have helped a lot on that. The performance was nice. This shouldn't be working for for the whole networks I've done because it's lots of traffic. I used two 10 gigabits connections and one copper. So it was a total possible of 21 gigs per second. It was too much. I shouldn't have done that. But it could keep up really nice. Even that whole traffic. Any more

questions? Just wait for a mic, sir. Sorry. Just wait for a mic. Hello. The method that you use to collect the information, it's... you actively sniff the network from one point or you use several distributed points along the network and you make a kind of a distributed collecting? To collect the information over the internet? On the network, because we were talking about the network, the university network. Okay. Back then I had Cisco and I've used the span protocol so I could only sniff the VLANs I wanted. Okay, so you went to the switch, you put the port in mirror, something like that? Yeah, that's it. And you sniffed the network so you had only one sniffing point? Yes, I had three.

3 network cables, 2 optical fibers and 1 copper one. So, on that connections I've configured the VLANs I wanted to listen to. So... Ok, so thank you. Any more questions? Hi. Hello. Hello. How do you deal with the false positives? Oh, I deal with the false positives, that's a good question. And it was really hard to keep up with them because I couldn't look at the 5 million events. It was really much. And that is something that... I couldn't look at all to lots of events because some events that it was interesting in my points of work to look at some of them were already gone because it appeared on the night and since we had too much alerts by the morning they were already erased which was bad

so the alerts that I could see was the one that got alert so and I could see on the moment those ones I've seen probably like 80% were something bad because some of those IPs shouldn't just be communicating on a port 23 inside the network because there was no exception on the firewall. It's by inference. How did you check that IPs were indeed malicious? You have proxy logs, application logs? To check if there was something malicious doing by that IP? Yes. The snort captures PCAP. in a specific time. It's like, imagine that you have an alert. You can tell Snort to save the next 5, 10, 100 PCAPs from that IP. So then, after that,

you can analyze. I'm getting the IPs from the IntelMQ and IntelMQ is getting the IPs from the internet sources. Okay. Thank you. Thank you. There's a question here. Hi, so I'm having a bit of trouble understanding how did you scope the rules to be introduced in Snort? So, anything that was triggered as a threat from the internet was immediately put into Snort or was some sort of, I don't know, some sort of rules to select which rules would go into Snort? You can configure the rules you get on the internet on pulled pork. It automatically goes on the internet, grab that rules for you and apply them on snort. The rules you create, which will be matched against the ones you get in the internet, which

come from Intel MQ, that information is based on the trust you have on that feed. So, if you trust that feed, use it. If you don't, probably shouldn't be used. Or maybe you should use it, but with caution. On the other optic, it's probably, well, we can't just block an IP because we have heard someone saying that it probably have done bad stuff. But that's the Portuguese perspective. If you look at By example, the Iran perspective, if there is an IP that could probably be doing something bad, I can assure you it will be automatically blocked. So, it depends on what we want and how do we manage the sources from where the information comes from.

Sorry, just a follow-up, because from the results you got, so you had around 5 million alerts from one week of scanning, so that's in a security incident response context, that's unmanageable, so what could be done to well, to triage or to limit that unmanageable amount of alerts to be actionable by a SOC, for instance. I should have had a little less appetite. At the moment I was doing my master thesis and I wanted, of course, a good grade. And I was like, okay, I really need this to have alerts so I can show this to my teachers. Which was just stupid because when I put it on the network it generated too much alerts and I couldn't just go back because I needed to get that week

of information so I could use it on my thesis and it was right on the end of my thesis so I didn't have anything I could do unless to have that result and it It just came on a really poor timing because at this time that I've gathered this information, it was a time that the University of Lisbon data center got a bit warm. At the time it was like 50 degrees, it was like probably, I guess it was the warmest day on the previous year and the data center just couldn't handle it. I really had much short time to get results, so I just let it run. How much villains? I could, I could and I've

done some during this capture. I've doing some doing... there is... one specific faculty that I won't be saying that was generating an unbearable for the system amount of alerts. So I've cut that villain in an instance. Any more questions? Okay guys, so thank you. Thank you very much. Thank you all. Thank you.