
or is it yours thank you for the introduction and thanks for being with me and yeah so it's support and we supposed to have two speakers this presentation but my colleagues he cannot make it so I will cover all the content I'm going to talk about two actually two main topics so the first one I will walk together with you on telecom security in general because I think it might be interested for you to learn some some at least the high level and the risk the stress in terms of security on the telecom telecommunication networks and the second part of the talk I'm going to dive together with you in a deep deep dive into one new sky which is the SMS
fishing how we fight as I read fishing internally at post yeah that's a few words about myself but actually I was introduced before so I think I can skip this part but that's one small thing as you know we at post an example we have the new department which we have different competencies and I am working on the cyber security cyber labs so in which we foster innovations and develop new products doing new experiment new evaluation up new of security product all these interesting stuff that we will do over there in the cyber laughs in post-reduction boo yeah so like I said we will cover the disease topic so telecom security affairs and the esmad
fishing and a little bit of spamming detection built everything from grow up do you know about telecom networks excellent we have a few expert here yeah so usually the picture is much more den and you know sophisticated and more complex than this but here is really a really a but I views of telecom networks so we have different operators around the world and they are connecting they have CAI up or type or network so they go through the internet but usually the cavity the in a secure manner so they connect with one another and they I don't they don't open to the internet previously and now recently they're I'm a new guy of players into the play so we have virtual
network operators and these guys they connect they can leverage the existing network infrastructure as operators like post so we have antenna we have core network but this guy hop data center somewhere and they're connected to the the the infrastructure of as a operators and they up they operate on top of it and thanks to this guy of course internally we also have bad guys within the tech networks right so we are we are not talking about the networks in Luxembourg in Europe but we also talk about the guy in Asia the guy in in Africa so there are bad guys and especially we have this new player so there are more and more let's say a tres
Telecom dress I think you have heard about this stuff so we have we have to deal with this kind of attacks like daily so we have spamming cause or we have s/m at fishing SMS spamming we also have a lot of telco fraud happening so just to highlight a few of these the main threat with you so the first one is a which is record one really spamming calls so City angry from Japanese minute a drop Co so basically some froster from somewhere they make a massive number of drop quotes right you can talk about 10,000 within a few minutes and then our customer call back so that's a point because sometimes they don't the people
say that's don't care about who's calling them they just say okay I'm missing school might be important I go back and then if they call back they routed to premium numbers so then the customer have to pay a lot but think about it one people pay one euros per minute but if ten hundred of people called like the hiker hundred euros and if it is one thousand the number is getting bigger so that is the first fraud the second one which is PBX hiking I'm going to have a more detail about this case so I will skip this part next one which is subscription fraud so somebody might focus or ID go to go to the shop and
then ask for a new SIM card and then it's then they can use a SIM card to authenticate the other services spoofing also interesting because thanks to the virtual network operator like I mentioned earlier they can actually spoke any numbers so it happens in Luxembourg that they spoke a number of some elderly people and then suddenly many people calling back and they go to the real numbers so then the the ladies she was like bombarded with so many Co and she complained but in general they use a spoofing call to make advertisement to make survey so basically they choose some random numbers and they spoke they happen to scoops up some some real numbers next
stuff so SMS fishing again I'm going to dive into this into detail so I will skip this the next one is quite interesting so here we talk about subscriber call SMS and data interception meaning that gives an SMS to somebody else and the SMS can be intercepted by the third attackers or you call somebody else you cause or partners or most etc and the core might be intercepted and it is happening it's quite straightforward thanks to the vulnerabilities in the telco network especially on the the ss7 layer protocols you might be able to trace the location of somebody's also and of course there's a kya fault yeah so the first example so I give you this example
because I think it's quite interesting from my point of view as well because when we detect this and we visualize it and we say yeah is quite professing so do do you know what a PBX is so a company an organization so you have to have the center searching boat center a telephone system its root the phone internally and when you go when you go outside they share some nice yeah and so of course before we use we have this beautiful lady are doing some kind of connecting from A to B and in the past like ten years we have this system so telephone system in the border of GD company and now we have this it's very
interesting as well so yeah yes on the cloud so what happens when these feedbacks not hiked of course this cannot be hiked I don't know but this is when it's got hiked the attacker can actually generate a massive number call forms or numbers to a numbers to a list of number to a huge number please numbers are owned by hackers right and then if you are owner of this PBX you will have to pay a lot of money so I'm going to show you in a graphical way so you can really perceive it so here you can see one dot is one numbers and most of these gray dots are all my hackers okay you can imagine you can see they
are all over the world and many in Africa in this case and yes it is what we observe just so let's display and I wish playing a later on we don't have a sound so sorry it's kind of muted video but you can see here we have all of this colored dot those are extension of one PBX so here we have nineteen extension and then these attackers that uses a whole number to call all these great numbers owned by them and so our system we can detect them quite quickly so we attack and we blah blah so that's why we have this kind of trading game so let me play in bike you can see so this guy making a massive
number of calls to many destination we block them stop next one okay we detect stuff in attack block stop so city's automated way so we we detect the guy we block them but he like always we run behind him the guys know so that it was the first work that we we did to like more than two years ago when I was quite impressive at the beginning because we see that and we show this visualization to our management and they say yeah we see some some kind of good impasses and they sponsors to do more other stuff yeah so what we learned from this guy observation from the experience so far as the froster and cybercriminal in
general they are smart people and they are very well organized they are globally organized and they're evolving and we are running behind them i really we are running behind them because we are talking abyss i advise saudi fan sites so we are running behind them we need to be fast and precise I wish plane one so what we we have at post so we combine these three let's say competencies together so the first one is a big data analytics being able to ingest a huge amount data so we are talking about a few 300 more than 200 gigabyte per day currently and we are expanding we need to use machine learning capability to be able to you know
analyze such such a huge amount data and then we have also some development step up in place to automate the mitigation actions because otherwise we cannot you know hand out quickly and can again cope with the for example see the night shift or during the weekend because easily foster they do they start the attack when we are not there sorry it's the last bite so fast is quite you obvious right so we try to catch the right as far as possible but to be the next point I think I would like to emphasize as you need to be precise because if we detect the wrong guy then we block the wrong buttons as the wrong numbers then the
customer will be affected right and also if you are not precise you will produce a lot of false positives and then you know there will be no more people will be involved they have to deal with those false positive and at some point they will say ok I'm fed up so there is quite important to be fast and beside any question so far so let's move on so there is a min actually the main topic of today's talk will talk about SMS fishing as spam using machine learning so I don't know if you will be interested in that the goal the our objective is to really to be able to catch the bad guy in real time so really
real time so we are talking about a really a latency of less than a let's say you happen to have a minute so that they go estimate fishing you know it so sender effects so sender usually is a pretend to be a bank or some known brand or a commercial website or a big company they send very short but attractive very interesting SMS like please do you have an update on your couch and you receive a let's say a gift from our whatsoever and please click on this link right and of course they contain malicious links yeah and it happens in Luxembourg's many times they send to buying customers customers of banks and then when they
follow the link they will be forwarded to this page you know what it is right everyone noticed yeah so it's not just see the identity provider in a symbol for authentication so once you authenticate the attacker they have the valid you know token or section ID and they can do whatever was open your couch yes but actually it's quite challenging intermedia with these problems so the first challenge is that we need to be scalable and there's something actually at some point people say SMS is dead because nobody seems in SMS they use social networking my server but we are observing that estimate actually is coming back because of the chief to lift so now sending SMS is much cheaper than
previously and it is easier because for example like I mentioned at the beginning we have NB a and ba n o so basically both network operators that's data centers they provide service on a website you pay a few euros you can send a few hundred of emails SMS regardless of the contents it's very easy yeah and actually many binds many companies they are using SMS for the multi-factor authentication so one of those challenge that we have to face with is the previously is that some kind of SMS fibers and they works by own rules and they don't work anymore because sender content and UI can change very very quickly so there's no way you can catch
a guy you can block for example bank number ie but the next time if you black really on the sender you might block the wrong guy you might block the h1 bank selling the SMS if you block the content the next day the guy will change the content of the SMS and again you might block the content the wrong way yeah the second challenge we have to face is if I tip Ness so again we need to be precise so it's very important to catch the right guy first we have to deal with multiple land which are the same terms this is very interested in Luxembourg because in other country you might have a MOOC
about the dominant language for example in French in France people speak French mainly 99% I would say texting in in in French in SMS in the UK for examples I would text in English of course but in Luxembourg we don't observe really a majority of the other language so we need to support multiple language right from the start so we cannot say okay let's stick with France French and then we move on now try to cover as much as we can at the beginning last but not least which is privacy so we have to be fully automated because in you know you cannot first you are we are not capable to analyze the content and second of all
is it's not allowed so must be done by machine so everybody everything must be automated so like I mentioned so machine learning comes to into the play so we resort to machinery so we are able to ingest and decode all the data in real time and the next thing is to use machine learning to detect the bad guys so bad SMS is actually just to decompose a problem a little bit we are dealing with three problem as the symptoms so the first one we need to detect the right language right so ribbon attacks a piece of text you have to know is this in English is this in French let's say voyage over server so different
languages we have to classify the content of the SMS right so really having the the SMS we have to classify them using the natural in machine learning we have a field which is called lateral language processing so we use all the advances in in NLP natural language processing to classify text from good tag or record harm which is not harmful text and spam and then the last one we have a piece of URL which of course we can decompose all the information it shows information from the URLs and then we have to classify URL from routier whales or legitimate URL to bad ones so I'm going to die specifically on the last topic which is
a Oh classification so it so given one URLs usually as a data scientist we will I have to ask myself like what characterize a bed at a best URL right so it's turned out that after analysis of course we did is not we are not the first one to deal with this kind of fishing problems but in on the internet people deal with fishing for example on emails so they have a better machine they can actually add a nice longer text so and we are will actually something I mentioned I forgot to mention earlier is that for the SML fishing detection we cannot actually in analyze the content of the targeted web page because for
example if you if we activate the areas we might invalidate the either the right URL of the year of the sender's for example you want to reset the password and the system send you a link and there is literally only one time so it when your once you open the link it is invalidated invalidated so we cannot analyze the content of the purpose that's also one challenge so looking back at the US so we have to analyze the URLs and I mean from a high level point of view we need to you know separate these bad guys from the good guys and then if we look really deep into the URLs actually we can extract a lot of
information so in machine learning we code like feature and generate so basically you cook the URL in multiple ways just to see what is the the the Cubs idiom how to save the characteristics of the bet us so here given one URL we analyze some so is a top-level domain known or unknown the domains belong to a known organisation or brand think like that do we have Chi repetitiveness of the characters so for example sometimes the the attackers they just replicate some some characters for example Google instead of having to oh they put three o's right or instead of oh oh they have OC so then they have something like something to some word which is not meaningful we
have to analyze some randomness in the indie was in the US so for example here we might have something not so meaningful for example yeah orange yeah or info with a 12 example thing like that we also in the eol use by attackers they sometimes they use as a user's easy all the digits so instead of unfamiliar confirm so I feel ABC the letters they use digits so quite some of 0 1 2 3 they use this more often Jensen than usual and also special characters so do you special character in the US so I cetera so we have in toto we have more than 30 we code features those feature can characterize URLs and yes so in term
of machine learning we need to provide the machine with some kind of data for training so thanks to some open sources so we have open fish and fish tongue where people report the fishing euros so from here we can really say download and retrieve a lot of us materials so we use that as some as one one set of the training one half of the training data set and the second half we need the good URLs so with we can really realize on search engines so we can let's say do some random queries on search engine and they return a list of Europe's it's quite quite straightforward something that I am actually very surprised when I learn the
first time is about the error rate on training data tells us to explain to you a little bit so here we can see the the training iteration so the more you train the better you learn and then here the error rate so like how many time I classify wrong us good become bad let become good so there is wrong us and we can see at the best error rate only 1% one point five percent maximum so meaning that we can have some accuracy of I say 98.5 so it's quite impressive at the beginning of course if you know more about machine learning that could have a problem as well but I can I will skip that if you are interested we can
discuss later yeah and actually we have this the whole system for SMS fishing detection in real in production since a few months yes seen actually in April this year and we did take spam but actually series have a limited spam because we detect only spam but with the URLs within inside the text we can detect spam campaigns like everyday daily we can detect spam campaign campaigns but currently we don't do take any action not yet but of course when if we have customer complaining then we can have the choice and we can take some action but for the fishing we the type fishing like targeting by star getting retailers like once per week so it's a Skype often at
the beginning before we we don't see that so we we assume maybe we don't know or we we assume that nothing happened but actually it's quite interesting to learn that they say they have this phishing campaign I was probably at least here again we are we have visibility only on our network so on post network so I don't know about the other networks maybe they suffer the same thing or more or less I don't know so we have this system in production and what you see here is the some interface we are also using this plung as the presentation layers of course we have all the machine on them processing of data the real-time let's say treatment
of data we use Kafka for that if you are interested and then we use a machine you need to detect everything only we send others to Splunk for visualization and in the presentation and here we will we can analyze the Zizi detected phishing attack we see the content we see the number of targets etc and yes we can also see a deep and deep dive into the SMS fishing so here for example you can see the language and see how many recipients the probability of being a fish a fishing SMS my demo here we have is like 91% etc so we have all these kind of details using scrum yes so in summary I hope we sent it some as I
highlighted some centrist in telecom networks oh like you already have seen we have actually we are facing quite a lot of stress but sometimes we don't know so one of them being SMS fishing and in house of course we are lucky because as some sets at the same time we have different competence so we have a team of good competent of different people and then we built everything from grow up from decoding of the traffic to you know processing data and then detecting running machine learning to detect attacks and presentation so all those competence together in one team and then we can we have built this a system and it is up and running something that we very like to share
with you thank you thank you so any question thank you for the presentation super interesting I have few questions so I'm going to bring the most relevant ones and then we can't discuss the last ones outside you show a nice graph of tracking BBX scammers or attackers and how you block them very fast did you use different machine as well because uber cool you mentioned that you used the data from fish tank and open fish to train the models that you are using for this system do you contribute back output to those rebels because as you mentioned is a very serious threat and it just keeps going on every day there are new efficient URLs and new efficient
attention people who enjoy doing this for various reasons so you have a very it looks really nice and I think it works from what you present it so it could be very hello it was workin through in the data back so are you doing no I did actually um so we can see we're back so like you know at post we have one scissors to team and also we have the soft teams so those team they will handle all the less ie incidents and then from the incident as long as we use we observe any bet URLs we contribute by to Fishtown at the moment but like I mentioned at some point we cannot activate the u.s. yeah
if it is not confirmed so only when we receive any less a complain from the banks open retailer saying that hey we are suffering this guy up massive SMF fishing holy then we can say at least over the year and then we submit to actually circle circles they have UIL abuse and and Fishtown yeah great one last question and super sorry for hogging demonic fish URLs i beez be Bieksa scammer locations all of those are super - i OCS that could be helpful for other incident responders do use miss in any function or any way to like east or those kind of things that maybe shared them either internally or with other ISPs around the region that's a
very interesting question it's in and do you know that we have two instances of miss miss so one is for miss for IT so we share the the oh stop it related to IT world and the second instead of MIPS is shared by all telco provider so the second is and up MIPS is shared by Abby's is sponsored by GSM a one kind of telecom of an association and there we share the kya Posey fraud we detect all the bad numbers all the frauds all the number all my hackers my attackers end and also this information hey thank you for the presentation I use keep three first about the failure rate of your machine learning yeah how do you cope
with with the failure actually do you accept to frustrate customer to not having the SMS delivered how do you do this so this particular case we don't take any action yet so we only detect and then we stop we start all the alerts in a coyote in its plan and then only when the customer for example banks they coerce and then we can take actions okay yeah and in the next time you you you consider doing this proactively and blocking threat directly oh yeah so that's the next step so the next step we kind of trying to get the binds and retailer on board so if we get the authorization then we will take actions and as you know the telco
provider we have some something which is Co SMS a five words we can really put some rules over there to blocks yeah yeah so thank you for the great talk my question is bit related to the last one perform differentiation so in the first slides you mentioned that it would be possible for example to spoof numbers and to the system and spoof SMS so how realistic would you see a threat when someone spoof the number of someone and intentionally sends malicious links so the person gets locked out from the system so you cannot send any calls or synteny SMS anymore so you are saying that if we for example blocks a bad guy but the effect would be the right guy
right yes yeah actually apart from this information upon the sender we have more information because who we can have like how what have the routing what is the information so this is a sender from which let's say let's say network elements of home from which network let's say path he that he sends easy the SMS okay so if for example X's as a server Network former provider in Africa yeah and the real person would be located to Knutson work so this would be also taking into consideration from that's correct yeah exactly thanks for your question any other question she all firstly I would like to thank you for the well presented materials it was creative my question is related actually
there are two questions I think that they're somehow related in the first of your presentations you mentioned one word that is with Japanese roots natural for the pronunciation of this word we drop out it means you can scroll back because I'm not sure how was the world this means sexually drop drop out your course yes I do it Japanese I'm from Bulgaria actually okay I don't know how yeah I found but it's just interesting for me from where this route of the this work because I mean yes the first one so yeah I don't know who picked this word but I sceles so it means in Japanese like Rocko's so you go and you top how it's exactly
pronunciation one Gris I'm not I'm not so nice by the way so don't trust me so and maybe my second question will be much more interesting because it's something here that I am serving on my phone and I'm not sure and just would like to consult I received some types of drop calls but this was actually from my known contacts for example my mother my my brother or something like that that are that are in my phone is it possible actually this is P some premium number that is sexually some something here but to this to hide somehow to distinguish and to represent like somebody that you know is it usually not because the phone is smart
enough so when it received any call it will check the contact list so actually shows a local contact list so not anything useful from the network providers so it's just pressing something miss by actually in the end time always out in authenticate I am always contact back and ask people is so is this is some type of error what this was and actually this was sexual call so it's okay yeah and here people do not pick up randomly numbers of course you know you have to verify it to be sure so that's okay oh thank you okay I have sorry I have one question there are of course limits to your analytic systems because for example I can show you maybe
afterwards I have an SMS in Bulgaria you know we use Siri like letters so I have here an example of spam SMS spam yes both of us we receive the spam SMS which is written no they don't use Siri leak they use Latin alphabet to represent the Syriac so you can't possibly have any chance of analyzing this language because this language does not exist yes yeah I will I will show you the example and it's really bad because I cropped the picture but actually we have something on this side we say feedback and when we have the feedback loops in which the user can say it is a post positive what it is not possible it
and also the user can also upload some Chi up their own data so for example you will send me a list of SMS I can collect it or why and then I upload to the system Malibu which way is hell so in this way we improve the bottom and one way maybe one day we can detect those yeah a good example any other question [Music] the sir work would be in public space I mean do you share your finding with other telco company or it will be only internal for your own business needs in the contact with other telco provider because they are either really interested only in this and we kind of packing a solution and we we sell to
them yeah so not only sauce it is a one single motor but like I mentioned we have really a lot of threads and then the tune that we have so it detects really a lot of observe yeah okay so I do have one question earlier you said that you don't inspect the content of the URL right so how do you deal with URL shorteners the SATA near that's quite tricky so that's why we come by many thing together so not only the URL but we consider also does the content the language and then on top of that we also I think that's also actually a good observation so we also consider the senders and how many recipients do they
send and how similar the recipient are because easily the foster they use some kind of they do enumeration of the recipients so they don't care so then we we observe a lot of similarity between amongst as the recipients so we take all of them into account and the certain way is quite a good example because most of them they use random words so by default this machinery mode or detect all those us yeah so this also means that if I sent some fishing to let's say a restricted number of persons there is likely no chance that this system detects it right if you send but yeah if I just sent your shortener shortland we I don't know too many people or
people yeah if you have targeted attacks yes that's a good one okay then we can search those let's say not so impactful yeah for these for the public yeah and my last question but I guess it's it's it's clear just to have an idea it's I mean your system in the first hand always delivered the SMS to the person right and then it's after some people already received the the fishing that you can actually block the number or do your let's say your reaction to this currently we do we wait even longer so because we if we work with let's say the the banks or the retailer so if we have the authorization then we can take
some action otherwise we wait was I more we proactively inform them by email or by some channel that we have and then we we wait for the confirmation or actions and and sorry I'm taking a I'm asking a lot of questions last one is okay it also means that you are let's say aggregating in the SMS is sent on a time window so also if I I don't know somehow identify that your time window is I don't know 30 seconds if I send one SMS every 30 seconds you go through as well okay right okay I thought to be fair I thought about it and then we can actually have multiple models running at the same time so one
can be very fast but dealing with the noisy guys yeah and the other one can be slower but then we can deal with this slower yeah yeah good I for one more question regarding when you detect messages you see a huge spam can campaign because you mentioned the public so what you need to do at our company yes we would inform our employees that there's currently spamming campaign going on and you would inform them out it would look like so would you also want to replace the spam message with a message from the provider maybe which we'll send maybe a warning message to customers that there is currently a spamming campaign and they got targeted by this campaign so usually
what we do currently forzó a customer that we don't have agreements then we contact the security officer or some contacting point in the security teams and notify them about the incidents and but of course in the in such email we don't mention anything any details at all only when they say that okay it's interesting it is relevant for us then we take action and then we can share more information so I think it's time for the last talk so thank you for your presentation