
well thank you very much for taking the time to come and see us today well we're going to talk about profiling VIP access patterns in user CENTAC centric data streams the TLDR that is basically try to find out if a a VIP account and I VAP account Ain can vary depending on the context and is at times is very difficult if you're able for example to do a lot of moment type of attack like a pasta hash or or pass the ticket and impersonate a VAP account is it's almost impossible to tell if you follow patterns of behavior of an administrator a technique that you can do post exploitation and that's where we're going to talk about today how we're
trying to approach this this problem so who are we my name is rod sorrow and I've been in the industry for a little bit I run a hackerspace here in Silicon Valley called Pacific hackers we also have a conference in November and I'm here with Joe today which is my partner in crime we've been working together for over four years now since we started working together at capito because fear was a you be a start-up was acquired by Splunk then we were sprung for two years and then we move on to jasc and now he says core light so like I mentioned previously the the goal of this talk is a the bottom line is can we
tell if a VIP account either an admin either a sequel is a a root account or he's super admin I seen that and organizations I have a very long tenure as windows engineer and I've seen a lot of admins of admins and super admins can we tell if this accounts either had been compromised or they are behaving in a manner that is dangerous so we will look a little bit of the malicious insider so how is this defined as VIP the AAP is defined as the account that has the most privilege within a system of application we try to include application as well because we try to make the definition a little more broad they're just simply
rude or system administrator and because when you go to in certain environments there are levels and sub levels of system administrators or super users usually some of the privileges that this accounts have is they can create accounts they can change user passwords they can write or change log files install software add themselves to specific groups create groups take over specific files and these accounts are needed because within an organization you don't want to have a flat structure where everybody has the same sort of the same privileges when it comes to creating I'm in a straight of tasks or creating users or resetting passwords so there's just sort of a hierarchy where basically you had to if
we follow the principle of least privilege you had to do obviously segregate duties and you had to assign every each user according to their tasks and obviously you don't want to assign any user a test that would allow them to take over that happens sometimes with administrator accounts that's been seen in the past I was part of a team that was protecting a organization you probably I Sarah but if you the industry long enough where the system administrator had all their passwords and a excel sheet there were the this individual was targeted he was fish and it was a resulted to be a a nation state and that nation state basically took over the organization
took over content and won ensued afterwards was basically a very stressful and difficult to deal with so what happens sometimes is depending on the organization and the way things are structure usually you you want to you want to prevent somebody from taking over everything and you want to distribute tasks and pernicious and privileges accordingly to whatever the users or group of users are doing and that's that's a way to do to at least the 101 of system administration and basically these are things that would allow you to keep your your organization protected I'll give you an example I had a customer where basically they were attacked by ransomware and because I said a GPO that basically said the user
cannot write outside of their documents ransomware came in once executed but all what was encrypted was the My Documents folder so even though I had an ass where the backup was written to they couldn't write to the nest even though I had a software and shares where they could pull software they could pull for example payroll they couldn't get into it and they couldn't get into it because the end they simply didn't have rights and they couldn't pass the hatch because I do a little trick which is to create an account just to join computers to the domain then I disable it and get rid of it so usually when you do that you had
to sort of remember what was the password for the administrator 500 if you do that then you're screwed you have to reset but usually that's one of the things in case the somebody gets popped they can retrieve the hatch and then obviously you're sure that it'll pass the hash and access everything and if you had something like eternal blue for example which it was one of the things that want to cry was doing then you can encrypt everything so by by this example when I try to explain is it's the relevance of the the isolation of privileges and the power that this accounts have so I think I've already been through this the analog account and
UNIX or Linux systems it's usually the root account so usually we look either a system in straighter and T system or route to put it a little bit more into a context of how these are defined contextually and conceptually usually what a supplier is called role based access control or DAC and rbac basically allows you to create other organizational units or groups where you can then apply it right and apply privileges that supersede the individual that is inside that that group or organizational unit and it's usually very very common in Active Directory environments the active directory basically is the the operational application of RBAC and it's usually done through a protocol called LDAP and LDAP can be used in
Linux environments and Windows environments and our investigation had mostly data thrown at the directory so here's a a overview of how the permissions are usually distributed you have the the director of service service basically checks the schema the schema is like this this hierarchical information about who is who what they do what they can and they cannot do it's usually resides in the domain controller and then from there using diverse number of technologies it gets distributed to there were stations to servers and the ultimately the applications here's an example many of you probably are very familiar with this this is how you manage an Active Directory you have the users the users can be moved in to all
users you can create groups you can create different types of special groups if you want and in this case basically what Active Directory allows you to do is basically create you get very granular or who does what and where can they do it like I was saying before this can get complicated and I want to we want to say this is a caveat because this is an initial approach of this research and there are environments where there are super admins there are all you admins they are hidden or use their service accounts within my career as the system engineer many times I was just telling products where we have to test connection to LDAP and then one administrator will see in a
specific of you and then the other woman come to find out and he didn't even know it I guess because it was a need-to-know basis I was working with a a company that basically they were making military developing military helicopters so there were a lot of all use and parts of the schema that were not seen by everybody and that were seen by the specific individuals obviously this may this research are a little more complicated and difficult we try to basically approach this research in the context of a specific or reduce or you data where we knew at least what accounts based on the schema and on the description of these users where administrators and if you even go into
into cross-domain trusts or or forest it gets even more difficult so once we basically agree on what what the VIP accounts of what are the activities or the behaviors that VAP accounts have we were able to do it by certain tasks so in this case we looked at backups password resets creation of accounts software installs and then basically the reason why we did this is because we needed to try to approach to see if if we if we if we can tell what they're doing if we can establish a benchmark of what they do can we tell if they have been compromised either by an external actor or by a rogue insider so it looks
like I got ten minutes right yeah so that means that we're gonna have to speed it up a little bit so here's a a some of the the quick reasons why you target EAP accounts you say one tool called hound that tells you almost like the degree of separation that you had to get domain and then obviously if you get the main admin you want pretty much everything here's a tool call me me cats you probably very familiar with this where you could do things such as pasture hatch and you can do pass the tickets it's a little more difficult nowadays we still can be done mostly if the year forest level is based on legacy 2003 2012 or 2008 rather so
this accounts can constantly targeted and lately why we most of us have seen is fishing spear fishing because obviously like I said before I've been in situations where once the administrator was compromised basically the organization was done and there's been other cases wrong it's wrong inside there the administrator or in a specific administrator leaves takes the passwords or research the passwords from others and then it locks up or or or disabled to extra trade information so here's the problem so not only Joe take it from here and he's gonna talk to what we did with Kafka and how we approach this problem all right thanks all right you guys hear me okay so yeah now we're just gonna
kind of dive in the last few minutes here into some of like the technical implementation we were kind of doing to prototype out this sort of kind of intuition about there's important users in enterprise data sets and and there should be a way to kind of fingerprint and and maybe sort of assess automatically some notion of like risk the the the account represents the overall and uh the the overall sort of a user hierarchy so I think you know how we did did this problem in an implementation perspective is just kind of more based on the contracted I was originally exposed to it at when I was still kind of helping with professional services services and
engagements and so you can kind of think of this as like a cookie cutter pattern we use that a few large customers doing consulting work and the reason it's sort of a fun story to tell and kind of like extrapolate some of the ideas here is because sort of this workflow we're talking about a led to like creating two prosecutable cases against insiders so people are stealing data and stuff like that in in a really large kind of bank card network or I don't know how to say it without giving away post basically a really big financial organization one of the interesting problems we sorta had to think about this time for kind of modeling kind of
important users in LDAP data was how to kind of do the anomaly detection or the the the scoring of the of the user behavior in a real-time framework so in our in our implementation we were using at the koepcke streams API and basically spinning up like a little distributed app that can run more it's gonna be more some of the scoring metrics I'll show you at that at the end here so kind of like what happens with this this algorithm is there's there's there's two kind of steps there's a sort of a ranking step that is about kind of deriving information from whatever datasets we are sort of curating for the problem of like identifying specific
user types typically this for us this is Active Directory in different forms and kind of how we derive a behavioral fingerprint for the the user activity is leveraging a lot of Kerberos in this case so we Kerberos is something we spend a lot of time kind of on like cherry-picking little little pieces of that data into into like a analytic schema and also like the Kerberos ticket granted without which all kind of talked about a little bit but at the end of the day we're sort of inferring this kind of like social graph which is a relationship of inherited like permissions and trusts and how they can access certain resources and so because there's sort of a graph kind of implicit
in like the social structure of these these kind of user accounts and sort of where they what their functions are the the first step in this problem was to model like create sort of a nice scalable graph data structure to compute some some classic graph metrics so kind of what we have here is like that like that the the algorithm that we implemented broke broken down into steps and basically what what the point is to is to sort of kind of do some intuitive processing of a social network like a website or in this case it's just like it could be thought of as like something like Facebook where there's a bunch of users connected they may be connected
between they may be linked by by devices or something else other than another user account but because it's a graph we can kind of leverage some some breast best practices and assign a couple of metrics to the ranking of the users in this in this this kind of large collection of sort of related entities and so with with with graph like with like with the graph it gives us a couple of different like counts of metrics that are important and we basically kind of have a under the hood it's kind of where we had to spend some time building up our sort of hearest heuristic so we're using PageRank a couple of other kind of
hand-built features to describe for each user the relative rank in the graph that's that's kind of like the key intuition for the first step and then from from there we basically take a like a sort of a time like a like a time series point of view for each user account that we find important and I think maybe I forgot to say this but the reason we sort of like ended up with these first two steps were kind of why we're sort of really thinking about what does it mean to be like a high risk high risk individual in an organization in terms of their potential to just elevate trust or target like different keys to
the kingdom like like walking and ldap tree is a perfect example of a sort of a attack technique we'd see a lot when kind of a campaign was was successful and the target was like just exfiltrated with all the user accounts so we're just kind of going after these these these these accounts that are able to do a lot of kind of high profile permission Changez and and in this this sort of dis metric like we have a a just like an intuition for reducing false positives I mean that's it's not really said here explicitly but step one and step two the first time I implemented this algorithm I we weren't doing any sort of like
ranking of important individuals and what happened was when we kind of get to the the actual meat of this algorithm which is kind of this little graph here so for every user and in the environment you kind of get this graph it's like all the heat map but really what it is is it just kind of tells you like in a nice simple picture that's gonna be represented by like a single number number when we hit it with a special algorithm this single picture of user activity is is is something we can kind of maintain in a nice scalable distributed fashion so it's kind of like a really lightweight way to sort of hack a baseline together of internal access
patterns so this is more like east-west kind of behavior where typically some of this stuff would be we'd be leveraging is about kind of Kerberos authentication Xand and tickets being granted from kind of Windows resources this this kind of gives us visibility into those access patterns where oftentimes in a large organization like if the traffic isn't routed through kind of borders or kind of north north south we would kind of lose visibility at scale so this was kind of a way to kind of get around that see some sort of insights in the land and then kind of the the modeling of the vips is sort of a fancy way to say like we've got to figure out ways to produce
false positives that are that are clever the the issue we had when we ran this we this algorithm at like some fortune tens was that the time to validate like for an operator to like basically escalate a ticket around this and say oh we might have the admin account compromised we gotta go you know pull all the all the servers and just kind of you know do the triage associated with something and you know like that is specific just to like the LDAP and Windows environment and and also kind of the network-centric behavior in the land it would take on average the operators in this large large organizations like 24/7 Sox I think it was 20 hours of working
time to open and close a case around this this hour so it's you know it has its drawbacks but if you're able to know and I think I picked the wrong slide deck correct I mean there's there's basically like I missed one picture here this is like an older version but
there was basically one more picture that's gonna like what what happens with the the scoring step for every person's individual kind of like act set of access patterns is we hit this with an algorithm that's called PCA or SVD yeah all it does is it takes a matrix under though this is a matrix of counts and the the x-axis is just sort of the units of time usually days or weeks and what we do is we ended up we end up only scoring matrixes or heat maps of users that we think have a high risk and this this lets us kind of target a very small subset of the population that poses like a larger risk and also can make make
that doubler alerts manageable like if you don't kind of do some of this kind of clever filtering you can run into a lot of just strange access patterns but you know like the key idea here is like after after you're sort of running our algorithm and you kind of build up a picture of usually it's weeks of time and for an average user like I've done tests on my account when I was at like EMC and I'd have about maybe 50 rows each each rows sort of like a like a like a share or an asset that's being accessed and so when you kind of look at like what happened with the people who were prosecuted because of this so two
admins basically were were fired a really large financial organization the day they were fired like or maybe like the last two days basically like their heat map history here I had a picture and I am sorry I think I just screwed the slides up selamat last minute but basically we add like one more day of information and you you see that like intuitively in the and they're kind of the coloring of the access patterns it's just there's a really noticeable deviation just with the naked eye and the way we're able to sort of like kind of hack together an algorithm that that that sort of targets that intuition is for every for every every update to this
matrix we rerun something that gives us a single number it's the principal eigenvector matrix with a new day of information we compute that number again if the two numbers are different by like a threshold that we sort of define with with the the organization itself we have a we basically have an anomaly and that's where we sort of kind of go after the investigation side of things so this was just sort of like a way to to kind of build a prototype around this problem with some interesting like our architectural constraints mainly the real time constraint makes us sort of choose like a real lightweight representation of users access patterns and and yeah so I think there's looking
we're out of time right okay the way I have Q&A or that's it without any questions from the crowd because there are none online well you guys have two more minutes if you want to tell us some more cool stuff about your algorithm I'm a little confused which matrix do you think the SVD decomposition for is it for the times here that this matrix yep so on each one of these sort of like bins with which we sort of bucket time over could be like this could be an hour just some resolution we choose like for the real sort of production environment we have maybe 40 50 days of individual data and then we build this graph which is a
single matrix each one of the like like row column rows is basically just an account or normalized count and then when we get a new day of data or we have enough information to update this matrix we build another one wait and we compete you would compute SVD on both of those the one that was from like the day before and then the one that's current and and then we just take the like the first eigen value from the result of that computation to give us serve an approximate number that the serve represents the current state and the latest column could the fact that there's a big anomaly yeah exactly that's the key intuition yeah well thank you guys it's
been a pleasure great job