← All talks

Attack of the Graph: Visual Tools for Cyber Analysis

BSides Vancouver · 202120:121.3K viewsPublished 2021-06Watch on YouTube ↗
Speakers
Tags
Mentioned in this talk
Frameworks
About this talk
Visual network analysis using graph theory is a powerful method for identifying cyber threats and building situational awareness. Christian Miles covers four key graph applications in cybersecurity: dependency analysis, threat intelligence, asset relationships, and user behavior analytics. The talk explores challenges in network visualization, available tools and databases, and practical techniques for applying interactive graph analysis to detect and understand threats.
Show original YouTube description
BSides Vancouver 2021 If you want to understand cybersecurity risks it pays to understand the various graph models that underpin systems and processes in use today. Visual analysis of these models using powerful graph theory is a vital tool to build situational awareness. Complex network analysis is used with machine learning to recognise threats and highlight risks. In this vendor-agnostic talk, Christian will focus on four key graph types: - Dependency analysis: Tracing the downstream vulnerabilities for a process of interest - Threat intelligence: Connecting disparate data points through knowledge graphs of related observations - Asset relationships: Links between devices and the access rights (and wrongs!) of their users - User and Entity Behaviour Analytics: user and device logs, the actions they take and the patterns formed The visual presentation will cover analysis techniques used to identify threats for each of the models along with technologies used to communicate their scope and severity.
Show transcript [en]

hi i'm christian and i'm excited to be introducing the topic of visual tools for cyber analysis before i get started i'd like to thank the conference organizers organizing an online conference has its own challenges but i'm grateful to be in the company of some great speakers and sessions you probably know that this is being pre-recorded so i should hopefully be in the chat today to answer any questions as we go along and i'll also be posting some links to some of the content as you see it when used with care data visualization is an extremely powerful method of communication it allows us to take complex data and complicated scenarios and make them easier to understand

a good data visualization will illuminate inspire and inform my data visualization speciality is that of network graphs and visual network analytics this is a particularly interesting approach to data visualization as it helps us understand a world full of connections networks are all around us transactions communications family trees social networks circuits maps everywhere you look you'll see networks unfortunately the first result when building your own visualization of a network is often something that looks a little bit like this this is sometimes referred to as a hairball and it doesn't tell as much about the underlying data the only things i can really determine is that everything is pretty much connected to everything else there's a wide range of colors used that

probably mean something but unfortunately the gray links are occluding the nodes which makes it worse to see what's going on here's a slightly better example this is one i made showing the links found in popular newsletters in the last half of 2020 we can see some structure in this network for example some of the large clusters of pink nodes are where certain newsletters have linked to one article that no one else has linked to so why are these visuals so common i see them in blog posts marketing materials and magazine covers one answer is that they're just cool using gradients and fancy color schemes and animations is a surefire way to turn heads and get

attention it can be a way to get people interested in your product and to attempt to show that you have a strong understanding of a domain the other answer for why these hair balls are so common is that it's simple to create something complex today i'm going to walk you through how we can move on from trying to impress people with pretty pictures to actually leveraging them to fight cyber threats across a number of use cases in particular i'll be advocating for the use of interactive network graphs tailored for use by specialists with a specific goal in mind here's a quick breakdown of the talk i'll give a summary of what i mean by visual analytics

i'll walk through the challenges of it in a cyber context before finishing with some examples of their use with some pitfalls and tips along the way it's a wide topic so i just want to wet your appetite today you can find sources and links on my site the url in the bottom left-hand side of this slide

i focus on visual network analysis which is the use of visual representations of graphs augmented with other data to detect and understand cyber threats visualizations like this can sometimes be considered to be a bit of a distraction they sometimes get a bad rap i said it's simple to create something complex it's hard to go to that next step and make network graphs robust and useful in a reproducible way the important part of this definition is connected data but before i give more examples of what this can be i want to take you back to 2005. perhaps a few of you in the audience are familiar with the 2005 deathcab for cutie album plants and one of the hits there are different

names for the same thing number graphs are a bit like that people use many different ways to refer to them in the field i've heard all these words to describe graphs networks maps charts spider webs which of these is correct well when it comes to network visualization i say they're all fine some may argue with me on the subtle differences between them but for our purposes today all that matters is that we're working with data that's connected in some way while i'm at it i'd like to clarify a couple of confusions in the space as well the less catchy 2021 graph b side for the deathcap song could be same names for different things the word

graph and related topics are a touch overloaded take graphql there was a talk on this yesterday confusingly graphql isn't a query language for graphs nor is it a graph database itself graphql is a query language to build apis with its own ecosystem of api tooling it has graph in the name due to the nested nature of the queries the other confusion is knowledge graphs it's hard to pin down an exact definition of what a knowledge graph is the situation was mudded somewhat by google referring to their algorithm for organizing all human data as a de facto knowledge graph that's what you can see on the right hand side here this is an info box for

when you search for something on google in this case i'm actually servicing the knowledge graph itself as the knowledge graph confusingly many people call what i'm showing today knowledge graphs but some people would argue that's not strictly true but for us today we don't really care about that we just care about showing where entities are connected there isn't much more to it than that so at a fundamental level this is what we mean by connected data here we can just about see three nodes connected via two links some people call nodes entities or vertices it doesn't really matter and in our domain these are likely to be hosts devices users or processors links can also be

called connections or relationships or edges and these could be say access log entries communications or even transactions once these connections are surfaced in a data set we get a bigger picture that can be a little bit more useful this is an example of a source code repository analysis it's actually a tree of a file system where larger nodes correspond to files with more lines of code there are many different ways to organize and visualize graphs i'll be mostly focusing on force directed graphs today but it's worth knowing there are other methods at our disposal with their own benefits and downsides on the bottom left we sometimes refer to this as a chord diagram and it's good

for showing pairwise and categorical comparisons meanwhile on the bottom right we have a matrix where each cell is colored if there's a connection between the row and the column these are arguably more readable than network graphs and commonly used in source code analysis and reverse engineering all the graphs on this page are a reasonable size for surfacing for end users in a visual environment a common method is to restrict the size of our network based on some sort of boundary so that we can show off the size and category of the information without overwhelming and to aid the analysis of the network itself i've alluded to a number of challenges in this space i'd like to summarize

seven of these today these challenges are faced by threat intelligence and stock analysts pen testers and security administrators looking to understand and defend against cyber threats they're also common if you're doing a quick one-off analysis project with your programming language of choice the first four challenges could be grouped into a bucket of data problems you can remember them with vvq lol just kidding i just thought that was funny to start we have the two v's of big data challenges we've been told for years that data is big nowadays i'm sure many of you in the audience work with large data sets on a daily basis and it's tough most tools will attempt to aggregate or summarize and it's hard

for analysts to trust that important data facets haven't been polished out to create a simple representation also as data volumes get larger it's anticipated that interaction feedback gets more latency this is important if we're aiming to use tools that aid people's flow rather than getting in the way the next v is variety we could be pulling data from various sources with our own data dictionary and storage how do you pick the sources that are most important the promise of sims is to assist with this but that's still an unsolved problem quality whether it's log size limits truncating data invalid data or corrupted in transit data quality is an eternal challenge garbage in garbage out it's common to

have bias towards more recent data sources which makes it ironic that big historical data doesn't necessarily imply big confidence finally lol is lack of links with network analysis we're always looking for connections to tie data sources together keys can be difficult but there's a trick i'll come back to later when discussing real sources visual analytics solutions are often described as having ai or artificial intelligence and to be cynical for a second it's often difficult to see through the hype and the marketing that's not to say there isn't fantastic ai research being done it's just that it gets diluted by the time it gets to industry usually because analysts reject algorithms telling them that something is suspicious by some opaque logic

cadence is the challenge of providing a tool to augment an analyst understanding of a typical problem and threat in the network and our tools should help us identify user and system processes and aid this understanding not distract from it i like to call this ai augmented intelligence not artificial threat triage is the cycle of detect investigate resolve but taking a visual approach to this is fraught with pitfalls around collaboration creation of deliverables or consumable products the final challenge is that of risk and reward we weigh up the cost of bringing additional data points into our visualization to aid with the analysis but with the potential benefit of the additional context explore versus exploit this isn't always

easy so there we have the challenges it's a lot unfortunately there's no free lunch visual network analysis will assist but it's far from a silver bullet nevertheless let's look at the tools available that help us hit these challenges head on it's hard to draw up a summary of the space as you could argue on the inclusion of databases programming libraries vendor platforms i've given it a go anyway and tried to plot things out on an axis we have diy from the bottom to the top to platforms and then from left to right we have permissive over to proprietary now in the bottom left we mostly have languages and libraries to build applications from scratch or work on a small scale problem gephi

is a very popular open source gui application that allows you to build networks from scratch whereas d3 is quite a low level javascript library for the browser on the bottom right we have more commercial projects and there are a number of toolkit vendors in this space if you're not looking to use open source i actually worked for one of them cambridge intelligence and i help developers build interactive browser-based visual network analysis applications in javascript a little bit above that the solution vendors box is also a little vague i actually work with a number of vendors so i didn't want to call out any in particular but i made an exception for some companies offering interesting platforms

finally on the right i have two thick client applications multigo and analyst notebook these are desktop applications with wide use across the industry and elsewhere there's a clear split between thick client applications that need to be installed and run on a server or a computer versus browser-based applications and suffice to say the browser is much more forward-thinking at this point when building scalable visual network analysis applications graph databases are databases that are well suited to inherently connected data a lot of people assume this means that a graph database is the only thing that can be used to store graphs this isn't true a purposely indexed postgres database is a perfectly reasonable backend one which you likely already know how to

query but if you're starting a greenfield project with connected data you could consider using a graph database just to highlight a few on this slide at the top we have neo4j neo4j is by far the most popular graph database and has a really good learning curve for new developers orangodb is a good alternative it's a multi-modal database so a graph model is just one piece of what it offers it also has a document store and a key value store and over on the far right we have azure cosmos and aws neptune these are hosted solutions that may appeal if you're in those environments for each of these cyber applications i'll be answering three key questions

what is it what's the risk and how can visual analytics help you'll see that the three examples do overlap this isn't a prescriptive thing techniques used in each can help with the others so to start things off i'm going to look at dependency analysis software and processors often have a vast tree of dependencies often abstracted away into a directory or package management take react the popular front-end framework it has a popular bootstrapping tool create react app and here's a dependency tree of all the packages pulled in when you first initialize a project this was drawn with an application called graphis or a library called graphis it's actually been around for over 30 years which is fantastic

and it does a really good job at laying out complex trees like this the downside of this library is that it creates very static visuals it's not really very modern in terms of how i could interact with it in the browser the power of visual network analysis is enabled through interaction with the browser so i would want to be doing some sort of zooming or selection in order to highlight more information on this network so here's an example of this in action i'm able to zoom in and look at something in more detail and by clicking on a link or node i can see the direct names of it for more information which is really nice to take this one

step further here's an example application from academia i think this is a really amazing way that you could interact with a tree like this and you'll see here that i can pan around and i can select nodes and i can bring the neighbors to me instead of forcing my user to to drag around the network and explore it on their own if you see here we'll also select some nodes and we're able to select the neighbor and then we'll fly over to where that neighbor is in the tree so you have this sort of mental model of the network that you can explore and understand where you are at any time so now when i click free

i'm taken over to there and i can sort of traverse the network on demand i'd love to see this in a modern cyber analysis tool and it gives you a glimpse of the sort of functionality that could be included in the future going perhaps a little too far there's a lot of interest in the use of 3d with large graph data sets this is a brilliant browser app where one can fly through a galaxy of dependencies in the node package manager ecosystem it's quite hard to imagine this actually being useful in a cyber context but perhaps with the rising popularity of vr and headsets it could be the cyber defense app of the future oh and here's an obligatory gif of

minority report you can't mention futuristic user interfaces without name checking this movie the next use case i'll look at is asset relationship maps physio is the de facto tool for building network diagrams manually that's what's used on the left the right is a modern alternative that's a web app the problem with this approach is that by the time you've drawn your intricate map of connections it's likely to be already out of date it's also not reasonable to expect a devops engineer to draw hundreds thousands or more devices manually and with the popularity of virtualization and orchestration technology like kubernetes the complexity of these environments has ballooned these maps are important for threat model assessments and overlaying

multivariate details like access rights can provide critical context on attack paths that could be leveraged here's an example of a dynamically generated network of assets connected from somebody's dorm room at school now this is starting to look a little bit like a hairball again but using the interactivity techniques i highlighted on the last section we can explore and understand this map in more detail and maybe find something off of a vulnerability we often rely on layout algorithms to do a good job at laying things out on the screen here's an example of one that's been run for a small network here we take advantage of physics simulations to pick apart complicated networks and display them in an appealing and

insightful way these physics simulations pretend that the links between nodes are actually springs while the nodes themselves are weights after initializations these will hopefully settle down now you are able to typically tune these parameters for these physics simulations and here's an example of me doing it live so now i'm actually tweaking the charge or momentum of the initialization resulting in a continuously updating graph of course you'd never put this in front of an end user it's pretty extreme example but it isn't unusual to see overly active graphs these are confusing distracting and above all unhelpful there's nothing wrong with an interactive but static graph just because you're using physics doesn't mean you have to show it

for our asset relationship maps the use of hierarchical layouts can highlight the flow of data and information through a network the final use case i want to introduce today is that of user entity behavior analysis this is where we blend access logs with other data to understand user behaviors and spot anomalous activity taking a network of hosts users devices and with the intent of understanding risks i like to think of this at three different levels firstly we have entity level concerns this is that some property of the entity or user that's anomalous in itself maybe it's an alert of a device with some sort of uptime or similar typically these are facts that we're interested in knowing if they just exist

at all of the entity level or on the node level next we have connections these can be the steps users take or data transfers that take place we take one step out of the network and look at the links themselves it's common to use threat intelligence sources like malicious ip logins from a surprise country and these are things that could flag at the connections level of the ueba now remember the challenge of lack of links sometimes we find that keys aren't possible to be matched against when we're using ip addresses or other similar data sources it can be hard to have these common keys to link disparate data sources now one approach you could take is to

use the trick that everything really has a time stamp so there's nothing stopping you from overlaying quite disparate data sources using the same time series analysis so here's an example of a window of time in a timeline at the bottom along with a topological graph view of a data set at the top using this one could spot and drill down into activities that look suspicious and using things like tool tips we can get details on demand of facets that we want to look at in more detail in ueba we use machine learning models of normal activity to build and serve as a behavioral baseline in our visual network analysis we can take this one step further and we can

look at this at the network level namely by monitoring network velocity this means looking at how quickly connections are built up over time there's a rich field of graph and network analysis that underpins our work in this space and i'd like to finish up today by describing one small piece i know it probably seems like years ago but do you remember when that big boat blocked the sewers canal here you can see other container ships making a long trek around the horn of africa due to the blockage one way to think of this is that it's a massive graph problem we have ships around the world looking to find the shortest path between two points

and invariably this results in the use of the suez canal in the graph world we could say that the canal has a very high betweenness centrality this is just a fancy way of saying we look at the structure of the network to determine nodes of particular importance now in a dependency tree asset map or a new eba this can be helpful to identify entities that could be the target for an attack so that's it for today i've covered three applications of visual analytics in a cyber context and give a flavor of the problems and solutions in this space if you enjoyed this talk today i have a newsletter on graph visualization and related topics called source target

you can sign up at sourcetarget.email and here's my contact details otherwise i'm happy to take any questions now and i'll see you around on the chat