
[Music]
hello I'm Robby Cochran thanks so much for coming to my talk at the end of the day this is my first time coming to b-sides and it's been an amazing conference so I hope you guys had a great time too so yeah time of my talk is listen to your engine unearthing security signals from the modern Linux kernel and just a quick overview of my background previously I did a PhD and I looked at applications of symbolic execution and currently I'm working at stack rocks doing container runtime security detection so this talk is going to be kind of a survey talk an overview of what system calls are why are they important for monitoring infrastructure
and what we can observe at the system call level during attacks and what features in the Linux kernel allow us to observe those system calls and then kind of what are the challenges and trade-offs in this in this world so if you're a kernel developer some of this may be a little high level but if you're new to this then hopefully you'll you'll come away with kind of a flavor of these these features so first of all it's kind of have a quick little primer on what a system call is and why is a system called different from function call so traditionally as you know you have a you have user space and kernel space
processes and system calls are used to communicate between these two worlds so we actually have to kind of talk a little bit about hardware so CPUs have a hardware feature called protection rings and using protection rings allow you to actually control the access to particular resources so ring 0 is the most privileged and this is where the Linux kernel and the VM live ring 1 and 2 aren't really used and then ring 3 is where user space lives and ring 3 is least trusted so it has the instructions running when the CPUs and ring 3 had the least privileges can't modify page tables and a user space process is restricted to only accessing memory in
its own address space so a system call was really the transition from ring 3 to ring 0 so a user space process needs to have a way to tell the kernel that it wants to access hardware send some data on the network and usually use some special system call or excuse me instruction so traditionally this was done with interrupts now there's faster ways to do this using CPU instructions like sis call and sis return and once this instruction occurs the kernel handles the particular system call and then sends the data back to the user space process now the reason I'm kind of pointing this out in particular is because there's overhead associated with moving data from one ring to the other
and it also means that it's more difficult more challenging to actually monitor these system calls so another kind of like overview I want to talk about before we get into the the meat of the talk is profiling tracing and how we can use these as security practitioners so for profiling you think about what profiling is this is like when we're doing Diagnostics or statistical analysis of what our software is doing the tool of choice to identify performance bottlenecks so you're looking at what are the what's the top function what is the the hotspot that I'm that I'm accessing on a profiler is going to tell you like maybe where the problem area of the code is but it's not
gonna give you the exact details for how you got there in order to do that you often use tracing tools so for tracing tools think about that a tracing tool like a court stenographer we're actually recording every function call that occurred and you can do tracing in userspace as well as tracing in kernel space oftentimes you might want to diagnose a particular performance issue by looking at which kernel functions were executed and tracing is often much lower level than logging when you can look at application log you can kind of diagnose some problems but we're tracing you're actually looking at the the function level execution and so we can combine and leverage profiling and tracing tools to do security monitoring
we can't do this alone we have to kind of include this in our pipeline but these tools can be leveraged for the collection of data so of course security monitoring is really the last line of defense in your infrastructure you're going to want to have your systems patched and updated and configured properly but ultimately even if everything is done perfectly right you still can have breaches and by having to monitoring infrastructure you're able to either detect those breaches or use some sort of auditing to once you determine there is a breach to say like what exactly happened so like this is something that you're if you have any type of infrastructure in that does any kind of monitor you're familiar
with this pipeline where you connect collect logs events behaviors you stream this data off host usually so it's centralized then you use the centralized data store to detect anomalies and an audit and investigate after a breach so really what's important in in what's I'm kind of the relevant to this talk is what you're gonna collect and stream into your monitoring pipeline and let's tie want to talk about if you're gonna collect system call data what that gives you what what can you learn if you're at the system call level so one way to think about why system calls are useful is because no matter what services you're running what database you're running what language you're using
ultimately these tools are gonna have to access the kernel and use a system call so system call let the system call level is a good place to introspect and actually identify behavior for what your services and and apps are doing at the system call level you're gonna get a really really high fidelity view of all system activity right this is gonna let you know like a lot of important things that are happening process launches file activity network activity hardware device access information like if the process is trying just to get the time of day and also system calls are used for inter process communication now kind of a caveat here is that of course you
want to monitor all system calls but there may be cases where some system calls have no legitimate use case on your infrastructure and the best practice there is to just block it using tools like set comp or second BPF alright so I hope everyone brought their steampunk goggles so now is the time to put on your goggles we're going to go into the lab all right get everything lined up here actually I'm not gonna wear those I can't see my slides so this is a little experiment you can run it you can do at home and I I did this and I can just kind of walk you through what I did basically I took a vulnerable version of
Apache struts and ran it in a docker container and then I and this version of Apache stretch has the remote code execution vulnerability that was used in the Equifax hack so what we're gonna do is load up Apache struts attach s trace which is a system call it's a really useful tool that we can use to view all system calls and then we're going to launch an attack against Apache struts using Metasploit and if you want to actually have the instructions and more details on how to do this this link here at the bottom we have a blog post to tell you how to do that alright so first of all let's load up struts this is a
you know kind of the default we're not actually going to run any actual applications this is just the built-in app that struts comes with so we load up struts and this is the vulnerable version so we have this is the actual bug in instructs and aspersion where unfortunately we're executing data that was received in a Content header and they're going to attach s trace to this running version of struts so s trace is an awesome tool if you never use it before you should definitely check it out it's a user mode program which uses the P tray system calls so P trace is a system call that allows you to attach one process to another and receive
events every time system calls occur and on the the one drawback to s trace is that it's it adds extra context switches so when you and there's different ways to run s trace you can run it in a blocking mode or non blocking mode but the kind of the key takeaway is like s Trace is great for diagnosing problems but not what you want to use for actual monitoring another thing I want to point out if you ever want to read an amazing man page check out the man page for s trace it's like whoever wrote this is a literary genius it's amazing okay so we've launched struts where we've attached test race and now we're
going to run Metasploit the Metasploit attack against use Metasploit to run an attack against struts so we're using just a reverse shell payload so here's the actual payload that gets sent to struts you can see here the the shellcode is is highlighted in red and then from s trace we can observe all of the system calls so I'm not going to show you all of the 60,000 the system calls that we saw in 30 seconds of running s trace but I'm kind of representing them here in terms of their frequency counts so you can see that some system calls are much more frequent than others and actually around 95 or 98 percent of these system calls you don't
need to observe they actually aren't really security relevant so let's look at the let's kind of trim down this like giant log and look at the the actual set of system calls where we can observe the attack occurring so here we can see a sequence of system calls and I'm going to highlight some of these where we're we're actually opening the payload that was dropped we're setting it as executable we execute the payload that establishes a reverse shell and then we can try it but we can the attacker can then read the secrets files so kind of distilling this entire set of system calls down to four that are really useful from a security monitoring
perspective we have Asha mod on the payload an exec on the payload itself and then a connect back to our verse shell connect back to the attacker and then we're actually opening a secrets file so the important thing to point out here is that there's a lot of data and there's only a few system calls that are really security relevant so what other types of attacks can we observe thinking about it from the system call level so there's been this rise and cryptocurrency mining attacks where the attacker isn't actually getting into your infrastructure and doing reconnaissance it's really just the attacker is really trying to stay under the radar and steal resources and so oftentimes there's fewer signals so any
we can do to detect this type of resource abuse is useful so for the case of crypto miners you can use process execution to observe the execution of like crypto mining processes so kind of in summary indicators have compromised from a system call perspective our network file system and process from the network we can see an attacker open a reverse shell using a connect system call the file system we can see an exploit payload written to a file or mining software being uploaded to the server and from the process perspective we can see anomalous processes being launched so what kind of attacks can we not detect with system cause usually information stealing or side-channel attacks are much harder to observe
because they're not actually doing any kind of behavior where you're trying to steal files or launch processes and steal resources really you're stealing memory data that's in memory so for example heartbleed the indicator is not at a system call level it's where you see a different user behavior in this case the response size anomalies for the TLS heartbeat messages and for meltdown inspector the attack here is breaking the least privileged principle in the kernel and allowing and the attacker to read kernel memory that they shouldn't have been allowed to read now there have been there has been some research that has shown that you can use CPU performance counters as indicators for these attacks and I have some links here
and again this is a case where security practitioners can use performance tools for actually for security use so at the system call monitoring indicators of compromised are buried within the system call events particularly the system called parameters the parameters are often more important than the system calls themselves and observing all the events is impossible there's too much data so strategies for kind of distilling everything down are to use whitelisting for files and processes or even better use some sort of machine learning framework and push all the data into a null stack and then analyze it there all right so now that I've shown you that it's what we can observe at the system call level how do we actually get
this data the colonel like I said the system call is not a normal function call it's a little bit special and so we want to be able to get data in a performant way and we also want to be able to get only the data that we really care about so there's a couple different approaches for data sources so I kind of mentioned that P trace is the system call that you can use to attach to a process but it has a lot of overhead associated with it so you probably don't want to use P trace another approach is to just write a custom kernel module yourself so the Linux kernel allows us to write modules
you're probably familiar with this and if you want to really get your hands dirty you can write a module that is insert into the kernel changes kernel pages to be writable and you just find the system call table and insert shim or wrapper function for every system call now this will work but it's going to have a lot of maintenance cost it's going to be really unstable and you're not actually using any of the features that the kernel provides another a better approach is to use either K probes or trace points so K probes you can think of it's like if you ever use to watch point in a debugger a K probe is a kernel feature that allows a user
space process to attach to any function in the kernel you can pretty much yeah you can pretty much attach to any location that kernel other than some of the K probe regions like you can't kaypro back a probe and this allows you to dynamically modify the kernel at runtime K probes are really useful because if you want to dig down even deeper than a system call level there's some kernel subsystem that has a function call that you really want to be aware if it's ever executed you can use K probes K probes to do that trace points are like K probes that have been pre-assigned so these are statically pre-compiled events known to the kernel
the kernel developer marks a trace point and declares it in the kernel source code and you can actually observe the available trace points in a running version of Linux bike adding this sisyphus file here sis kernel debug tracing available events so what are the trade-offs between using K probes and trace points so probes allow you to attach to exactly the function you want to monitor but the stability is not guaranteed so in the internal carnal functions can change over time for trace points that's it's kind of the inverse the pros are that it statically defines the stability is more assured but the con is that the exact data you want may not be available another kind of trade-off between these
two are that trace points actually have predefined offsets for the parameters so as I said before the system calls themselves are important but even more important are the parameters to the system called Aricent like what was the actual executable that was run what is the file that was read and so trace points actually make it a little bit easier to get that parameter data all right so now that we've kind of figured out how to set watch points in our kernel how do we get the data from kernel space to user space how do we get the data from ring zero to ring three so two built in kernel features we can use our F trace and the perf subsystem so f
trace is a kernel feature that exposes events in the system file system and you can actually set trace points and use F trace all through reading and writing files insists kernel debug tracing so you can create a trace point add add a watch for it and then just cat a the trace file and view all of the events as they occur perfects are a little more full-featured and it's actually a system call where you attach either a trace point or a K probe to a perfect and then you can get the data from an in map meet an M mapped region of memory and you can do both you can both get a log of data through perf
you can also get counts so if you want to know how often something has been executed but usually for security monitoring you're more interested in like a log of all the kernel events that of that have happened so building on top of F trace and perfe vents we can use a feature in more recent versions of the kernel called EVP F or extended Berkley packet filters so if you've ever used tea to be dumped or iptables you're using Berkeley packet filters and this is actually a virtual machine that's inside the kernel and it has just a few registers and it's it's very it's very basic but allows you to inject byte code into the kernel without actually writing
a custom kernel module and why this is really cool is because it allows you to do the exact computation you want inside the kernel because the built-in features of the perf sauce subsystem and F trace may not give you exactly what you want especially since they were designed more for performance monitoring so with extended Berkley packet filters the instruction set for BPF is extended so that you have additional registers you can have functions within your byte code and you can do a little bit more sophisticated work however you still can't you still can't do loops and you you still can't your your code can't be super complicated so the the basic workflow here is that you write an e BPF
program you compile it into the byte code using a tool chain that's based on LLVM and then you can attach this byte code to either a cape robe or a trace point using a BPF system column and so anytime that trace points or k probe is hit the BPF program that you attached will be run and so this is really great because no kernel module is required and you can construct much more sophisticated analysis within the kernel the cons are that there's this is a more recent feature and so it's not supported in order kernels there's kind of a spectrum of BPF features from starting back from kernel version 318 up to like the latest versions of the kernel that
have added additional support for trace points and more and before that they added support for K probes so usually just like with assembly code you're not going to write assembly code directly you're gonna use some sort of tool kit on top of that so there's an awesome suite of tools called the BPF compiler collection and there's a whole playground of tools that you can try out that are really useful that allow you to use BPF code to get data from colonel in a really performant way so in the attack that we looked at earlier we saw that we were interested in open calls execs and TCP data or connects so for example in the BCC toolkit the open
snoop exec snoop and TCP connect are BPF tools based tools that will give you that information so looking at an example of a eb PF tool that can give us information during an attack i ran a program called file life during the Apache struts example that we ran through earlier and we can see that so what this file life program does is shows us really short-lived files by washing for creates and then unlinks in the kernel and so we can see that in this case the the payload that was dropped with only lasted point oh one seconds and this is something that could be a useful security signal for observing payloads that are written executed and deleted immediately so in
summary in terms of data sources k probes and trace points are the the best approach p trace is - has too much overhead and writing your own custom kernel module while very flexible isn't necessary with eb PF the only issue here with eb PF is that if you need to run on an older kernel version it's it's you're not gonna be able to do that and once you've kind of figured out how what data you want to get out you can use F trace perf or EBP F so what are some challenges that occur in using these tools so one one challenge is containerization so a big benefit of grabbing data from the system call level
versus at the application log level or even the network level is that the system call level you have a lot of context about which process process ran which system call for example right at the network level you don't necessarily know which process is running the is sent the particular packet but at the system call level you can see that a given process is doing this network activity so one thing we might want to attach to our monitoring is which container is running that did this particular system call and this is actually not easy to do out of the box so a container is really a user space concept which utilizes several kernel and user space features
it's like so we're using it uses namespaces which provide isolation cgroups which give us resource control and user space tools and file systems so the Linux kernel does not actually have a notion of container ID and there have been a lot of proposed solutions one is to just use the pid' namespace of the unit process as the identifier for an event and so every system call is going to be associated with some process ID and every process ID will be in a namespace another approach that's been proposed and there's some patches for this are to use allow the user space to actually set the container ID and because the concept of container is still kind of being defined the the
kernel it doesn't really feel like responsible for managing that so this is something to kind of keep in mind if you're interested in this type of monitoring so in summary a Linux system calls are the universal API for infrastructure the security signal value is really high but you have to filter through a large amount of noise and finally EB bf tools allow for focus detection in the kernel but you have to use a more recent version of the kernel in order to take advantage of it and if you're interested in more information about this i've attached some resources to some these slides this is you know i'm a user of these tools i haven't been
developing them and there's some awesome resources online and that's it thanks a lot [Applause] we have a ton of time for questions if anyone has any questions hi so how would you recommend mitigating some of the attack surface introduced by EB PF so that's a good question so your your the question is if you're using EBP F you're introducing an additional attack surface into the kernel and that's that's certainly something to consider one one approach is to use containers so you you you don't allow your certain containers to use the BPF system call so you block them from doing that now that's going to not that's going to limit attacks they're actually using the BPF system
call but if the attack is information stealing then it's not going to mitigate that and this is something i think that like the the BPF developers are very aware of that you're you're opening the attack surface but it is something that should be considered that's a good point
hey I wanted to ask about detecting speculative attacks like meltdown using I think we can use trace points and branch prediction to catch any speculation misses and anybody else using a missed branch with trace points do you think that's gonna work so I don't I don't know so this is one one thing that like the kind of the the indicators that like I was discussing are very clear you know you see a payload that's that's dropped and it's kind of like it's it's not it's it's black or white whether it's it's malicious or anomalous the the techniques for detecting meltdown inspector are often statistical so you you're looking for anomalous page faults or that sort of thing so it's it's
certainly it's certainly doable but I don't know how general-purpose they are in all cases thank you
thank you for this talk this was actually the talk I wanted to go to so I really enjoyed it I'm in a AWS environment we're running a bunt to you know 14-point Oh for moving to sixteen point Oh for what you've mentioned here are there any restrictions or limitations in the AWS environment well we don't control the hypervisor so this is all above the level of the hypervisor so the real restriction there and the environments you describe are the kernel version you're running so if you're running a more recent version of the kernel then the features some of these features are available if you're using the a you know a server lists if you're kind of going really far into the future
of using server lists then yeah you can't use these features so it sounds like you're kind of in the sweet spot where that is these are going to be available hi I had a question so in your example you're profiling the Apache struts web server and it seems to be like you know it's gonna be a ton of calls and do you have any let's say advice as far as automating some of this so that you can not attempt to store every system call but to develop your own signatures and upon detecting that then sending a notification or an email or something you know limit that the amount of noise really yeah sure so I
mean the the for for reducing the amount of data first of all you don't want to record or listen to every system call that's that's the first thing you want to focus on the ones that are really the best security indicators another thing you don't need to do is if you have a file that's being read multiple times you only need to report that once another approach is to use whitelisting so you have you have a set of processes files or directories that you don't care about and a lot of the techniques that have been used in the the audit sub-sub system can be used here where you're you're looking for indicators that are more likely to be anomalous
but it there's no really one easy solution it's like often you're gonna have to tune based on the service or application you're running so recently we were struggling with Phipps enabling Phipps on EWS so I came across using three modules so would that be a way of mitigating such kind of risks like using signed kernel modules pardon KS uses signed kernel modules so in in the case of like the the attack that I used here we weren't actually exploiting anything in the kernel so we weren't inserting a module that was unsigned or sign so we we didn't actually have to verify anything so really if you're using signed kernel modules that's that's good but I'm not really sure if
that's related to this exactly is there a question in the front was one more last question so some work is being done to port DTrace to the Linux kernel do you have any idea whether that's going to be in sort of the same space as EPF yes and do you have a prediction for which one's gonna be better so yeah II BPF in some ways it's kind of like a back to the future version of dtrace so dtrace had a lot of features that included a higher-level language that you could more easily kind of describe exactly the data you wanted to get out and I think that they will probably converge in some way I know that there's
work on higher level tools so kind of a higher-level language to describe what you want to get using EB PF but I think most likely the EB PF base tools are gonna be more useful just because they they just have more momentum right now but I think there definitely should be cross-pollination of ideas between the two two projects yeah okay that's all we have time for questions let's thank the speaker again