← All talks

"seL4 & Family: Fast, Trustworthy, Cheap, Deployed" - Ihor Kuz

BSides Canberra · 201832:01630 viewsPublished 2018-08Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
Mentioned in this talk
Platforms
About this talk
BSides Canberra 2018 Slide deck: https://drive.google.com/open?id=1EcboIaABt6jd52-OoSNJfAAPbOieWR-o
Show transcript [en]

we actually have a really great talk now by Igor coos talking about self force which is the formally verified microkernel and he'll be talking about self Loren family fast trustworthy sheep deployed so let's welcome a hall with a round of applause all right thanks everyone so yeah I me and I'm gonna talk to you about SEL for and the work were do well I'll introduce what it is in case people aren't aware of what it is and then I'll talk about the work we're doing in order to make it more widely used basically so there's this common common knowledge that if you want to build a system or you want to get a service then you want it to be fast you

want it to be good and you want it to be cheap but you can usually only pick two so we've got SEO for and family which is the platform around it and other tools related to it and what we want is all three of those so I'm gonna talk to you about why I see SEO for is fast and good and then I'm gonna talk to you about what we're doing to make building systems with it cheap so SEO for maybe show of hands who knows what it is who's never heard of it a couple of people all right cool so SEO for is a small fast capability based operating systems microkernel so what does that mean it's small so think

about the order of 10,000 lines of code of see and assembly what that means is that it has a small attack surface so there's not that much code to attack and more importantly it's amenable to verification to formal verification so you can actually do proofs about the whole code base it's not too big it's fast what does that mean it's the world's fastest operating systems kernel and that's designed for security and safety so what does that mean it means that it's suitable for use in the real world so our goal with developing a seal for was to build something that was pure that could be formally verified but that could be used for real-world systems we didn't want a toy system we

didn't want an academic system we wanted something that we could build real systems on top of it's capability-based so what does that mean it's built with the capability based security model and that means that unprivileged code needs permission to access resources and to do communication so by default you can't do anything unless you're given explicit permission to do something and so what that means is that the kernel can confine damage from attacks in unprivileged code and then finally it's an operating systems microkernel so what that means is code that runs in the up in the sorry it's the operating system kernel which means that it is the code that runs in the privileged mode of

hardware and that means it's the most critical part of the software because anything that the operating system kernel can do anything that's running in privileged mode can do whatever it wants in the whole system so there's a little picture of what SEO for is it's that middle layer between the hardware and the user level applications it's the part that runs in privileged mode and basically does virtual memory management does inter process communication dust read management and provides access control and then everything else in the system is written on top of the kernel runs in unprivileged mode and basically runs as components that can be isolated from each other so this is kind of an overview of what SEO for is bring

everyone up to speed ok so I mentioned it's fast so what does that mean well some numbers to just look at the goal of making a sale for fast is to make the IPC which is inter process communication performance fast so that when you need to switch from component process to process and use the level or when you need to do communication between the processes you can do that very quickly and that's important for a microkernel because it relies on this switching and this communication so it's the fastest microkernel out there has the fastest IPC interesting thing I was when we started developing a steel floor with this goal of formally verifying it and so on that we wanted to make sure that

it wouldn't be less than or more than 10% slower than the current version than the existing versions in the end it ended up being the fastest version ended up being faster than the existing versions so that was a good win so it's fast it's good what do we mean by SEO for is good well in this case we mean that the code is good the code is trustworthy and it's the world's most verified kernel so we have a lot of different ways of saying this but basically it means that we've formally verified that kernel so what does that mean we've got a mathematical proof that it's code is correct with respect to its specification we have a mathematical

proof that enforces strong security properties and we've proved safe from upper bounds on the worst case execution time of the kernel so let me explain this picture here if you look at the middle there's a green box that says specification so what we did was we started with the specification of the specification basically tells you what the API does so for every API function we specify exactly what it does what the preconditions are what the post conditions are how the state changes after you run that function so this is all formally specified in in Isabel which is a formal logic what we then do is go to the box underneath that and write code that implements that

specification we take that code and we translate it into the same kind of formal logic Isabel in Isabel in higher-order logic so we have this formal specification so then we've got this visa specification and this API specification and then we do a formal proof so a mathematical proof that the C's specification that the C code implements exactly the API specification why is that important well obviously it's important because you want to know your code is implementing the API correctly it's gonna do what you think it's going to do but more importantly it means that the C code implements exactly these the API specification so it doesn't do anything more and it doesn't do anything less and this is important

because that also means that your C code doesn't do things like buffer overflows access the wrong memory it doesn't do things like null pointer dereferences and so on so that takes away a whole category of potential problems in your code so you know that your code does the right thing and you know it doesn't do all the bad things that it could do so anyone who's paying attention will have a bunch of objections here at least two so one of them is well how do you know that the specification is right how do you know that you got the specification right and the answer to that is well you look at it really hard and you make sure

that it says what you think it says but we went a step further and so we said well we want to use this code to build secure systems so what kind of properties of secure systems need well there's the standard CIA properties confidentiality integrity and availability and so what we did was we made proofs on top of the specification that say that if you set up your system correctly that you will have confidentiality between different processes as long as they're not allowed to share any data or have any communication you will have integrity between these processes that you won't they won't be able to modify each other's data and so in this way we have

proved that the specification gives us the properties that we want so the other objection people have is well okay you've proved the C code but that's not what runs on your hardware you still have a compiler in the mix you need to compile the C code into binary code and that's what actually runs so that's true we didn't take the route of using a verified compiler but what we did instead was we verified that the binary code that the compiler spits out does the same thing as the C code and so we took a bind the binary code we took a model of the hardware that runs on so this was arm in this case

we read in the binary code we create a formal model of what the binary code does and then we extract control flow graphs from the binary code from the C code and we show that they do the same thing and so what that gives us is a proof that the binary code that's running as long as it's configured correctly gives us the desired security properties and so this was a big deal this happened over the over the last almost 10 years and and so these proofs are mathematical proofs and they're done in in Isabel which uses higher-order logic so it's so SEO for is fast and it's good which is awesome but it's not magic security fairy dust so just

sprinkling some SEL 4 on your systems won't make them secure so what's the catch why why not we have a bunch of assumptions in the proof so one of the big assumptions in the proof is that the hardware is correct and it does the right thing and that's not always the case and particularly we've seen recently with meltdown inspector that the hardware does really bad things sometimes not all of the code is is verified yet so there's some part of the kernel initialization there's a little bit of assembly code that isn't fully verified yet SEO for currently runs on a single core it's verified for a single core so a lot of systems want to use

multi core we're still working on that and then SEO for again is the microkernel that runs in privilege level there's a lot of code that needs to run at user level so we need to verify libraries and user level code and stuff like that so there's still more proof to be done it's a microkernel not a full operating system so most people aren't used to working with the microkernel you need to provide your own drivers you need to provide your own network stack your own storage stack it doesn't give you POSIX so if you want POSIX you need to implement POSIX support on top of that so kind of difficult to work with and that leads to a higher development

effort and cost for people using it you need to port it to potentially port it to a hardware that you're interested in it doesn't run on everything under the Sun you need to develop those OS services you may need to redesign your systems if you want to run them on top of co4 and it doesn't support every language under the Sun so we support see we have some C++ support we support rust pass well there's some other languages we support but you know if you come with your favorite language of the day probably you will need to develop support for that and then finally the verification effort and cost are still reasonably high in particular if if the

kernel changes if we make changes to that if we add things like to fix meltdown inspector or if we add features or if we change the API around then those need to be verified likewise if we port to a new platform and add platform features then those need to be verified as well so it's not perfect but we want to make it perfect and so this is a little bit of a story about how to do that before I talk about that a little overview of building systems with SEO for so if you want to build a system to run on SEO for this is approximately what it should look like so you've got your hardware your processor running

you've got the little microkernel running on top of that and then everything else runs on top of that in user mode so in unprivileged mode and so if you need to access file system you need to provide a process that deals with file system access if you need to access the network you need to have a process that gives you network access you need your device drivers so to access hardware you need to have device drivers process management and memory management and so on so you need to provide all of those some of those are already available some of them you need to write yourself and then you write your application so you need to take

your application you need to structure it in such a way to make it take advantage of the confidentiality integrity properties that SCF or it gives you and build your system in that way it needs to communicate with all of those other components on the left so the file system and so on to do things and then eventually you get tired of doing that and so the rest of the stuff you just run on Linux running in a virtual machine you don't care that much about that the main thing is that SEO for is a context switching engine so it allows all of those processes to run and switch between them it gives you this strong

isolation between the processes so if one of those processes goes goes crazy maybe one of your device drivers is buggy and it goes crazy it can't affect the other processes and then finally it gives you controlled communication between the processes so only if you explicitly allow communication between processes can you do that so that's awesome we want to use SEO for to build trustworthy systems this is I'm from the trustworthy Systems Group at data 61 our goal is to build trustworthy systems but what's a trustworthy system so let's start with software every system that needs that has some kind of critical functionality or needs secure functionality will have some trusted software and trusted software is the

software that must work correctly to ensure that security requires our men or another way of looking at that is it's the software that's able to break security of the system so if the software that if it misbehaves it can basically break the system and do whatever it wants so we take the collection of all the trusted software on the system and we define the trusted computing base and so this is just all this trusted software on the system next we look at trustworthy software trustworthy software is the trusted software that is worthy of the trust that it gets so we put all this trust in this software but should we do we know that it's bug free do do we know that it

doesn't have exploits in it do you know that it's not gonna do something malicious we don't know but when it's trustworthy we've done something to make sure that it is actually worthy of being trusted so there's high assurance and we can do that by doing good coding practice by doing a lot of testing by using awesome languages like rust and so on by doing formal methods so finally the a trustworthy system is a system whose trust works trusted computing base is trustworthy so we've actually looked at the trust word trusted computing base and made sure that it is worthy of the trust that it has so how do we build trustworthy systems with SEO for how do

we use it to actually build trustworthy systems like I said it's not magic security fairy dust so you can't just put something on SEO for and it'll be secur so there are a couple of ideals in in building a trustworthy system the first one is a minimal TCB so minimal trusted computing base you want to minimize the amount of code that needs to be trusted because the more code there is to be trusted the more likely it is that something will go wrong and it the bigger the attack surfaces and then you'd like to verify that TCB so make sure that the tcp is trusted apply formal methods or apply good coding or use good languages etc all right so

these are the two kind of drivers of building trustworthy systems and then in doing that they're a bunch of well-known principles for building secure systems to follow us or things like lease privilege make sure that a component or a process doesn't have any more privileged than it actually needs to do its work separation of privileged make sure that components don't do too much and have too much privilege to separate out tasks so that the amount of privilege a component has is reduced least common mechanism so make sure that you don't have all the components using a single component as a single other service to get something done spread that out as well so that there's no kind

of one point of compromise and then defense-in-depth have a bunch of different mechanisms to catch attacks or to stop attacks from happening and so we use the SI four mechanisms to do that so we use the fact that we have a microkernel we start with a really small trusted computing base compared SEL for size of ten thousand lines of code to something like Linux which is which is millions of lines of code and the starting TCB is a lot smaller already then we need to apply isolation so we need to apply the fact that SEO port does isolation really good or really well and apply that so that gives us things like separation ship privilege

again applying isolation helps us to reduce the TCB so we can take the trusted functionality out and isolate it from the untrusted functionality we use controlled communication so control who gets to communicate with who and of course verification to make the TCB trustworthy so have preferably formal verification to so here's an example we call this the security retrofit and this is an analogy this is basically taking existing systems and retrofitting them to run on SEO for to take advantage of SEO for its property to make them more secure and the name security retrofit is an analogy to a seismic retrofit when you take an existing building and you modify it so that it's better better resistant to

earthquakes so this is what we're doing with with Software Systems so we start with a regular system the example here is a UAV on the mission board on a UAV this is based on a project that we've done in practice we start with a regular system and the first step is to take that it runs on Linux so we take the whole system and run it on a virtual machine on top of SCL for now this doesn't give us any security but it's the engineering work needed for the next steps next we take that virtual machine or take that Linux instance and we split it up so we take some of the functionality and we run it in a

separate virtual machine or a separate instance of Linux that starts with the isolation already we have some isolation between different functionalities then we also take some of that functionality and we move it outside of Linux and outside of the virtual machine and implement it natively so in this example we've taken the network stack up and implemented it natively and we reiterate that process and take more and more identified the trusted components in the system and remove those outside of Linux into native components and so what that does is removes for example this huge Linux from the trusted computing base so you no longer have to rely on Lync it's being correct or no one attacking Linux

and so on and so you get this set of trusted components running preferably natively that you can then apply verification techniques to and make sure that they work correctly so in the project we did we stopped at moving the components out natively we didn't necessarily verify all of them although we use some good product good programming languages that give us a lot of properties already to implement those we've done a bunch of projects with SEO for so mostly on autonomous vehicles and it's it's open source so it's usable by by people outside so we're building up a community around that and we've seen people coming up to us and using this and so people are using SEO for

independent of us to build products to do research and so on so that's really good we're getting more and more interest from people doing IOT people doing automotive from defense etc etc so this is really good but you know when I asked who knew SEO for there were a couple of hands who here has actually looked at SEO for or done anything with SEO for okay one person so I expect most people have it right and why so I mean it's awesome it's great there's all these awesome things to do people are using it but it's not in the mainstream and so we've done a lot of work on SEO for the next step is to make it

mainstream and what do we need to do in order to make that happen well there's three main things we need to make it cheaper so it's expensive and development effort but it's also expensive to do the verification it's also expensive to make sure that that trusted computing base is actually trustworthy to make sure that we can verify these components and so on so what we would like is to get proofs for free we'd like to be able to write software and just have proofs that it's correct that it's good we need to make it more relevant so whenever we talk to anyone about using SEO for they've got all these requirements that we don't mean we want to run it on this platform

we want to run it on multi-core we want to use you know Ruby or something crazy so we need to make it more relevant we need to add more features to it and then finally we need to make it scale so the effort it took to verify SEO for huge pretty large especially for a group our size and we're constantly reeve arif eyeing things so as we modify a sale for as it evolves as it grows the the proof needs to evolve as well it wasn't a one-shot thing that we did and throw away we need to keep that active and so I'll talk about a proof engineering what it takes to actually do that so let me

start with making it cheaper when we developed a feel for we estimated that the cost of developing and verifying the kernel was about $400 per line of code so multiply that by lines of code and you get the total cost compare that to other software development so if you take low assurance software development basically using a traditional embedded kernel and developing a traditional embedded kernel you're looking at about one hundred two hundred dollars per line of code and if you look at high assurance software development so something that needs to meet Common Criteria requirements l6 plus is the stuff that you know you need to run you need to have to do top-secret work and

so on that goes up to a thousand lines of code so SEO for is not the cheapest but it's also not the more expensive and we argue that it gives you more than the Common Criteria el6 plus level of assurance so it's still pretty good but it's still expensive in particular if we need to verify all of the user level stuff that makes up the trusted computing base as well that's a lot of verification work that still needs to be done and so what we want to do is automate proof generation so get proofs for free so we've got two main approaches for doing that right now both of them involve programming languages we have a project called cogent which is

developing a high-level language for developing system services in so we're developing services like drivers we're developing services like network stacks like file systems and so on in this language and the idea with this language is that you take the language you compile it and that generates your executable code so your binary that you run but it also executes SAP roof of that code as well so you get your executable code and your proof at the same time which is really good this is work in progress you really take budget and do stuff with it yet we've got some examples the other approach is is based on K Kamel which is a verified optimizing compiler and the

idea there is that you write your theorem in a theorem proving language so for example in a higher-order logic and it and from that generates code that implements what you want so you actually don't write your program in a regular programming language you do that in logic and you get good code out of that and because the compiler is verified as well you know that the code matches your theorems both of those can be used to then hook into sd4 and systems running on top of that to verify your trusted computing bases so these are two things that we're doing to make it cheaper to make it possible for people who aren't us weren't in our group to actually do

this kind of stuff and build systems on top of SEO for we also need to make it more relevant so I mentioned a bunch of things before about why people probably aren't using SEO poor so things like multi-core we need to have multi-core everyone wants to run things on lots of cores people apparently have a lot of computation that they need to do so we do have an implementation of SEO for that works on multi-core so an SMP implementation the interesting thing is we use a big log for that which you shouldn't ever do except it's a microkernel and it actually is okay so that's good the hard part there is doing the verification of a multi-core kernel so

there's a lot of work that a lot of research work going into that right now figuring out how to verify concurrent systems that run on multiple processors it's it's a tough task we're doing it and hopefully we'll get there we want to run it on embedded systems we want to run it on real-time systems these are systems that have timing requirements they have strict deadlines that need to be meet met otherwise the quadcopter the helicopter falls out of the sky and we don't to happen so we've done a lot of work on that we've got a version of SEO poor that's and this is going to become the main version of SEO for that's for mixed

criticality systems for real-time systems so that exists and we're currently working on proofs of that system and we also need Prevert for real-time systems we need to have worst case execution time analysis so we need to know how long does it take when you invoke a system call because you that's part of meaning your timing deadlines and so we've done a full worst case execution time analysis on the kernel and it's one of the only kernels that's ever actually had that done so that's really good so we're doing really well on the real-time systems aspect timing channels so we've talked about SEO for providing isolation provides confidentiality but that doesn't include timing channels right so those are

explicit channels and so as we've seen time and channels are tough right Jen gave this talk about all these timing channels that are possible we have some approach that's also challenge work of dealing with timing channels in SEO for by doing cache coloring so making sure that things that are actually isolated don't share any cache which is awesome that works except that the hardware is broken and we can't rely on the hardware necessarily so we're doing work on that we're trying to change the hardware as well we're trying to get influenced to actually make the hardware work I guess one way of that doing that is finding things like meltdown inspector and embarrassing hardware met vendors

into changing it and then we need platform support so we want it to run on a lot a lot of hardware currently runs on a lot of ARM processors it runs on x86 we're we're doing verification for 64-bit x86 and we're doing support for 64-bit arm as well we need to expand that as people have more requirements and so on but also on the user level when you move to new hardware platforms you need to provide disk drive I'm sorry device drivers and so on and we're looking at using cogent and KML to do these kind of things to provide verified drivers and so and then finally we need to make it scale so currently the proof is about

half a million lines of proof code this is one of the largest mathematical formal proofs in the world nope which means nobody's really done this before and we're adding more features we're adding support for virtualization hardware we're doing this multi criticality kernel where´s we're adding support for multi core we're adding support for more platform 64-bit and alongside the 32-bit and so on and it's tough right now it resembles the early days of software development where people had no experience building large systems and so they didn't know what to do to actually make building designing and writing these systems maintainable and so this is what we're doing so we've invented a new field called proof engineering which

is analogous to software engineering so it's two proofs what software engineering is to software how do we improve the proof tools how do we improve automation how do we actually structure proofs such that they're expandable such that they can grow and not become unwieldy and unmaintainable so we need to learn how to you reuse proofs more we need to have more abstraction in our proofs we need to do all of these things to make it scale and once we do that we can start really kind of ramping up and verifying with a bigger and bigger systems more and more systems so these are the three things that we need to work on to make this

usable and hopefully when we've done this in a couple of years it can come back here and say who's building systems on SCO 4 and you know a lot of a lot more people will raise their hands so in summary takeaway message is that we produced verified technology that's high performance and has been deployed in systems around the world so people are actually using this to build things but our goal now is to radically change the way the world build systems so we want to get to a point where there will be no excuse for bad software in systems there will be no excuse for software that has vulnerabilities that has security bugs in it and we want the future to be

verified thank you

we've got time for a couple of questions if people want to ask any other one over here I get a tow got CA technologies I've worked with embedded software for a long time and most of you probably don't know that we built the face where I write us for the intake triggers in Australia so my question is are you planning on putting this to PC architecture the question is whether we're planning to port this to the PowerPC architecture and I laugh and I say that's an awesome question because we get that question a lot automotive people like PowerPC as well we resist maybe someday it'll happen but currently we're resisting we're hoping that eventually people stop using

PowerPC we won't have to do it I don't know if that will happen or not if it doesn't happen we probably will in the end any more questions in this presentation

[Music]

so the question is whether the risk 5 offers opportunities to do the hardware right and avoid a lot of these timing channels that exist in the commodity Intel and ARM processors yes we we're very interested in risk 5 and we're very interested in working together with the community to make sure that they do things right and to influence how they do things to influence that the hardware is written in such a way or is designed in such a way that the microarchitecture doesn't have all of the state that that leaks information and it provides the right functionalities for closing the channels so yes risk 5 awesome actually thinks elf horns one of the crowning achievements of Australia so let's let's

thank people for his presentation [Applause] [Music] [Applause]