
thanks everyone so I'm a and yeah I'm gonna talk about formal verification and whether that leads to better security I'm a systems engineer I'm not a formal methods person however I do work with I do work in the team that developed SEO for microkernel that's the world's first formally verified operating systems kernel and so I know a lot of formal methods peoples and I'm I'm also even friends with some of them I've heard a lot about formal methods and I've heard a lot about how formal verification is great and how it leads to better security and so on and slowly that message is sunk in but I really it took me a while to really kind of
understand why that's the case as a systems person so I'm gonna try to distill some of that knowledge here and hopefully pass it on to you as well quick show of hands who here considers themselves a formal methods person all right that's what I thought not very many so that's good because I'm not one either so as I mentioned we're the trustworthy systems group and we developed a formally verified operating systems microkernel and when that got released when we started doing things we started seeing headlines like these hacker-proof coding hackers can't break this hacker proof it's unhackable etc etc so when you see these kinds of headlines you're either gonna cringe with dread or you're gonna get giddy
with anticipation that you're gonna break this hacker proof code so I want to talk about whether form of verification really does lead to better security and whether that's the same as hacker-proof code so is it really true that we have hacker proof code or not I'll give you a bit of a spoiler yes it does lead to better security but no that's not the same as hacker-proof code so I'm gonna give an overview of form of verification in introduction to non FM people then I'm gonna talk about why we claim that form of verification does lead to better security so what is better security and why does it lead to and then I'm gonna talk about why that
isn't the same as hack proof code why it's not a silver bullet so formal verification formal verification is proving properties about stuff with math now properties can be a lot of different things this can be correctness this can be security this can be things like liveness safety and so on and then stuff can also be a lot of different things this can be code this can be protocols this can be hardware this can be processes etc people prove stuff people prove properties about lots of different stuff so for this talk I'm gonna focus on proving correctness about code about software so if you think about if you ask anyone when is code correct when do you
consider code correct mostly people will say well when it does the thing I expect it to do so I have some expectation of what the code should do I wrote it write it I run tests until it does what I think it should do more formally that's the specification you have a specification for what your code should do and it's correct when it meets that specification so if we want to do formal verification we need to have a formal specification of the code this is something that's precise something that's unambiguous and abstract it tells us what the code should do not necessarily how so as an example if we want to talk about lists and sorting
lists we can have a formal specification of what a sorted list is informally we might well we have a function called sort list a for abstract that takes a input list and outputs a list so input is L output is L prime and informally we can say well this specification for sorting a list is that L prime is sorted which is nice but formally we want to be a little bit more specific so we provide a more formal definition the input list and the output list are the same length the output list is a permutation of the input list and the elements in the output list are increasingly larger we also deal with abstract data types
rather than concrete data types so in our formal specifications we'll have an abstract list something that you can index something that you can take the length of and so on but it doesn't tell us how the list is necessarily implemented so it's different than an implementation which might be a linked list for example and then we have our implementation this is code for example the in the C language or another language and it's concrete it tells us how we do something not just what but how so we call the form of specification a we call the for abstract and we call the implantation C for concrete now when we're doing formal verification we use a
formal model of the code so we don't use C source code directly but we use a formal representation of C code for example and that comes with a whole bunch of rules about what the semantics of the C code are is so what happens when you run certain instructions great so given this formal specification given this code we want to prove that it's correct so what does that mean a little bit more formally we want to prove that the state after executing the code fulfills the specification the abstract specification so what do we mean by the state well we have an initial state we have some state of the system some abstract state of the system we execute
the code and then we have some state after executing the code so we have s as the initial state T is the state after execution and then execution is defined formally by the formal model of the code and it defines the behavior of each command and how it modifies the state so once we run the code we have some idea of how the initial state is modified to the final state so what do we want to prove well we want to prove that the final state after executing the code so state T fulfills whatever specification we have and there are two common approaches to doing that the first is logic the second is refinement so I'll talk about both of
those the idea behind logic is to introduce these things called triples and a horrible consists of a precondition some code and a post condition in the intuition behind logic and doing proofs in her logics and these in these in these heart triples is that if the precondition P holds before you execute your code C then the post condition Q will hold as well in a correct program so if the program is correct if P holds and then you do execute C then Q will hold so as an example we could say that the precondition for our abstract sort function is that L is a valid list so the input list is a valid list and the
post condition after we sort the list is that the output list is a valid list that there's a permutation and that the elements are increasing so this is basically our specification and so we find that when we're doing this kind of logic these are triples that our post condition is typically what represents our representing our specification it's a desired property of the code when we want to prove such a triple we need to prove that if P holds for the initial state when we then modify that state using the commands in to state T then Q will hold for that final state and in doing the proof we use the behaviors as defined in the
formal model of the language of the implementation to show for each command how it modifies the state and that the final state matches Q if we have a larger program with a lot of commands then we break that up in 2 sub-goals so we have kind of sub post conditions and sub preconditions but we show that the overall chained fit matches those initial pre and post conditions the other approach is refinement so we say that the implementation refines the specification so the concrete implementation refines the abstract specification and the intuition behind that is that any behavior in the concrete implementation is allowed by the abstract specification as well as any behavior or sorry all the behaviors
in the abstract specification are present in in the concrete code typically we'll separate the specification in the implementation and the specification will have an abstract version or abstract specification of every function that's in the concrete implementation so for example our abstract specification could be an abstract function to sort the list which again has the same properties that we talked about before the lengths are the same it's a permutation and the elements are increasing where we want to prove refinement what we need to show is that when the concrete code and the abstract code start in an initial State and then you execute them the state that they end up in is essentially equivalent so they
start in a state that's essentially equivalent they end up in a state that's also essentially equivalent which means they do the same thing now we need to formalize this essentially equivalent idea and so we introduce a state relation and this talks about how the initial State or the concrete state maps to the abstract State what's the relation between the two states so here's an example of a state relation that says okay well the lengths are the same and the elements of the lists are the same so if we have two essentially equivalent States in the abstract and the concrete then they should two lists should have the same length and the same elements and then the states are the
same we take this state relation and then we show that formally that if the state relation holds between the initial States s and s prime then after we execute C after we execute a the state T and T prime will also have the state relation between them hold and if that's the case then we can say that C refines a so we have this kind of nice square so that's nice so that's formal formal verification 101 hopefully everyone now can go off and prove code correct but this still doesn't tell us how this formal verification leads to better security so before talking about that what do I mean by better security well bad security means we have vulnerabilities so better
security means less vulnerabilities what do we mean by vulnerabilities well I don't really need to tell you much about them but just to make it complete we have three classes of vulnerabilities so there's the design vulnerabilities where you make a mistake in your design or your architecture of the system such that it doesn't actually fulfill the security policy that you want we've got implementation vulnerabilities where you make mistakes in your code that allow people to use those mistakes use those bugs to bypass your security policy in bypass your security mechanisms and then we've got operational vulnerabilities where you make mistakes in how you actually deploy or configure your system or people make mistakes so you get social engineering
you get people who are just make mistakes or not very good at what they do formal verification can help us with the design vulnerabilities and the implementation vulnerabilities so I'll talk about how they do that I talked about how formal verification allows us to prove properties about code which is nice but typically those properties don't relate to security they talk about things like the list is sorted they talk about application domain properties which are not necessarily security however it gives us an extra benefit and it also helps us to prevent implementation vulnerabilities so that if we can prove these nice abstract properties these nice application specific properties about our code we can also prove that a bunch of
implementation vulnerabilities don't exist in that code why is that well a lot of implementation bugs happen because they're actually undefined behavior this is things like array or list bound overflows if you access past the bound of a defined array that's really undefined behavior especially in your formal model integer overflow and underflow is if your integers don't have limits then you don't overflow or underflow them so that doesn't really exist following null pointers is to find behavior and so on so our specification formalisms in the languages we use for the specifications don't allow these things to happen you're not allowed to define it in that language because it's undefined so these kind of implementation bugs that most of
you love don't happen in the formal specs of course they do happen in real code so writing past the end of array is something that C programmers love to do they do it all the time what we need to do is make sure that doesn't happen and so what happens is that if you try to prove that this kind of code fulfills its specification and again that specification may have nothing to do with array bounds it could be something else you'll fail you won't be able to prove that code that contains these kind of statements fulfill their specification so for example the end state after you try to access an array out of bounds formally is not defined there is no
state and so trying to prove properties about something that has no state is not going to work furthermore there are no rules that you can apply in your proof about no state and so you can't continue your proof so trying to prove something with these kind of bugs in it just won't work you won't be able to prove it and so that forces you to either fix your code if you want to continue your proof or if you're sure that your code is not broken which you know none of us write broken code so we're sure of that you need to prove that the conditions that lead to the undefined behavior won't happen so if you're certain that
your array indexing code will never you know Index the array out of bounds then you need to prove that in order to continue proof so once you get a fully proved system then you know these things don't exist the big benefit of that is that you're proving properties interesting properties about your system and without having to specify all of these implements vulnerabilities or bugs the proof for free finds those vulnerabilities for you finds those bugs for you and forces you to do something about them so you don't have to tell the system you know find me all the array bound overflows it finds this for you because those are undefined so whole classes of undefined behavior
just disappear once you've proved your system so that's really great I want to go into a little bit more practical detail so I have said we've we have this verified operating system and I want to talk about what we've done how you apply this in practice so how would you apply what I've just talked about with this logic with this refinement and so on with proving properties to a real system so SEO for is is this microkernel we've developed it's small which means it's amenable to a formal verification it's fast which means it's useful for real world use it's capability-based which ultimately means that it has an explicit access control model so it's really good for talking about secure
systems and it's the operating system kernel which means it runs in privileged mode which means it's one of the most critical parts of software so it's critically important to get it right and which is why we formally verified it so what did we verify what got verified about SEO for SEO for verification showed functional correctness which shows that the implementation refines the spec so that's the refinement I was talking about before the implementation of SEO for refines the specification of SEO for it means that the kernel does exactly what the specification says no more and no less which is nice hopefully the specification says something useful hopefully it says something about building secure systems and so on well what does it a spec say
it talks about the behavior of system calls so it defines what each system called us so for example if you have a system call that sends a message from one thread to another it defines how invoking that system call is going to change the state of the kernel so you have an initial state you invoke a system call that sends a message what's the state after you've sent the message that's nice but that doesn't say anything about security so we don't explicitly talk about security in the specification that was proved about SCO for okay so what's the point if we if we want to have this kernel and we want to verify it in order to be able to say
that it's secure but we don't say anything about security in the spec what's the point well we still get these proofs about lack of all this undefined behavior that it talked about before so improving the properties of the system calls we've also proved that there aren't buffer overflows there aren't null pointers being followed etc there aren't integer overflows or under floats etc so all this undefined behavior doesn't exist in the implementation of the kernel which makes it a lot harder for you to exploit the kernel which is good that's good for security but we can also prove properties above the specification itself so we can go a level higher and start making interesting proofs about
the specification itself and because of refinement because we've shown that the implementation refines the specification if we make proofs above the specification we know that those proofs also hold for the implementation of the kernel so we know that if we actually invoke a system call and we prove that that has some other properties other than it's just correct then that will hold for the actually implementation so what are these properties that we proved about the specification well we wanted it to be secure so we wanted to show that given an access control policy which you can think of as an access control matrix or a specification of what authority subjects have two other subjects so who can for example modify
who who can access who can read what data etc that the kernel will enforce that policy and this comes down to the classic security properties the classic CIA properties of confidentiality integrity and availability so we've proved confidentiality about the kernel spec we proved integrity about the kernel spec and we've partially proved available above the speck what does integrity mean in its context well generally the general definition of integrity is that you can't make authorized mod of unauthorized modifications more specifically and formally we have this nice triple that says given a pre precondition so the initial state s0 that's what the precondition part means if we make one of these abstract kernel calls so a system call after the call
the state afterwards which is state s the difference between initial state and final state will be authorized by the policy so there won't be any changes to the state that aren't authorized by this access control matrix that's the property that's been proved about the kernel specification I won't go into details about how that was proved there's a paper about that you can read if you're really interested the other property which is a little bit more complicated is confidentiality so confidentiality says that you can't read or learn unauthorized data and so there's this longer statement here that says if the initial states are equivalent then if we have initial states of the same Pro to different
initial states are equivalent if we do the same executions on both of those states then the final states will also be equivalent with regards to the security policy and so the picture what it shows is on the Left these two initial states where the green parts are equivalent these are the non confidential parts and then the red and the yellow parts or the confidential parts and those are different if we then execute the same instructions for for for this subject let's say it's a it's a process if you actually look the same instruction schooler's process then the final state that they end up in will be the same even though the confidential states might be different so what that
means is the confidential data doesn't leak into the non confidential parts of the system so we proved that about the kernel as well which is great this gives you a bit of an idea of how using formal method can method or form of verifications can lead to better very better security we can reduce implementation vulnerabilities but we can also have interesting abstract higher level property security properties that we can use to build our systems on top so I'm going to talk about why this is not a silver bullet why does it not mean that we have hack proof systems if we want to build a verified secure system we need to start with an overall threat model for the
system which leads to security requirements it relates to properties that we have to have on the system and then we start at a high level architecture or a design and so we need to design an architecture system such that these properties hold these high level properties home so in this example it might be that information only flows from domain a to B and it always goes through some sort of trusted component that may be filters it or something like that so we want to prove that this architecture fulfills that property so we can do that we can do a high level architecture proof about that but then at some point we need to implement that
so that architecture gets translated into a bunch of code that could be manually written code that could be existing tools that could be generated code and so on and then we need to show that this code also fulfills some properties and the properties might be that it's a refinement of the architecture so any data flows that are allowed in the architecture are also allowed in the code but the code doesn't allow other data flows so we can prove that as well which is nice we need to run that on something so we need to run it on an operating system and a bunch of services it relies on that operating system and services so we need to prove
that as well well now we've got a bunch of code we need to actually load it into a running system so we'd also got a bootloader ideally we would need to verify that to make sure it does the right thing and that ultimately needs to run on hardware so we should also verify the hardware what I'm getting at here is that building a secure system requires different layers of abstraction and each of those layers have different properties and each of those layers need to be shown to be correct to fulfill those properties as well now we've got this idea of necessary versus sufficient it's not sufficient to just implement verify some of those layers and expect that you will have a verified
secure system so if you just verify the top layer the architecture you can have a nice architecture but you can have implementation but bugs in the architecture you can have problems in the operating system in the loader in the hardware that allow you to bypass that and so you don't have a fully verified system so if you want a fully verified secure system you need to verify all of those layers and if you don't feel Nura bilities can sneak in there the other thing is that with formal verification we're working on models and models aren't necessarily the same as reality there's always some gap between the model and the reality you can try to make that gap as small as possible but
there will be a gap and this manifests itself as assumptions so we have assumptions about the real world that are proofs our models and our proofs rely on so some examples of these assumptions practical assumptions are well we may assume that boot is correct so the hardware and the software environment are initialized correctly so that when our verified code run or our model runs that it has a particular environment we might make assumptions about how the hardware behaves we have to have a specification of the hardware we make assumptions that the hardware is implemented correctly so that it actually matches the specification we can make assumptions about semantics about our programming languages so if we
take code in C and we transform it into a formal model we make some assumptions about the semantics of the programming languages so we need to assume that those are true we may have unverified code in our system so we may be using third-party libraries and stuff like that so we need to make assumptions about that code as well so there's lots of assumptions that might need to be made and if you want your verification to be believable to be good then you need to validate those assumptions and those assumptions as I showed in the previous case can be validated by doing formal verification as well or you can do your your testing and your hardening
and so on on those things and so on and if you don't do that then this is where vulnerabilities can Nikken as well there's the the assumptions are fundamental right because of this gap between the model and reality but there are also other potential limitations that are more practical that can be dealt with but you need to put effort in into dealing with them so one as I've talked about is the specification should say something useful so you need to validate the specification you need to make sure that it says something useful for example like we did by verifying properties about SEO for specification the verified code needs to actually be used correctly so for example it needs to be configured
correctly if you miss configure the code that might invalidate some assumptions that you make and therefore the proof might not hold if again this idea of necessary versus sufficient also holds if you use verified code but it relies on a bunch of unverified code then a lot of your assumptions might also break finally scalability is a big issue verification is hard one of the proof engineers in our group said that you know it punishes you for every mistake in a terrible unforgiving way so a lot of people don't become proof engineers because it's hard but it also takes a long time to do proofs and so you're limited in what you can verify and that
means that you need to make decisions but important design decisions such that the parts of the system that you do verify support being able to use unverified code on top of that so for example in SEO for we have confidentiality integrity proofs which means that we can run unverified code on top of SEO for and still have guarantees that there's confidentially and integrity between them we don't necessarily need to verify that code running on top so we can use that to build secure systems on top of that without having to verify all the code but all of these things the fact that you don't necessarily verify all the code that you may miss configure and so
on also can lead to vulnerability sneaking in so this comes back to the question does formal verification really lead to better security and the answer is yes it does in particular it can lead to better security because it makes you define specifications it makes you define security proper and you can have secure specs which is an important basis for building secure systems it also forces you if you prove a system to deal with all of the implementation bugs implementation vulnerabilities in the system so both of those lead to much better code much better security options of course it's not a silver bullet it does not guarantee hacker-proof code because we've got assumptions we've got the
models the coverage of verification might not be complete people might misuse the verifications and so on so that's the end of my talk if you want to have more information about SEO for or what we do at trustworthy systems those are the URLs otherwise if you've become interested in formal verification and you want to explore this and see what it can do for you there's a nice blog post on proof craft orga about starting into verification how to become a proof engineer what you should do all right thank you [Applause]