
good afternoon everyone we are ready for the next presentation by joachim kennedy on a deep dive in to comb our using metadata to empower the analyst joking all yours good afternoon so the graveyard slot so let's get through this it's a little bit about me before start so my name is Joakim Kennedy I'm a senior principal security researcher for anomaly and when I'm focusing on researching and gathering threat intelligence that we are providing to our customers so the agenda for this talk is first we're gonna go over and do a little bit of an introduction to go the programming language and so most of us here in the room will have sort of the same
background knowledge and then we're gonna look at how we can recover function information from stripped binaries there have been compiled by the go compiler and then we take a look at some malware and we'll apply it sort of this but a tool and extract and get information from them that otherwise is sort of hidden and then we'd take a look at see what else we can find in the binaries and what we can draw from those conclusions so to start with goal was sort of it worked on internally at Google and starting off at 2007 by Rob charisma and Rob Pike and Ken Thompson it's designed to be a memory language is using a garbage collector and its static
typing and the language was sort of designed to be a new language for the 21st century so it has wealth support for and multi-threading and not working etc the language is relative simple and easy to read and which is mainly civil bite from design so this is a hello world and go and each all code in go is part of a package and the package that's been executed it's called main and so in here we were sort of looking for the main package with the main function that would execute we're importing a a standard library for doing some formatting and printing and essentially just writes hello world to this manner to the output go has support for sort of
most a standard type numerical types we've both had four unsigned and signed integers of specific sites in size and it also have support for floats and complex numbers it also has a couple of aliases so for example a byte is an unint eight and in a room and which is more used for characters so go has a support for utf-8 and this is sort of handled by roon is a utf-8 character it also allow sort of the programmer to use a non size sort of a integer so you have enough a nun size you ain't in an int these are the sizes of them are dependent on the architecture that you're compiling it to and then it also has a pointer
representation for these strings so go has first-class support for Strings you can either sort of the sort of define it at the top where you tell it that this is a specific string or you can allow a compiler to do the they will resolve the type for it under the hood a string it is different this go strings are different from C strings so it's using two words to represent the data and you have first a pointer that points to where the data is located and then a length parameter so this makes length calculations of strings constant array since and then on top of that go has a sort of a functionality called slices and arrays are the same as what they are
and see where slices are more similar to Python lists so a slice can grow and it's an abstraction on top of the race the slice is represented by first a pointed toward a day to start and then the length of the current slice and then a capacity the capacity if it's set this to zero a this slice can grow until you run out of memory if you have a value that's how far it can grow essentially structs are very similar to how it's in C it's just a line that sort of laid out in in the memory and the way object-oriented programming is done is through interfaces where you sort of specify a function signature and as long
as the type implements that function signature and it sort of a satisfies the interface and under the hood this is done through a V table that were points to these specific functions and then a pointer to the actual underlying struct so let's take a look a little bit under the hood what happens when you compile it so this is a snippet from a malware that this is one of the first bits that runs and so what it does is it first takes and reads in the argument list that was executed and so what it's looking at for is the first entry in the argument list which will be the name of the binary that was
executed and it would read this one in and to move it into re X and then take the next parameter after which would suggest that this is a string and then will move this onto the stack so here's you have their pointer to the red data start and then your length parameter and then it will call a standard library just to get the base element hundreds and the return value is actually returned also on the stack so you see the calling function then basically pulling those data off the stack and put it into its local variables so the difference we have normally you would see just a bunch of push and pops and the go compiler instead is opting for a
faster runtime over smaller size in terms of a binary so on top of this and since it's designed for multi-threading and multi-core currency it has a concept of call of a go routine a go routine is simple similar to a thin thread yeah it's executed we have this keyword so you actually will call go before the function that's executed and then this system put on a scheduler that runs internally in the runtime and thus get the runtime selects between the different go routine or who will be executing what so it's basically scheduling but in the actual current application and under the hood this is done through the call to new proc and which takes a sizeof integer and then a
function pointer the size tell-tell state scheduler how many arguments this function takes and it requires that all the arguments for the function to be called has been pushed on the stack before it's executed so essentially you will have this structure let's kick it off okay so now can I know a little bit haldi what's and what's under the hood so let's take a look at how we can actually recover functions information so here we have the simple demo program and so what essentially we're gonna do is just panic which is the way Google sort of throw was the go throws exceptions we're just gonna panic hello world and if we compile that and tells the compiler to strip the binary
and remove all the possible all the debug information and output from files as this is a stripped binary and then when we run it we get the exception which says this file path where actually the exception have had so you tells us that we throw an exception on line four so even though we've removed all symbols or external symbols go I'll have the capability to figure out which line source like they actually this exception was thrown so where this comes from if you take a look you find you can actually find the strings for these function names and they're located in a table that's called go PCL and tab which is for alpha binaries is a it's separate
section this table is a sort of a heritage from or a something that was came in from plan nine so if you this is a man output from man out and on plan nine and it's talking about that you have a PC line number table in the binary maybe further down and they in the main page that says this is reused to recover source line number from a failure so this function I was attitude the go compiler in version 1.2 and it is there to sort of provide a accurate and a way of debugging to figure out where the exception was wrong and luckily for us who wants to sort of utilize this to extract information from
straight binaries the debug package in the standard library has functions to read this table and extract it and this is a snippet of code that's taken from the test and which would read in this one an elf binary it looks like it expecting the symbol table but luckily the symbol can be empty and it will generate the table correctly so the the table that's returned has in most cases a an array of internal microphones and then and as this lying table using the line table we can actually get the from the program counter get the source code line number or vice versa you can use the source code line number to family pour the program counter in the binary
so this is kind of very very interesting useful when you want to look at malware so let's take a look at some so I just have a little tool here that is sort of reading this table it will and extract all the functions in and just look at the specific one for the main and then packages that are not imports or third-party vendors or stat and sort of standard library and just focus on the malware code and it will try to guess not right no I can't could have to start over then say and it will hopefully get more text so it it's using the the information and tries to guess the length of the functions in the source
code and then also sort of structure a source code trace so here's an example for a and this is a malware that was reported about a year bye Cisco Telus and they sort of reported a BNA something that was scanning and certain looking for SSH service and then brute forcing them and sort of spreading spreading that way and what we have essentially is it's finding where the packets of the mains this is the source look this is actually folder location where the file was located when it was compiled and by the author here's the the source code file and then under there you have serve all the functions starting from the first line number starting actually from line
188 to at 181 and then it's sort of guessing and some guessing there's a bunch of struct defined on top of that that we can't see and so this does from a this one was only targeting mainly like Linux malware and at the end of last year and there was a new version at one version of Saburo Brizzy that was found I think it was starting in October and this is a malware that's used by the russian a apt group apt 28 and it's downloader they have a way of writing it in multiple languages too subtle to sort of throw off analysts and one of the samples you could get was representing sort of all of the functions that we
have so they hadn't done any of obfuscation so there have been a couple of sort of malware that's been obfuscated so for example we have one-way annals gets through this or malware authors and sort of by renaming the functions so we can sort of tell from the function name or what it's doing and we do have to sell the package name but and also with all of the different libraries that they're using are not obfuscated so it's a little bit of a hurdle but still we're working on a so just on top of sort of function names it's also possible of recovering it type information so this is a simple little demo and in this sort
of code we're defining our own struct that just will contain one string and one integer and then our code the substantiates a new version like a new instance of this and assigned a string and an integer to it and then we will just print out what the struct is the compilation of this sort of return this kind of code just sort of move highlighting them important part and at the bottom we see the call to printf and here we see the format string being pushed to this actually being loaded into this onto the stack and then we can see the two different values added to destruct and in before that we see a call to new object and it's called on
this offset is used in the call and if we here's a sort of a hex dump of this offset there's a clear structure there's a bunch of structures afterwards it looks pretty much the same and what it turns out is this is a type that's called underscore type and this is the first three lines from that hex dump sort of translate it on to destruct this actually would tell it so this is what's being used in that new func although this new object called and it's using this and passing that to malloc so it knows how much memory needs to allocate but the structure contains other information is very very interesting from sort of a reverse engineer one
really cool feature is the kind section and so this is in enum and if you look in the source code for them for the for the go compiler this is returned as a kind struct so we know now that what's here is reference referred to as a struct and the other interesting part is this name off which is an offset to a name type which is a internally sort of reported which is an internal structure so it's its own type and it's just a structure that holds a pointer to a byte and using that we can calculate where this looking this name is located and in this sample it's based on where the this list start and the Lord this the list
starts at a the section so we can actually just take the section and locate an absolute location and then just add the offset and we'll get where does is located and it it returns this bytes pointed to this byte and the function and this type has some methods to return the string representation of it and if you look at that one it basically jumps three and basically three steps ahead and that's resolved this the actual string which is if we if it's here where it's pointing here's where the name starts and we basically gotten that the name of this item is called main dot my struct so we have both the package name and what the
author called the actual structure in the source code and there was more stuff below it and so now when we know that it is a struct we can actually look up and see the what the real type is there and this is the real type so it's actually a underscores type that's been embedded in a bigger struct and right after that we should have a name type which is again another struct and then some struck fields the struct field is sort of a little bit of an image of what the main struct is so it will have a pointer to some names and then their own types so we can essentially just unwind this and
reconstruct what type this is so taking there the other section here is we're basically just translating so here's we have the the location where the package name is which in this term will be Nate just main we have the fields which will be starting from here as here stay where the off-site is for the data and then with two and two so we just know we have two fields in the struct and then looking at this offset is actually just pop just points down right below it and there if we can actually just translate these values so we know it's two we can get the name and we can sort of get where we can figure out what at the time
where the type is located and so if we just dereference those we'll get the actual name of the fields but in the structure and so finally we can sort of just reconstruct what this initial source code was so sort of just to conclude so go binaries initially when you look at them from a as a reverse engineer their beasts there it's not uncommon to come across a a binary with 6000 plus subroutines so starting out if you don't have the right to link this it's really really hard but if you have the right to link or you develop the right tooling it's relatively easy to actually deal with these there's also massive amount of method metadata this
is not the only thing that's there and the sort of interesting part is here that we can actually recover the functions and the type information and since this is directly coming from the source code sort of be translated even if there's a slight changes to it there we can still sort of see similar area between other samples you can also do an interesting of sort of it's interesting you can do an educated guess of the source code structure and so there are a couple of samples that have come across there being compiled in different operating systems and but you still have the sort of the same source code structure so you can map across that
they're the same and it also allows you to do sort of a he met with like metadata analysis so you can look at samples that we compiled forearm or MIPS and then map them to sort of compare them to other samples and just using sort of this source code analysis and you can sort of see that they're similar or the same they essentially coming from the same source so with that and I'm taking any questions if there are any
what kind of tools do you use to re go samples so this is a it's its own sort of a the tool that I use is a tool that's written in go with itself and it it has the capability of being used to be used in other tools so so far I haven't written a and Ida plugin for it but I can it works for art too so it if you use the art to you it will basically run instead of so printing out the structure it will reconstruct the symbols and inject them like it was a non stripped binary yeah I think you partially answered my question I was gonna ask about Ida Pro support and it's
in the plans
hello does the runtime depend on that metadata couldn't attacker like strip it out if they were so the the type data I think it's I'm pretty sure it's used by the reflection capability so if you did strip it out you may end up crash and random places and I think it's the same with the the information that's used to recover the function information while it is used for the the panics step and there are I remember reading a while back where people were starting with using just a normal strip tool to strip go binders and kind of resulting random crashes because it was selectively ruin basically removing stuff they didn't need it and to be honest I don't I don't
know what happens if you remove the that table and now since functions are sort of first class citizen in go and this type information should also also there's another representation of the function in that type list so if you walk the type list and get all of these different types you can also find all the functions and you should very be able to recover that so a quick question about like heavy I'm right here hey so have you ever looked at a binary that has used the project go off you skate and do you have any insights as how that works or does it provide a few station against some of the things that you saw
today so I haven't looked at that project and but I've come across a bunch of malware there are obfuscated but they are only officers getting their code so all the dependency that's pulling in are not off you skated and the standard library or not off just get it so it is just a little bit of a hurdle now you just have a random string but you can see all the calls that it's doing so unless unless it you know would obvious get all of the functions you have which may get some problematic you know so okay I think we're out of time if you have any more questions you can just talk to your kin
after the session on behalf of besides we thank you for your presentation thank you [Applause]