
[Music]
[Music] valley college aims to uphold the intention of the numbered friendship treaties from the perspective of indigenous peoples we also remain collectively accountable to respect indigenous peoples legal and inherited rights recognizing we are all treaty peoples through our words and our actions beau valley college honors the traditional lands of the blackfoot confederacy which includes the sixaka the ghana the bikani and the homoscopy bikini first nations as well as the yachai nakota wesley chinniki and bears paw first nations and the sutana first nation we also recognize the connection and autonomy of the metis nation region iii within the historical northwest metis homeland we join all nations in celebrating the unique histories traditions and cultures of indigenous peoples
as we continue our journey on the road towards reconciliation together good morning my name is vaughn ravenscroft and i'm the vp strategy and cio at bow valley college i'm pleased to welcome you to the 2021 b sites calgary conference held virtually once again for 2022 the conference is a place where students academics and industry can come together to learn and share information on cyber security and related subjects this blending of academia and industry is especially important in a field as rapidly evolving as cyber security is now i'm sure that many of us are missing the ability to gather in person this year and look forward to doing so once again in the future but one of the
benefits of a virtual conference is it allows us to extend the reach and i'm pleased to say that we've got over 65 students from across across the globe in attendance so why is bvc hosting the conference we committed to providing a venue for sharing and learning and are growing our technology offering so this is a great fit for us we continue to expand our technology related courses with the recently announced alberta center for entertainment arts which will be opening opening in january 2022 and is offering courses in visual effects and animation other courses from the college include our cyber security post diploma certificate rit systems diploma the software development diploma and post diploma certificate the cloud computing
post diploma certificate and the data management analytics post-diploma certificate our collaboration with b-sides extends further in that we are hosting hacking 101 lab as a joint effort between b-sides calgary uh and the college i'd also like to take this opportunity to thank our sponsors for their ongoing support and involvement in this event please do make sure that you check out their technical deep dive sessions and meet them at their virtual booths it's the platinum sponsor is bvc our gold sponsors ions united and secured net solutions silver sponsors are proof point we hack purple sentinel one glasshouse and systems fire mandiant cyber rock trend micro and red canary and the bronze sponsors are eset and crowdstrike i'd also like to acknowledge
that we've been joined by a number of community and special interest groups today and tomorrow the university of calgary information security club check geek the security professionals information exchange alberta women in cyber security the western canada affiliate cybera the isc squared alberta chapter and the rogers cybersecurity catalyst out of ryerson university please do enjoy the conference and hopefully we can do this once again in person in 2022 thank you all right i guess i will begin so uh thank you everyone uh happy to give this uh opening keynote for the besides event this is a talk about uh mobile phones basically these things that are great for us and give us access to lots and
lots of internet um but come at a bit of a cost in that from the point of view of a variety of ads and analytics companies that are invested in people's phones the phone serves more as a vehicle to gather information about users to build behavioral dossier about them and the reason for building these behavioral dossiers is because it works really well for doing targeted behavioral advertising so basically when you're using a device if they're able to uniquely identify you have some way of of recognizing who you are and recognizing when you're using your phone in a variety of different contexts they can when you use a particular app learn a bit of information about you at that
moment in time you happen to be using this particular app you might be at this particular location and so on and then later when you're using another app they they are able to expand their dossier of behavioral information because now they know the same person is doing this other thing at this other time and through this collection of all of these reports they get a clearer picture of who you are all sort of linked to your identity so this is the ecosystem that is running inside of these mobile apps that is collecting all this information from users now we had done a study in 2018 where we found that the majority of children's abs that were part of a
self-certified design for families program were in fact improbable violation of the law due to the degree of behavioral tracking and in some cases collecting location outright which is a clear violation of of a legal statute in the united states called kappa the children's online privacy protection act and um it's of course particularly concerning when it's building these dossiers about uh children who are unable to form a meaningful meaningful form of consent uh as they're interacting with their devices so this managed to get some attention and we also got some letters from lawyers of some of the companies that were involved in this as well and it made us realize that well we really should be absolutely certain
about our findings and results because it's one thing in science when you make a claim like oh 50 of apps do x but it's another thing when you take that list of all the apps and you give it to the ftc and you say here ftc here is all the apps doing this bad thing now we want to be extra certain that everything that we're saying is is correct so we came up with this idea to use the permission system to look for false positives in our data set so false positives being we recognize an app is sending location but maybe it wasn't actually sending location it's a bug in our code that was searching for
transmissions of location and so the idea was well if we use the permission system which is basically a security mechanism on the phone that says well you can't access location unless you have the permission required to access location and you can't access the imei which is a unchangeable serial number unless you have permission to access the imei well we could use this to build a system to look for false positives in our data set the idea would look something like this well we take all of our apps and we look at what permissions they were allowed to access what data they could actually access on the device or on the device when they're running and
this can be done with static analysis because apps declare their permissions ahead of time you can look them up you can see whether or not an app is allowed to for instance access location then meanwhile we do our experiment where we're actually running apps on real phones and seeing all of the network transmissions that they are that they're sending and from this get an idea of what data they actually did access by virtue of it being sent out so if we see that it was accessing location because it was sending location well this might be a false positive when we remove the transmissions that weren't possible to have occurred in the first place right the idea being that if we
see for instance a phone number being sent out but the app didn't have to access the phone number we have a false positive now this method makes a a makes an assumption and the assumption is that the security system on android is sound that is it doesn't make a mistake that is where there's a security mechanism that mechanism is correctly enforced and there's no way to circumvent that security mechanism but of course in practice security can be hard and so instead what we ended up finding instead of false positives were apps that were actually cheating apps that were transmitting accessing information transmitting it even though they had no ability to access that information so with a little bit of effort we did a
reverse engineering we figured out a variety of side channels and covert channels that were actively being exploited on the android in by legitimate apps in the play store so for context a covert channel would be when two apps uh alice app and a bob app are cooperating where one for instance has access to some sensors or can do some actions or access to data and is allowed access through a security mechanism whereas another app the bob app is denied access in this case the alice app is sharing access with bob so alice could just tell bob the location and bob doesn't have to go through the security mechanism and a side channel is when an app simply
bypasses the security mechanism so ideally the security mechanism is protecting the sensitive resources but eve has figured out a way to simply go around it so as to make it uh not no longer in the way of the access so i do want to point out though that we did find some false positives after all so the original idea of this method did actually work and one false positive that stood out happened to be a transmission that we saw of the phone number of the mobile device because we had done some experiments in in berkeley and the area code there is 510 and the country code is one and so what we saw was our entire phone
number being transmitted but if you look at this number a different way this component of it is actually a time stamp it is a particular time uh given in one of the greatest ways of measuring time ever devised the number of seconds since january 1st 1970 and so this timestamp with three more digits actually happens to be the number of milliseconds since the unix epoch and it just so happened by coincidence that it contained as a substring the phone number of the device which is a one tenth of a second time window that this entire phone number could have appeared out of something around 60 or so years of unix time that we'd have available i worked
it out to about one in 305 billionth of a chance that a random arbitrary timestamp happens to exactly match our phone number which seems astonishing except when you realize that we are probably generating millions or tens of millions of different timestamps by running all of these apps over the course of years so it becomes much more likely that this this coincidence did occur but nevertheless uh it was quite astonishing to see this occur that the exact phone number the device happened to be grabbed in a tenth of a second time window but on the other hand we found a number of side and covert channels being exploited so some we've already talked about before and then in this
talk i'm going to give four different new ones that have just recently been found because this technique just keeps giving us new interesting results so some of the older findings we had is that on the mac address of a router so when your phone is connecting to the internet it goes first to the wi-fi router this router has a mac address this is just a serial number it's burned into the device you can't change it and as a result these mac addresses of all of the routers happen to be geolocated there's a variety of different projects and data collectors that are doing this and the idea is that if you know what wi-fi routers you're near based on them
having a unique mac address and you know how far apart you are from them through the signal strength indicators you can actually get a pretty good idea of your geolocation as long as you know where those routers happen to be located and routers don't typically move around too much so it happens to be a an effective way of doing your location and in particular it works quite well indoors where gps is more limited because there you have more wi-fi routers indoors and you can triangulate your location quite effectively so the ftc views collection of routers back addresses as a form of location there was a settlement involving a company in moby where they were collecting routers when they weren't
allowed to access location to get location despite that and android now views access to this information as a form of location and and therefore requires it so in order to access the ssid which is the friendly name of the router or the mac address of the router you need to hold a location permission in order to do that but we found an app that was accessing this information without a location permission so we wanted to see how it was doing it so here is the decompiled java byte code so this is in a language called smally and this is just what an android app looks like when you don't have the source code you decompile it
and it's pretty um easy to read once you sort of figure out the mechanics of it it's basically like an assembly like languages there's registers and the registers can store objects and then there's function calls as well so it's it's it's like assembly with function calls and objects and so here in the first few lines we see in register v0 we're storing a variable a string access wi-fi state this is a permission one of the permissions you need in order to get the wi-fi router and then it's calling is permission granted on this string and if the permission is not granted the result will be zero and you can see the next line move results
so it removes the answer of this function the return value to register zero and then if register zero is equal to zero jump to condition zero so condition zero here is like the error condition right the permission was not granted the next step it gets the wi-fi manager and if the wi-fi manager is null goes to condition zero then if the wi-fi manager is not null it calls get connection information which returns a wi-fi info object and if that is null goes to condition zero now at this point if we passed here it means that there is a valid non-null get connection info so what happens next it does the same process again gets connection info
it puts into v0 this wi-fi info object then calls get mac address which returns a string and you can see if the string is null goes to condition 0 otherwise returns whatever is stored in register 0 which is the mac address of the wi-fi router so this is how you would normally do this process on android this is the standard process that if you want to access the mac address this is how you would go about doing that so what happens at condition zero what what is the error condition well there is a function gets called which is named get device mac addresses from arp which returns a list of mac addresses from the arp table
so i want to point out here as well when you're looking at decompiled android apps generally you don't get the names of the functions generally they're all replaced with just some single letters in the process called obfuscation so that you don't actually see what the developer called things in this case it was nice that they actually gave us the name of the function and we went into the function get mac addresses from arp and we see that it's opening on the file system slash proc slash net slash arp which is just a pseudo file that stores the arp cache and if you open up this file on your device you might see the exact name of
your wi-fi router along with the ip address and here you have to remember that these mac addresses of routers were never meant to be surrogates for location they sort of evolved into that role so files like slash proc slash net slash r don't have the same permission control that you know android would then impose on accessing router information namely having access to location having access wi-fi state because it wasn't thought that we really had to protect this information necessarily so in this case this file was not correctly protected with the right permissions and some developer discovered that and instead of reporting it to google uh just exploited it instead another example was on the sd card is
acting as a form of shared storage among apps and um it also stores your photo library so you might go to the mountains and you take a picture and then later on you take a picture from a different angle and then on the way home you're driving home and you have your gps turned on and what ends up happening is that photos you take become automatically geotagged with the location the gps coordinates where they were taken and this is then just stored in the sd card in the form of exif metadata so metadata associated with the photos and what we found was there was an app called shutterfly which was simply wholesale uploading all of your photos
xf metadata to their servers it was going through all of your photos on your sd card and just getting all of the metadata and uploading it to them because they're classy like that and it further had code to process location from this so this app which didn't have access to location which didn't have permission it did have functionality to actually get the latitude and longitude from the photo metadata and then put it into their own json objects to be uploaded to their servers and to me i think that this this matters because it signals this form of intent from the developer more than just simply oh we just uploaded all of this binary data we didn't really realize what was
in it whereas here it's no they're intentionally getting these particular values from the photos our next example comes from unity so unity is a game engine and advertising company and uh one of the things that we saw them transmit to their to their servers is the hash of our phone's mac address so we talked about router mac addresses router mac addresses are a form of location information but all networking devices have a mac address and the mac address of the phone which it also would use when when it's communicating over over the internet this mac address is tied to the device effectively forever and it forms a great way to identify a user that you can never opt out of you
can never escape you can never change you can never do anything about because your mac address is just tied to your device so that number is going to uniquely identify your phone forever and we found that what unity was doing was it was taking the md5 hash of it and our mac address was this is taking the md5 hash of the mac address and sending to themselves as what they called a uuid a universally unique identifier so they were taking the mac address and and transforming it by hashing it and sending it off now the interesting thing here is that there's no permission that allows you on android to access the mac address as of
version six which is uh seven years ago or so they recognize that this number is too invasive it tracks people without the ability for any kind of meaningful opt-out and forbid any access to normal apps to have ac to to learn the mac address of the phone so any access of this information necessarily means that a a side channel must be being exploded because it shouldn't be able to be accessed in the first place um as well according to unity's privacy policy they say that they generate a unique device identifier from the device's mac which they alter to limit the ability to identify the relevant device in the future i just want to take issue with this particular claim
because if they ever learn your mac address they can hash it again to learn the same uuid so it's really not altered that you cannot recognize it if you come back and you hash your mac address they'll get the same value so they can easily recognize you and the other thing is if you're familiar with hashing and the idea of pre-image resistance well mac addresses aren't big enough the domain isn't big enough they're only 48 bits long meaning that you could easily build a rainbow table to brute force mac addresses uh and and reverse effectively this hash function so it's it's neither effective nor correct to say that you can't identify the relevant device in the future
all right so we wanted to understand how unity was doing this now the problem was it was occurring in a 17 meg library called libunity.so this giant c plus compiled binary object and where we found uuid equals somewhere uh in in the in this library it was just in this giant string section at the start so we really didn't have much to work with it would be quite an endeavor to figure out exactly how it was doing it but this is when i realized that there because they're md5 hashing it and we looked for md5 and we couldn't find it but any implementation of md5 has some fingerprint characteristics that are going to be shared with any md5
implementation namely there's round constants there's initialization vectors there's going to be some fingerprint that smells like an md5 implementation even if they don't call it that right so one of the round constants is six seven four five two three oh one we searched for a similar one nine eight b a d e f e and found it and therefore now at this point we're looking at an md5 implementation so whatever gets passed into this function is going to be hashed with md5 and so we looked at what calls this and we found that just a few steps earlier they were calling an eye octal so an ioctal is a system call that you can have and
basically it creates a sort of hidden numbered api for things like sockets and files and stuff so if you think of programming with sockets you have connect and bind send receive and so on but there's lots more you can do with sockets you can change buffer sizes you can change congestion control algorithms you can tweak it in a number of different ways and instead of giving all of these options their own name there's just one function called dioctyl and you pass in a number and that number corresponds to what you actually want to do with the socket and one of the ioctals is a socket ioctylget interface configuration which simply returns the mac address so
there's an ioctal in this api of of numbered api calls that allows you to get the mac address and unity uh was using this ioctal so as to get the mac address of the device even though if they had done it through the android api they wouldn't have been able to access that information all right so now for the the new findings so our our first one in this set is from cnn so cnn is a news based entertainment company in the states and they one of their apps is about breaking news and when we looked at this traffic that cnn was generating what we found was a large data equals blah payload and when you look at this it's like okay
well this looks like base 64. so you the first instinct is to base64 decode it and when you do this enough you also begin to realize that at the beginning this e y capital j that signals the beginning of a json object so whenever you see ey capital j you're probably looking at the base64 encoding of a json object because ey capital j corresponds to open brace quote so we base64 decoded this data and ended up with this transmission that was being sent and what stood out to us was the wi-fi ssid being transmitted the wi-fi ssid was being transmitted so it was gaining access to the ssid this is the friendly name you give your router not the mac
address but you know the the name you might give it if you don't change it like most people then you're ending up with something like bell underscore a number or in this case uh a similar kind of construct the interesting thing here is that the latitude and longitude were set to unknown meaning that they did not have access to the location and similarly you can see below wi-fi bss id is also set to unknown so if they if this app had location it would have presumably set the latitude and longitude here and in particular we knew it didn't have location this is why the transmission of an ssid stood out to us as something that shouldn't have been
occurring in the first place so how what how are they actually doing this so we look for the string wi-fi ssid so we just decompile the app and then use grep grep is your friend you search through all the code for wi-fi ssid we find the string and it's being put into a variable called wi-fi ssid okay so where is the wi-fi ssid being used well we found it being used in this particular function and what ends up happening is the string wi-fi ssid is being stored in register four and then later on at the end we see register four being provided as input to a function jsonobject.put so there are three arguments here i have
them in red v0 v4 v1 the v0 is the this pointer so it's the actual object that you're calling the method on and then v4 and v1 are the arguments and if you look at the method the signature after the push you see that there's a string followed by an object and the return value is a json object so this is basically storing in a json object a key value pair the key is v4 which is wi-fi ssid so that's the one that we're that we're looking at so whatever's in v1 must be the actual value of the wi-fi ssid that we're interested in right so now we need to figure out what's setting v1
so v1 is at this place in the code it's already set it seems it's being passed into a function is empty if it is empty then we see the const string unknown is assigned to v1 the line above move object v1 v2 v6 basically saying is if it's equal to zero the return value is empty then jump over the next line so what's setting v1 well higher up in this function we see that it's being set to the return value of get ssid so okay well that's quite a helpful function so we should have just looked for that in the first place so we go to get ssid and this is just a pure accessor so it's
returning a member variable ssid okay so what sets ssid well we find it's being called in a function set ssid okay great so who's calling set ssid so we search through all of the code again grabs your friend we find that somewhere in the file called android pi network manager the function set ssid is being called so we open that and what's being passed in is register v0 what's does that get set to well the line above is that the return value of the line above get ssid using location permission check and again it's nice that the developers actually gave us their the names the function them but in this case i disagree with their language of using
location permission check because actually they get it despite not having a location permission and when we went to the implementation of get ssid using location permission check the very first line says there's a member variable called hacked ssid and it stores a string and what ends up happening is if location permission is not granted it just returns the hacked ssid as opposed to going through the legitimate way and getting the network ssid the the appropriate way so now we have our answer there's a variable called hacked ssid somehow it's getting set and if it can't get permission it uses that instead so how is hacked ssid getting set again grabs your friend we find that it's being set
in this uh this particular function this function is called by another one called on capabilities changed so on capabilities changed who is calling this well it turns out that the operating system is this is a callback function that you can request to get invoked you can register for you can express your intent to receive and whenever your network information changes then this triggers a callback and it just so happened that one of the additional fields i believe it was even called my extra data happened to store the ssid and the developers realized this and were storing that when this callback triggered in the variable hacked ssid just in case you weren't allowed to access that information
so instead of realizing that there's a bug and informing the android project that there's a bug in their code they instead called their variable hacked ssid and i was the one who reported it and i was the one assigned the cve and the resulting bug bounty all right our next example comes to us from dev to dev now if you don't know dev to dev is a comprehensive solution that analyzes your apps and games and gives you valuable insights and one of the valuable insights that they give is whether or not you're a cheater apparently and i was quite pleased that they indicated to themselves that i was not a cheater i wish i could have said the same thing
about them because when we ran their app we noticed that they were sending again the mac address of the device the phone's mac address this persistent identifier this thing that can track you um that can never be changed or reset and as well i will like also like to point out that the line above which says is rooted is set to zero this is indicating whether or not if the phone is rooted meaning it's running a custom operating system and i just want to point this out that the fact that it's zero represents a lot of work on our part because we intentionally evade all root detection mechanisms that are in place the phone is rooted they just
don't realize it so actually cheaters should have been true for me as well now the question is how are they getting the mac address so we search for mac y5 which is the key associated with that value and we see what we basically don't want to see which is that a binary file that's 20 megs big happens to be the one storing the string oh this is going to be a lot of work to do right we open it up and again it's sitting in some string part of the binary file and it's sitting alongside every other string without any sort of context so we get the feeling that well this is going to be a lot of work
um but there is a a technique that we have which is to realize that well we've seen this before with the eye octal if it's happening in c it's got to be happening somewhere in libsy or something like that or it has to be accessing through the file system so there's only so many libsy and system calls that it could be happening in so we can just go through and instrument them so we go to the linux implementation and you know when you call the nyoctyl there's actually a function called i octal that gets triggered in linux it has an implementation you can modify it you can recompile it and redeploy it so we did that
we just added a printf in the kernel it's called print k but we added that print k so that we could see what i octals were being called to make sure whether or not it was to see whether it was an ioctal that was triggering it it wasn't an eye octal so what about get sock opt another system call of a similar nature maybe there's a socket option that allows you to get access to this so we instrument that it wasn't that but through this process we eventually discovered that there is a function called get if adders which you can call and it returns to you all the information about the interface addresses on your device
including in one of the fields the mac address of the device so at this point we did a proof of concept realize that yes indeed this works you can get the mac address this way the mac address is you know returned by this particular function now we didn't know whether or not this was the way of course that they were exploiting in order to get this information so to test that hypothesis what we did was we modified the return value so it would be conspicuous to us so we made it palindromic so the last bite in the first byte would match and then we ran the app again and saw that this palindrome had occurred so
the only way that this mac address would look like this is if our code had run which intentionally changed it to this other shape and so not only is it the case that you could get the mac address this way but also they were in fact getting it now another interesting thing about dev to dev is their use of the advertising id now we talked about how the wi-fi mac address is uh this persistent thing you can't change you can't reset and similar with uh something like the the phone's imei is another example of such a thing but typically what ends up being transmitted among all of these apps that are running on your phone is something called the
advertising id so if you have an android device you have an advertising id if you have an apple device you have a id for advertisers idfa and in this case if you go through 18 layers of settings you can find this thing that allows you to opt out of ads personalization and you can see here my advertising id is begins with ced now if you click opt out of ads personalization what ends up happening is this thing turns blue and well not too much else basically every app gets signal that you the user has intentionally set is add tracking enabled to true there's limit ad tracking enable to true so the user's saying please opt me out
but you see at the very top of this the advertising id is still being sent there's nothing technical that prevents apps from accessing the advertising id only the don't be evil bit is set to true and so you know be on the honor system don't do any behavioral advertising with this information but here's the information anyways and this is in addition to anyway sending a wi-fi mac address which you can't even reset but the advertising id you can reset so the other main mechanism that you can do to opt out of all of this tracking that's occurring is go through again these 18 layers of settings and find where you can reset your advertising id
click it it resets with a new random number now this one you see begins with 380. so you basically create a new random number it's unlinked to your old random number now every data collector who's receiving information about you is going to link it to this particular number instead so you sort of create a new identity out of your old identity now the thing is dev to dev doesn't seem to mind if you do reset your advertising id because in their network transmissions they just send the old one and the new one together so if you reset it they're like oh that's cute the user wants some privacy okay so the guy who used to have
pc the advertising id ced which they even call prev as in previous advertising id was beginning with ced now has the uid user id beginning of 380. so that's how they sort of get around that and this happens even if you reboot the phone meaning that they're saving your old advertising id in their own local storage so that they can make sure that you don't get privacy if you try to trigger this little privacy provision that android affords all right our next example comes from dub music player free audio player equalizer headphone emoji now dub music player when we looked at its network transmissions we found something interesting something going to mobile.measurelib.com which was 28k gzipped transmission it was enormous
and it detailed every app you have installed on your phone all the directory it's installed all the permissions that it requires things like your location the scan of of nearby router includes an arp survey which has the mac address of your router it included a lot of information it also asked itself a rhetorical question why report which i found particularly interesting unsatisfactorily answered app launched and a little while later because report interval expired but more interesting for our case was that we saw that this app was sending a ssdp discovery message to our local network meaning that it was basically doing a broadcast on the home network and it triggered the router to answer back with its plug-and-play
configuration which happened to include its mac address so it it doesn't look like this is how they were getting access to the mac address and it it's probable that they were just reading the arp cache like we had seen before but this still stood out to us because we saw the router mac address was being sent to this app which didn't have location permission as an inbound transmission as well as an outbound transmission and when we looked at this inbound transmission we saw that basically this app was doing a discovery of all your devices on your home network and this sort of brings up an interesting point which is that these apps that we're running on your phone
all of these apps and all of the code that's included in them that's serving the purposes of ads and analytics companies and anyone you give wi-fi access to all of their apps running on their phones they're running on the trusted side of the network right the firewalls are designed to keep out traffic like this yet now here we're inviting it in and so these apps are free to do scans of your home network for whatever purpose so this app did not have location permission so it shouldn't have had the router mac address yet it did so we looked at their privacy policy because we wanted to see who was doing this and they said well we don't share your
personal information with anyone which is not not true they certainly share your location with various companies it's interesting that they didn't think to mention that um they did go on this isn't the only part of their privacy policy they'd say well except in situations legal warrant corporate merger stuff like that but nothing along the lines of no we collect your location and sell it and so we searched for all of the clues that we could find in this app that might indicate who's responsible for this behavior so we grepped for measure lib and found nothing and we grabbed for wi-fi mac and found nothing and we grepped for y report and found nothing and so
usually grabs your friend but in this case it's quite a mystery that none of these strings happen to be present in this app so our next step was to find all of the other apps that we had seen contacting the same domain and we came up with a little list of apps that were all talking to this mobile measure libs and what did they all have in common because these apps will have a whole bunch of third-party libraries we want to isolate the particular one doing the behavior that we're seeing so if all of these apps have a library in common and they did it's this coolest library coalib.c library they all had this third party in
common we figured okay well this this behavior that we're seeing must be occurring inside this library so we searched it literally five results not a good sign when the first two have malware analysis in the title so not not really obvious who is the entity behind this but we ended up through a lot of effort finding some message board kind of like a stack overflow where someone said i'm trying to follow the instructions here to integrate coolest library and i can't get something to work and this is then a unlisted link on a website that basically then explains how to integrate their coolest library into your app so we finally have figured out who is behind
it it turns out it is the company measurement systems the internet measurement authority a panamanian based company and this company is the one that's providing this coolest library out so we looked at the strings inside the coolest library and then we realized why we weren't seeing y report or the other strings that we were interested in they're all base64 encoded and when we base64 decoded them we didn't end up with anything meaningful either we ended up with high entropy randomness which suggests that they're further encrypted so we looked at the code to see how these strings are being used i see here at the beginning there's a con string beginning with 963 and then the
next line is it calls the function d dot g so this is more like what i'm used to seeing where you don't get the variable names for the and the function names you just they're replaced with letters so you have to figure them out they're meaning on their own so there's this dot g function it takes a string returns a string the string it takes in is this base64 string the string it returns is probably the useful string that the app actually wants to have so we looked at the implementation there and i won't go through all of it because it's a bit messy but i've basically reconstructed it here in a java pseudocode so the d.g function takes a string first
thing it does base64 decodes it and then it creates some variables i've given them friendly names like password and salt based on how i have interpreted them password is set to the return value of o concatenated with the return value of l so o returns m-e-a and l returns s-u-r-e so it creates the word measure and then salt is similarly constructed out of five pieces to create measure move measure then there are a variable called rounds is set to 10 with a length of 128 and what ends up happening is that a password-based encryption function is being used where the password is this measure the salt is measure move measure and it's doing 10 rounds of password-based encryption to
get 128-bit key then an iv is constructed by taking k.b and base64 decoding it and concatenating with k.q and base64 decoding it resulting in an iv value of some fixed number lots of bad practices in crypto here so don't use this as a reference about how to implement crypto correctly but what's end up happening is they're getting a key a fixed iv they're decrypting the string so we now have the encryption key that is used to base or to decode all of these strings so all of the string constants that we would have looked for when we're trying to figure out how this app is behaving are encrypted making our job a little bit harder until
we actually figured out how this was okay so now we have been able to decrypt all of these strings and i just want to point out here that it's quite fortunate that they're only doing 10 rounds of password-based encryption here because this is executing every single time every single string is used without any kind of caching at all so they don't like generate the password once and or the key once and use that but rather every single time a string is used which was an enormous amount of time in this 30 kilobyte gzipped transmission that they're sending frequently every single time this password-based key encryption is done over again just to figure out what the string should be
like why report now this company according to their marketing material covers 94 of the connected population with more than 20 billion measurements per day now i i don't know if this is true all i know is what their corporate marketing material says and the fact that according to their rank of the where their data is actually collected they're ranked about 71 000 in terms of the ranking of the website according to dns resolves so somewhere between um discoversouthcarolina.com and armywarcollege.edu i find it hard to believe that discover southcarolina.com is getting 20 billion hits per day but maybe measure libs marketing materials is not entirely forthright all right our final example comes to us from yumeng
so yumeng when we looked at the network transmissions that they were sending we found something interesting here this umt it happened to be saving or storing the hash of the imei the imei is another identifier tied to your phone you can't change it you can't remove it it is typically used when you authenticate with networks like cell towers it's also used to blacklist stolen phones so it serves as sort of security purpose in that sense generally apps have no business knowing your imei there's nothing that is interesting about it aside from the fact that it uniquely identifies a user and you can't escape it you can't change you can't opt out you can't reset it
so what was how were they getting it again you needed permission on android to get it this app did not have that permission but other apps can get the permission to actually access the imei it's called read phone state so this app when we looked at the files that it was opening we found that it was accessing on the sd card a variety of different files that were all like hidden files that began with a dot and when we looked at what the file stored they all stored the same thing they all stored the string the thing that we saw being sent so what's happening here this is a classic covert channel now you have
one app which has access to the imei it saves it to the sd card another app which doesn't have access to the ibi reads it from the sd card and that is how humane was able to circumvent the permission system that should have prevented that app from knowing the imei now something interesting about this is that actually the this is across six different phones so all of my phones that i was testing on all happen to have these files stored because eventually i ran some app that had humans sdk inside and the imei was hashed so now we have the md5 hash of the imei and again when we talked about being able to reverse the hash of a of a mac
address it's even easier to reverse the hash of an imei if you look at the column on the left they all begin except for the last one with three five nine two two one one zero meaning that there's really not a huge number of different imeis so if you know the manufacturer if you know your phone's prefix of the imei which is basically the phone's production run then you're able to quickly enumerate all possible hashes of that imei and if we look at what's stored in newman's file though what's interesting is that there's some zeros removed from the actual md5 hash so it's not the md5 hash exactly it's some function computed on it which
for some reason seems to remove zeros but not always the first zero not always the second zero not always only one zero but some amount of zeros and then it staples on a four digit in hex suffix at the end so i spent you know 20 minutes but trying to figure this out and by 20 minutes probably closer to three hours but you know work i shouldn't have been doing as a professor not getting anywhere and eventually i gave up so i leave it as an exercise to the class if you want to uh figure this out i would happily happily receive your email with a little explanation um how this is actually occurring because it is
it is curious exactly what they are doing and why but i wasn't able to quickly find it in the code so here's a bunch of apps that are exhibiting this behavior um that are able to compute the imei in this way so uh yeah have at it and have some fun if you want to get your hands a bit dirty doing this kind of interesting fun stuff all right that is the end of my talk so i hope you all enjoyed it uh and enjoy the rest of this conference i'm quite happy to take questions and just want to point out that this technique is the gift that keeps on giving because developers keep finding side and covert
channels and apps and instead of doing the the ethical thing responsibly disclosing it and making sure it gets fixed instead exploit it and leaving it up to us to be the good guys to actually make that happen all right thank you again and with that i will happily take questions
i'm seeing lots of nice comments so i appreciate the many thumbs up smashing your phone is probably a good idea as well oh yeah as for as for mobile apps one approach you can use is f droid which is an open source repo like all of the open source apps are fine i've tested them and i see like absolutely nothing sketchy going on so uh yeah the easiest thing to do is to just install open source apps for everything and they're better anyways and we should all be like it's a bit of a pity that like we have this ecosystem where the linux kernel is what's running on the phone and then on top of it is this
aosp project the android open source project all of this is open source code and then all of the code that the developers use to write any of their software is open source code and then there's this tiny veneer at the top which is like you know the flashlight app that tracks your location like why why is this not you know a solvable problem why is the apps that most people run not open source when they're already running an open source operating system and an open source platform so
um have i tested apple devices so this is harder to do but um there was recently a study done and and they found similar findings what's nice with the apple platform is that when you opt out of behavioral advertising you don't get the advertising id you get zeros so you sort of limit that kind of tracking and apple's always been better at locking down identifiers so you can't get imei and stuff like that so um but nevertheless the same ads and analytics companies are present in these apps and they can also track location if you have location on um so it's not the case that apple is necessarily better per se only that android gets more scrutiny because it's
open right it's an open platform you can do these sorts of large analyses quite easily um which results in them getting a lot more scrutiny um in the same way like an open society gets more scrutiny than a closed society what tool did i use to decompile apk tool i can post it in the chat here apk that's the standard tool uh and then you just you give it an apk if you install an app on your phone you can find the apk in the directory slash data slash app slash you know com.blog and then it's called base.apkbase.apk and then you'd go you can adb pull path to apk and then apk tool the b f dash
uh and then you get a next directory structure and you can start playing around uh lineage os probably like any any completely open source um uh platform that if you're not including google's uh code then you're not having an advertising id um but if it's based on the end operating system then you're still gonna have these bugs for instance if the arp cache can be read or if um uh you know if you the side channels like get sock opt or or get it fatters so um but it's a matter of well when android fixes it they can fix it too right so it's it's basically parallel running on the same project but they can be more
aggressive in things like oh we won't allow this thing that you know for some reason google does allow right
do i have any theories as to why it removes some zeros the the only theory i can think of is to not make it obvious the hash of the imei or the yeah is being sent because basically when i looked for apps sending the imei i would normally have searched for the hash of the imei entirely but for whatever reason i decided to just search for a suffix at the end of about 10 characters and so it just happened that i caught them sending the imei but had i not made this change when i forget even why i made it i wouldn't have caught them send me the imei uh so like it's kind of like uh
you know probably more uh hubristic than is true but like in my imagination it's like they they don't want to get caught sending the imeis they tweak it a tiny bit so it's not exactly the imei so people searching for the hash of the imei aren't finding it um yeah i don't know why no one's noticed this before i it's like it just it's there to be found i guess right so it's like uh part of it though is searching a lot of network traffic at scale so like basically running lots of apps to find the handful that are actually cheating and then um you know doing that kind of analysis um but really it's it's all open right
so you can just like minim proxy the apps on your phone and look at the network traffic and then you know see what's being transmitted
what is my thought process methodology looking for cvs like this um so yeah the the the original thing was basically to not have false positives so like i was i'm quite sincere when i said i was not expecting to find so many side and covert channels i was expecting to find bugs in my code that were saying oh you're sending phone number when you're not or you're sending location when you're not and fixing the code and when i found them all i was just like like it was like this aha moment where i realized like like of course there's going to be side and covert channels and like of course the permission system isn't 100 sound but at the time the the
the original process was i'm going to find mistakes uh in my code so that i don't go and tell regulators that companies are doing bad stuff when they actually aren't so
is the only is it only advertising companies that are exploiting these side channels well so it's it may it's unlikely i guess like i i don't intentionally run malware and that's another thing like maybe i should like start going to like malware databases of apps and start seeing what they're doing because i probably find more examples of cytocover channels um or maybe the ones in malware data sets are more scrutinized and so the side and cover channels they're exploiting are just sort of like known and fixed already um and what's left is that the the thinking about ad companies and analytics companies they're sort of like they're they're benign you know they're the good guys they're like you know so
you don't they don't need the same kind of scrutiny um so um yeah i mean it would be interesting definitely to start running some some actual malware and and certainly this like mobile measure libs it smells like malware if they're encrypting their strings like i i don't think that they're doing it because they're you know a good faith actor um have any of these companies disputed your claims or threatened legal action well we got a we got a letter from uh chief counsel of one ad company um but it wasn't a demand letter it was more of a like scare letter um and uh we wrote back basically saying like you know that the the this
everything we said is true they were basically taking issue with the fact that we misquoted their privacy policy which they changed after we published which is bizarre because like the internet is archival so the wayback machine has their old privacy policy it's like you we quoted the privacy policy at the time of our paper was published and then they said well we were misrepresenting them because they don't say that so i don't know i i don't think they were it was a really serious attempt uh to to silence us and certainly like threatening academics working in the public interest is not in the interest of a company that's publicly traded um no panamanian cease and desist
letters because uh this is actually like yeah you guys get the advanced showing so i haven't even published this i just reported it to google recently and i'll probably going to um do a blog post about this on the app census blog um which is my startup company that like looks into the whole ads and analytics space on the android world um but uh you guys get to find out about it first um i like your not so subtle dig at epoch time unity and cnn yeah i mean thank you yes yeah eunice time really is it's just it's so wonderful arbitrary but so is all kinds of time all time is arbitrary so the second since january
1st 1970 seems as good as any other kind
um if i didn't answer your question feel free to repost it because i'm i just can't go back through all the history so and i don't know if the moderator is going to come on and and and ask a live question or if uh they're going to come on and boot me out so i don't know i don't know what i'm supposed to do next i'll just have some tea
so any more questions put them in the chat i guess otherwise i will wait a minute and then mute myself and what do you think about twitter's ongoing project of testing out sending ads to people's dms incentives to spying on their feed i haven't i don't know about that that's an interesting uh that's an interesting uh verbality uh that's an interesting um i didn't know about that okay i'll maybe something to look into
happy that you guys all like the talk and uh yeah and again i'm happy for the invitation and thank you so much and i'm looking forward to the rest of the of the conference