BSides Oslo 2023

BSides Oslo · 20235:13:29382 viewsPublished 2023-09Watch on YouTube ↗

Show transcript [en]

um so this is just a screenshot showing that these three online uh tools that index all packages show different dependencies for the same package so this is still open there's more more details in the blog post the depths.avs see URL lab3 while sneak and the socket.dev see charge set normalizer okay so I talked about um manifest confusion and distribution confusion and this is resolution confusion is a attempt at the umbrella term again if you you've seen this before let me know but it's all the ways to resolve a package into something else than you expected so of course we have dependency confusion where the package installed from the wrong Repository manifest confusion where there's multiple ways to resolve wish dependency

a package has and then distribution confusion where you might get the wrong distribution so briefly on how I approach this research so I was actually starting to implement a transparency log for Pi Pi and then I had some assumptions that led to do some bugs and I started reading the powerpi source code and it turned out that the allowed inputs were created and what was needed to do a regular um upload so I I wonder if this was possible to exploit and then of course the you can only have one source distribution I got in the way and I started to look for bypasses for that so the four blog posts are released today is here uh I've touched on most of

these but the the second one only reproducibility in Pi Pi that has a lot of other ways where you might get unexpected results with um with python packages and with that I can take questions [Applause]

anybody questions up top on the balcony downstairs yes mate thank you um now that you've seen that this possibility exists um have you had the opportunity to scan through Pi Pi for example to see if there are packages that could be vulnerable to this do have the these confusions in place or or other um in the wild occurrences of this yeah so in general because the the file name rules have been so lacks historically there's a lot of garbage in the system and for the most part it looks like it might be accidental but I haven't looked closely enough to to know if it's accidental or not those issues that are present and so that might be a

future research but luckily like this is easy to scan for right it's easy to detect much easier to detect than if a package is malicious

anybody else questions for steon manifest confusion distribution confusion I like the Rhymes where is the hand keep it up hello so the prerequisite that was listed uh what does that mean and how hard is it to get it so that basically means you you need to compromise a package to be able to uh to publish as the maintainer so it could also be of course not the maintainer turns evil but you need to have permissions to to actually publish packages which should be our high bar and basically when when you are pulling in a project you are trusting that project to not do malicious stuff right as this is just one malicious thing you can do if you have those permissions

anybody else questions for stellen about these python package distribution attack vectors

hi um I was wondering you mentioned detection do you have any examples of good tools or ways or mechanisms to implement to detect these scenarios yeah as far as I know nobody except the test code I have is is doing that detection

all right and uh you'll be here today still if anybody has anything else they want to ask you you can also ask him about Quantum cryptography if you like so thank you very much stay on thank you [Applause] and all right so that was our last talk before lunch so luckily the talk with the sandwich in it was well placed Let's uh find our way down and grab some food the food might not be uh out until 12 when the scheduled lunch was happening but it's coming and we'll be back after lunch for the rest of the program thanks everybody

welcome back everybody I hope you had a chance to grab some food and and chat with some some new and old friends colleagues acquaintances um a couple of comments before we get into the rest of the program if you pre-ordered a t-shirt um please pick it up um there are some T-shirts that have not been claimed if you did not get a t-shirt in your size due to our uh delayed shipment from our supplier please go downstairs and talk to penita at the reg sponsor area and work out a solution with her we can do refunds or we can get you your shirt when it arrives we also have some of these shirts from last year's event and uh you can pay

what you want if you would like one of those shirts as a donation uh up to you any shirts for this year's event that are not picked up by five we will put them up for sale as well because many people have asked and we're waiting for everybody who who pre-ordered to to get those so just shirt stuff okay uh it was inevitable that we would have a talk about AI so preben is a PhD student at entenu Charles and he's working on ways to break Ai and hopefully some solutions for how to fix it fingers crossed take it away Brevin thank you uh excellent yes uh my name is uh I'm at Engineering also at similar research lab here in

Oslo um and I'm going to be talking about ways that you can break AI systems using something called adversarial attacks um and some ideas for for some solutions that you might want to to implement which is what I'm researching in in my PhD uh so first just a bit of housekeeping I'm going to start off with a brief introduction about how neural networks learn how they're trained I'm going to be talking about deep neural Nets which for if anyone doesn't know that's the sort of the main AI architecture that you'll find in chat gbt and the algorithms on on Tick Tock as well um so after that brief summary I'll go through what adversarial attacks are how

they work um and why they're a problem hopefully convince you of that um and then in order to introduce a solution we first need to talk a bit about how I've said how neural Nets represent information you could say how neural Nets think if you want a anthropomorphize a bit um and then I'll introduce what we call the causal neural network model which is sort of my area of research and end with some thoughts on the the state of the art all right so first things first uh for the purposes of this talk I would like you to think about deep neural Nets as very powerful pattern recognition machines they're very good at finding statistical patterns and data

um they could be very complicated patterns such as for example the patterns relating the pixel values in an image file to the label that you would like to apply to that image if you're building an image classifier um and I'm going to be talking about images in this talk um because they're quite nice and visual but everything I say is sort of equally applicable to AIS that deal with text or speech or you know stock prices um okay so we're trying to identify patterns between some some input X and some output y we have a very powerful machine but because this machine is so powerful it can spot patterns that are false patterns that don't generalize so I have

an illustrative example here um the um if you imagine training an image classifier on the cows in the center there and you train your AI to say these are pictures of cows then the AI will likely pick up on the fact that okay all of these they have like some green grassy stuff in the background that's probably related to what it means to be a cow um you know you train this you get good accuracy you deploy it and in production you encounter the image on the right and it's very likely that your uh your AI is going to fail um because that pattern no longer holds uh it was a false false pattern all right

that's all well and good it's a cute example um but it allows you to do some more malicious stuff um it allows you to do what we call adversarial attacks by by sort of exploiting these false patterns um and an adversarial attack uh for images consists of uh you start with the image there on the on the top right which is of a panda you have an AI system that correctly classifies that image as being of a panda and then you add some some noise to it some carefully crafted noise just a tiny bit to produce the image uh there on the bottom right um probably from the distance you're sitting you can't even sort of tell that

something has happened um it certainly looks quite similar it's just just a tiny bit of noise but the AI system that you train now fails completely to classify this as a panda I think it's a it's a cat in in this example with very high confidence and we think that the reason this happens is because much like the sort of grass cow example there is a more complicated patterns in your data as well um that aren't as sort of obvious and explainable they can be to do with sort of fine structures in your pixels um you know all very complicated stuff and it means that you can add noise in just the right way to to trip up the AI

um so that's what an adversarial attack is um and it's obviously not that big of a deal if you're just classifying pandas but you know if this was your self-driving car or if this was supposed to diagnose like cancer tumors or something um then then it's a bit of a problem um it's very interesting to note that although AIS are very susceptible to this type of attack humans are basically immune like you hopefully no one in this room looks at the bottom picture and thinks it's it's a cat right all humans sort of naturally are able to defend against these attacks um and the hypothesis is that the reason humans are very good at this

um is because we are naturally good at extracting uh what we call the causal information in in this case an image and that means we're good at extracting basically the important information the important the information that affects um in this case the classification we're trying to make and we're not fooled by sort of random noise in the background um the way that these statistical pattern machines are um and there is a theoretical solution to this vulnerability and it is if you're trying to recognize pandas you just need to collect all images possible of a panda you need to go out with your camera and take pictures with all types of backgrounds all types of lighting

conditions and camera angles and lenses and and all that kind of stuff um that's obviously kind of Impractical so we would like a solution that we can do without having to leave the office um so to introduce sort of why or how we can go about solving this we first need to dig a little bit deeper into how how neural Nets uh think how they represent the information that you give to them uh so I've drawn a very simplified schematic here of a neural network um in this case processing images so this could be the the panda classifier um and we see that a neural net consists of a bunch of layers um and they process the information you

give it sequentially starting with the first layer and then moving up to the second and so forth so so the first first layer here um looks literally at the RGB Channel values of every pixel so that's very that's a lot of information you know you can have 200 by 200 pixels in an image three color channels that's 120 000 I think um so but that's that's very low level information there's a raw pixel data um then there happens a bunch of processing and the information is fed to the next layer which is smaller um so this this next layer has a a smaller capacity for sort of storing information and it needs to sort of

disregard the low level details and then try and sort of extract some some more high level stuff uh so the next layer might sort of learn to identify basic lines and Corners maybe in your image do some like very basic processing and then you sort of move down the chain here to to deeper and deeper layers and at every stage the information is compressed down to a more uh sort of abstract high level representation um so at the very end you might have quite a small layer that encodes information about I don't know like faces and sort of fur and snouts or whatever for um for your animal classification and then at the very end a prediction is

made based on that sort of compressed information as you can basically think about this as a as a lossy compression algorithm aimed at sort of extracting um a particular type of information okay so so this allows us to sort of reduce down from all the possible combinations of pixels to a smaller combination of sort of high level um codes or are compressed uh representations of the same information right we now have all the pieces we need to think about what we call the the causal neural network architecture um and instead of having a single processing pipeline like we saw in the previous example the causal neural net takes in the image and then produces two

independent sort of information streams level one C here for Content the content of the image such as uh subject and shape um and there's one called s for all the other information the lighting the camera angle call it the style information um so this is um an architecture which aims to separate out all that important information to to put it in in the C stream and leave all the unimportant stuff all the style information um although like grass information uh in the S stream um and if you're able to do this uh sort of correctly or or sufficiently accurately then you can do something quite clever which is that you can introduce those little box here that

says perturb signal what that means is you're taking your style signal and you jiggle it around you add some noise to it uh you you flip a few bits here and there just to try and try and corrupt it a bit make some variations on the original style signal uh so you might you know produce sort of say 10 different variations on the actual style signal that was in the image um and then you sort of one by one recombine these with the uh your content stream uh and you have a have your neural net make a prediction at the end and very crucially you tell your neural net that regardless of what I do to the style

signal you should always predict the same label um so so this sort of allows you to approximate that um Gathering of all possible images in a much more manageable formats and under certain quite lacks mathematical conditions that I'm not going to go into this sort of converges to a uh to to A system that um is able to to do what humans do to sort of extract the the the causal important information um so this this causal neural net architecture is um is quite quite a new thing I think yeah the first papers I saw discussing out from like five-ish years ago maybe um and uh they've gained a line popularity over the years since

um because they are as we've talked about not fooled as easily by adversarial attacks um they're not fooled by sort of adding noise to images or if you're doing text you know it's easily fooled by swapping words and and sentences um but we still have uh a long way to go um even though they have all these sort of uh desirable properties um so that's what I'm doing in my PhD um I got three years left so I'm sort of trying to make some some progress on these things um there's a lot of interesting stuff that goes into that green box that's a separation mechanism that tries to separate these information streams um that's highly non-trivial to

to design um it's also a question of how you know that the uh the the C and the S signal streams that they contain the information that you expect them to contain um and you know how do you we've talked in in very qualitative terms here how do you make this sort of mathematically rigorous um so so a bunch of open questions but they have shown a great deal of Promise um they're good at adversarial attacks they're good at generalizing doing stuff like training on one data set testing on another data set uh those types of tasks um and I'm quite a quite a fan of them as you might have guessed otherwise I wouldn't have spent

four years of my life sort of researching them um but so hopefully that sort of piqued your your interest in in how these can be used to to make more secure um AI systems thank you [Applause] all right this was a short talk but we we finished early so anybody have any questions all right let's go hi um I remember reading a tweet from some cool guy and neural network some time ago I think it was the only but I'm not sure yeah he tweets yeah probably so basically what he said is that any attempt to make your neural network smarter but trying to explain it how to thing is failed uh you know is bound to fail eventually the

bigger neural network with more data will win so it's a failing strategy so what do you think about that and also another question is uh when you showed the first adversarial attack with a some crafted noise um my first what I thought it was maybe it's about how we access this data humans that you know for us it's a little bit blurry we don't get access to the individual values of those pixels so why not just make the image a little bit blurrier with some random noise in other words just put your perturbation step as the first step and that's it why doesn't that work thanks yeah yeah uh okay so to answer the first question first which is

um can't you just solve this with sort of more data um the answer is yes eventually if you have sort of infinite images you're guaranteed to get a uh robust or a secure uh neural net but I think the question is more about how quickly you can get to that step because if it requires sort of it's very possible that it requires more images than Humanity will ever produce in its lifetime and then it's sort of unattainable um so the I think the causal neural network texture and first is sort of some guidelines on you know how you'd like the information to be processed to use that data more efficiently uh because you know with

perturbing your style you can basically turn one image into 10 or 100 so it allows you to sort of extend the data that you have um I also want to say that young lacun has published a full Manifesto about how you should make AIS think so you know who is he to talk um um to answer your second question um about why don't you put the perturbation step as your first step the that's a good idea it's a it's an established strategy um which we call adversarial training and it basically means you do this to your training set and then you train um the issue with that is that it sort of makes you good at defending against a

specific type of attack it makes you good at defending against the attack you trained against um but there are sort of many different attack algorithms and you need to patch each one and if someone uses a different one that you haven't trained on then you know you're a bit screwed um so so that so that's a practical solution for sort of patching problems but it's not it doesn't solve the underlying issue

uh do you apply it to the uh the image and the image is the input to the as you well you're applying it to the pixels or to the label yeah yeah

oh yeah so um uh you mean like before you train you add you blur your image and then you train so that it's not sensitive to being blurred right is that okay okay we can discuss the ins and outs of adversarial training um yeah thank you very much preben and he's here hopefully for the rest of the day yeah yeah so uh feel free to take up this and other discussions all right thanks [Applause] all right up next we have Swan Bouchard and Gautier Ben m to talk to us about something completely different this is the the fast-paced part of the day where we go back to back shorter talks

all right guys take it away

how do I get in there we'll make it work where's the pointer

yeah okay pretty good

all right um hello everyone so today we will talk about uh graphql security and we will dive in the report we published about the topic we also have a few surprise at the end so stay tuned um first I will introduce Escape so at Escape we are building a fully automated security platform so the interesting point is that we are totally agentless and so it works only with an input domain um and a notification strategy if you want to go deeper in your application um Escape is known for its unique IPL traversal algorithm we will go deeper in the subject later and we are actually capable of testing business logic and not simple fuzzling through reinforcement learning algorithm

um with the algorithm we will generate legitimate sequences of requests and exploit complex attack scenario and Chennai attacks such as ssrf and stuff um we're also known for our reason to work about attack surface management so you have a lot of input here but at Express management consists of referencing any endpoints Exposed on your organization just like a Google search holds index an application at Escape we do have a team dedicated to finding new vulnerabilities zero days finger printing engines on graphql and rest application and continuously improving this algorithm that so you can see the result here all right so let's get right into the report of graphql security topic so we scan 1600 API endpoints each of these

endpoints were publicly accessible for legal reason I guess and so that represent cumulative duration of 460 the errors so that's a lot of computing power at a lot of cost if you are on AWS so we collected almost 47k security alerts which is quite a lot on average this is Suchi alert per graphql endpoint so this is really interesting if you if we do have some people in bug Bunty here because we will present you some specific graphql findings and some CV we reported it happened that sometimes people tag us on Acker Rank and iq1 to say okay and some Kido for the CV you published I just got Ubuntu for that um so if you wish you can already scan

the QR code or take a picture of it if you want to access the complete report later on but let's introduce ourselves so here is Gucci he's full stack engineer at escape and he is also leading the web application team and I am Swan I am a security software engineer I am leading the cyber security and Randy part and if you are stuck in your CTF I am also experience it in binary exploitation well thank you this one for this introduction uh first let's let me ask you by raising your hands who is familiar with graphql oh quite a lot actually that's great we'll start by introducing uh this technology named graphql and explaining what it is why it is vulnerable by

Design graphql is a query language built around types and Fields rathers and endpoints this uh allows front-end developers applications to query exactly what they need effectively moving from the back end for front-end paradigm Sometimes some types sorry May expose other types through their fields effectively in creating graph data model it was designed and used internally by Facebook since 2012 and was made public starting 2015 and companies get started to get attracted from Fortune 500 to startups there's another reason why graphql is getting popular is the the graph model allows creating some kind of API gateways effectively exposing underlying apis built with different Technologies or graphql or rest whatever by grouping these the all the data the

underlying exposes inside a big graph with joint concerns such as Authentication

here is a rather complicated graphql query which allows me to introduce some well vocabulary a graphql query starts with perfect in the top left corner an operation there are two main kinds of operations queries and mutations we can Loosely map Zeus to queries map to get requests get HTTP requests and mutations to http passed requests we will detail all the other kinds of tokens in this query when we will talk about the vulnerabilities they imply there are two interesting metrics regarding graphql queries the depth so depth is the level of nesting as a well the level of the deepest field in the query and the width is a total number of fields in the query

we said graph model here is a representation of a dummy graphql API and let's say for instance we have a type on the right hand side exposing a stripe token fields and a path in blue leading to this field well since our data model is a graph there may be several path path leading to this field for instance a shorter one here in red and a longer one in a Range effectively creating some kind of Access Control nightmare I guess so now we will talk about the Escape magic and why is it hard to test production application but it's a fundamental problem it's not specific to graphql any application is hard to test dynamically um so first only a very few companies

how they integrate automated Dynamic security testing pipelines this is becoming uh more frequent with uh compliance and compliance in payment system and compliance in general secondly must have a security solution use fuzzing algorithm so sometimes algorithms are really simple but sometimes are really Advanced if you take for example um ledgers and stuff they will use a lot of fuzzing testing really Advanced and anything that is coming to Binary exploration exploitation is about phasing but API is quite different and often a security solution do not pass the validation layer and so they never test the business logic whereas most of secretive vulnerabilities are represent so that's why we developed a unique algorithm that we call feedback drive

and exploration that is specifically designed to learn about the application about its business rules and the way it actually organized objects um this system was specifically designed to work on graphql but we reached the state of the art in that field and we are now battle testing it on different protocols such as restor grpc application to explain it really simply we will go to the next slide so you can take a look on that part you will see that you have way more current data than on the other one so on the right side you will find a query that is totally randomly generated and on the other side you will find something which look like

a new ID an email which is very likely to be a gizzard from the application and you also have an injection in the password um so Escape is scanning in two different steps there is a first step Where We Gather a lot of data from the application trying to understand this business logic and we will change a very few things after that to test about vulnerabilities if some of you knows nuclei outer with Ubuntu testing you can see Escape as like a nuclei on a really really strong steroids with this feedback driven explosion algorithm um so as you can see just mentioned uh and uh resources are all organized as a graph so it's becoming

um they add to automatize that with a software because application can be very huge we feel duration and stuff we today have algorithms that are capable of operating both deep and wide and actually testing all path all parameters and all objects in a graphical chamber is not an easy journey to manage this recursive Behavior

yeah so we build a complicated algorithm we make it run on the internet with what we found more than 40 and a thousand alerts which we will review one by one right now just kidding here is what we found grouped by categories in this big list there are graphql specific errors vulnerabilities and some kind of vulnerabilities that has existed ever since apis existed for instance publicly available stack traces and say fruits graphical injections let's take a look at the this first tree to begin with uh here is a graphql server which has its debug settings uh still on on production and it's about 12 of all graphql endpoints we tested and why is it a vulnerability because

some strike stack traces uh just dumps Library versions for instance This Server is using a deprecated library with a very critical that icve on it the next one is a unsafe route being public which means in fact that the mutation which can affect edit data on a server are missing some Access Control last but not least we found many SQL injections they are still there one bash command injection which means with we can just run code on the remote machine and three server side request forgeries we were able to intercept machine to machine Communications to leak some API Secrets more on that later now let's jump in through the graphql specific vulnerabilities we group them in four categories

API broad forcing denial of service API schema leak and unsafe public operations the first one API brought first thing leverages a feature or two features actually named batching and anything which can by Design Group several graphql requests in a single HTTP exchange making the whole exchange faster on the internet because some web application firewalls count requests at the HTTP level it allows an attacker to run some kind of Brute Force attacks to run for instance three more three or more login attempts in a single HTTP requests effectively by passing rate limiting

so here's a quick conclusion and of the consequences you can have your bit varsing so student user account because you can actually brute force and say okay I will try uh 10 000 of pin password tries in one request and because I bypass the rate limit and you can extract the sensitive data using such techniques um yeah and this feature aliasing one coupled with another feature named graphql file uploads can lead to a modern version of zip bombs we named it graphql bombs for instance let's consider uploading a single file of one megabytes but if we reference it a thousand times in a requests with a one megabytes HTTP exchange we are able to create one gigabyte of work on the

endpoint so we can add graphql bumps to the list of consequences this time's for you um okay all right for the second vulnerability we'll talk about denial of services and we will take fragments as examples so fragments is another feature of graphql you can see that as a function if a regular program means languages so I may ask you what could go wrong if you have a language that is recursive by Design if you implement a function in it so let's get an example of that um so this is a very simple example we are declaring a query which called a fragment X we are declaring a fragment X which called fragment y n and so on Sir

congratulations your major first infinite fragment recursion Loop um so this is an updated stack Trace so most of uh JavaScript base and python-based engine don't want to actually cover these cases because it's handled by the language Safeguard on encryption but still you get an interesting stack Trace let's think about compile languages because this thing is in the graphqlsa so compile languages such as goolong and rest uh do not support um this at all so if you use the rest here my team and I made the rest more secure because we published multiple cves so this is a different attack but still on fragments you can chat check on GitHub advisory if you want the detailed explanation

so we are only disclosing the ones that do not affect graphql rfcs and are related to the implementation itself depending on the languages so in that specific example just to be quick we nested a lot of fragments together and we lead to memory overflow and took down any rest server almost instantly so as I said we made a lot of contribution to JavaScript engine and Java engine which is used by Master Fortune 500. um and so uh to escape we do run into a lot of zero days but we are always reaching graphql foundation and maintainers first before disclosing consequences of general of service could be nip availability for some time and to start be shut down

so the third uh interesting vulnerability specific to graphql is the schema leak um so graphql provides some introspection capabilities but you can disable this feature to hide your schema but with another graphql feature which suggestion you can rebuild the schema from scratch so with that suggestion any misspelled query will be corrected to something that actually exists in the schema even if the introspection is disabled so that is an example of schema reconstructed to clavianos is a new pencils package that you might be using already if you are doing a big bunches on graphql we contributed a lot at escape on this package this is totally open source [Music] example in that specific Shimmer on this

side you have the schema that we found through the introspection and someone just started it was interesting to hide update them inquiry because it wasn't protected so we changed the password and someone was not happy um yeah so we highly discouraged uh considering security through obfuscation because pen testers are always capable of restituting such information so we can see the consequences of schema leak access to an authorized function that you previously hide and the easy confirmation because if you have an update admin query you just have a backdoor in your API

we also found a shovel full of publicly available information into big categories piis personally identifiable information and tokens API mesh into machine tokens among those in the Pia category we found a lot of email and phone numbers which are which can be which could which in this case could be a false positive like supported resumes support phone numbers sales phone numbers Etc but that was never the case for passport numbers and bank account numbers which I which are former critical far more critical on the other hand we managed to dump a lot of machine to machine tokens like aw uh Amazon web services tokens Google platform tokens private RSA keys and certificates and auto accounts to

Adobe as an Aggie Tab and even one chat GPT token all these graphql specific vulnerabilities these four categories can be linked to the more common web top 10 vulnerability classification that Tobias showed a while ago for instance the Dozen complexity issues fall under the API 04 category Access Control fall under all the access control web type 10 categories or 102 and o5 and security misconfigurations any information disclosure disclosure fall into the APA 07 category

foreign to sum up what you could read in the report we found more than 100 000 public graphql apis on the internet through GitHub searches Google searches only asint we narrowed this list down to a thousand uh a bit more than a thousand based on two criterions they need to be publicly accessible without any kind of Authentication so let's see any dedicated Criterion and as their introspection as their introspection open which means that we can get the whole schema from the API without needing to reverse engineer it with the suggestions for instance all the vulnerabilities you saw some are graphql specific which can be the most devastating ones but some are as old as the internet

and denial of services are possible quite easily by default because graphical engines feature no odd limits on query depth and query widths which can lead in the case of a graphical gateway to a single point of failure because you're able to down the Gateway you are bolted down all the infrastructure

you can scan the QR code to download the whole paper you will get more informations on the 46 000 alerts we did not detail which makes a lot of pages and right so let's see how we are trying to make the web app ecosystem more secure so not specifically graphql and that is a few surprises we talked about at the start of the torque first we can tell you about graphql armor graphql armor is completely free and open source package that is a one-line installation for JavaScript base and Giants you can remediate to a dozen of graphql specific renewabilities such as is limited character limits depth limit and physician and and so on so check that

out if you are like a graphql developer or trying to improve in blackbunji so just type capture number and if you want to make a contribution foreign contribution so as I said we recently extended our algorithm to request application so we did the exact same report on rest application um so we published a website called iprank.dev we ranked every apis on different critics but so we scanned again the internet trying to find unauthenticated and public API you are for example Spotify SoundCloud Etc so we had in mind that if you come to choose your next payment solution you could choose between stripe checkout.com and so on so we ranked every API using different criteria such as Securities

compliance with a wasp top 10 performances so pure speed reliability and design the Azure application commented and so on and we still found interesting token in in all these ipis GWT emails and because it's really interesting because it's super recent we found a few open AI API keys um for the numbers we scan a 6000 over 6000 API we found a bit less of vulnerability this time because we did not add the graphql specific vulnerabilities and uh if we have some devops writer we you also have the technical specifications

and last but not least we are launching a free completely free API Security Academy which can be accessed through this link it's a platform to learn about API security using both approaches first a red team approach so you get to learn about the vulnerability you get to learn how to exploit it and then you take the blue team approach you have to patch it and effectively prove that you have the vulnerability is patched what's interesting about this site is that it runs entirely in your brother you got a full node.js development environment working in your browser through web containers so it can work completely offline thank you for your no not yet oh yeah here is the details of what a lesson

looks like it's a complete integrated developer environment with a file system the explanation of what the vulnerability is usually small yeah what's wrong yeah small servers with one specific vulnerability yeah so take a look it's open source it's on GitHub and it's securely on beta please file bug reports so that's all for today um we're still hiring if you are interested in cyber security software engineering or full stack web development we are based in Paris so if you want to visit friends that could be a good opportunity and the errors are handles if you want to add on linkedin's and the mail and maybe meet again at some conferences thanks thanks [Applause] guys for traveling to be with us here today that

was great does anybody have any questions hand in the back coming back keep it up where was the hand yes thank you sir hi guys thanks um so first of all your research is based on the apis with uh which all have introspection open right yes so would you consider disabling introspection a must in this case as a step one to protect the API and how much would it make would disabled introspection make your work more time consuming um so as I said with schema leaks um even if the introspection is closed we can reconstruct the application so [Music] I won't go too deep in this part but we contributed and made a clubhouse open

source and most of the code is written by Escape guys but um you could always with some complex NLP algorithm uh fall back to for building a complete specification even if the physical system is closed there is some heuristics available out there but that's another stuff so basically you can see they're disabling interfaction and security of Rob security right yeah okay it doesn't interact security biop security is not recommended at all you shouldn't consider it also if you're uh API as a front-end the front end will contain some graphical queries that you can already take as a base of knowledge to create more complicated queries any other questions looking up at the balcony no okay are you here for the rest of the

the day and hopefully the evening thank you so much to Gautier and Swan [Music] [Applause] you're not uh you're not forbidden to receive gifts hopefully

thank you thank you there you go and next we are whelping welcoming to the stage Veronica Schmidt and emblem Butterfield they are Veronica is a assistant professor at uh norov and emlin is head of computing and program lead for the bachelor in digital forensics and they're here to talk to us today about iot medical device security if you've ever been left alone in a room in a hospital while you're waiting for a healthcare professional to arrive there are many gadgets to amuse yourself with

Veronica just said that's why some of us don't get left alone yeah just tell them you're a hacker and they will send the doctor right away they won't leave you alone foreign

so as we as we move to exploit uh iot and new technology to to provide Health Care to in the Aging population how do we strike a balance with the security part all right and with that we are up and running let's have a hand for Veronica and Evelyn [Applause] thank you hello okay so he kind of used my real name which it's generally just V my mom calls me Veronica so if you come talk to me just call me V it's easier that way so like he said some of us don't get to be left alone in a cardiologist's office anymore I did three years ago when I came to Norway the first time I don't anymore

I think it's because they Googled me so we're very excited to come present some of the research we've been doing at norov University College uh some might say it's a bit of an unhealthy Obsession that I might have well let's introduce ourselves so I'm V I'm an assistant professor how do I get time off of work I bring my boss with me I mean I am a PhD student currently actually studying how to build robust logging for forensics and incident response on medical devices it's a dirty job someone's got to do it I'm also involved with Defcon and the biohacking village and a lot of other things that don't really matter but I'm also a cyborg it's probably a

thing that I'm the most proud about I have my own walking talking ecosystem good afternoon I'm Andrew butfield I'm head of computing for North University College and program lead for our digital forensics bachelor program but Veronica and I come with industry experience working with law enforcement across the globe and really can be a passion to try and find out what evidence can we get back from any devices we look at from the entire ecosystem not just from a computer not just from a mobile phone but from absolutely anything and that's where this kind of talk stems from I think it is uh I still remember the first time that I asked the question of my

cardiologist do we use pacemakers and defibrillators to determine time of death firstly you looked at me strangely because I'm asking questions about determining death and asking super technical questions only to be told no we do live at him which I thought was very silly at the time because we have something connected to a person yet we use liverton that is accurate in a ballpark so what can you expect I hate to disappoint you we do not have the tech and the equipment that the public sector has I'm on a budget Edmond doesn't want to give me lots of money so no we do not have the technology the fanciest thing you're going to see is some AI generated

art but we would love access to all of that information and systems that were on the keynote this morning um but I think a lot what we've done at the minute is very much not on a budget but it's completely achievable by anyone you'll see as we talk to that too it's specialized but at the same time it's not specialized no so let's manage expectations right now right I'm not gonna probably show you something that's going to blow your mind I wasn't surprised as you were but we're here to share some of our research so what are we actually going to be talking about if you've Googled me seen some of my talks you know I have a soapbox I need

one because I'm short but there is a difference between a medical device and a clinical device we are not talking about the desktop laptops equipment attached to medical devices that Connect into the EHR to know what we're talking about today the devices that we refer to are the ones that are classified under the EU MDR and ivdr what is that it's a long boring document that manufacturers have to follow but it does Define what a medical device is now some of the things we've looked at is a patient monitor every hospital has them if you've ever been a patient you've been hooked up to one going to look at infusion pumps some of the devices we've actually forensically

looked at CPAP machines pacemakers and icds buy me coffee and I'll tell you why cardiac programmers very proud I now own three of them how well that's my secret but there's a whole host can we show you every single one no I'll probably take up your whole day but we are gonna try and cover some of them I think the key thing for me was I never thought about medical devices until Veronica came and she'll explain why and the kind of forensic significance that they might have but also just the security implications from such devices and kind of the vulnerabilities that might exist and how they could be exploited and the more you dig the more

you find and the more you find the more you want to dig to the point that as Veronica says she owns far too many devices now and in each and every one of them has potential from both forensic and security perspectives I mean by show of hands and I won't judge who yeah has looked at medical devices in terms of vulnerabilities I see some had to I'm not the only weird one welcome to my club it's the koi hands that come up that suggest that maybe we shouldn't be looking at them well we made friends of the public sector just in case I get arrested for what I have so now we're friends but what is medical device forensics and

yes I had to bring in some memes because why not any good presentation has them but medical devices seem like something that's foreign sexy new high-tech but it really isn't but before I dive into the forensics of it and the incident response of it potentially and what it holds kind of need to just do a retrospective of where we are and the statistics here were published by the H ISAC this year so this is a study that they conducted now these shouldn't surprise you medical devices have been vulnerable for years it's only now that we're really talking about it now 53 percent of these medical devices H ISAC looked at and analyzed had at least one critical vulnerability

which was remotely exploitable which would impact patient care why is this important well people like me are attached to these devices we carry them with us now if they have critical remote exploitabilities well it becomes a little bit different when someone's standing in front of you with those devices because there are many people just like me that potentially are walking around with a vulnerable device caveat that doesn't mean I'm not going to have my device I will rather have it than not have it but I think a big thing about these is that nobody's really considered the technology before so vulnerabilities seem to have increased I think we've discussed this many times they seem to

have increased and there seems to be more security vulnerabilities now but in all honesty a lot of this is because nobody's checked them before and considered them as anything that needs to be tested anything that needs to be secure it was a medical device that gave support to different people in different ways but now when people start looking at the opportunities that they allow then you can start seeing where are the holes what can we do how can we exploit them

by 59 . now that might seem like an alarming thing to me it's exciting but because it means more people are doing what they're looking at devices I am always in support of knowing what the vulnerability is so that I can fix it right better you know what's broken they know what is you know they're not no now in a general Healthcare delivery organization one of the devices used most is an infusion pump right they almost in every room so 38 of the hospital networks are generally made up of infusion pumps now anyone here know what an infusion pump is right they deliver some serious drugs like morphine or pain medication so generally they are things we don't want to be

remotely exploitable now 73 of the infusion pumps on the market have at least one remotely exploitable vulnerability so you can kind of see that there are lots of reasons why we should be looking at these devices I told you I'm going to have some amazing art AI is fantastic so what are some of the vulnerabilities well these shouldn't shock you too much these are things we've been dealing with in Hardware manufacturing for years right we have 64 percent that is built into the software but lucky for us if it's in software what can we do we can fix them if we know about them now medical devices some might say Legacy will always be a problem and

there is a reason a real reason for it we're dealing with Hardware components Hardware components within a human being's body so how do you do or how do you deal with a hardware vulnerability that's baked into a device you're going to call in the patient and say hey we're going to catch you open because we need your device even though it's a functioning medical device no there was a big recall on cardiac implants and pacemakers a few years back because a big vulnerability dropped and because of a knee-jerk reaction from the manufacturer they started patching without doing a real analysis and ended up breaking patients devices security doesn't always get it right because we did patch happiness ended up

in patients having to come in and have physical surgeries right thank goodness it's not rechargeable by means of that the patient has to recharge the device because some of us would die but you have to go in because it's a whole unit so again just finding a vulnerability is one component fixing it when it's hard way is much harder but lucky for us there is Hardware vulnerabilities and will tell you why it's a good thing sometimes

now there are different classes of medical devices so huia has a health watch like Apple it's considered a medical device because it gives medical algorithms specifically the cardiac one that does atrial fibrillation now in those specific class 3 devices there was vulnerabilities found but I think the more concerning ones is the classes of devices within people's bodies or attached to people's bodies or on Hospital Networks

you can see we practiced this really well we don't ever um so what we need to think about is why are these vulnerabilities important I mean it looks sexy from a security perspective saying that there's vulnerabilities in systems embedded in people's devices that are embedded inside their bodies but from a forensic perspective which is kind of a big part where we're coming from is these vulnerabilities allow evidence to be collected but also because they can be exploited maybe those devices hold the forensic evidence that maybe prove and this is maybe Hollywood style but prove how somebody died maybe somebody hacked into the pacemaker and managed to overpower the pacemaker and blow them up whatever it might be but

maybe that evidence exists somewhere can we capture that evidence is it obtainable you've really gone Homeland there on us so who here has seen the Homeland episode called Unbreak My Heart I think it is what it was called where they have a patient monitor in the president's room you know and they managed to find a serial number and that this whole sophisticated hack that hack is not as unplausible as one might think right so anyone that does medical device Security will tell you that that day people's heads were on fire but no one considers dfir no one considers that these devices that are on a hospital Network might be the foothold onto electronic Healthcare records

we know cyber crime is motivated by what monetary gain how do we get more money we sell data now electronic Healthcare records is one of those things that's very lucrative for cyber criminals to sell and these devices are medical devices they functionally built not to be secure but to give treatment and record clinical data therefore they actually a rich Treasure Trove for cyber criminals now the challenge we have is where do you get the devices from you can get dead bodies and drag them into your office and chop them up and pull out what you need you can potentially go and Source devices from different places and hope that they have information and hope they have

weaknesses that you can try and exploit so in South Africa it was fairly easy I had a friend at a coroner and by law they have to remove the Pacemakers and icds because they go boom in the incinerator so have a friend call a friend is helpful otherwise your old friend eBay shockingly enough very easy to obtain devices now I think I've got the algorithm down to sit and wait now I am not saying do as I do in fact I'm saying don't do as I do at home right that's the disclaimer but most of our devices are sourced currently since moving to Norway from eBay and one might think that you don't get devices from there but you do I don't

know if they fell off a truck don't care I just want to be able to hack into them do some forensics and take it from there now you'd think forensics on medical devices if you've never performed forensics it might sound a little bit Fantastical if that's a word potentially but in terms of medical devices can we use the same techniques do we have to do Advanced Techniques do we have to use the the fancy things that we saw in a keynote this morning well the answer is kind of no realistically what we're looking at is normal technology normal being it's technology has been around for a long time it's just maybe used in a different

way and therefore we can extract the same data from them as we do from computers mobile phones applications whatever it might be and still get all that data back and make use of it we can use all techniques just in a new way I think that was probably one of the dis most disillusioned I was when I looked at pacemaker data the first time because it looks sleek it looks like it's got some new technology because it uses machine learning it is able to keep my heart beating it's able to keep me upright all that amazing thing and then you open it to come to disappointment that it just looks like a micro PC like

microcomputer with some components and it's not as fancy as you think it is but it's important understanding what's under the hood because people are intimidated by doing work on medical devices and that's the purpose of this talk is to kind of demystify these devices and say that it's actually accessible and easy well not easy but it is doable but don't tell everyone otherwise we won't be able to get funding for future research Okay so forensic Sciences we have to come up with a process that kind of sort of follows a pattern that's repeatable reproducible and that we can document so there's three things that with every device that we've looked at that we've kind of had to use

we've had to up skill to be able to do some medical engineering or just normal engineering but having to look at the device to figure out is it still working as it should there's no use doing forensics on a device if the data you're generating is coming from a flawed device it doesn't help us document what's on the devices but it serves another purpose it helps us reverse engineer the hardware to potentially understand how we can interact with it because obviously there's many vulnerabilities on these devices that can be exploited however we need to find the door in as was said this morning jump the fence that would leave the least amount of breadcrumbs behind because it is

forensics then we get into the forensic site how do we extract that information so that we can make use of it and present it and analyze it and use that in wherever we go in in the future now you're going to think these medical devices are implanted they're embedded if somebody dies because Something's Happened on their medical device then that's a criminal case and therefore we have to do things properly because it's going to go into the courts later on so we have to consider just as we did on the keynote this morning all the different steps of the forensic process how do we extract how do we analyze and how do we do it in such a way that it's

repeatable now medical devices offer an additional challenge of it's not always repeatable sometimes we have to really destroy the device and we've seen destroyed devices to be able to get the data back and somebody else won't be able to replicate that later that is just a challenge that we have to deal with in the future but I think it's a challenge that iot sometimes faces as well is that by means of actually getting into the device we end up destroying it we void warranties I've realized now that every device I have opened there's no way it will ever be used as a medical device again it's labeled for human for non-human use at Big Causeway I'm not certified to open

them up if I was certified to open them up and I could do it in a process that retains the Integrity of the device that's different but that certification will have to be done for every manufacturer every device which as you can see for digital forensics won't be a real achievable I mean I'd love to but some manufacturers are nicer than others and will play ball and some just generally don't but the final one is probably how do we visualize the data right because it's Medical Data it's not data as we know so unfortunately we can't use our nice commercial forensic tools they don't know what to do with the data because it is in unique formats

they come in different shapes and forms no medical device is exactly the same there is a pattern but they are different so it's coming into creative ways of either building your own visualizations or using clinical open source software to visualize data to reconstruct what their data is I think this is one of those it's a team player thing because not everybody will understand how to get the data from the device how to connect to it how to get it on a low level so that we can perform the forensics and then in particular understanding the clinical side of things I mean some of that data I think you need to study a long time as a

doctor to understand parts of it and to make sense of it so sometimes we'll have to pull in other people to assist but I think the important thing is forensics people know how to work with data right there's lots of instances of data that we don't understand what the data means but we can take it visualize to someone like a medical examiner they can then look at the data in a format that they used to so I get asked this question regularly why are you so weird that you want to do your research well I have a device I'm curious about it because if someone's going to get killed it's going to be me right now I'm joking but it is the

connection from the cyber world to the real world so it matters because people's lives matter and it's only a matter of time before we see this exploited in the wild so let's look at a device let's look at our first device which is a imported cardiac device what this purpose of this device is it'll kick you in the chest and wake you up when your heart decides to misbehave in short okay that is what's in my body well an upgraded version I'm obviously not going to show you guys what my device looks like like I said a room full of hackers no no no I know better

well sure I'm happy to be the ethical um what do you call it research Bernie that you can try and attack my device I might even tell you how no I won't so what does forensics look like on an implant right we know DD we know raw we know year one we know all those kind of forensic images hate to disappoint you that's not what you're gonna get so depending on the manufacturer so this is not applicable to every single implant because every manufacturer has their own way of doing it a lot of them has a what we call a PCB file the programmable database file that contains all the information most of this information currently is

clinical so how many time was treatment given the firmware is on there so it is not in the traditional format that you would expect it to be it's in a proprietary binary format currently the easiest forensic way to get a full capture of that without having to destroy the device or Take It Outside a human body is to use a cardiac programmer they have the ability to actually grab that full binary image but then you kind of stuck because it's a proprietary binary format now you have to engage with a manufacturer and that goes right about 50 percent of the time so it's either a maybe or a not um European manufacturers tend to play

ball nicely the US lot is a little bit of a different kettle of fish but this PC this pdb file is super important because it tells us whether the patient receives specific treatment it tells us what the settings are on when last it's been changed but that's basically the extent of an implant I think with medical devices if you've ever done foreign forensics and we're talking uh Nokia 1100 Samsung d500s that kind of age of device we're probably at that same stage now where there's lots of different variety of formats each and every one of them essentially needs decoding in some way so if you love hex then this kind of stuff is perfect for you because you can

pull it apart and find out all the evidence that's inside of it and not many people can currently yeah but before we move on so you've seen the device that's a cardiac implant you get similar devices like that that goes for deep brain stimulation they are not quite the same as a cardiac implant there's one key difference the brain stimulators actually are rechargeable which means they aren't restrained to the amount of storage they have so some of them might actually have a file system or an OS on them where the cardiac ones are reliant on x amount of CPU Cycles so they can only do certain amount of data storage for a certain amount of time with some

security sadly because we don't want to have a new device every three years I like going in once in 17 years but that's how long this device lasts this is an interesting story so this device is my dad's device uh he asks himself the question what he did to deserve me as a daughter but we've built a whole scenario around a patient dying using his data now he kindly gave me the device after I was done with it I just think he didn't trust what was done on the device so he didn't think that it would work anymore but the CPAP machine you would think is less foreign than an implant right lots of people have CPAP machines

so basic little tools that we needed was a specific Torx crew which is one of the things you find that you need and then depending on some Hardware to be able to do some Hardware acquisitions now the disappointing or exciting depending how you look at it the dirt is on an SD card it's on an SD card using the standard file system that we know how to analyze we know how to extract it's nothing new it's disappointingly nothing sexy however it is easy to get hold of we can use traditional forensic tools and techniques to extract the data to analyze it to interpret it and do some parsing of the information yeah I think the shocking

that the claim is made within the FDA documentation as well as the manual that they offer secure delete now if you living in the EU you know the right to be deleted the right to be forgotten gdpr and data limitation so I decided Well in all things in science let's see if we can get some more SD cards on eBay and see if we can do good old-fashioned data recovery now the findings on that study or well I say study a little project that we had we found that it was not secured Elite but just a basic formatting that was being done so if you know the fat file system you know data recovery can be

fairly simple so again we were able to actually recover patient data from previous patients that sold the cards now the machine itself depending on the version you have may contain a Linux file system good thing about this it is encrypted both at rest and in transit but for the sake of not totally destroying the device we didn't want to go take the board apart but then emlin lost them because he took his eye off his luggage in Germany so the device is somewhere on eBay for sale but it wasn't just my device it was all my clothes and everything well that's a different story um as soon as you touch this device as soon as you start opening up you avoid the

warranty which means it's no longer usable for and fit for human it's going to say consumption uh yes and that's a challenge in itself I mean the SD card holds information that's really of interest that patient there to the recoverable information that's perfect but to get to it we now avoiding the warranty which kind of brings us back again to the early days of phones and and extracting that kind of data so in terms of going to open them up and up but in a real-life digital forensic situation are we in the business of taking devices that are potentially super expensive like MRIs off the market because we've now done forensics on them no I don't think anyone's going to be

happy with at least the hospitals won't be because those are machines that they replace once in a while so the key thing to This research is can we get in without compromising the Integrity of the device itself nine terms of this one we had a spare one that we wanted to destroy and open and we found that it had the ability to connect to the board itself which a lot of these devices have now Security Professionals will go why did they do something like that well because Diagnostics has to happen debugging has to be happening and technicians need to be able to debug and figure out if a device is working as it should now that

means once you've got physical access to devices that become super easy to get into them because we know potentially this is a component of the design so a lot of the times this is just to give you an idea we have to face many decisions like how are we going to go in which way is going to be least destructive which way is going to yield the evidence or the information we want to have I wish I had the answer to say to you it's X but the forensic answer is it depends on the device but if we can be lazy we'll be lazy and so we'll just look at the SD card and from that we'll pull out a variety of

different information that gives us everything about the machine its settings its patient data structures of treatments that have taken place which includes so when we opened the device we found an EDF format now EDF format is a format that's been around for as long as I have and they have just given away my birth year but it has been around for ages and you find well documented things in sleep studies because what is a cpaper machine doing it is monitoring your sleeping pattern as well as your breathing so in applied data science because there's a lot of studies done we could find a documented structure and it is in ASCII so it's totally human readable and in

plain text so you can imagine on those SD cards that we recovered the data we could actually do a trained analysis on who the patient is how they slept what therapies they get what device it came from and all these very nice pieces of information now forensically these things are important because we can determine whether anyone is tampered with data but we can also determine whether or not a patient received the equal clinical therapy that the setting says it should receive so if someone's changed the settings there is indications that it's happened so we'll be able to say that a patient did not receive appropriate care again I'm lazy emlin's lazy so if we can

find code to do what we need to do we will again went to the applied data science space and found some nice python libraries that pass this information out and gives it to us because we we like to do things easy but why reinvent the wheel so using Matlab if you really want to you can actually put out the signal you know the medical signals and determine the Sleep patterns or you find open source software that does it for you now in the CPAP Community when you do a little bit of ocean and research you actually find that patients got fed up with the care and the denial of denial of access to their data that

someone built a tool that parses the information so that you can make informed decisions right and this has been maintained and is freely available and can be used to actually visualize the data now our students at North University College actually does medical device forensics and this is from a scenario that I built around tricking my dad's machine to not sense his breathing so I could simulate him dying and put him under a top and take photos um but we are able to say what is the settings and the fact that the machine didn't respond when he had an event insulin pumps are the same there's a huge underground community of patients that have built software that allows you

to read data orphan insulin pump so essentially it uses the fat file system it's on an SD card even if things are securely deleted it's all perfectly recoverable and we can find information about past past patients and all the information that was stored on that about their treatments the kind of system that was used and also the pii so information is very specific to them you need specialist knowledge I think to be able to parse that information this stuff it means nothing to me Veronica spent a lot of time looking at this but it's just about sleep patterns but the medical experts will understand this and have more perspective on it yeah I might be the weird exception to

the rule I want to understand how these things work I necessarily won't go testify and quote just do what Eminem's referring to is knowing where your line is I did consult on a case involving a pacemaker where in a murder case uh short story because I can't give you the long one is the fact that I was able to verify date and time on the device I was able to verify the Integrity of the report that was taken but I was not allowed even though I'm a patient since 19 and I understand these things at heart testify to what it meant whether the patient receives the appropriate therapy a medical specialist was brought in but because we could confirm the

Integrity of the data it made his job a lot easier this was probably the most fun I had because I went totally old school it's a device I've never come into contact with in South Africa it's a patient monitor network bridge so it bridges into the hospital Network and connects to other devices because what can go wrong one would assume that because it's a network bridge that it is fairly secure now if you look here we can see that it's running tortured eggs kind of gives it away so we know one Avenue to go research but we also know that they've got some proprietary software on there I think this fits nicely back with the

CPAP machine in that there's a hell of a lot of information out there that you can find just by Googling everything from people that have decided to pull apart the system before to look at a file data to understand and interpret that information and build tools that will parse that information to people that just want to share people like Veronica who just want to share information out there about what the device does what it's capable of and what it's made up of so this is where you're Googling should really be the starting point for a lot of these kind of research purposes so I bought this device shopping I attend to do that half asleep

and then ah since not impressed because it's just another one I'm like don't ask questions first thing I did was I did a good old-fashioned Google I go to the manual now one might expect these manuals aren't readily available midwench is your friend every technician and nurse practitioner puts these manuals on that they've scanned from their hospitals because what happens to manuals they get lost right so sharing is caring in the medical world and those things contain a heck of a load of information it's actually overwhelming the FDA publishes some of the documentation uh eumdr does as well so if you read Around the redacted black areas of things that's proprietary and confidential you can get a gist of what

is going on on this device now once we did the research and I identified which board it was I mean it gives me access to a lot I found some known vulnerabilities now everyone should be or perhaps isn't with this spectrum meltdown attacks now this is on a hardware board so this means that it's not something that can easily be mitigated now two out of the three cves this board was impacted why do we care about this in forensics like I said this morning it is our hold into a device it's the hole that we need to climb through now this is just one example most of these devices when you start looking into the boards you will find a whole

host of vulnerabilities on them and I think this kind of research is where the fields of forensics and security really cross over so looking for vulnerabilities exploiting vulnerabilities was not necessarily something that was ever in forensics but it's really important to get into these new systems and new devices which is why the community needs to grow and expand and share information about how to break into these systems how to then attack and steal the information not only from the perspective of someone exploiting it for nefarious purposes but also for legitimate legal purposes so before we so just to be clear so one of the things I belong to the biohacking village in Vegas and I am the calvary

which does a lot of This research ethically by the code of conduct that we have when we find exploited vulnerabilities that haven't been reported on we will report it responsibly to the manufacturers now that might not sound as fun but because we are dealing with human beings and potential things that can impact harm we do tend to report these as we find them and work with the manufacturers on fixing them so we are very much for responsible disclosure none of these that we are showing today or the methods are undocumented these are well documented things that others have found in these devices so in terms of this device it soon became apparent that perhaps we weren't

going to be able to use our forensic tools I was very shocked to find a bunch of USB ports at the back and decided well what's going to happen if I put a USB mouse and keyboard in am I going to be able to break out of the kiosk if you've ever tried to break into kiosks or get them to go sticky keys that normally your your friend in this instance this was really literally pressing the Windows button and there's Windows saying hi I'm yeah what can I do for you but then doing more research I realized that this was a version of Windows I haven't seen before because it is embedded windows uh who year has ever done PDA forensics

that's how old school it is so it is I think Windows ce5 on this specific one but the developer guides gave us a lot of information on potentially how we could connect to this device as you can see we've got ethernet USB that we already saw without opening the device gpio uart L Square C and it had an SD card they tried to hide it under the board but it was quite apparent that there was something there so kudos for trying this thing was very perplexing because even when I did the forensic acquisition of the SD card it soon became apparent that it doesn't have a C drive it doesn't have a root drive it's not the

traditional embedded windows that I knew so I did some more research on the developer tools and realized there's a whole host of tools that offer me access into the device why because developers need access to program these boards and work with them so what did we do if we use the remote viewer right it's not very sexy what it does it takes cognition from your PC to the device and allows you to use your computer as a screen but it gives you access to the operating system which allows you to do a whole lot more and I didn't need any password at all it was literally just identifying the IP address of the device and then

connecting these two pieces of software up now we did the acquisition traditionally I'm not going to tell you for the SD card how I did the acquisition that's boring but we used the update tool now this specific board has a tool that allows you to do a cold boot a warm boot or actually save binary images of the data contained so the operating system the file system the bootloader and all of those there now this is probably the closest we're going to get without having to rip the board apart so that is this is what we did it's a binary image right it should be readable said was my realization that it wasn't because it was also an operating file

system that I've never encountered who here has encountered tfat yeah I didn't either until that time so everything was a proprietary binary format that was not in any ways or means viewable but reading a little bit more I was able to build the same Visual Basic environment and use open source code to translate the binary images into file system files this gives your logical files and we can then use our standard forensic tools to do the analysis um same things were found on the SD card may be slightly simpler to get hold of the SD card data but the information is there and we can do standard analysis pull information out we can find things

inside of the the registry because it uses embedded Windows the registry is there giving us things like typed urls uh information about systems or applications that have been run also so one of the things once we had that and we had both images we could compare with fuzzy hashing and hash values to confirm that what we're seeing on the SD card is in fact a backup and not all the data so that we can verify it but this acquisition was done without ever opening up the device now one might think Windows registry they all feel and look the same embedded windows are slightly different it's a compact version so your registry editor that you're using now won't work and as such

we needed to make use of an older version of registry Editor to actually look at the data so these are the core findings from the Windows registry that we were able to recover while some of them are useful there's not a lot but this is because it's a resource constraint device we have the machine good or identifier so we can identify each machine uniquely we had limited USB information like I mean three or four entries and not to the extent that you can get today typed URLs will also first in first out so not a lot noted again resource constraint devices again traditional forensic techniques can be used it's nothing new it's just used slightly differently this

information was easily extractable I think it's fair to say it was well documented there was information all over the web that you could search for to find and that seems to be the case for a lot of the medical devices and making use of that open source intelligence in the first place gives you a lot of access to the information that's in the back end again the fact we would know it in forensics it is sometimes just a factory reset or clearing out of you know the root directory it's not really anything comprehensive so General data recovery would work how are we looking on time uh you can leave some time for questions if you like well we're gonna finish up

and then we will take some time okay so future research a we've got a heck of a lot of equipment that's just come in that we need to look at uh I love the fact that I'm paid to actually look at Medical Equipment it's amazing um it's probably an unhealthy Obsession but we have more devices that needs to be examined uh up until the point that I get banned from Ebay that is where I will Source my devices and then I'll find alternate routes but the acquisition process is fairly still Hardware based right it does leave some destructive elements and it is going to be that way because of how the devices are manufactured and the fact

that these things last 10 plus years the newer devices are actually according to the FDA cyber security guidance now having to come with the ability to be forensically acquired that's going to probably be another five years before we see those things in market so the future is bright and we will see more capabilities to do that but what we see is a range of different operating systems and a larger ecosystem so whilst we're talking about physical devices now what we also need to think about is the applications that connect to those devices are they on Android are they on iOS we can then again use more traditional forensic techniques to extract that data and it is so rich

um he's terrifying I think about how easy it is to gather that information and make use of it once you have access to the physical device the operating system is your friend it will give you all the information and a lot of the time the manufacturers don't understand the underlying operating system or even the file system because the rules of how data is stored and deleted is is handled by the file system and the operating system plays a role in that so whilst they build applications on top of these layers the rules still apply from the file system to the operating system but it might be shocking to know that my cardiac patient monitor is running on

Android and it's got an EXT file system it's things in forensics we know so it's nothing new it's nothing strange but I think the key finding is we need more people to do research into this right we need that the task team to take on board and actually consider medical devices as a problem now we have to acknowledge the medical devices does not have a functional requirement for forensics and security by virtue of this they're not ever going to be massively pushed to be secure I got asked the question V would you ever want a username and password on your implant and the answer is no why because when I'm having a cardiac issue I don't have

time to give a username and password to the nurses and doctors I want them to have access immediately so availability of devices cannot be overtaken by security but these devices will never be the linchpin within an investigation there will be something that adds information that maybe gives a timeline or a suggestive intelligence that we can then use and correlate with other information so we're not suggesting that these are going to be the golden bullet but they will assist and they are a different source of information that is generally overlooked at the moment it's more a circumstantial evidence it adds to the case and the same with the implants that have been used in court cases they're

not the determining Factor they simply in support of other evidence now how do you get started in this a lot of oceans you research right if you can Google you can find a manual you might even find it on the manufacturer's website where it asks you are you a doctor and if you lie good enough and you just say Yes And You Lie from your location a lot of these manufacturers give you the technical manual the operator's manual and the clinical manual what more do you need to understand what the device is doing without ever opening it the caveat there is do not lie Veronica is just suggesting one way of gathering information and she is not

encouraging people to lie on applications I think the biggest thing that I've learned from these things and that this was kind of the inspiration for my PhD is the logs suck on these devices they absolutely don't contain any robust logging of any means or form the logs are riddled with vulnerabilities we saw lots of things where sensitive data has been disclosed let me tell you every medical device as I stand here before you today contains Logs with printed plain text JWT tokens uh you know exposed URLs app keys and secrets patient data because they are built with clinicians in mind and they build with debugging in mind not for the purposes of security or incident response

and with that you are rid of US unless you have any more questions [Applause] thank you Evelyn and V I see a hand at the back yes thank you very much for the talk very interesting I was wondering if there's any personal health medical device with a wireless module say Bluetooth that you managed to find a fully remote exploit too because that's the sexiest part right so I have a unique take on vulnerabilities I'm going to stand like this because the I can't see you uh one of the things is yeah we found I mean we found Bluetooth attacks we found wireless attacks heaven knows my device has got what they refer to as Telemetry C which

is Wireless which communicates in plain text I have RF as a backup and I have Bluetooth right all of these modules will become vulnerable at some point that is the sexy part of it but I think where I get excited is can I tell what I've done to compromise the device is it you know I find a vulnerability I exploit the vulnerability but then I want to do the forensics on it to determine whether or not there's logs there's artifacts I only have to get in once but to fix things and make things more secure and do forensics is a bit of a continuous challenge and that's kind of where I enjoy doing what I do but if we find vulnerabilities

we do responsibly disclose it and we do disclose mechanisms in what we find for forensics like if I find no access to logs that I've connected onto that is something that we will tell the manufacturer and suggest that they add in in future releases if possible any other

so some of the devices I do have I legally cannot yet publicly disclose some of the vulnerabilities due to the fact of the manufacturing working with them to fix it because I'm under contract so whilst I'd love to be able to do these um demos legally I don't and I am a you know visitor in this country so keep watching the space when that time runs out on when we can publish we will publish heart so what I was left wondering is um speaking of the cardiac programmers you said you can basically dump the data from a device can you do that covertly or do you have to be in such a close proximity that basically you can't just

sniff everyone's data around the room so I'm going to tell you why I'm not allowed into a hospital room alone anymore in South Africa we will manage to do a man in the middle attack go with exactly that by you know just sniffing the authentication between programmer and patient um a the doctor didn't assume anything else of me because they don't think that it's something to be worried about the old Telemetry that most of these devices have as a backup is in plain text right so if you at this appropriate time is able to sniff the traffic you get the golden authentication token that means that every device that has from that manufacturer you could do a replay

attack there is actually well documented uh research on that from other manufacturers I can't say their name because I'm going to get into trouble again but there has been a mass recoil on things like that I remember attending blackout where a friend of mine Billy Rios was doing an attack with a programmer that they were able to put malware on because a lot of these cardiac programmers run a Windows variant But to answer your question the close proximity Is Still required generally speaking actually not that close anymore not but then yes the range I think is 50 meters now and increasing ever so often as technology increases but we do call them the stars have to align attack right

it's not just that that has to align it's a bunch of other elements which makes these attack attacks still unlikely it's more likely that my device accidentally goes into what is referred to as vbi mode which it thinks it's being debugged and it's constantly going to go we have a lab on campus that interferes with my device and actually puts it into vvi mode because of the electric interference which means it could potentially drain the battery there's other attacks of magnets being used to put it into debug mode but like I said they find something they improve upon it in the Next Generation but if you have a medical device it's going to be vulnerable if not now

in two years when someone has got nothing better to do and they found a vulnerability it's a thing we need to become comfortable with time for one more question anybody I have one question um more on the forensic in general um tool validation has been a big thing in some jurisdictions on the admissibility of forensic evidence so yeah of course in decades uh known by in case ftk where the big two in North America anyway is tool validation a problem when you're talking about Gathering evidence uh with various open source tools and developer kits and so on show that it is real repeatable and that you've tested it numerous times then it becomes easier to be admissible in court it is not your

traditional forensics right whilst the analysis is the acquisition can and will be disputed in court this case that I consulted on in fact uh the manufacturer picked up the device dumped the data and destroyed it because it's biohazard right but in any investigation we would never destroy a device that we're looking at but that was their process and it was approved and the courts accepted it as such so it's still developing we're still making lots of mistakes and the more it gets tested in court the more we'll be able to refine the processes but a lot of these things probably won't pass the test for going to court yet if I'm honest with you

the device is just so we can get the validation done appropriately thank you very much B and emblem [Applause] and now we have a break remember to pick up your T-shirt if you pre-order it and get one already and you can get uh blue t-shirts Pay What You Want see you in 15 minutes

oh yeah so this was great watching through a bunch of these on Forex speed looking for like there's something I can use all right screenshot seriously I can use all right screenshot

players it's mostly a roadrunner and Riley coyote

but he also never gives up

two minutes

oh my God

thanks for those of you who have joined us for the next round of talks after the break people are slowing down as the day wears on you've been flooded with information and cinnamon buns so Ian Fox is a fellow Canadian from Waterloo Ontario and he also did his first talk at uh at Defcon last month in the OT Village ICS OT scada yep so supposed to oh hey there we go take it away Ian excellent all right thank you very much um yeah as Ryan said I'm going to be talking about OT ICS scada whatever you want to call it this is some work that um that my team at work has been doing and

I thought it was interesting so even if it's not directly applicable I think to what you do I think there will probably be maybe some some ideas that might inspire you uh as you said I'm from uh from Canada and moved to Norway because apparently I'm fond of cold places uh working for an OT security startup here called Omni and uh so you've heard me throw this word or This Acronym around a lot at this point what is OT uh so if you're not familiar with it uh OT or operational technology operational technology is basically anytime that you have uh computers and digital systems interacting with the physical world so maybe that's in a factory setting like

this maybe that's in with some medical devices um maybe you've got like a big uh robot trying to catch your enemies um and it's it has a lot of similarities to it security but also a few things that are different uh similar to with the medical stuff that we just saw a lot of it's very old hard to patch not necessarily designed with security in mind uh and that can lead to some interesting challenges uh particularly what we were looking at was uh context and visibility for risk assessment and for instant response things um so in a lot of these networks they were built like 20 30 40 years ago so maybe you have some Network sensors that

you can see you know your standard like I.T things in the network but there's going to be gaps there and if you're an instant responder and you see an alert come in on an IP address maybe you see that it's critical but you you don't really know why uh you you see that but it's fairly opaque and so we were trying to figure out is there a good way that we can provide some more insight into this and maybe show someone who's responding well is this something that I really need to care about is this controlling you know like the the main drill on like an oil rig or something like that or is this

controlling you like the AC unit in the break room uh it's if you're an instant responder that that can be kind of Handy to do um or the the other uh sort of thing that we were trying to do here was help with some risk assessment because if patching is really hard and patching is expensive you want to really focus your efforts where it's going to have a meaningful impact um so what we found was that these Network sensors are really good for the I.T stuff they can tell you there's uh we see the services running on this IP address it's it's got this vulnerability not so good for showing you what that thing was actually doing and that's

where we started looking to some alternative sources uh so the example here maybe we can see that you know there's this controller thing and we might have some guesses as to what those wires are going to but we're not quite sure so where where can we else or where can we look to try to fill in these gaps uh and as you might have guessed from the title uh the answer turns out to be a lot of Docs there are some other tools that we have as well you can go to the site and actually walk around and see maybe you can do like some nmap scans or something like that uh though in OT you have to be

kind of careful with that because again a lot of these systems are very old very brittle uh something as some as simple as running a scan uh too aggressively could knock the whole thing over uh so you want to be careful with that you also might interview some site experts but again these guys are very busy takes a lot of money a lot of time and so we wanted to try to build up as good a model at the site as we could before going to these sort of more expensive methods uh and to that for that we turn into Docs so our plan is to convert these docs into a machine readable format uh build

a graph from them which I'll talk a bit about what we use that for and then use that to answer some questions either about risk assessments and and trying to pinpoint where we want to spend effort or on uh helping give instant responders more an idea what they're dealing with there are a few types of docs that we'll go over in specific uh some of them you're probably familiar with things like Network diagrams or risk assessments or functional specifications which are basically just like a word doc describing how the system is supposed to function uh others are a little bit more specific to the the sort of Physical Realm things like P and IDs are piping and

instrumentation diagrams and they look like this each one of those uh icons there represents a an actual physical either like a valve or a tank or a sign like that and then the lines are going to be frequently like pipes between them and so that this is laying out how the whole process that this whole process is is going to work for that particular part of the plant and there's a lot of good information that we can pull out of these in particular as as mentioned those symbols are generally kind of standardized so that's might be a a valve is the little thing in the middle there uh a lot of them have tags which are

labels associated with them and as we'll see in a second there's some semantic data that we can pull out of those uh and the relations between which valve is attached to which tank which tank is controlled by which uh which computer is is of Interest as well uh tagging is the first thing that we decided to look at because uh pulling text out of these documents is is much easier than trying to understand the uh the actual picture information at least as a starting point um a lot of these were actually just scans of of PDFs but even those you can at least do OCR and there's some pretty good tools for that and these tags uh once you

uh once you have them in text format you can actually pull some some information out of them and it'll probably look different from site to site but for for sites there is generally a pattern to it so for this particular standard that they use for this site it's laid out into which system which subsystem which part of that subsystem and then even has some information about uh if there's any parallelism and which instance of that item it is so when we're looking back at uh something like this maybe this is like from we can see it's a valve and maybe it's valve one versus two and you can see sort of where it is on the uh the thing there

in this case uh 84 is maybe it's the electrical system and elj is the code for a junction box which uh has something going on uh so the first step that we did was we took these docs we put them through just some standards like off-the-shelf OCR tools and then you can start doing text search on them as as you noted the uh or as you might have noted the tags are very very standardized which leads itself well to regular expressions and so you can write a pretty easy script to say given a tag look through all the documents that we have and find Which documents are related to this piece of equipment because the tag identifies

like a piece of equipment or given uh given a document give me a list of all the equipment that it talks about and we're dealing with a lot of docs usually so this is um something that already can start to narrow down your your search when you're trying to figure out what's going on with that particular system uh and of course we've got to use llms because those are the new buzzword uh and we can use those with especially the more sort of text-based uh functional documents uh where it might describe what uh what operating system a computer controlling a valve is using in just like plain text and so we can start using some again

off-the-shelf tools like a lang chain and uh like llama index just to start building up uh the graph which which we'll see in a second template matching with just like some standard opencv stuff can also be used to start matching the the symbols in those diagrams so here's an example of the the obligatory llm can the system that is associated with this tag run with one of the meters offline which is maybe important for your your risk assessment and it says yeah we found this document and it the the glowing green chemical thing can run uh even if one of them is uh is off because it's got one built-in spare and this is really what we were trying

to build uh and one of the things that I think is is most interesting and more broader than uh just OT sense um is that graphs can can do a lot for you there's a lot of really good tooling uh available around uh doing queries and doing sort of reasoning and algorithms on on graphs uh and so we wanted to to try to take advantage of that so the graph that we built had nodes for a more traditional I.T things like network devices that are running instances of software which is affected by cves and are connected to networks and then from some of the diagrams that we had like those P IDs and uh and other similar ones we can see

which physical things they're connected to uh which functional assets flow to which if there's a a tank that's flowing to a through a pipe to a different tank then if you can affect the Upstream one then that that'll have an effect of the downstream one that's good to know and from some risk assessments that we found we also found that we could tie these functional assets to consequences if the uh the tank holding the fuel is ruptured then people from uh from the companies have probably done a risk assessment and put like a a severity on that of the you know there's this a high environmental consequence or something like that uh there was still a lot of uh of manual

work here but using some uh some python we were able to reduce some of that boilerplate and then we basically just put it in a a commercial like off the shelf graph database so a lot of this I think is uh not too difficult to um to reproduce and I'll have some uh Links at the end to some of the the software that Sandy if you want to try doing something like this yourself so a very small segment of the graph ends up looking like this you can see that we've got um the sort of turquoise nodes are our computers uh PLC in this case stands for programmable programmable logic controller which is controlling some robotic arms a

conveyor belt a self-destruct thing because we're looking at the Acme TNT Factory it's got to have one of those by law and HMI is a human machine interface but again basically just a computer that's connected to that Network there and so we can start asking these graphs uh some simple queries like if I'm on the HMI which uh which physical devices can I control and uh there's a the graph database does that pretty quickly and it can find us from it translates that from the the query down the bottom there and so we can see all right if I'm on this HMI I can control the robotic arms the conveyor belt and the self-destruct thing or if I'm trying

to do a risk assessment on a particular piece of equipment maybe uh I'm looking at the self-destruct thing I can ask the database which uh which uh it devices have a path to be able to control this as you can see how that would be uh useful for some of these higher uh more destructive physical things we can go even further and tie this to things like consequences from those risk assessments so if I'm an attacker and I'm on that HMI I can say show me the the fastest way to um to cause some kind of consequence that would be bad

but more exciting than that would be to actually set some agents loose on this graph and see what they can find so uh with again just another simple python script you can have it start at one of your um when your computer nodes and have it look for other any networks that are connected to that device other devices on that Network and then find cves on those devices which are remotely exploitable through the network and sort of build up your your uh graph of things that you have control over and if they have a cve which gives you code execution you can keep pivoting from there if it's a cve which can cause an hour of service then you can

potentially disrupt whatever physical assets are being controlled and so you can sort of build up this blast radius from each of the devices that that you can start from and if you do this to enough of these you can start to build up a notion of which uh which nodes come up again and again and sort of maybe try to pinpoint uh which specific cves are going to be the most impactful to patch if you want if you can only patch like three things maybe we can find some hot spots in the graph and uh try to get the most bang for our buck when at uh patching because if you if you run the whole

thing and you you can see what the most valuable assets are take a few of the the cves out and run the whole thing again you can compare the cves in context and actually see how how much uh more uh more secure your network is afterwards rather than just looking at the uh like the CVSs score and there are tools that do this as well uh so we built one but there's also a couple of Open Source ones from uh miter which again will be linked at the at the end of this and they do basically the same thing uh so here's an example of that um we can I told that we're looking for uh

routes from the engineering station in this chemical process plant to anything that will have a high environmental cost and you can see that it finds some devices uh it looks for the networks on those devices goes and discovers a bunch of stuff and eventually it finds the the self-destruct flamethrower that's reachable through some steps gives you the attack path there exploit the cve pivot to this thing and then you can control it and it has the environmental consequence of high this was the I think I already sort of went through this verbally but um again basically if you run this differentially if you run the whole thing with different uh different aspects in the graph maybe if you put in like a

firewall or if you take out some of the cves then you can see how specifically that that'll affect how difficult it is to find these paths and how easy it is to get from a likely entry point like the internet to some sort of unwanted event like a high environmental consequence from a risk assessment

um but this is still pretty pretty early days there was a lot of manual work here which we're trying to get uh get down um the diagram parsing in particular is a sign that still requires a lot of manual labor and so we're looking into some more uh computer visiony things to try to deal with that of course I'm sure there's more we could be doing with uh with LMS going into the future and a really big one is connecting to more data sources to deal with drift because these documents are generally there are some from when the the equipment gets updated but of course when you're dealing with systems this big and having contractors come in and

and whatnot things are always going to change and that's not going to necessarily be reflected in the docs so something that's still to be tested is um how uh how close these models actually get to uh to real life and that's another place where I think it's a good starting point but you still do want to use those other tools like the site visits and interviews to sort of refine it afterwards uh these are basically all the tools that we used in the process and I'm happy to if you're interested go into sort of more specifically how we change them all together but I'll just leave that up for a second um cycat the one at the bottom there was

the the miter one that was doing the attack paths and that's kind of fun to uh to play around with but the uh the two things that I really want to stress some that I thought were was most interesting about this was one there's a lot of untapped information in docs especially when it comes to some of this information that you're not going to find just by doing like a network scan or something like that and I think that's probably true some other areas as well where there if you think of of all the information that you have available to you when you're solving a problem uh frequently there will be some that's a little bit harder to get to but but

might give you some good Insight uh and two uh graphs are great for modeling complex systems there's been a whole bunch of really good uh open source software made for these and uh like Theory done on them with algorithms and things and I think it's uh a tool that we should not be forgetting in this world of you know llms and there are lots and things we we've got some other tools that are I think better suited to design problems um so I wanted to say thanks to the people that were doing this work to me or with me um and that that's uh all I have for now so I don't know if I have time for

questions but I'll be uh sticking around for the rest of the day as well let's move on but come see Ian thank you Ian next up is Andre Lima so for those of you thank you very much for those of you who are on the red team side of things hacker code cracker weed whacker uh you're wondering where is all the red team stuff well Here Comes Save The Best For Last maybe if that's your jam uh Andre Lima is a red team operator and researcher from PWC uh we're very happy to have him back again he uh he was here last year and uh his talk was uh one of the most uh popular

according to to the community feedback and we got to have that red team we got to scratch that red team itch so uh check out his uh his bio on on our website as well and you can find a link to a number of his other talks and research he's very involved in the in sharing and giving back to the infosec community in various ways so

we will in any moment dive into automation for red teams how to improve our workflows because we are simulating attacks as red teamers in a Time boxed engagement and the bad guys don't necessarily have those same time constraints as we do uh in a in a Consulting setting or as an internal red teamer inside a business wireless password is skog stim 20 exclamation would you like me to type it in for you introduce yourself Andre um so my hi everyone my name is Andre Lima I um I've been a pen tester red teamer for more than 10 years um and yeah really enjoyed doing the research and everything and thank you let me get this out of your way give it

up for Andre [Applause] just bear with me for a second

there is a weird delay between the models and anyway uh oh okay so this is a red team uh automation infrastructure and payload development um my name as I was saying uh is Andre Lima I work at BWC Norway we have been doing this for 12 years a red team operator researcher uh have worked around the world uh in Portugal Australia and now finally settling down in Norway for sure um and yeah you can reach out to me on LinkedIn if you want to check some of my previous presentations that's the link and if you want you're interested in what I do and want to follow just uh I'm on X as well so still weird to say that

uh also I'll be sharing the slides in the end so don't worry about taking pictures and all that stuff you'll have those in the end so yeah today we're going to talk about because I understand there's a mix in terms of in the audience in terms of experience so I'm going to talk do a little intro on red team infrastructure um then I'm going to talk about my implemented solution uh big FYI work in progress um but I'm going to talk about the infrastructure payload development and future development that I want to do on it and some conclusions

so a little intro into red team infrastructure um just some design considerations that you should have and the reason why you will often see this type of architecture uh whenever red team infrastructure is mentioned the things you're looking for here are resilience and concealments and the whole point is if you have a payload and it's reaching out back to a C2 server you want to have the resilience of having more than one redirectors those are redirectors do nothing more than redirect traffic to the backend server that could be a C2 server that could be a payload server and what you want is if the blue team is able to detect that type of traffic and

it Flags it as suspicious and it ties tries to block it you want to have a list of options and in this case the list of redirectors so if you block redirector one you just it just automatically moves on communicating on redirect you too and then if something happens on redirector 3 and so on foreign what you want is to have some sort of separation of deity again gives you more resilience but uh you'll often have this logical architecture regarding for example your payload and then another one for the C2 server uh and then maybe another one for the phishing infrastructure for example uh you um you want to have these separated because you don't want you you don't

want to have the blue team just block one IP and all of a sudden it blocks your entire infrastructure in terms of uh functionality but there's a problem with this which is as you might imagine if you uh are tasked with setting this whole infrastructure up then this becomes a huge load of work and you do not want to do that manually I love sysadmins but this is not the job so I do prefer to spend my time hacking um so yeah the solution is obviously given the problem that too much work instead of spending too much work in season you probably want to take some of that time and actually put in to what is more

valuable to the client and actually do more uh hacking in that sense so automation is definitely the way to go um now just a quick note on why this type of projects is interesting um multiple Tech options different design options very customizable new tech integration um I one of the things that I've always loved to do even though I'm more connected to the security industry is to go randomly to conferences it conferences usually where sometimes developers infrastructure whatever and just learn about new technologies and often wondering how can I integrate that into what I know and how could that help me and in that sense it is incredibly interesting to just go deep into some of

the Technologies because obviously most of us understand for example the concept of devops but have you actually ever gone deep into one solution and tried to actually have it work so this is why this type of project tends to be like super interesting you just learn something new and you immediately you might have new ideas I don't know for example Mojo is a new language but on the surface if you look at it you might just quickly put it aside as oh it's just an AI language kind of thing but actually when you dig deep into it uh it doesn't relate to this but it's just an analogy here but when you dig into it you start realizing

that actually it's a full purpose language and if you understand how much quicker it is and how it does it you can quickly have the idea of maybe redesign hashcad I don't know hash cheetah or something and down the line maybe that's an interesting project that might come up come up just because you had the idea to go to a conference and hear about Mojo that seeming seemingly unrelated so if you want to read more into this there's a lot of technicality uh in terms of what software to use for example in the redirectors there's more considered dumb redirection when you usually go uh Ozil layered level uh level three for where you're thinking about IPS or maybe a TCP UDP uh that

kind of redirecting but then there's also the considered smart redirection where you tend to it's usually something related to it has the ability to read into the layer seven so HTTP protocol look into the the bodies of the request and decide what to do sometimes edrs and specific tools that sandbox some payloads will do requests and will be will identify themselves in the agent or and whatever HTTP header and you might want that flexibility so in that sense you might want to use something like mod rewrite um in my case I I'm a minimalist at heart so at this point I'm still using iptables but later down the line as I see the need for it I will definitely

improve it so yeah definitely some really good references um now I'm going to talk about the implemented solution uh again big FYI ongoing work um so basically it comes uh it comes down in terms of infrastructure it comes down to a python wrapper that runs terraform to set up the infrastructure and uh the terraform in it in itself has ansible as well to for provisioning so if I need to install an Apache server kind of stuff that will be done with ansible but setting up the whole VM and three directors and all that stuff will be done with terraform the reason why you want to wrap this up in in Python is that you will often you will require for

example when I do devops I'm using GitHub actions and what that means is I will eventually down the line have to trigger workflow and you can do that in the GitHub using the GitHub API to do that kind of stuff I need not only information that it is given to me as out output from terraform namely the public IP addresses from VMS but also obviously the API keys and everything to communicate and set up the the whole devops which I'll talk more ahead there's also I want to integrate a uptime Kuma that's k-u-m-a just because it looks interesting so I got into it it doesn't really necessarily it's the best just FYI but um the whole point is you want to set

that up usually with terraform and ansible and install it but then you need the API the python to again reach out to it through the API through its API to feed it the servers that you want to Monitor and how you want to monitor them that's a bunch of python requests that you do so these are some of the options um that it does I'm using a click module on python it just makes things a lot easier and that's an example of the thing running it's not that interesting seeing it live it takes a lot of time a lot of time is in 10 15 minutes maybe depending on the number of redirectors but uh on the terraform applying part

um but yeah and those are some of the tools that I'm using GitHub actions I'll talk a little bit more ahead about those and yeah just to talk a little bit about the option to go terraform infrastructure s code um this is one of the things I really like about terraform it's just infrastructure s code so when you need to backup your infrastructure theoretically all you have to do is get push your uh the code itself um there are very good people that use Jenkins for example which is completely uh plausible uh and a very good option um but to be honest with you it's mostly a personal preference because first of all I look at Jenkins and it looks like

something coming out of the 90s and I'm not a huge fan but also on the more practical side I'm a minimalist and I want to decrease the amount of sysadmin work that I have to do and the worst thing that uh I want to the the thing I don't want to happen to me is to me for me to come into the office is thinking that I'm going to do some really cool cool Recon on a client and then all of a sudden there's an EMC critical CV on Jenkins and I have to patch it and I patch it and I figure out uh it's broken now I have to fix everything up go figure out where the

backup last backup was I'm not doing that um so yeah really huge fan of terraform um and yep foreign is just the integration between terraform and ansible just to clarify it's not like I'm an expert in these Technologies but the feeling I get is that terraform tends to be better at deployment into of infrastructure and ansible tends to be better at provisioning installing software and setting the server up so there's a lot of discussions some people think one is best in the other and but some other people think the combination of the two is the best and quite honestly I've used it and haven't had an issue yet that I could think oh maybe I

should have used more one or the other um and then there's uh some notes on infrastructure security the most important thing when you have such infrastructure especially deployed automatically is you want it to be secure uh bear in mind these are uh these are software that will be running the c2s and so on that give you accessing to actually infrastructure so you want to be very careful with that and make sure you know what you're doing um one of the cool things very easy to manage SSH Keys throughout this whole thing the version I'm using Azure as on the back end as my cloud provider um and the firewall is the network security uh group and SG and that's just

a little part as you can tell by the left column the there's a lot of number of lines there uh but uh just an example the first one will limit uh that's for the resource C2 redirector um so so for the redirectors I will only allow SSH and to be honest I think pretty much all the VMS I will only allow SSH from the operator public IP address for example uh and then the redirectors will accept uh both Port 80s and 443 from anywhere but there's a little trick that I will explain uh actually now in the misdirection part what does that mean so this is one of the cool things about these type of types of projects is that

you can become very creative and because there's no right way to do things you can just do whatever you want that you find interesting and one of the interesting stuff that I thought about was uh if you notice uh when I run it there's some debug information there you can tell there's a client Network range that I Supply now the reason I do that is for the the infrastructure to be able to tell the difference between what's coming from the network the client Network range and what's coming from the rest of the internet now if you're coming from the network range the client Network range what you will see and you try to access one of the IPS in that

case the 112 redirector what you will see is the actual backend I think this case is the payload redirector so it's the payload back-end server that's the the Apache that is there installed now if you try to access it I used my phone because I'm on 5G so it's just easier to don't have to set up anything or just to move places so just for my phone you see a cannoli recipe which is good if you want to play around with the blue team and just misguide them a little uh so it's useful so yeah it's one of the interesting stuff that you can pull off by having these Solutions so in terms of devops uh building the

tools compiling uh the tools that we use very often I'm using GitHub actions and again it's not like I sat down and I uh played with every single one of them out there and decided the GitHub actions is the best not the case at all but I like using uh if I'm using GitHub already for my repos for whatever I use it for project management as well it's kind of Handy in that sense when you pull push stuff it automatically creates stuff in there which is super convenient but if I can have all my tools in one spot unless there is an amazing feature on some other option I will tend to have all of

them on one spot and just make my life easier so that's why GitHub actions was the options the option now that being said I've noticed some difficulties in developing in writing workflows um just some things that don't sit perfectly with me that might have changed since then a few months that I checked that but for example the latest version I want to say of Visual Studio that you can use to build Tools in GitHub actions is I think it's 2019 whereas if you're using the Azure devops I'm not sure if it's Azure devops it has a specific name don't remember but then it allows you for the more more modern uh 2022 version and overall it kind of makes

sense that Azure would make a better thing the option for building at least c-sharptool.net stuff so it kind of makes sense so it's definitely something that I want to try and give it a go but that's later down the line so you can see here just an example of a workflow running now if you look at the green sign over here and you see here it says manually ran by Andre I was confused when I read that because I never read that I mean I read I ran it manually I think the first couple of times when I was testing the thing but ever since it's just the the the the infrastructure the python wrapper that

runs it but I then realized on the GitHub documentation that you can manually trigger a workflow uh run uh using the GitHub API and that's basically what uh I'm doing uh that's a call a part of the Python wrapper where I'm just obfuscating the my API key of course but uh overall it's just the more than obvious stuff it's just some python code that runs triggers the workflow um now after you build a thing uh you the naming convention is you create What's called the payloads in this case it's called artifacts now in GitHub actions and I'm sure others do provide inputs not only input which is useful here in terms of public IP addresses

but also allows you to Define secrets so if you want to put in your SSH key to then connect to the payload server and upload the artifact that you just built then you do something along those lines um and yeah

now that is all about building your code but when you're building your code but after before building your code you want to worry a little bit a bit about the code itself and the simplest example you probably want to obfuscate stuff in it um but not to be easily detectable in signature based kind of detections so the what the the way you try to do usually that is by using some sort of scripting capabilities uh either Powershell or Python and you'll find a lot of very hacky ways where people will uh run scripts to find for example literal strings anything between two double quotes and but if you think about it that's a very hacky and bad way to do it it can lead

you depending on the complexity of the software to a lot of false positives interpolated strings is another type of strings that you cannot detect just by simply doing that kind of stuff but also all types of details like if if you're expecting a keyword followed by for example if you want to update and in C sharp you should because the executive is actually IL code Intermediate Language and it keeps a lot of the names of the classes methods and all that stuff namespaces if you are looking like to replace a class name and you're thinking about oh let me look for the name ABC but ABC might be like in the middle of a string so let me put space

ABC space but then you can find examples where the program where there's two spaces where there's a line break when there's lots of stuff bottom line is you're using you're trying to do it the hacky way which might work but it's not the best way and that's where Roslyn comes in after doing this type of stuff I thought Roslyn was just a compiler uh it turns out that it's actually defined as a net compiler platform and that sounds very fancy but basically what it means it represents C sharp it parses the code which if you think about it it's the best tool to parse the code and interpret it and most importantly it parses in c-sharp code and it generates

a syntax tree which then you can navigate and you will have all these different types that will define specific stuff that you will then easily be able to find class names to find the the instances where they refer to that class name method names whatever you want literal strings interpolated strings and all that stuff so the syntax tree will contain syntax tokens such as keywords literals and so on syntax trivia all the different types of there's different types for different types of comments for example and and syntax nodes as well language constructs classes name spaces and all that so that is the power of Roslyn now just a quick warning it sounds like oh I

I'm about to show you like a few slides on this stuff it sounds very straightforward and oh yeah it must be like easy to use uh there's um first of all like uh but there's a lot of stuff that I'm not talking about here because this is not a Roslyn tutorial um but I will give you really good references of places where I learned a lot about it one of the things is you can write very simple code and look at the syntax visualizer it's it's not hard to get to it from the menu I don't remember right now because I've done it a few months ago and now it's always there so uh but ping me

later I'll I'll find on my notes or actually look at it but um yeah so you can click on stuff and I think there I was clicking on hello world and it immediately jumps the visualizer jumps into the type that corresponds to that object which is the string literal type this is just an example for you to start getting like comfortable with the types and understanding how it all nests in together but in order the com the oversimplification here is that I just showed this and everyone is probably expecting to open Visual Studio and find it you actually have to install Visual Studio SDK to do that um that was the only thing that would be

cool but so the nougat packages to be installed there are quite a few let's say four or five I just have to check my notes again uh but they basically contain information for types classes libraries required to analyze parts and understand the c-sharp code then you have multiple objects literal string interpolated string syntax trivia all that stuff that you kind of have to get used to so it's a good idea to actually take a look at the visualizer there is also I didn't put it here I forgot but there's also a URL a website that does that offline so if you don't have visual studio for whatever reason or just want to quickly just we're going

to check it again ping me later but yeah multiple workspace types uh adoc workspace I must build workspace Visual Studio workspace depending on what you want to do you'll have to use different ones and also understand just the the the what a workspace is a solution a project document how they correlate with each other and then how all the objects then come into the document object uh then you have navig uh learning how to navigate the syntax um which can be done in three ways uh link is the simplest way it's the simplest way sorry um but not great performance which is relevant if you're writing an analyzer which Visual Studio is running every time you press a key but then again for

our purposes it's not really that bad um they're there's also syntax Walker and syntax visitor and Link is actually something because I find the syntax Walker a little bit more elegant but I've been forced to go back to link a few times just because of that last sentence insufficient stack to continue executing the program safely now if you've ever programmed the using any sort of recursion you know that if it gets too deep then you start having trouble with the stack and that's something that sometimes syntax Walker will trouble you and you'll have to refer to some sort of solution on link uh another uh issue is it uh all of these all those objects are

immutable objects and that's something that you hear and you tend to forget really quick but what that means is every time you actually change the code through programming it actually creates a new instance of the of the object so if you're keeping the old one which is very common uh you then struggle with wait a minute I just changed this what happened why is this not uh changing and you just have to figure out that you have to use that new object so just some really really good resources and again uh conferences that I would usually not attend uh developer more like c-sharp I think I want to say programmers developers but uh seem very

focused on C sharp Donna but uh and DC Oslo where Mark Randall and Eric scherboom uh I hope I'm pronouncing that right but they've they talk about very deeply about it and some other implementations um that don't relate to security but you understand the whole point of having these also xpn has really good blog where he actually uses Azure devops so which experience Adam Chester uh amazing researcher so and actually I do have the link for the syntax visualizer which is that one Ross encoder again I'll give you the QR code for the slides in the end so you can access all this

so let's just show a demo of Roslyn in this demo it will just grab a seat belt and it will convert all literal strings and interpolate the strings to kept uppercase letters I'll just share show you the code and everything so this one's like

for some reason it's taking ages to start video

just one second go with me

let me just try to kill Windows Media

Okay so let me just try to Okay so I think it's right

same problem with the mouse sorry for that okay

okay so this is uh basically that the main code and as you can see there I'm loading the uh the solution um it's going to just call uh let us just run through it

so it loads the solution that's the seat belt that I'm about to update and download then it creates an instance of the uppercase writer which is an implementation of c-sharp syntax rewriter and you have those two visitors interpolated string texts and interpolated and the literal string one and all of those are executed once navigating through the whole syntax tree and calling out the visit method there so then I just have to update everything build the whole process output the assembly to an executable locally because again this is all in memory and then run it and here you can see that I'm getting a fresh copy of seat belt and then I just run it and you can see everything comes back in

uh upper case

that is very weird because I'm not moving it okay okay

now there are some complications there um things get trickier I mean the thing is you uh that is a very simple example but it's the example uh I wanted to show it because it's the easiest to visualize because of the uppercase of course uh but the when what you actually want to do is to integrate to grab that literal string or literal string or uh interpolated string and put did inside a decode method not only encrypt obfuscate it somehow or encrypted but also uh put it inside a decode method now what that means is that you have to write the code for the decode method somewhere now depending on the software that could mean uh the inside a

namespace inside a class and just write another method but it has to be code that is reachable throughout the anywhere that has a text that has been obfuscated so it makes you think it's not that straightforward so

now just to clarify some goals um what you want to have is fresh infrastructure ready to go fresh payloads ready to go uh simple to deploy easy to manage but and for future development uh what I want to do is having the infrastructure timeout for auto cleanup because ideally you don't want to stress too much about bringing down the infrastructure and just have it uh you know the engagement is ending in two months for example you just set a date and it does it automatically of course more c2s everything that it's being done is being the redirection and everything right now is through IPS um not domain names so that's a part but I've I've seen that you can actually buy

domain names probably generate a few random ones to test it out but there are tools that will generate if you give it some sort of context in terms of naming of company kind of thing it will generate a few random names that are kind of related to it so it's a bit of Disguise but again you want to automate all of that support to https still not done

also uh phishing infrastructure support I definitely want to add that um how long do I have Okay um now a lot of people like Solutions like Go Fish uh but personally uh having had great experiences uh because my the obvious thing about these kind of types of tools is that you lose a little flexibility and control over what is sent um so a lot of spam filters will tend to become really good at detecting the minor things about how an email is developed so you want to be very careful with that and to that extent I'm a much of a bigger fan of developing the whole thing myself very quickly python codes running that connects to an SMTP server in a I've

done it in AWS before but pretty sure you can do it anywhere else to send an email and then you control the whole body that could be HTML that could be just text but you control everything exactly everything that goes into that and then just create the whole phishing site where you collect the credentials in and so on and make it look like something legit quick conclusion uh good right to me for social design principles resilience security uh easy to deploy good payload Dev principles obfuscation encryption is usually uh not always but usually an Overkill um from experience bypassing edrs most of the times when you are actually trying it's really important for you to

understand the goal here what you're trying to bypass is automatic automated static analysis as in signature based kind of detections you can never stop really good manual reverse engineer so it might take them longer but you that's the point you want them to take a little bit longer but it's just for you to be able to then move laterally and get other machines so it's very important to understand what is the purpose here and more often than not zorbed by absorb with Zora encryption with Zora obfusiation or encryption technically with one byte key is more than enough code insertion to stop automated Dynamic analysis don't have time I would elaborate on that so just ping me after but ioc

stripping per compilation per up uh good insertion you want to have that because you want to have the functionality of if something gets uploaded to virustotal you want to be able to understand exactly which engagement were you on when you developed when you compiled that payload and it's really good feedback as well for the client and for yourself if you're in the middle of an engagement and one of your payloads gets uploaded you know it's from disengagement and not the previous one or something like that so you probably need some fresh payloads you only get that kind of feedback if you have this good into your payload and then of course the kill switch there

the job is ethical hacking and there's a very important part of on the ethical uh word and that also means respecting the client's time now a lot of the times you assume oh but my infrastructure is gone like what do I care you should care because when a payload is triggered is executed and it starts running you need to have a kill switch otherwise what that does is starts a whole bunch of red flags and tickets throughout the whole company and you're just wasting people's time unnecessarily as opposed to it just checking if it it's the time of the engagement if it is for example if it is raining on a computer uh in the

domain that belongs to the client um so you you definitely want to check those things and when you want to check for example for domains you don't want to put in the actual domain of the client in the payload because that's a hint that a client X might have been compromised so what you want to do is if you understand password storage is pretty much something like that it's a bit extreme to Salt it but what you want to have is a hash of it so you grab the domain where that computer belongs you hash it you compare you with your hash and that's uh only if it matches then you run but if you take anything from the stock

uh Let It Be the following never send a human to do a machine's job agent Smith so I hope you've learned that today uh you can grab the the slides on that QR code uh again I'm reachable if you want to check my previous presentations uh so thank you very much for coming

thank you very much really appreciate it yeah we all appreciate it too are you sticking around if people have some questions fantastic yeah I'm here for dinner as well great so that brings us to our last but not least talk of the day it's going to be another hard-hitting red team talk from Martin ingerson founder and CEO of covert US security consulting firm specialized in offensive security pen testing red teaming he is also a two-time winner of the master of cyber security Norway from the competition from hack on so lots of fun it's a fun event and he is going to talk to us today about Ai and how we can use to presumably llms

that have been all the rage in red teaming and offensive security operations and after that we will have uh we'll wrap it up with some closing remarks prize for the lucky and skilled CTF winner and then hopefully many of you will be joining us downstairs for drinks for uh one hour approximately and then we'll come back up here for uh for dinner uh with tables and we'll get rid of these chairs some set up for dinner which will be at six um we had to move out of Smet de Vaca which was the original plan due to damage from the recent flooding which is still not repaired so we will we'll hang out in Pokemon for those of you who are joining

for the evening have some uh some drinks and then we'll come come back up once the furniture is changed for dinner good to go um soon yeah okay that's a good answer uh we still have some retro t-shirts this is the Retro from last year if anybody wants to pay what they want uh any any number zero and above is recommended and uh and if you pre-ordered a shirt then please go pick it up and then we will sell the the remaining shirts as well from five to six if anybody wants this year's shirt in a luxurian brown it's called mocha but you can be the judge of that yourself

oh there we go yes all right give it up for Martin

I'll grab my luck at kind of hitting the button with lag it's like playing Counter-Strike at home all right so um thank you so much for having me and I know that I'm standing between you and beer and so I realized that's bad for me uh but but I'll try to get through this um I also try to go for the most buzzwordy title I think I won if you can agree I also try to go for for the the longest one um you know and fairly like in the lead there unfortunately the the graphql guys beat me so I have to do better next year uh if if I'm accepted obviously but yeah that's going to be one of the parameters

I'm I'm sure so uh briefly about me this is me before I started learning about AI now I have beard and and the deep depression um no I'm just kidding um I grew up always wanting to be a programmer so so I love programming I've done that my entire life I as far as I can remember at least um so I figure out I wanted to do some education I I wanted to go down that path uh and suddenly realized you know information security is way way cooler I started playing some ctfs as Ryan has mentioned and I won some of them I also played with people here which is also very very cool and I started working as an information

security consultant I've been doing that for like professionally for seven eight years I've been playing ctfs for 10 or something um and decided to out two two years and nine months ago to start my own pen test company as Ryan also mentioned now if you're coming here like them what's important to mention here is I I don't have like a PhD in AI or something so if you're here and you know a ton of stuff about AI I'm sorry um I'm gonna try to look at this from kind of a practical approach but also kind of forward-looking uh Andrea touched on a lot of cool topics that I'm also getting into um but I I I won't spoil it

um the point is I Joe here so I didn't have a chance to bring it with me but I have a picture of it because this is going to be fairly high level I'm a dad so yeah sorry now um the Titleist contains the the word offensive security and so I want to try to Define that quickly before before going ahead now back in the days uh we kind of separate between the red team and The Blue Team the red team being the people you know attacking stuff and The Blue Team being the people defending them and I know we have a wide variety or uh in the in the in the audience here today so uh that was kind of the

tradition and now suddenly uh red teaming has become much more than than just being the guy who attacks or the people who attack and I think this is kind of a typical buzzworthy thing uh in in the in the in the information security space but suddenly red teaming is kind of what it doesn't matter what you're doing as long as it's always red team vulnerability scan it's red teaming you're Port scanning something it's red teaming you're I don't know writing a report it's red teaming so so we're kind of moving away away from the the word red teaming because when I talk to clients they say red teaming but we mean widely different things and if I choose

like five people here right now and asking like Define red teaming uh well we'll get five different answers I'm certain of it um so we've shifted to kind of more adversary simulation emulation stuff but but also namely the name offensive security so offensive security is basically just attacking stuff it's penetration testing I guess it's yeah um and and again ignore the company named offensive security they just messed it all up but yeah luckily they renamed so that's yeah that's good um as offensive security practitioners then we gain access to to a ton of data I'm not sure if everyone realizes that when we do penetration testing or or are doing a security assessment or something

like that but the amount of data we get access to is pretty large and the amount of data that we're able to process like manually for us is is very limited uh I don't know like have if if there's any penetration testers here like how many value shares have you read like for hours just reading files trying to figure out documentation trying to figure out how stuff like hooks together in this environment uh yeah multiple hours just looking trying to find that one thing that's gonna like take you further that's one part of one part of like the the data you get access to but also stuff like I don't know Bloodhound information from ad

um information from from yeah computers yeah all kinds of information um some some examples like Yeah from there's again there's information everywhere right and and we're not kind of uh taking advantage of it they're from Lowdown Port scanning your C2 whatever and also also your network web app scanning or stuff like that so um and then now from from the offensive security part like okay we have all this data and stuff like that how can we kind of take advantage of that well I'm gonna try to briefly like describe AI and again this is really from a practitional type of view so if you know this stuff I'm sorry again I'm gonna watch it but the point being that that

it you don't really have to be an expert to take or uh to utilize some of these tools that is this kind of um appeared the the last year or the last periods so it's important because um people are very scared like not know you guys in general not you but people in general are fairly scared of AI every time we have like a new new thing here like when shot GPT arrivals suddenly it's like it's like it's gonna destroy the world or something like that but it's important for us to Define that we are still at the infancy stage of what AI can do for us we have this concept called Ani AGI and Asi which is

basically just artificial narrow intelligence and then general intelligence and then Super intelligence and narrow intelligence is basically a toddler or a kid just learning to like write or or Draw it's a it has a very narrow range of abilities in the general intelligence type well we have a an AI who's kind of on par with with us humans or like grown-ups they have all the capabilities that we have and then again we have like the SI which would be Skynet and iRobot and I don't know we're very very far away from that we're still we're still here and there's like a ton of time uh until we're like over there but unfortunately people believe that it's all magic and it's like it's

going to take over the world but we're still here it's still fairly understandable even for me um when I when I I took a master's degree in information security and I was forced to do some AI stuff there and I Data Mining and stuff like that and we primarily learn about supervised and unsupervised learning we're not going to talk about that today but it's it's still kind of relevant um supervised is well Machine learning in general is it takes some input like if you give it a data set you give it some data and usually a ton of it and kind of like the stuff that we already have like in hint for for the the talk but uh you give it

some data and in the supervised learning if you use that you pre-label the training sets right so take emails for example you have 10 000 good emails that you manually labeled that's good and then you have 10 000 Bad Emails that you manually label as bad you train your model on that and then you give it some random error email and it'll kind of classify it that's supervising you're helping it kind of decides on on what it should do and not do that's simplifying it but yeah then you had unsupervised learning which would be I'm not going to help you figure it out on your own very common for clustering trying to find connections between things that

we're maybe not able to see as humans so it's very interesting in in kind of social graphs or or marketing analysis and also recommendation engines um looking at Amazon for instance being able to say like okay with this huge Corpus of data if you buy this item you're probably interested in that item because there's there's not an Amazon employee who's like yeah if you buy this you'll probably want to buy that so there's no one like sitting manually labeling thousands and thousands and thousands of of items it's it's unsupervised largely unsupervised then we had deep learning and that's kind of the part we're going to go into brief into in general and and in this in

this talk um and and then reinforced learning that's again above my level but but goes more into robotics more into into training and rewarding a assistant and then it learns from from from that for the Deep learning part um there's this new well it's not really new but it's it's fairly new uh like mainstream new uh called generative AI because traditionally uh these machine learning models haven't really produced anything like you it's more like you you ask it a question or like you ask it to do something and it does that thing it's not like generating stuff from from thin air in thin air uh but that's what's so cool about generative AI uh we've already talked

about llms you probably all know about shot GPT um some of you might have used mid-journey or stable diffusion for generating cool images and we saw a talk earlier where where I believe the images was produced using using um using some some type of generative AI and also audio and we're seeing audio being used or should I say misused already for faking people's voices and that's real like that's actually something that's happening right now and it's being used misused by uh by actors threat actors and many many more there's there's probably a ton more things you can actually like generate but yeah um just go into llms because that's kind of what we're gonna Focus most on I'm going to simplify this

very very much but it's gonna you know it's it's just enough to kind of it's like the Dunning Kruger curve like you think you know about it and then yeah yeah anyways so there are a neural network with trillions of weight weights a neural network again this is a simplification but it it has some it's it has some input parameters and then it has tons of hidden layers that are weighted or that that have some impact or or redirects or and I have a good way to describe it but it'll kind of uh the data you put in is is transformed via going through through all these nodes and then you get an output node and it's difficult to know

exactly how it arrives at the conclusion it does because this isn't built or this is this is trained so so you give it a ton of data it trains itself and and obviously it's not like what's that 12 nodes it's it's trillions right and you're training on a lot of documents and that's kind of the if you if you kind of look at the news today uh they talk about the plagiar plagiarism uh so so people even you know it steals the information from the internet and then and then and then it used that in in the model and that's kind of true kind of what it does at least it kind of indexes or it uh ingests all of that

information and then it builds um these types of Vector spaces and Vector spaces is very very simplified now I'm using words because you know these these models doesn't use words but they use numbers that represent words and those words have uh some sort of vector space that places them close to other words that relate to them so so if I read a thousand documents about an Apple for instance well it could be in the Apple computer or the iPhone it could be an apple or apple tree apple the fruit apple or it could be an orange and then then it tries to kind of put those in relation to each other uh depending on what's most likely

now in in modern like complex llms there are multiple Vector spaces and that's what gives it context so so if we have kind of context um if for instance like you you all shot GPT I want to know more about the Apple computer well it's going to look and try to find the vector space where it's been trained on Apple and computers more in general than Apples and fruit trees for instance so it has multiple different types of vector spaces that you can kind of use to create the context or the feeling that it understands what you're talking about so that's again is that this is very simplified but it's important because it answer answer kind of answers this common

misconception that LMS are just copy pasting stuff from the internet and like you know copying Snippets from from here and there and gluing them together and like brushing it over so making making sure it like it's it's synthetically correct but that's not how it works at all it ingests every word and then it calculates like the the the the chance between every each and every word how How likely they are to actually appear together so it it has no and that's also why strategypt is unable to give you like the citation because it doesn't have that it's just trained its model uh to to to kind of um I'm not going to say feel because I'm

not gonna humanify uh AI but it's got It's Gonna it's just a score like what's the most likely word to follow now all right so now the whole thing about the talk in general let's let's meet like let's put off insecurity and AI together and just before before I say anything about that it's important I wanted to touch on what Andre said it's it's so funny because again he had a chance to say it first um but I in my experience we're generally late to the party when new technology arrives we're very slow at adopting automation devops uh AI uh soar or whatever like any any type of new technology offensive security is is fairly slow and

um I believe that's because the stuff we do still works there's really no need and one could look at it and say like it it's adding complex complexity or it doesn't really matter um but for us to kind of push the field forward I think it's important that we take that step back and look at this and like is there in any other way we can look at it it's it was a wonderful suggestion going to other developer conferences and seeing what they are talking about because we kind of live in our own bubble so so we need to break out of that and and try to look outside and try yeah try to see

if there's anything else we can we can do um

I assume that most of you already use chat GPT so this is going to be very very obvious for you but still if there's someone here who's like huh I didn't think that was possible well that's why we have the low low hanging fruits so just an example then what can we as as offensive operators use uh GPT for well for instance we can make it a right command line commands for us uh this will be very simple I I've watched the the IP address on purpose uh and and it's funny because it also tells me that I'm an idiot and that's not the valid ipv4 address I don't like its attitude but I made sure to comment that

later um but it still gives me like the the stuff I wanted and this is you know that's in retrospect it's a very very simple command but uh say you're writing I don't know um how many use hashcat and like know every single parameter in hashcat for masks and forward lists and for all of that hmm I don't but yeah that might just be me but then I can use this obviously to to kind of just kick start whatever I'm I'm doing give me and it's not always perfect and we'll get back to that as well but but uh shot GPT has read all the documentation they already have ingested the documentation I haven't I haven't read

the man page who does that um but yeah so so they have all the information already um another thing you can do and you can ask it for two suggestions like hey I want to scan a bunch of IP addresses how do I do that and obviously it gives you like the you shouldn't make sure that you're doing it I don't know if it could be an oratorized and stuff like that but again it still kind of gives you the information you just have to has a wait for it to type out the first part um in that case it gives me a map and mascan which I guess would be would I would use but but I also just send that

map and for some reason angry ip scanner for those who are gonna take the certified certified ethical hacker exam um but another cool thing that I didn't prompt it to do I didn't ask it to to give me that information but it suggests Showdown and sentences as an alternative way to to to acquire that information well it's obviously not really what I wanted but it could be a good way a good uh could be a good you know way to learn about something else or to like reconsider maybe I don't need to scan this IPS page range maybe I can just look it up in The Showdown and obviously you're not you're not going to get all

the ports because Showdown doesn't scan all the ports but maybe that's not the point so so that's very interesting again I didn't ask you to to to give me any options I wanted to scan some eyepiece and this is what it gave me so so as a kind of learning tool as well I think it I think it's uh it's it's it's a good idea to to use it also a Titleist help getting overview but also kind of systems making stuff more systematic or or parsing stuff GPT is is very good so this is you know it is I'm going to give it some IP addresses with some ports I wanted to kind of tell me

what are those ports and and classify which of those assets are more likely to be vulnerable than another so anyone who's going to take it bad on which of those IPS would be like the worst no no offense of telnet here no that's that's fine so um again it's going to give you and it's pretty pretty as well uh gives you an overview and it even has like a port 1337 off news release or hacker Services sounds cool it highlights the telnet part and I guess the coolest part is okay it's going to give you invulnerability analysis and tell you about all the different type supports and again I didn't I didn't really ask for this but that's what it gave me and

and uh and it's a good indicator that okay maybe I should look look into those services and to kind of finish off uh there is the result it gave me well uh we consider uh the most vulnerable to be talent and FTP and then potentially vulnerable and then probably not as vulnerable and again you know this is obviously it isn't perfect but it might give you a kind of an idea of where to look first for instance you can also you know help you write reports uh obviously um uh you got to be careful copy pasting data into into shot GPT in my in my professional opinion uh if what shot gbte returns is something that I

can stand for do I have to write all of that process just to like say it's mine I'm not sure like people copy paste anyways it's from the internet and again it's going to give you a ton of information and what's also cool is that it's going to give you information like um obviously explaining what SQL injection is which is what I initially wanted like I wanted to like have something to copy paste in the report but is also saying oh well some of the malicious actions you could do could be like reading sensitive data modifying data possibly executing commands and which I think you could use as kind of a indicator where you could say oh maybe I

should go back and revisit that SQL injection that I discovered and make sure to see if I can actually hit those hit those um malicious actions and see if those are possible so it's kind of a way to to to make give yourself kind of a checklist to okay do do more and kind of really prove that the SQL injection you found is good and obviously I wouldn't copy paste all of this into a report that would be uh too much irrelevant information but it's it's nice it's kind of a tool for for guiding you you can also create some horrible PowerPoints um I try just uh just for just for fun to to generate PowerPoint based on this

presentation and it gave me absolutely nothing so or it gave me this but yeah barely usable um but what's cool about this it's it gives you the the python code already shows you the python code that it used to generate the the slides um that's an add-on you have to turn turn on in in chat GPT but what's so cool then is you could take the the PowerPoint code and in theory you could create your own report to presentation python script and just you know slap on your own uh theme and in theory you could like yeah sure I can create that presentation and then you like hit a button and you have the presentation so

that's also cool automation again uh it's it's very relevant when it comes to reporting because that's a very sad process um so again kind of back to do all of that data we talked about now obviously all that data that we receive from Bloodhound or or from from whatever tool we're running we can't really paste that into chat GPT that that wouldn't be that wouldn't be like good so what I'm going to show now is outdated or probably already outdated because this is a field that's moving very very very fast it's it's it's running ahead but as an example you have uh you have projects like private GPT which you can run on your own machine it

requires some some decent GPU power but but in general you can kind of create your own shot GPT and then base it on the documents or the data that you do have available and it's built is built with Langston gbt for all chroma and llama CPP and sentence Transformers and these are all building blocks that are all of those building blocks are always getting better so again this might not be the best option for running stuff on your own right now but it's again it's moving so so fast what's so cool about this in particular is because it uses Lang chain it's able to ingest all these types of files so every everything from you know plain

text txt files to Evernote to EPUB to mark down or even something as cool as Outlook messages uh just like yeah Outlook uh [Music] I was gonna say there's one more email so so in theory you can ingest a wide variety of stuff that you find during a pen test into this and just to kind of give an idea um you're probably all familiar with miter attack which is a what can I say a a library of of uh of tax tactics techniques and common knowledge for for what you do during each stage of of your I guess offensive operation it contains a thousand different sub techniques that all generate more data which is which is uh interesting to

would be interesting to look at and this is my attempt at putting all of miter into one slide which always fails but make sure you can read that because there's a kahoot afterwards so um one tool that generates data would be a foca or foca fingerprinting organizations with collected archives often used in the reconnaissance phase if you're kind of it uses Google being Dr go for searching for files related to the company you're looking up so say you're looking at at covert it would try to find the PDFs or or any type of file that that contains the word covert or or are related to covert and that's something you can could ingest easily um

file shares we already talked about file shares man spider is a tool which doesn't necessarily fit into this category or like the private GPT part as well but what it does is it looks through it uses regular expression and it searches through all the the files in a file share but what you could do instead is just take all the files from the file share and ingest them into private GPT and then you can have it searchable there instead and it wouldn't be as straightforward as this you can't just like run regular expression inside of a llm that's not how it works unfortunately and we'll get kind of back to that but um that might be one way to to to

um to do that type of uh ingestion another very cool Tool uh is from from a guy named uh flankwick um he created a tool where if you're able to compromise a Microsoft account online account it's able to exfiltrate data like emails Skype not Skype sorry team messages uh calendar and stuff like that and that's again is something you could ingest like you could ingest the chats you could ingest all the emails and then you can start prompting into that so is it as easy as just ingesting it well you have to remember that it's not like the llm knows what you're doing it's not like it knows that the stuff I'm ingesting now is

um is it it doesn't know what you're gonna do with it so when you prompt it you can't just say give me uh give me a password that you found in some documentation because it doesn't know what the password is probably uh it it it it's not kind of capable of that but you can probably and this is again depending on kind of the development of these three or available LMS start asking it like where's your backup server located where's your what's the like you can ask it more configuration types of all questions and that was also mentioned in a talk earlier about you know taking documentations and putting them into kind of llm's type of systems and I

believe that's something that's very in its infancy but it's going to be more and more relevant think about ingesting someone's Confluence for instance into into a llm and being able to actually read it and not just waiting for the thing to load um and then again um prompt into it um just want to mention common pitfalls there there are some things you need to remember when working with these types of tools or projects first of all obviously the stuff you put into it is also basically the stuff you get out of it so if you have any type of bias in in your data set that's obviously going to reflect in the results you're getting

so so some say you know in out and that's kind of goes for for this as well and that's I guess that's AI in general um also misinformation again back to back to the stuff that you're ingesting and this is also a com a thing that people are discussing regarding llms is it just ingests stuff from all over all over the internet right so if if you're able to like if it suddenly it parses some Flat Earth or website or something it could in theory be uh make it believe or like when you can end up prompting and and saying that even the the word is flat it has no you know it has no context it

doesn't know that that's necessarily wrong or right only depending on the weight so if someone publishes a ton of Flat Earth theories well it in theory chap gbt is going to start saying that the world is flat because it's just calculations right what's most likely another very important part is hallucination and and this is something that people really haven't been able to wrap their head around when when trying to prompt stuff especially in chat GPT it the model will try it very very very very very best to give you what you what it thinks you want so if if you ask strategy what is Martin's phone number it's going to give you a number it's not going to be mine

that's I'm fairly sure yeah I'm fairly sure and but it's still it's still like it's like ah this is the most likely thing I could come up with and that's what you get and that goes for kind of all of it so so you have to take everything it kind of gives you with a large grain of salt and yeah it it can hallucinate and that's that's a problem and that's kind of the the pitfall with people who don't understand that when they use Chachi petite it's like it's like asking to to figure out if a text has been written with chat GPT it's like it's it's very very bingo and also back to kind of the the

ownership of generated content we can kind of avoid that in general by ingesting stuff that we you know privately ingest stuff that we only use for for that engagement but I mean it's it's it would be interesting to take all the reports you've ever written uh all the information you ever gathered and like smash them into to one model and see kind of what it gives you this is is it ethical who I'm not I'm not a lawyer so I don't know um but also again the generated content the thing is again back to it it's not copy pasting stuff it doesn't have a database of like this came from there and and that's it

doesn't have it it's nothing like that so so it's uh it's something that needs to be considered and hopefully avoided by by going you know offline uh instead of instead of using for instance chat GPT for for stuff that it shouldn't have access to um I'm getting thirsty so thank you for your time um yeah we do some cyber stuff if you want to contact us but yeah thank you so much [Applause] Thanks Martin we do a couple questions does anybody have questions yeah oh he's thirsty watch out um have you used this on engagements with customers um in general yes what are your experiences regarding as you said the the models tend to stick to

the information that they know from before what's your experience with how often the model hallucinates what are your experiences with trying to ingest all of this random data into a model that has not been to some extent it's probably not been trained extensively on the new information yeah how it's not it's still in its infancy that's okay it's it's it's not where I want it to be and and it's easy to kind of dream up scenarios really like ask you to do whatever you want like give me all the information about every Windows host or whatever it's not there yet um that could be fixable with with restructuring the data before ingesting it better language models

it's yeah it's it's still still a work in progress um what it's decent at is is taking emails and and kind of not necessarily parsing them but or well parsing them but not necessarily recalling them because again it's not able to recall anything but but typing something in the word of something else and this goes back to kind of the context on the vector vector spaces but it is it is able to kind of say if I wanted you to write this in the voice of the CEO it it's ish able to do that but it again it's not perfect we tested it it's not our daily driver but yeah

anybody else have questions for martinism about possible uses of AI in offensive security operations yes coming your way keep your hand up thank you have you seen any changes in restrictions regarding what information it will give you considering that they're kind of cutting down on what information you can get the information I get from from the model you think yeah in terms of offensive security that it won't allow yeah so so the nice thing about running stuff on your own is you don't have the guard rails so so like for chat GPT as you already saw it it kind of tells you like you need to don't don't use this for for malicious purposes um and that's a part of the shot GPT

model that it's not necessarily trained on that but it's it has guard rails kind of around it if that makes sense to to kind of protect the model from from doing stuff that open AI doesn't want it to do uh if you run it on your own you don't have those guard rails so so that's kind of but uh I in my experience has been that it's been more relaxed it's not as restrictive and it's not shouting at you as soon as it kind of feels like you're trying to hack so so shot gbt as well kind of feels more like yeah it gives you warning but it still gives you basically the result you're after

so so it's it seems like they kind of tuned that a bit down but that could be anecdotal okay so for GPT uh exploit code and stuff like that uh there hasn't been any stronger restrictions going forward than what it was say a few months ago that you have seen yeah as yeah so so from my experience no like they're like it's if anything is become more relaxed and also the thing is uh kind of bypasses for those type of guard rails that are so common now you know it's like my grandmother used to write exploit code and I would I would love to like learn something from blah blah blah and then it kind of for some reason kind

of bypasses those guard rails and suddenly it gives you exploit code um what's worth mentioning is that the code it gives you is usually like crap at best and you have to like tune it and help it to tune it but um but in general it doesn't there's yeah the guardrails are generally bypassable thank you any other questions for Martin I have a question he said it it wouldn't be a good idea to paste uh client IP addresses into chat GPT what about the pasting client IP addresses into the Enterprise chat GPT well uh again I'm not the lawyer no but but to to qualify a little bit we when we're in the VIPs talk from Nora and

Kenneth they talked about APA test right so we if we use APA test then Google tells us we can trust this app but then we still have to trust Google right so how much do we trust Google how much do we trust the chat GPT and it's uh it you know not from not is it legal but yeah how much that's right that's a good point and and a thing that I want to add is I mentioned that we're all often very late to the party like chat Bots that builds on LMS are already being built into office for instance it's it's already reading your email at least it has the capability to do that and and it's a toggle away and at

some point in some organization that's going to happen um so so at some point you have to trust someone I guess we we like at covert we try to to avoid that by self-hosting as much as we can but it's uh again you have to someone has to have your email anyway and if it's Microsoft like why not give Microsoft it and again it has Defender and Defender for endpoint it already knows your IP addresses so by giving it it doesn't it doesn't give you anything else in general Thanks Martin anybody else we're good thank you very much thank you is it beer now it's almost it's almost beer it's so close to beer if that's your jam yeah

there are other things available as well uh t-shirts water uh right now uh I would like to invite Mike Bates from Sans up on stage please give a hand for Sans and providing the CTF for us today [Applause] thank you thank you I hope you all enjoyed the CTF today plenty of players look very good some of you might know me already I was in a popular band 1024 megabytes we only had the one gig but then I joined the dark web then and I was always on tour um but yeah very popular very good scores today um but there was only one winner the top four constantly changed but the winner today of this prestigious luxurious very

expensive trophy is Bob the thief you're there Bob [Applause] uh we're also doing our Sans training in April at the Radisson blue Waterfront there'll be six courses and there'll be a discount code for anyone who's attended today b-sides 2023 or something like that it'll be so hope to see some of you there wherever you want to come Bob

congratulations there you go

thank you

Mike all right just a couple last things and then we will chill downstairs uh until six and come back up here for dinner I want to thank all of our sponsors again especially the gold sponsors mnemonic spotabank and the remarkable and defendable for not just funding this so that we can charge a ticket price far below the costs per person to hold this event but also for participating uh we don't look at this event uh as attendees staff sponsors who are all participants here and everybody that came here today and gave us feedback from the previous events uh made this event what it was so uh thank you to all of our participants our speakers our

our volunteers and all of you as well and uh to to continue to hold this event in the future and have it grow in a way that serves this community and we would very much appreciate that all of you who registered with your email address will respond to the feedback survey that will go out in the next 24 hours it shouldn't take more than five minutes and that feedback will help us to determine how to serve the community in the future uh we could make this event uh bigger potentially we could have more tracks uh more days but the feedback that we received is that uh folks uh liked the venue and the found the one day one

track event kind of uh accessible and uh intimate nobody used that word but it's a shared experience right we're not hopping between tracks and you know you come with your colleagues and then everybody sort of goes their separate ways and that's great too because you can get really honed in on your area and see all the talks on just your thing but this is sort of the opposite where there's a little something for everybody but there's also you're gonna be exposed to new things that you might not pick if you had 15 other tracks to choose from so these are the kind of things we're wondering did we just do this again or do what what

should we do and uh what did you like what did you didn't like so please respond to the survey and if you have comments uh good and bad uh put those in the in the text field as well um if if there are any uh here's this last thing I'm going to say about T-shirts if you want one there's probably probably one kicking around somewhere downstairs and uh grab some fortune cookies stick them in your bag bring them to the office they do have cyber security jokes inside they're not ordinary astrology based fortune cookies um uh I think that's it um I want to invite the volunteers up on stage now to to thank them myself and so that you can

see the the entire group basically that puts on this event we're a non-profit and uh 100 percent uh volunteer work that goes into the event do we have folks anybody anybody coming up [Applause] no one for us anybody want to say anything no this is this is the team uh the b-sides team and our media uh volunteer from El vibachin is there anybody else still from El vibachin that wants to come up you want to come up on stage no well thanks for sitting in almost the back row and giving us a wave all right thanks everybody [Applause] yeah uh it looks like we uh we are lacking a bartender downstairs so uh we will sort that now

that's top priority

hello all right

BSides Oslo 2023

Related talks