← All talks

Oops, I Leaked It Again — How We Found PII in Exposed RDS Snapshots

BSides Las Vegas34:0053 viewsPublished 2023-10Watch on YouTube ↗
Speakers
Tags
Mentioned in this talk
About this talk
Mitiga researchers uncovered hundreds of publicly exposed Amazon RDS database snapshots monthly, leaking extensive personally identifiable information. The talk covers how misconfigured snapshot sharing exposes sensitive data despite secure network settings, demonstrates real-world examples of exposed databases, and presents detection and remediation strategies including AWS Config, CloudTrail monitoring, and encryption-based prevention.
Show original YouTube description
Breaking Ground, 18:00 Wednesday The Amazon Relational Database Service (Amazon RDS) is a Platform-as-a-Service (PaaS) that provides a database platform based on a few optional engines (e.g., MySQL, PostgreSQL, etc.). A Public RDS snapshot is a useful feature that allows a user to share public data or a template database to an application, but when wrongly used, may accidentally leak sensitive data to the world, even when using highly secure network configuration.   We at Mitiga, discovered hundreds of databases being exposed monthly, with extensive Personally Identifiable Information (PII) leakage. In this talk we cover the main aspects of RDS snapshots and how easy it is to accidentally expose sensitive data widely to the world. Our research process is based on extensive investigation of the RDS service, its configurations, and limitations. In the session the participants will get relevant knowledge about RDS snapshots, including real-life examples of the risk of using this service, and recommendation of how to prevent, detect and remediate the risk of accidentally sharing RDS snapshots publicly. We will share an in-depth description of our automated process, which includes procedures to constantly monitor for public snapshots, and remove any if found. Ariel Szarf, Doron Karmi
Show transcript [en]

hello everyone welcome to breaking ground today we have Doran and Ariel who will be speaking about how they found pii in public RDS snapshots in their talk titled oops I leaked it again before we get started we want to thank some sponsors without them the help of the sponsors donors volunteers this would not be possible the located out in the vendor area but just to list a few we've got conductor One Toyota Plex track be sure to St by their booths a quick note on cell phones please make sure that they are silent and not interrupting and with that I will hand it over thank you to Doran and

Ariel can you hear me right cool uh so thank you for coming let's survivors of U this conference uh today we're going to talk about how we found pii in public RDS snapshot and as we called it oops I licked it

again so this was Britany and This Is Us my name is donon and this is my colleague Ariel uh we are both uh security Cloud researchers at mitiga and today we are going to talk as the name of the presentation suggested about how we found pii in RDS snapshot we are going to talk a little bit about what is RDS what is RDS what is is uh is the service what is the concept of RDS snapshot and specifically public snapshot we are going to talk about what is the problem with having public uh snapshot over there and also show you how attackers can exploit it we're going to talk about what we did in our research how we mimic an attacker

technique and did it at scale and also share with you some cool examples of real cases of real DBS that we found out there with the real information and at the end we are going to share some uh ways to detect and mitigate this uh this risk but before we start we want to show you some of these cool cases that I was talking about so these are some databases that we found out there and there shouldn't be public so for example here we can see a table of one of those of those databases that includes a username user password email the gender the phone number of the user the merital status a token of password ID number and more

and more and more and more yeah so let's start what is RDS Amazon relational database Service as known as RDS is a platform as a service that simplifies database Management in the cloud there are many great features for example easy database management this service automates timec consuming administrative tasks such as software patching hardware provisioning and more all of these allowing companies to focus on application development another example is high availability by replicating databases across multiple availability zones this ensures that that your applications deal with infrastructure failures without downtime this service was launched in 09 and N9 years later in 2018 stock overflow published an article about the incredible growth of Amazon RDS actually in these days this service is widespread

in the cloud what is RDS snapshot so Amazon RDS snapshot is a point in time copy of an Amazon RDS database database and the snapshots are stored in Amazon S3 bucket snapshot can be taken automatically like every hour every day and so on Snapshot also can be taken manually by click RDS snapshot actually back UPS the entire DB instance it's not like select queries on all the tables R snapshot also contain the metadata and you can restore the DB using RDS snapshot and also RDS snapshot is a resource that you can share you can share it inside your account outside your account and even publicly why to share snapshot publicly maybe because you want to share

public data maybe because you want to share a template DB to an application or maybe you just want to share a snapshot with someone without dealing with roles and policies it's so hard sometimes so public AR snapshot is a great feature what can go wrong as all of us already know of course databases can contain sensitive data sensitive data can be personal identifiable information is known as pii if if threat actor gets their hands on this type of data it can be a disaster to your organization they can publish it they can blackmail you using this data and so on sensitive data also can be Secrets secret can be password token XS key and so on with this type of data

theor can exploit your environment based on this public snapshot with sensitive data to public even for just a few minutes it's a really bad practice now we understand it but we don't feel it think about that how easily you can imagine someone in your workplace who publish a snapshot publicly for a few minutes it might be said what's the worst that can happen and even may think it's not an issue they need to report and even after that even even after you publish snapshot to public if you want to investigate what exactly happened when the snapshot was public there is a major lack of visibility in AWS cloud trail that D will describe later in this

presentation to you thanks so now we know what is what is RDS what is RDS snapshot but let's see how it can be exploited by attackers here you can see an illustration of attacker looking for sensitive data but really adversaries can easily uh clone publicly RDA snapshot the only thing they need to do is to use two API calls that leaves no forensic traces you won't be able to see it in the logs but they actually clone the uh the snapshot also think about it that traditional scans like scans for open ports or vulnerabilities will allow the attacker to understand uh some information about your organization but not actual access to the data with public Rd snapshot if

for example you expose it by mistake and thearer was able to scan your premises and understand that there is public Rd snapshot at this time they have actual data to the actual access to the data itself let's see a demonstration how easy how easy it to to do so the only thing you need to do is to use the described D snapshot API call to include uh the flag include public and you will get an output of all the public RDS snapshot in the specific region this is the information all of the snapshot that are there now the only thing that they would need to do is to copy one of the DB snapshot identifier as they

wish and this is a unique identifier for hdb snapshot and to pass it as the input of the next uh API call which is copy DB snapshot here you need to mention the Deb Target DB snapshot free Britney and the region and few moments later you will have a clone of this DB snapshot in your environment here you can see some information about the newly created DB snapshot in your environment you can see the engine which is my SQL you can see the master username which is Root in this case but sometimes it could be indicative of the organization that this uh snapshot is belonged to another thing that is important to know that it doesn't matter what the

owner of of the snapshot or the owner of the instance the database itself do with uh the resource now for example they delete it it doesn't affect the snapshot you now own the data you have the data in your organization so what we try to do we try to mimic what an attacker does but we try we try to do it in at scale our hypothesis for this re research was that attackers can scan H the AWS premises and clone those uh snapshot that were exposed only for a few minutes so that's what we did we buil an AWS native Technique we use AWS Lambda function step function and B of treat for the automation of the API

calls and uh we created uh this bot that runs H every hour and looks for the new newly created RDS snapshot the overall the high the high level Pro the high level goal of this H bot was to scan and clone those newly created snapshot and extract the data automatically so let's focus on those processes this is the overall process on the left hand side you can see the process that runs every hour it's an hourly scan and it's responsible to scan and clone new snapshot the on the right side you can see the process that runs every six hours and this is like the offline process and it's responsible to go through all the snapshots that we

have copied H to prepare them to prepare the to create the instance out of the snapshot and to extract the data for manual an analysis later let's talk a little bit about the first one so this is an hourly scan as we said we run an hourly scan for all the snapshot that were created in all the available regions which is most of the region except of four regions and for this we use the described DB snapshot API call then we iterate through all those regions and we clone uh the snap the newly created snapshot in the last hour since the last run to our premises to our AWS account at this stage we also maintain a

state file that include all the snapshot that we have cloned so we make sure we don't clone the same snapshot in the next run for this we use the copy DB snapshot API column this is an example of the function that we use this is actually the function that looks for the newly created uh DB snapshots the second process now at this point we have the DB snapshot the newly created DB snapshot in our AWS account so we don't need to run this process every hour for example we run it as less frequent we run it every six hours and what we do in this process is first of all we make a list of all the newly

created snapshot that we don't have databases for them then we again iterate through all the regions and the that available we get the unique Arn for each snapshot that we later pass to create the instance in order to create the DB instance we use the restore DB instance from DB snapshot it's another API call once we have the DB instance ready to to work with we reset the master password and otherwise we cannot access the data inside the the DB itself then we move to deal with the data itself this step is the analyze and extract we automatically extract the DB schema which are the table some information about the tables themselves in the in

the database uh it includes and also the DB content we take this content and the information about the the DB schema and we store it h at S3 and of course we later on delete the and when we got the the data itself we delete to cut delete the database to cut charget let's talk a little bit about how what we do with the data itself so we created automated process that helps us to highlight highlight tables that contains in high probability pii what we do in this stage as we said we extract the table name which could be indicative of what there is inside the table the table schema which is the name of The

Columns of this table that could be could be also indicative and first 10,000 rows of each table we save everything in S3 as CSV and then we use pice Park in order to slice and dice the data before we we move on and we select those candidates that we would like to manually uh analyze we do another step to reduce the number of candidates we filter for only the tables that are nonempty which has some at least some rows and we search the column we search the column names of the of those tables against list of indicative keys this is the an example of what keys we were looking for so these are pii related

Keys it includes pass password phone account IP address document secret and so on now we are going to show you some cool examples that we found in the wild cool now let's talk about our findings before we start I want to Define our research time frame our research was conducted over 30 days from middle of September 22 to middle of October 22 from now I'm I'm going to call this time frame our research month to begin I want to share with you three nice examples about public a snapshots we found the first example is a snapshot that that was exposed all the research month the DB was created in March 22 the snapshot was taken in

August 22 and this DB look looks like car agency DB this table for example looks like carrental orders table each row is a is an order and as you can see each row contains full name phone email car model date sales consultant name and the occasion for example father birthday marriage festival and so on the second example is a snapshot that was exposed for less than four hours just for a few minutes or hours what what the worst that can happen this DB looks like dating up DB the DB was created in April 16 the snapshot was taken in October 22 more than six years later this table for example looks like the user's table each Ro is a user that

contain the name password email gender birthday ethnicity link to an image user description and more and more another table in this DB for example contains the private messages now just take few seconds to imagine what could happen if this snapshot got into the wrong hands the third example I want to show you is an example with technical data this snapshot was exposed all the research month the DB was created in July 15 and the snapshot was taken in September 22 more than seven years later this DB looks like mobile phone apps company DB this table for example is the devices table each row is a device that contains the device ID that actually Mech address user ID that actually email

the device model the app ID that was installed on this device and the exess token with this data Thor can impersonate a user of course so now you can say okay you search in all of the regions for entire month and you found three nice examples it's pretty bad but it's not a phenomenon so let's talk about it let's talk about how prevalent is this issue in our research month we saw approximately 2,800 public RDS snapshots in this graph you can see how many public Rd snapshots we saw per region the most common region of course is us us East one because this is the default region but also you can see here that this phenomenon appears in all of

the regions in this graph you can see how many public Rd snapshots we saw per DB engine we saw postgress Oracle SQL server and of course the the most common DB engine we saw is MySQL and it's not surprising because of the popularity of this engine but now we can say maybe all of these public snapshots are supposed to be public so let's talk about that in our research we try to to deal with this issue and we try to think how we can clean the data in order to do that we applied two filters the first filter was we filtered out all the all the public RDS snapshots that were published by accounts that

publish a lot of Rd snapshots think about that if an account publish a snapshot for example every week it might be part of their product or part of their workflow so it might be not interesting in this filter we actually filtered out approximately 2,000 of public snapshots a lot the second filter we we did is filter out all the public snapshots with boring keyword in their name boring keyword can be test template public and so on and this filter we actually filtered out just 70 public AA snapshots not much now we had 650 public a snapshots and these snapshots is our potential to contain sensitive data from now I'm going to call these snapshots interesting

snapshots now I want to share with you some insights based on the metadata of the interest this these interesting snapshots this graph shows how many interesting snapshots we saw every day we can see a change of course but also we can see here that this phenomenon a stable phenomenon we didn't catch a unique Peak or something like that this graph shows how many snapshots were public for each number of exposed days from 1 to 30 as you can see public CDs snapshots that were exposed more than two days and less than 30 days are anomalies in the right side you can see approximately half of the interesting snapshots that were exposed all the research month month it means

maybe they supposed to be public maybe someone published them and then forgot about them in the left side of the graph you can see the other half of the snapshots of the interesting snapshots that were exposed just for one or two days just for a few hours it means maybe someone published them by mistake maybe someone just H just want to share them for a few hours with someone in all of this graph there is another case that the publisher of the of the snapshot is threat actor if the if this threat actor Tred to be discret as possible he um he published for a few hours and if not they published for a long

time this is my favorite graph every snapshot of course was taken from an RDS DB in this graph we can see how many DBS were cre created each month most of the Deb were created in September 22 or October 22 and it makes sense this is our research month but also we can see that the number of the debes that were created before of that are more than a few is more than a few why is it interesting let's think about that together let's take for example a DB that were was created in 2015 if this DB was created seven or eight years ago and this DB is still in use that seven years later it's still relevant and and an

admin took a snapshot from this DB and publish it the probability that this snapshot contain sensitive data is higher than snapshot that based on a Deb that was created few months before now let's talk about our insights based on the content of the the interesting snapshots as D described earlier from the snapshots we extracted the the data and to CSV files and we stored it to S3 bucket after we did that we built as don't describe we built a list of interesting keyword to search in column names here you can see a sample of that secret billing IP phone token and so on what we actually did is to search these keywords in the column names just in

non-empty tables just in tables with data here you can see the the matches to this keyword actually we found a lot of matches all in all we found approximately 5,800 columns with an interesting keyword in their name and with data when we reduced it to different Rd snapshots we found 171 public RDS snapshots that in high probability contain sensitive data just in one month now we can agree that this issue is a prevalent issue thank you so asiel said now we can agree that there is a risk there uh even even though you don't know uh you didn't know that this could be a risk that publishing RDS snapshot even for five or 10 minutes could expose the data out

there now we understand the risk and you might ask yourself how what what can you do as an organization to detect this issue or to mitigate this this risk so as I say you might ask yourself how can I know if someone for example copied my public snapshot sounds pretty straightforward right so you can't during our research we were surprised to understand that there are no logs about RDS snapshot if they are public for example if someone try to touch your public snapshot for example copy it or create an inst or create an instance out of it you will will not be able to detect it it's there is no log records about it in cloud trail and this

was even more surprising because we know other services in AWS for example in ec2 if you create an EBS which is uh the dis you attach to N2 and you publish a snapshot of the CBS publicly and someone for example copied this snapshot or create a dis out of it you will get a log entry about it you will know that someone from a third party account is trying to do something with your your EBS and you will be able to know if this account is related to your organization or not and if not it's probably an attacker but in this case there is there are no log records which means you are completely blind once you

uh publish it either mistakenly or not but there are some things that you can still do in order to detect some some actions around uh snapshot that went public so we divided it into two sections first the first one is current states to understand if right now you have public snapshot and the second one is historical historical check let's talk about the current state what you can do so first of all you can use the AWS API like the attacker did what you can do is to use two API call the first one is describe DB snapshot and the second one is describe DB snapshot attributes here you can see an example what you need to do is to describe all

the DB snapshot in your organization of type manual because you you want to see only the ones that you uh that are not automatic snapshots and to query for the DB snapshot identifier which is a unique ident identifier for each snapshot later on you take this list of DB snapshot identifier you input it to the next call which is describ DB snap snapshot attributes and you will get all the attributes of all the DB snapshots in your environment there you can you would be able to see if the DB snapshot is publicly available another thing that you can do is to use AWS config and the AWS config is a tool provided by AWS which give you

the current state of your uh visibility of the uh your organization in any given moment you can know what resources you have and what are the the configuration of the resources this tool comes with pre-built Rules by AWS but you can also build your own rules one of those uh pre-built rules is RDS snapshot public prohibited once you have enabled this rule to your for your organization you will see that if you have any uh non-compliant resources like in this H screenshot another thing that you can do is to use the AWS trusted trusted advisor uh this is another tool provided by AWS and it overlaps some rules with the AWS cont config uh what differs this tool from

AWS config among others is that you cannot customize the rules you have a set a set certain set of rules that you choose uh either to use or not but you cannot build your own rules under security there is a rule that looks for RDS public snapshot and this is a information about this Rule now let's talk about an historical check so it's important to know if right now you have public RDS snapshop but it's also important to know if you had in the past public RDS snapshot so what you can do in this case is to look for the modified DB snapshot attribute event in your cloud trail and to see if uh in

the in the time that you store those those logs any RDS snapshot went public to do so you need to search for this event as I said the attribute name would be restore and under values to add you will see all in the AWS documentation they specifically mention to not use the all statement under this store because it exposed your public RDS snapshot to everyone and this is a mistake that actually it happens a lot as we saw what about mitigation so this is for detection to understand if you have or had snapshot that are public what can you do in order to mitigate the risk so first of all which is quite straightforward and it's true to other

risk not just this one is to employ least privileged permission practice which means give only the permission that are needed to perform certain job either it's a role or a user give it the permission that they need to to uh to do their job and not more the second the second mitigation step is more specific to our case to encrypt your snapshot with kmsk ke if you encrypt your snapshot with kmsk key what we found is you are you won't be able to share this K this public the snapshot to the public you will be able to share this snapshot with other AWS account whether in your organization or outside of it but not to

the public and this should help you to reduce the risk and to uh prevent mistakes so to summarize let's talk about what we had today so we discussed about RDS what is the concept of RDS and RDS snapshot we emphasize the the risk and the problem we now we know that there is a risk that uh before that we did a know it's uh very important to understand that if you expose snapshot even for a few minutes it's probable that someone took this information took the snapshot and can use the information we also know that there is a major visibility Gap AWS doesn't provide you any information about public cardia snapshot if anyone from third party

account ever copy or create an instance out of them and also we show we showed you some cool examples but you might ask yourself what now what you should you should do now so we recommend you to do the following the first thing which is the easiest by tomorrow you can check do I have public snapshot you can use any of the steps that we recommended you can use the AWS config the AWS trusted advisor or use the API to understand if right now you have you have public snapshots in one week because this is a little bit more complicated to check your logs check your cloud cloud R logs you can do it through the console or

through anywhere else you have your logs to check for this modified DB snapshot attribute event and to search if you had a snapshot that when public if if the snapshot was public and you didn't know about it that it includes information that shouldn't be out there you should treat it as an incident because someone might touch it and the last thing is to apply the mitigation step that we recommended you and specifically to encrypt your snapshot as we said if you encrypt the snapshot you prevent mistake no one will be able to share it with uh with a public thank you uh this cure called QR code leads to our H blog which includes more information and of course we share

a lot of uh details so you can see everything all the recommendations H the mitigation and detection methods so it's uh very useful uh useful resource for you and uh thank [Applause] you thank you do we have any

questions

I'll I'll get the fun question first um what was the most interesting thing that you all found uh within there in terms of the relationships and the naming conventions what was like the most concerning the most interesting thing you found while kind of scavening through this data and everything else Convention of of like a table or a DB or yeah just in terms of the combination and the keys and all the data points where like oh that's really concerning why is that there so I think the most uh interesting examples were the one that we showed you for example this dating up it was like very interesting and we actually could we didn't share it of

course but we found like the real conversation between users in this dating up which is quite of shocking and of course this this shouldn't be public so this is was like the one of the most amazing results but we found other examples which are not pii for example production server of really big retails company around the world that we can see all their stocks uh all their models like a lot of information that shouldn't be exposed uh yeah so the lack of um the copy database snapshot logs and cloud trail as a feature according AWS uh so we contact I'm not sure if I said it but we did talk with the AWS and they acknowledg

that this is like a lack of visibility they they should uh um bridge at some point but not yet it was like last year and there are no no logs yet but hopefully sometime soon do we

have any of the companies H I'm not did we try no no we didn't the answer is no the answer is no we thought about it but we didn't

anything else to copy the snapshots did they need um any sort of access or like you need to be able to read and write it or you can copy with without that to copy the snapshot for like you mean by mistake if you are a user within an organization to make ah to copy a snapshot is already public yes no no permission that's the don't anything in that Organization no because it's public and and that's the whole thing that people they don't know that they even share it publicly and From This Moment onwards this is for everyone eyes everyone can see it they just need to the the thing that is like the hardest is as we said we found like 3,000 new

public RDS snapshot over this month and what we try to do to build like a machine that helps us to find the one that are interesting this was like the most uh the hard part of this uh search and for attacker they need either to build something similar or to understand by the name and Convention of the DB or the user the master user if this is a target that they would like to

Target mind anything else hope you enjoyed it thank you thank you than