
[Music]
hello a little bit nervous is my first time presenting at a piece science I'm really excited to be here so so this whole talk is about a side project that I've been running for the last you know six months or so so I have a more of a traditional background when it comes to you know security operations in Incident Response as an analyst and I've moved into a security engineering role at a company last year that's very cloud focused so it's been an area that I've been focusing on improving internally and it's something that's really interested in me quite a bit so that kind of led me to be interested in s3 and Amazon the problem I don't think
anyone in this this theater really has any illusions as to the fact that 2017 was a terrible year for people that host s3 buckets and particularly you know deep root analytics that the company that lost the entire you know California voter rolls as well as most recently a FedEx subsidiary partner of Walmart it felt like 2017 was pretty much like the year of the s3 incident though I think the general public would probably remember it is the year that equifax happened you know but we'll see and there's a real problem because especially when you look at say the deeper analytics breached where they exposed all those California you know data records and all there's the
profiles that they had made on the California voters the thing about it was it wasn't technically illegal for them to have that data stored in a public s3 bucket it's only a legal determine how you use it so it's illegal to use that data for certain purposes you know but that companies still you know up and up and running I'm sure they're being sued but technically they didn't actually do even do anything illegal all the really damage was a reputation so AWS and Amazon in particular are very aware that this looks really bad so they have and I got to commend them on it they've done a amazing job at providing resources I'm making changes to hopefully get in front
of this issue because the funny thing about it is AWS has by far and away the highest mark market share of almost 40% of the cloud market and according to some some security researchers up to 7% of all buckets are public so let that let that sink in for a second and they've gone to great lengths to publish blog posts articles white papers blog you know podcasts they've sent automated emails to account owners letting them know they might want to take a look at that they've rolled out a new tool called Macy which helps you accidentally or prevent you from accidentally exposing data like PCI or credit card numbers whatnot they've made a number of
UI UX changes which you can actually see up on the screen so they've now made big orange tags in their web console so you you know can be pretty sure that you're doing something you're not supposed to be doing and they've always defaulted to private let's not forget that so all these public buckets are ones that were set that way specifically and even most recently just last month or you know more than that now but on February 20th they made the s3 permission checker tool public for all users where previously it was only available to enterprise users so they're truly they're really trying to make you know make make the tools available to help their users and
they've also done a number of things in the background that you may not know about so for example they also communicate directly with people that own s3 buckets that maybe they might not want to expose so if you remember the back to that previous slide before this one you saw that I had a screenshot so it's showing that I had 92 public buckets at that time when I took the screenshot and I actually got this email about four months into the project and someone you know I think my current working theory as they reported me to them and they couldn't really terminate the buckets because I guess I hadn't violated their Terms of Service but you
know they actually emailed me about these two in particular and letting me know that I may have exposed my s3 buckets to a larger audience than intended thank you that was kind of the idea but you know we'll go with it so let's talk real briefly and real quickly because you know I don't have a ton of time about how you actually enumerate s3 buckets as well as other types of cloud storage it's basically a case of trial and error for the most part you come up with likely names for buckets and then you make an HTTP request and you see what they say back and you know fairly standard web enumeration tools work you know I've
seen all kinds of standard off-the-shelf tools thrown against the honeypot has three buckets that I was running but generically you're gonna take a word list and start with like a word root so you might start with you know if I wanted to host buckets for myself I would say Cameron and then I would go through a word list and I would just append suffixes like the ones you see in front of you and see if they exist or not and you know standard word lists that are available you know on github work just fine and you know specifically you just look to see what response codes you get if it's a 200 it exists if it's
a 404 it doesn't and if it's a 403 it exists but you can't access it so there's not a lot of Occupation here but pretty much you know right away if you found one or not anyone want to take a guess or want to contribute so Amazon it's not the only one in the Google or sorry in the file storage market there's also you know Azure from Microsoft and you know Google Cloud are also major players and looking to gain market share as quickly as they can anybody you know want to put a hand up saying that you know Google is maybe better at protecting their buckets than Amazon anybody how about Microsoft Microsoft's known for security right
well funny thing about that these are the API endpoints that you can actually enumerate to determine whether an s3 bucket or any sort of other similar file storage exists and Microsoft actually took a somewhat different approach and it's actually significantly difficult to enumerate because they incorporate the account name not only do you have a soar account but you also have an Associated storage account in Azure from what I understand so it's actually even more difficult actually to you have to enumerate two different things that you may or may not know I'm not even entirely sure if it's possible good research project if anyone's interested so let's get into the good stuff as someone who was kind of
learning a lot about AWS last year specifically s3 it was an area of interest because of all the the public breaches and articles that were being posted I wanted to know who is scanning who was doing this when are they doing it how are they doing it what kind of volume would I see and I had the pretty simple idea of just throwing some up there and seeing who just tried to access them so you know I decided I was going to create a bunch of buckets using public word lists I was gonna populate them with lures and then I was just gonna set up access logging and wait and see what happened ended up not doing a
lot with that number two bullet ended up not populating these with lures like I had originally intended cuz I wanted to start with a little bit more passive approach and then maybe I would be more invasive moving forward but you know that's basically the general idea and this did not bad picture but anyway the basic idea was set up a bunch of public buckets and have them feeding their access logs into a private bucket and then have some sort of system post-processing and using those logs in some way pushing to a Kinesis stream in my case so and I did this basically as an internal hackathon project I kind of had the idea to do it and it came along
and I was like you know whatever I'll sign up for that and do that in a day so I created a pretty simple Python boto tool that creates the buckets generates the name stop the logging and sets up the ACLs as required and in September of 2017 I created about a hundred buckets and they were themed on sixteen different entities which you know is kind of important so you know as someone who was working for octa at the time I kind of decided that I wanted to make the buckets appear as if they belong to peers or similar companies to us so they were mostly themed on tech companies in the Bay Area that shall not be named but
I decided I was also going to use a word list from the most popular google search for s3 bucket enumeration so at the very least I figured I was going to catch people the new head of Google so that project was actually called sandcastle and it's since been removed or made private but I ended up using that word list so you know what happened well within ten hours someone had hit one of the buckets that I just created they were not referenced anywhere they were not used anywhere they had really nothing in them but I saw access requests pretty much almost immediately by people so they came from pretty much primarily the United States fix my I
think I have so I think this a little better so you know primarily the United States but over the six-month period I saw about 550 unique IP addresses make requests to the honeypot s3 buckets fifty percent from the United States followed by the EU and as well as India or the main ones and this kind of surprised me because when it came to people numerating and scanning you know web asset web assets from my day job I was a little bit more so associated that kind of activity a little bit more with Southeast Asia you know Russia China South America stuff like that but this was mainly you know from the United States which I thought was kind of
interesting who who was doing this this was a big question that I had and I was kind of surprised by what happened Oh almost 40% of all the requests that were sent to my honeypot s3 buckets had an Associated I am logging so that included an IM username and plaintext as well as an account ID which really kind of threw me at first because if you're going to be scanning these buckets of that probably don't belong to you you're logging into them they're not you know not invited they weren't in scope for any sort of bug bounties they weren't in use anywhere but the reason why I think my theory is that they WS heavily rate
limits you if you don't include iam creds so if you're trying to do any sort of enumeration at scale that rate limit would be a serious issue so that's I think why it was also set up like that and I have another another theory and that's because and I'll follow up with this a little bit later on in the presentation but most of the enumeration was done using legitimate AWS credentials like the SDKs and the command-line interface which will automatically load profiles from the system so if you are on a system that is configured to have I am creds or a profile it'll load it and send it along with your requests and maybe there was
some unintentional exposure there but I saw a total of 34 unique IM users and I did look to see if any of these users were distributing their scanning across multiple IPS and I didn't see that at all in fact pretty much all of the IM users that I saw thought were associated with a single IP address over a six month period but that's also kind of an interesting point is because so many of these were authenticated you could easily track a particular actor across multiple hosts if they were attempting to distribute it and I usually saw these users fall into one of two groups usually there was like some sort of Twitter github hacker one alias or it
was clearly a tool so I saw a lot of automated tools you know I wonder what s3 checker is for her an IM user you know people are programmatically you know going after buckets and i'll talk a little bit more about that later on so what kind of tempo did we see how often were my buckets being hit so I'm gonna immediately discount that top chart and I filtered it out and got the bottom chart that was this one guy that just decided to slam one of my buckets with an automated tool totally skewing all my results which was something I had to take an account pretty much the entire time but on average I saw on I'm sorry
on average 200 requests to my honeypot buckets a daily with a median of 40 to account for spikes so you know and on average 13 of the honeypot buckets were touched every day so that's pretty high number I think 13 out of a hundred almost 10% were found on average every day so people that were going after these buckets were pretty aggressive and they're also pretty good at it I'll talk a little bit more about the two spikes on the bottom one as a case study later on so what were they using from a scripting perspective I mostly saw go lang Ruby and Python and it could also be like I said I have to take into
account spikes and other than traffic but I think the goal and I might actually have just been a single person who is very prolific or you know is just making a lot of requests and then as I mentioned earlier I saw a lot of the AWS command-line tools as well as SDKs boto SDKs as well as you know curl s3 browser whatnot but another thing that surprised me was people were hitting these with pretty much standard tools like Berk collaborator and a wasp doorbuster I don't know why mainly I think because they could I mean you can point those tools at a generic user sorry URL that s3 offers by the web but the thing that
I was mainly interested in seeing were the custom s3 bucket enumeration tools so these are the more sophisticated guys who are deploying their own tools that will aggressively recon these buckets and go a little bit beyond what you know your off-the-shelf vulnerability scanners are gonna go for I'll talk briefly about targeting and I think this is a pretty interesting and sobering slide but of the IPS as I saw almost a third targeted a bucket belonging to more than one entity entity meaning company a Company B so these guys were you know targeting multiple organizations pretty regularly and over half were able to access or locate more than one s3 bucket honeypot so I think this could be the fact that they were
using the word list approach to enumerate and probably they just happen to find a lot of mine because I was using similar word lists and then how many of those guys actually downloaded content from the bucket that they didn't own or didn't belong to or it didn't it wasn't a scope for anything or and use for really anything well third of them downloaded content once they found the bucket so the actors that are after s3 buckets are targeting number of entities and they're very good at hunting down buckets what that exists and are easily enumerated will by a word list and you know a third of them are actually accessing the data which actually seems a bit low to me because
if you go through the trouble of hunting it down you know don't you want to know what's at least what's in it so as I mentioned before the thing that kind of interests me the most were the custom tools that people were rolling to actually figure out some more intensive reconnaissance of what the buckets could actually be used for and I think that these guys were interesting because they went beyond just bucket exists and it's got files in it they were looking at things like can I change the logging policy can I change the bucket policy can I modify the ACLs of these buckets actions that I would more associate with takeover than recon and following along
with that this was actually a question that came from a previous presentation of this topic someone after the after the talk asked me where all the attackers kind of the same where there's some that were more effective or dangerous than others could you rank them by severity and I thought that was a really cool idea so I tried to do that so I basically said that if you hit large numbers of s3 endpoints API endpoints that is that would indicate a higher degree of sophistication so you know standard scanners are mostly just gonna look at the objects and the buckets themselves the more sophisticated actors are going to start looking at things like ACLs logging
settings if they're tagged bucket policies etc and I found that 13% of the IPS that I saw interact with these honeypot buckets fell into this category so over almost 10 percent of them did a much more intensive interrogation of the buckets when they found them and another question that you know I would then ask myself is okay these guys are clearly more sophisticated are they better at hunting down these buckets and I found that was also true so on average these 13% touch double the number of honeypot buckets then the rest of the people that were doing it so I would say yeah they were more sophisticated they were using custom tools and they were much better
at finding buckets and they were finding a lot more of them so I thought that was pretty interesting so here's that example that I was telling you about before so the you know were the the one of the volume chart where I filtered out the guy on top and the one at the bottom there were still two clearly visible spikes so looking at the tooling that the attackers are you know researchers were using you can see a little bit of what was going on here so this was I think one of the more sophisticated attackers and he managed to hunt down three buckets belonging to two different organizations over about a two-week period that he was active and I
think one of the things that kind of sets him apart is not only did he hit the buckets with standard you know recon tools you can see the looking at you know seeing four exposed Etsy password and running all kinds of weird stuff against it but you can also see that he was running the boat o3 library and I think also of interest here is the second row anybody know what phantom Jas is yeah so phantom Jas is a tool that allows you to take screenshots of web pages so not only was he scanning down the bucket he was looking at all the contents of that bucket he was taking screenshots of everything that he was
doing he was interrogating the buckets to determine if he could push policies or change logging status and for good measure he was just scanning it down with I think what I believe was the arachnoid scanner but I'm not entirely sure about that just a guess and you can also see the volume count very consistent so if you look at sorry actually I'll skip one slide ahead for just to highlight the consistency of his methodology so on February 7th he managed to find two of these buckets and issued about four hundred requests exactly to each one they were completely identical and then he came back two weeks later and actually it's slightly modified his methodology a bit maybe you
know he decided to go to v2 of whatever tool he was using maybe he changed up a little bit but he also had his footprint he was only sending 200 requests each of the buckets and he may have changed his word list or his discovery method all because he found an additional bucket when he came back two weeks later and just to highlight what I mean by like a more intensive interrogation of the honeypot buckets you know he was going way beyond just oh there's objects in it and I can read it you know look at some of the things he was trying from put life cycles changing logging status adding notifications seeing if he could
upload seeing if he could add tags even seeing if he could host a website in that s3 bucket so this is I think what I was associating with my more sophisticated attackers and again you can see the consistency in the volume so clearly an automated tool and by the way this guy made really no attempts to hide himself he's a British researcher bug bounty hunter actually and he has very interesting github he has pretty much no no regard for hiding what he's doing in fact he actually uploads to a public github repo his phone scans of all the bug bounties that he's in so if you've ever wanted to know if someone was vulnerable to something in the past and
maybe they patched it you can look at his historical scans which is kind of interesting some of the key takeaways here I think it's very easy to pull it's very easy to hunt down these buckets there's no real protections in place they're either public or they're not or they don't exist again something that surprised me was a heavy use of legitimate AWS tools which make it pretty trivial to you know figure out what you can do and the other thing is I'm gonna go out and make a theory that most of the people were researchers and I say this because of the locations primarily in the United States and you a lot of them almost half
of them were passing I am users and the count IDs with their requests they weren't really making much of an attempt to mask themselves as well as the volume a lot of it was very targeted I'm like I said on a median day you know there were 40 requests to about 13 booked it's so pretty targeted in the most most case and again large numbers of the actors almost you know half of them were able to find more than one bucket and almost I think a third of them were able to find buckets belonging to more than one organization so these guys are doing it in a pretty organized fashion so I guess I'll kind
of so I'm a blue team member right I'm a security engineer is this something that I think was valuable what I recommend that other security engineers do it well I think yes because it's so easy if you did literally just turn on a bucket set up access logging and wait and see what happens and all you got to do is post a lambda that processes the data and I'm actually gonna call out a canary token 7:00 work anybody heard of those guys they're really good and should check them out and they were within the last six months added s3 Canaries as an option for their free virtual canary service or canary token service they didn't have that when
I started my project but if you were looking to deploy one it's pretty trivial you just give them some creds and click go no and if you didn't want to give them credit you can just make it and as far as the value of it to a blue team member I mean companies throw a lot of money at thread until you know you could drop six figures on a threat feed right but this stuff is giving you real-time you know threat Intel on actors that are actively targeting your web assets and cloud assets that may belong to you or you may be using I mean that's I think pretty slam dunk especially given the cost basically free
and another thing that's great is access logs if you've ever looked at them are basically like Apache web server logs so they have you know IP addresses user agents what they tried to do as well as enriched with the I am information and account IDs so you can very easily pivot into things like cloud trail app logs as well as your IDs systems and I'm also gonna plug there was a talk next door actually before mine about leveraging your bug bounties are data driven bug bounties and one of the other things that I think is very interesting about this is we operate a public bug bounty where we give all of our researchers their own octa instance to play with and
it has interesting I think source of information if they're hitting your honeypot s3 buckets steamed on your organs and they're also interacting with your so your dev tenants where your bug bounty researchers operate you can actually get some interesting insight into how they operate what they go after first what kind of tools they're using other than you know the s3 bucket stuff what other targets are they going after and I think you know like I said it's cheap the IOC's are readily applicable to your organization and they're very easy to operationalize and feed into other systems I think at that point I'll open the questions any questions so that no I didn't but that's actually there just to
kind of add on to this so this was like I said a hackathon project that I put together in about six hours on like a Tuesday or whatever and I didn't put a lot of effort into it at the beginning and it was only about a month or two after I had done this that I started to look at the data and I was like I know this is kind of interesting it might be a good research project I think as I move towards the next version of these honey pots I want to make them a little bit more interactive like I said I didn't really go to great lengths to disguise them or populate them with like
interesting looking lures I think that if I were to do this again and I plan on actually tearing them down I would like to give the researchers the ability to upload data so like going back to our friend the British researcher the bug bounty guy let's go back a couple slides so this guy tried to put a website onto my s3 bucket and I'm really curious what that website was so not only what I like to add in the next generation the ability for them to upload objects I'd also like to have them they have the ability to modify bucket policy and put websites and look at the lifecycle data and because I'm kind of curious what they would have
done if it would have been open so I have kind of an initial rough idea that I want to have kind of a baseline kind of known good bucket kind of setting and then when I see people you know put a website or put an ACL or put a bucket policy save it off somewhere and then restore it to that known good state you know after like 30 seconds or something so that would allow me to get those really cool indicators and get a little bit better idea of what they would have done after that but in this initial version that was pretty passive and only allowed read anybody else so the researcher that we mentioned actually
has a number of github projects that are powered by that the transparency insert stream in particular so this guy you know I looked into a lot of what he was doing just cuz I happened to find him because he was really noisy on those two occasions but he basically ingests the search stream using the open source transparency data on SSL certs and he hunts through those and looks for any that are associated with people that are known to have open bug bounties and then if he finds them he scans them now I wasn't able to find the exact tool that he was using against these I think he still has that as private but several of
his other tools fed off of the transparency project like for example if he happened to find one that was associated with a domain that was associated with someone that a bug bounty he would immediately do aquatune scan which is a really cool tool I don't know if you guys have ever I hadn't heard about it until I saw this guy use it but it's a very good tool for scanning down subdomains as well as identifying domain takeovers so he would immediately scan for domain takeovers if that was the case and part of that domain takeover is also looking for s3 and CloudFront instances that are open for basically being taken for example if you host your website on you know besides s3
Amazon AWS and you delete that bucket but you keep the DNS register DNS record someone else can go along and register that and then they have basically control over your domain so I think I'm out of time but thank you thank you kim de [Applause]