Lessons Learned Implementing Meaningful Access Controls to Customer Data

Name: Lessons Learned Implementing Meaningful Access Controls to Customer Data
Uploaded: 2018-04-25
Duration: 26 min 45 s
Description: Patrick O'Doherty describes Intercom's nine-month initiative to implement meaningful access controls for sensitive customer data, achieving a 70% reduction in privileged access. The talk covers the organizational and technical barriers to access control adoption in startups, practical strategies for

BSidesSF · 201826:4598 viewsPublished 2018-04Watch on YouTube ↗

Speakers

Patrick O'Doherty

Tags

CategoryTechnical

TopicCloud IAM

ResearchCase Studies and Incidents Analysis

StyleTalk

About this talk

Patrick O'Doherty describes Intercom's nine-month initiative to implement meaningful access controls for sensitive customer data, achieving a 70% reduction in privileged access. The talk covers the organizational and technical barriers to access control adoption in startups, practical strategies for scoping and auditing data access, and how to build sustainable tooling and processes that don't bottleneck the security team.

Show original YouTube description

Patrick O'Doherty - Lessons Learned Implementing Meaningful Access Controls to Customer Data There exists an unfortunate open secret in our industry: that companies are often quite old and advanced in nature before they implement meaningful internal access controls to sensitive customer data. The reasons for this are numerous, ranging from lack of tools to lack of prioritization in the face of other engineering needs in startups. At Intercom we decided to undertake a significant body of work over a 9 month period to holistically address this issue internally resulting in an over 70% reduction in the number of people with such access and dramatically improved tooling, processes, and automation. This presentation will describe Intercom's journey with this work, the methods used, and the lessons learned which we think would be helpful for other companies.

Show transcript [en]

[Music]

righty thank you everybody I'm Patrick and today I'm going to be talking to you about this mouthful of a title I know there's a lot there but hopefully there'll be something interesting and useful for you to take back so just a little bit about me bit of background I'm a security engineer at intercom we have a small security function at intercom around 10 people supporting a company of 500 so multidisciplinary team we run all of our various internal programs security programs and then we also provide tooling and consulting resources to all the various engineering and non engineering teams that are to come and where we have offices in multiple places but we're primarily based out of the

Dublin and San Francisco offices that contain the largest engineering teams so for people who might not be familiar with intercom intercoms a communications platform the business is used to communicate with our customers you might already be familiar with intercom having used it before and the most common way that people integrate is the messenger so you can see it here in the bottom right of the screen and then open on the right-hand side and so this is the primary means by which our customers who are themselves businesses take our messenger put it in their website and then use it as a communications medium to go back and forth with our customers either alive in Bank queries or kind of direct targeted

ad by messaging so they say at the beginning of every talk you should have some sort of agenda or something and I've I thought that was a little bit stodgy so I decided I would do it in the form of a sick baby headline so you'll never guess had this startup produced access to customer data by over 70% who I'm glad I'm in security and not a copywriter and the premise for this talk is after you as we began this work about a year ago and it became apparent to us asking around from the security teams that there was a kind of an open secret startups which is that the industry waits many many years until companies

are quite old and before they create any sort of meaningful access controls but either employees deal with customer data and this is kind of terrifying especially because more and more we're dealing with an industry that has a lot of third-party SAS vendors for like critical components and so you know that the dependencies here of like your CI dependency your CI provider relies on an open source you know database hosted platform and they don't have internal access controls on their data and so then your CI provider gets popped and like there's a lot of chain of complicated dependencies here that makes this risky and so I started asking around friends why they thought this was the case and many of

them reported back the same categories of issues social and technical as with every security problem and the first was that access controls are something that's very very hard to advocate for internally especially when you're really young and the business needs to create this external show of progress and so you know ACLs don't move the needle for customers especially not small ones and so it's not prioritized in this like get stuff at the door ship a new feature every two weeks or whatever it is you know race against competitors and then also there's just like inherent problems at doing this at small companies if anybody's ever worked with like five ten twenty forty people companies everybody's a generalist everybody's

doing everything so in an environment where you even have roles and you're you know divvying the night to people over any substantial period of time six nine months you're just going to end up back at a situation for everybody has everything because they needed it to do to do their job and so roles are effectively meaningless in an organization for everybody does everything and then on the technical side there was also some pretty strong reasons why people find it was hard to do this and a there's just like a not a great set of default tooling that people can look to when they want to tackle this problem and so in the common development framework so people are

using there's just no standard pattern and I'm a big fan of convention or a configuration to just make it really easy to make a solid decision and this is something where there isn't great tooling so Django has some of this with the admin interface but that's just really a very coarse you know god mode you get read/write access to all of your global data sets and what we're really talking about here is the ability to give scoped access to some subsections of the data and then for instances where you're doing with tools that you purchased third-party tools oftentimes they reserve their single sign-on and provisioning functionality for their more enterprise-e customers I read the association of a single

sign-on and Enterprise he I think is an unfortunate thing in our industry because it it gives basically it just sets up this financial pay wall for security features which I think is unfortunate and so they can't avail these because there's you know financial incentive for them to not do so because it might cost up to it like twice as much per head just to get this feature and so not that I think that anybody in the audience needs to have like this drilled into the most of why it's important but I just want to give a bit of context as to what we were thinking when we started this and you know the reality of technology is that

no matter how much work we do to create great mechanisms and policies and auditing there's always going to be a user and users can err and so we should make resilient systems and then also we'd been we'd been burned by this in the past and so these are three incidents that I can call to mind where the lack of internal access controls significantly affected the scope of the breach and so it was just it was something that we were very cognizant of having been on the wrist on the receiving end of this so you know you can't really do any large substantial body of work without a plan or at least if you do you're flying by the seat of

your pants so one of the early things that you'll do if you want to tackle this is create a plan and usually at the start of most security teams somebody sits down and writes out a set of policies and pretty pretty minimal set of policies but one of them and they're important just to set context of your work and communicate it to the rest of the company I'm one of these will probably be a data classification standard and it's a very very simple document it's just a list of all of the types of information that you have and you process and then kind of aspirational goals and levels of protection and standards about how you

want that data to be handled it's important that the labels that you choose here are kind of that the internal taxonomy that you use is very clear and well communicated to all the people in your organization for two reasons a you want people who are dealing with data to be able to properly classify it and understand what they need to be doing with it and then secondly as a result of that if they properly classify it then your security team can direct their attention to the resources with greatest need and so if you do this you'll end up probably something with something like this and so here on the left-hand side we have our set of classifications and so from

mostly sensitive and the right-hand side you the usual culprits of where you might find this data and it's around these that we're going to try and create some concept of roles and figure out how to govern access to all of this and so in our case we have two that we really really care about you might not have two you might only have one here but critical and sensitive are four are the two most like sensitive pieces of data that we have critical is and this is confusing because intercom is a business that businesses purchase software from and services so critical is all of the data about that artists emerge businesses bring into intercom or

generate through their use of the product so it's all of their user profiles it's all the conversations that they have it's all of the data that they view and manipulate within intercom and then sensitive is all of that except it's ours so it's all the conversations that we have with our customers all the information that we know about them and we split these out very carefully because we are very careful about what we do with the critical data like there's a one of the reasons that we undertook this work was to have a really really good trust story that we could give to our customers and so we we don't just lump ourselves in with customers we

treat them even better than we treat ourselves and then private is pretty much everything else in our daily work so like slack github Google Docs all that sort of stuff and then public is obviously everything that we publish all of our podcasts our books our blog posts and then on the right-hand side we've got sources so application and analytics data stores were the big big you know pools of this critical data and these are the usual things you know sequel databases caches message queues s3 buckets all of these places that are kind of specifically designed to store and process a huge amount of data and so from this we you know kind of examined all of the roles and people working

across country common basically distilled it down into these two major groups there are other people in intercom who need access to some amount of data but these are the two that we wanted to focus on because they were by far the biggest so the first is all the engineering team and so at the time that we started this work and everybody who joined engineering team at intercom was provisioned at base set of accesses and one of them was production infrastructure access for a bunch of reasons I'll get into later and the second is our customer support and sales teams so the customer support sales teams use an internal administrative interface that we build and they use it to you

know do a myriad of things like debug customer issues providing support resolve any problems that people might be writing in about and the tools are developed alongside our customer facing product and but they're only accessible to us through a specific separate interface with a separate authentication mechanism usual things you know typical for startups have like admin dot or slash admin somewhere in their application and given the combined size of these two teams these are actually this is the largest body of people within each column that was using data and regularly so when the first things I want to call it is that at the start of this work we decided it was very important to be able to quantify it and

measure it and so we sat about creating a nice day to dog dashboard which unfortunately because it has like a bunch of literal numbers that I don't want to disclose I can't show it but basically that which gets measured gets done and it was really really useful for us to be able to quantify daily and weekly work so we work in six weeks cycles with weekly commits on how that work progresses over time within each column and so the idea was that if we if we had this number that we we were trying to reduce we would get a little bit more creative taking weekly commits we would say like okay I'm just going to

reduce this number by 20% I don't care how I don't have a plan I'm just gonna do it and so it was easier to think of this problem as you know 50 small steps as opposed to trying to consider it in one large you know sweeping change and that was really really important to kind of iterate on this over time because it's otherwise really really daunting to take on a very large seemingly intractable task and so based on this we decided to look at the engineering teams and we had a suspicion that the access numbers here were pretty soft and that there was room for improvement and sure enough there was so we have great data

on this because we have a certificate authority for SH certificates that enumerated everybody who had access and so we were very quickly able to baseline everybody that need it and we also stopped automatic provisioning so there just to go into a little bit of detail as to why this access exists existed some of it is do detect that so this access this broad access was needed to trigger things like migrations were started long running data processing tasks in the background and so by baselining we immediately got 20% reduction it was great you know 20% of people that we don't have to worry about anymore and the big takeaway is that for us at least was that we were

reading our understanding of what somebody needed to get their job done and it turned out that over about a year and a half that this wasn't actually a critical requirement of what you would need to get your work done and it turns out that people are pretty happy to ask their colleagues to do something like this for them and because it D risks the situation for them if you're like onboarding and you've just been there two weeks and you need to run a migration or set up some long-running task it's much easier for you to ask your buddy who's onboarding you like hey could you that do that for me rather than having to go through this big

process so like I said the next largest group of users for customer support and sales and like I said they're using this internal administrative dashboard very similar to I'm sure what you have all of you in your various companies and there's a whole like bunch of functionality in here some very very commonly used tools and then also some tools that are specifically used by product teams to debug their service or their feature and nobody else really uses them and so going into this we had a couple of rules of engagement because this was a very very high traffic tool internally this is something that people need to use like every hour every day to get their job done and so we wanted to

make sure that we weren't just kind of swooping in and sprinkling some access controls on it and completely destroying everybody's productivity and because that's not how you make friends and we also were cognizant of the fact that we're only ten people we've got a company 500 we're not in all the time zone so we operate in and so we really really didn't want the security team to be in this kind of central bottleneck position and then finally we wanted to make sure that grants were pretty granular and that they would expire by default so that we wouldn't end up in this position for it just over six nine months people would just get everything back so the way we went about this was

and we have a huge advantage here because we're deploying into our own software we've Liberty to change whatever we want and building whatever arbitrary restrictions so we were able to introduce some pretty robust access controls and auditing and other bells and whistles into kind of the base controller that's used for all these various pages and then we have an internal pattern of very a heavy use of static and elseís checks in our code base to ensure what we call that coding practices so one of them would be if you make a new admin tool extension and there'll be a static analysis check to make sure that it inherits all this behavior and so it allows us to develop pretty confidently

going forward that even if people join the team and they make new tools that they won't accidentally introduce some sort of data leakage so I just want to take a detour into a little bit of a side quest which is messy data and so before you can grant access for deny access to things you have to like first of all separate out what it is you want to protect and what it is you don't want to protect and like I said there was over a hundred I think of 120 130 different resources when the within this tool some of them displayed critical information some of them didn't some of them are used exclusively by one team or

even one person some of them are used by everybody every day and then on top of that this a fact that a lot of the data that we're processing a lot of the data that models that we that we build have both like critical and non-critical data co-located together so if you think about an image condit like we have a conversation model it it holds all of the you know channel and content information about how a conversation was conducted who was in it all of that and there's clearly some like metadata components there that are very important to us like when an email notification was sent about a particular comments that was made in a conversation and then

there's the critical stuff which is you know the literal user identity and content of the message that they sent and so we wanted to be very strict that we were separating these things ash and so we undertook kind of a boil the ocean classification where we decided we would really go through all these tools and if there was any kind of unnecessary critical data being printed out we were just redacted and so we made a bunch of changes here some of them were minor so changing away from using PII based identifiers into just using firing keys everywhere others for a larger so for some of the more commonly used tools that dealt with the more complex models

like the conversations model we decided to just completely split them in two critical and non-critical pages so everybody would be able to get access to the non-critical stuff that they would need to debug like why isn't an email sending and whereas the content would be in a completely separate page that barely anybody had access to and so this was a non-trivial amount of work but it was something that we only had to do once I had result of massively limiting the things that we have to protect and so you know just allowed us to focus a lot more and then again we have another static analysis pattern that makes sure that any new additions get properly

classified so we can build upon these understandings and build kind of like generous access to all the non-critical stuff because there's nothing to protect without accidentally shooting ourselves in the foot so from this three-week period we built up kind of an understanding of how people reason the tools and it confirmed to us again that like the production infrastructure access this number was also very soft and that there was roughly three types of users there was like the customer support and sales team that were just using everything all the time everything then there was the product teams and they were using their like specific you know check this feature tool and then there was everybody else who was like sporadically

using one or two pages and so we decided to just kind of distill this all into and creating like base grants for people so we we set at three different categories or three different classes there was all which is just like blanket access to everything kind of a super user all over that critical which is all the pages that we don't care about so you can view any amount of not of non critical data in each column as an employee because there's nothing that you're going to lean from it and then the ability to just like individually grant access to pages so this is useful for people that literally only need to use one thing and don't even need access

to everything else and so based on that we decided to launch kind of in a fail open mode so we evaluated a four to page for two weeks we discussed the customer support org by giving them a blanket access to start because we wanted to come back and work with them specifically later and because they were by far the heaviest consumers of these tools and even without the customer support team there was still another 20% reduction this was like a huge baseline just cutting away access that people didn't need and then in this two-week period based on like the kind of profiles that we built by people and the grants had we given them we observed I

think only two or three failures and so by the time that we went live we were extremely confident that we weren't going to massively like tanked somebody's productivity because that really was a nightmare scenario for us we really don't want to get in the way of the teams that were working with and so this you know this measured progress allowed us to be very confident in shipping these changes but we also made sure that you know we're not perfect we knew that we were going to miss things so we decided to build in a nice little break glass mechanism that would allow these teams to kind of self-regulate and allow people within the teams to gain temporary access

grants that's pretty simple anytime you view a page that you're not allowed to view you don't have permission already you're presented a form you feel that I've you're give a reason like what are you doing shows up in slack turns out that between the security team and all the customer support managers around the world we basically have followed the Sun coverage of this room you see the inline description this URL brings you to a kind of a profile view of all the permissions that somebody already has it's active and so it's easy for people who are making the grants to kind of understand whether this is appropriate or not and make a judgment call and so

this is this was really really useful you know from this we had already gotten manager position where people didn't have access to things that they didn't need and if they did they could get an access for a day or a week and going back to how we plan work you know if there was somebody who was working on a feature that needed them to require or get access to sensitive or critical data for a week you know we could do that comfort or safe in the knowledge that they weren't going to kind of keep it after the period that they explicitly required it for and so like I said we initially discussed immerse aport because we wanted to come back to work

with them separately because again they were the highest volume user and so we had this idea so we had this initial as we started out we had this tool that was used very very sparing the internally called impersonation I'm sure different companies have different things like this but basically it's a tool that allows an employee to view somebody's account through their eyes basically and this is useful for debugging things where our internal tools fail and we just can't figure out what the discrepancy is between what the customer is seeing and what we expect them to see and it's it's really really rarely used the number of people who have access to it internally is very

very low it's very very tightly scoped and kept kept closed and the way it works again advantage in working in our own is that you basically ask a customer for permission in the context of the conversation and the issue you're dealing with and once you have that permission you can add a tag dislike specific permission permission to impersonate tag to the user that just gave you consent and then using this conversation URL you can individually address every conversation in intercom with the URL and the reason you could use this as collateral to make another access grabs internally to say like hey this user Patrick is having an issue I need to log into their account to view

like why they're hitting this particular exception in an area the product that I can't reproduce and then somebody can grant you that access and you get a temporary ability to log into their account and again like with all the other access grants we collect a reason here we summarized these on a quarterly if not a six-month basis and then we use it as a road map input to make sure that we're building better and better tooling to deprecate the need for these wide accesses and so there's a neat trick in here which is and that at least an intercom we have to kind of got objects that can scope the data that is associated with them and so we decided

that we would piggyback on this authentication mechanism and create a variant of the internal tools the sensitive pages that instead of showing you just like all of the data within intercom it would only show you the data that was that belonged to the user that just gave you permission so once you escalated this access you got access to the internal tools and you could see like just the data that belongs to Patrick just the data that belongs to his companies when an intercom and the way that this is done is just basically there's a default method in each of these controllers that can swap out the scope and like the god level object scope and this makes sure that even if

you enter in like a an object ID for something that doesn't belong to the user you can't actually directly access it because it'll only find it through the view of what this user or what this app is able to view and so this is really really nice for us because people get to see more powerful administration like debugging tools but only get access to the very very limited amount of data they need to do their job and so how it works internally is if you're on customer support you get access to all of the non sensitive pages because there's nothing in there that we're worried about and if you're a customer support manager and you've gone through

an onboarding port period of at least four weeks you get all the above plus the ability to use this feature and selectively give it to other people and so this allowed us to create tools that were scoped they were they were tightly scoped they had a per user and pro resource grants they expired but also created like this self-serve mechanism for the CS teams to be able to work on this themselves and we this is all automatically provisioned based on HR system data that we take a snapshot of every day and so when people either move off the team we're moving into some other role this access just gets cleared ID so people don't keep access across

the various jobs that they've had and so we have some future work plan for this to make it even better and so at the moment the act of giving consent is in line within the conversation and we review these like the customer support team runs their own kind of Quality Assurance audits of conversations on a weekly basis I think but it's it's a little bit it's not as tight as you want it to be so what we want to do is push the literal act of granting permission on to the customers side so that we can't do anything without their action so this is a work in progress of what it might look like at the moment but

basically a customer will get the ability to selectively turn this on and off and we won't be able to do anything without the customer having to undertaking this action and they're fully informed they get to see it they get to revoke it if they think the need is done and so we want to be able to put customers in the position of controlling their own data like again one of the big reasons that we wanted to do this was to have a really good trust story for our customers to be able to understand what was going to happen when they started using intercom before their data would be and now they they are in control so

this is something that we hope to launch the next couple weeks so the end results over 70% of people no longer have access to employee data or employees no longer have access to critical data internally which is a huge you know relief and weight off our shoulders and we have self-service tools that allow the teams that really really need to to kind of operate their own operate their own systems you know they they change the way that they work based on the volume of work that they need to do and so there's no there was no way that we were going to be able to come in from afar and be like this is how you

need to work the security team has decided in all of its infinite wisdom that you can only do these things and that generally doesn't get answered well and then we have this customer based consent mechanism that allows customers to be in control of this so it's no longer something that they have to trust us on they can just literally use the tools themselves and so the lessons that I kind of got out of this and I'd love to - for you to guys to bring home or to think about a little bit more is that your access numbers today are totally soft they're big they're daunting but there's just so much low-hanging fruit in there that you'd be amazed of what

you could do by just giving it a week or two worth of consideration collecting intention at the point of authorization is super useful I cannot stress the benefit of having this as roadmap inputs for your tools so the the impersonation features that we have are very very broad and actually a little bit cumbersome and people don't want to use them because the debugging tools are better so if we can push people into the debugging tools by making them even better we get to deprecate this means of access entirely and then also pairing with your high value or high-volume internal customers pays off wonders and we were really really fortunate in that the customer support team leaned into

this process with us and the results that we got were far far better than anything that we would have been able to do in our own and so that's all I have today thanks for listening if people have any questions I'll be here and you can also reach out to me on Twitter or you can spam my email that's it [Applause]

Lessons Learned Implementing Meaningful Access Controls to Customer Data

Related talks