
talk of the day we've got ben hertzberg who's going to be joining us and giving a nice little discussion on securing your snowflake cloud warehouse so by all means thank you ben for joining us and being our last talk of the day i'll pass it over to you okay okay doing a besides talk from home it's my first time not my first b-side but my first besides from home this is the photo from my last trip i think or one of my last trips before covered and we're going to talk about a nicer thing about the snowflake security today a little bit about myself my name is ben herzberg i'm a security researcher i'm working as
a chief scientist for satori we provide a protection security and compliancy compliance for data warehouses and data lakes and before that i was leading research for inverva and i had three kids hopefully you will not hear them throughout this talk and i'm from israel where it's not 4 20 it's 11 20. though some say it's always 4 20. here are my contact details in case anyone wants to ask anything and of course feel free to ask anything in the discord channel we have a quite a packed agenda so i have to say that some of this will be quick and i may fast forward through some of the commands that i'm running etc but i can
of course send all the code the triangle the queries all the samples uh so i'm hoping that if you miss anything we can fix that so what are we going to talk about today we'll have a short introduction i i i think that this is something that we should do i'm sure that not all of you are well familiar with the snowflake and we'll talk about three topics in snowflake security obviously we're not going to cover everything that there is to know we're going to cover network access control that's a small subject in snowflake pretty nice and simple data access control which is a huge topic we'll have to be very focused in data access control
and logging and monitoring in snowflake so i'm hoping you will get a value out of this and we'll wrap it up with some takeaways so the good thing about not having people in front of me i can say are you ready and i can imagine that everybody is saying that they are ready so let's start with some introduction what's the prerequisites for this uh talk so some of you have heard about snowflake maybe some of you haven't heard about snowflake some of you only heard that it's a cool technology um and some of you probably use it uh so we'll have an intro specific like what's snowflake what's a cloud data warehouse and and i assume
that most have at least some basic sql knowledge if you don't then just ignore the sql bits and you will be fined and then focus on what we're trying to achieve so a small disclaimer as i said you can i'm not trying to say that by the end of this talk you will know everything you need to know about snowflake security but i do hope to spark your curiosity and give some ideas about better protecting snowflake or other data warehouses and remember that we are all unique like snowflakes each organization is different and not every um access control setting is the right for everybody otherwise there would be no settings and we would just have same configuration for all but
again it's to spark the curiosity to get you started and okay so let's start with the snowflake is a cloud data warehouse uh sometimes the terms database data warehouse and data lake sometimes there are products that are and in one category and there are also other categories the the lines are a bit blurry but database is usually uh oltp it's transaction uh it's aimed at transactions it stands behind applications for example e-commerce applications and and it does the database behind it it's mostly structured data and it also can include semi-structured data uh like json but mostly structured data a data lake on the other end the the rightmost column and data lake like s3 like azure data lake is the idea that
anything goes in there you flush all your data the logs everything you you're sending it to a data lake you're sending it and some of it is somehow structured for example in parquet files some of it is in other formats and then mostly data scientists or forgive me for whatever job title it is people are taking a look at this data and trying to make value of it i would say that data warehouse is something in the middle the good examples are redshift bigquery and snowflake it has structured or semi-structured data it's made its main usage is for analytics purposes and uh and in in our days where all companies are data-driven um mostly all of the companies are
data-driven many teams are making use of the data warehouse and and because it's very simple structured there's a schema and you can get a you can get very good value out of it you also have great bi tools to connect to the data warehouse and do your stuff in there so you put also a lot of things in there and you start getting value from your data it smells like in the cloud data warehouse perhaps you've heard about the ipo it's the biggest software ipo of this year or maybe of the last few years and it's the fastest growing cloud data warehouse so redchip is the biggest aws redshift is the biggest but it it's the slowest growing bigquery
and the azure scene apps are and has good growth and then are bigger snowflake is the smallest out of this pack of the this pack but it has the it's the fastest growing it's a unicorn of a unicorn company and there's a reason it's a sas simple powerful scalable solution it also has some great features like data marketplace or secure data sharing it has dynamic masking ability and tokenization and and once it's very easy to get from zero to snowflake and it's also very once it's in the organization the usage grows because lots of different teams see that it's very easy and things are a sql query away from you so some of the security challenges
around snowflake which are also true to other data cloud data warehouses first of all it is security in data warehouses usually and again each organization is different but not managed by security teams but they're managed by data ops or data engineering teams and whatever title it is that manages them it's usually not the security teams and this sometimes creates a gap because the security teams don't have the knowledge of in which table either the sensitive data that we don't want that team to have access to etc there's a blurry responsibility between snowflake the data teams the security team and this is a challenge that we see around snowflake security management and there are also compliance issues which are
you know um very i would say it's not correct grammatically but very soon in the older uh data warehouses and with ready reports and and and everything and then snowflake they are a bit blurry and also there is role hierarchy in snowflake it's an arbuck software so it's a role-based access control but it has hierarchy so a role can also it can it can have its own privileges but it can also uh inherit privileges from other roles and in some organizations that causes complexities and we'll talk about that later so for the uh for the samples i set up a fictitious company called papercatenator.io this is a tribute to a phineas and ferb to dr heinz
einstein who had a lot of band-aids to sell so we set up a gadget to give people paper cuts and so i thought let's create papercutinator dot io we'll create a database a schema and some tables customers for the customers and purchased it for what the the for the band-aids they are buying so this is snowflake ui and you can do mostly everything anything in snowflakes and we're creating the database in the ui so we're creating the database and the schema and we're creating the tables and inserting some and bogus information into the sorry into the tables let's run it [Music] okay my camera okay let's hope it they let's hope that it will be okay and now
we have the data and so we can query the data we're selecting customers as you can see we can get a nice view of the um of our customers of course there are fake customers don't worry their phone email databars password okay so now that we have our uh our test data ready let's talk about network access control this is the first and quickest topic of today network access control basically it means limiting uh access to the resource in this case to the data warehouse based on the network you're on so you can for example limit access only through through certain ip addresses like vpn vpc and office ips et cetera why do we even want to do that
uh because by default it's off so you can access your account from anywhere from any id so first of all it's a risk reduction uh if you don't allow people to access your data warehouse from whatever ip they are in um then you reduce the risk of them for example connecting from a computer that not authorized and has a malware in it and so you can limit them by doing that and you can also and also compliance meaning that some of the compliance control the frameworks require you to show that you place network access controls um on your uh on your data so setting up net network access controlling snowflake is very easy it's you can set up policies and we will
do that in a while and set the scope for the policies or assign the policies you can set it either for the entire account or for a specific user so it's very simple there's no hierarchy of policies or anything like that and of course paper cutting mater with dr heinz duffersmith they want it and let's set it up they want everybody to connect only from the office set of course for the ceo president and co-founder duffenshmirt who wants to connect from anywhere so let's set it up in snowflake so it's very simple we can also do it from the ui but that's not fun so we'll just run the query so we're setting um the network a policy
office we're defining it with an allowed id list it can have subnet in it of course and then we're setting the account to use office and then we're setting heinz a different a different policy on different ideas that of course use your imagination and you can leave only the setting that allows all access and we're setting it to heinz the user so now we have the situation that we wanted where everybody can access only from the i the office ip of course if i remove this and heinz can connect from anywhere please don't do that even if your president ceo and co-founder asked you for uh to bypass all security please don't do that uh and as you will be the one
staying up all night afterwards when there's a data breach so don't do that some of the limitations around network access controls in snowflake are it only supports ipv4 or it doesn't support ibv6 that is and um the identity granularity sometimes it's not uh you have to script your way around it because an account level is very big it's it's very blunt and the user level is sometimes too granola when you have thousands of employees or hundreds of people connecting to your snowflake there's no in between so that sometimes is a limitation it is a limitation sometimes it's very disturbing and you cannot set the set the access in the securable object granularity meaning it's a zero to one
you either are coming from the right id addresses and you can go in or you are not and you can go go into your data warehouse you can't say for example that you can go into your data warehouse from a list of ip addresses but to the hr database you can only go from specific restricted id addresses and also you can do network access control based on data types for example you can access information from any network uh or from a specific network but to access pii personal identifiable information you can only do that from certain id addresses so those are some of the limitations let's talk about data access control which is a big topic uh data access control
means how do you limit people to use the data is the reading writing updating the data in the right way and it's not a trivial question not in snowflake not in other data stores because on one hand you want people to actually use the data on the other end you don't want to be over permissive you don't want people to have uh more access than they need because that's a risk and and so and you don't want to become a rubber stamp that simply gets tickets and opens permission so it's not a trivial subject and we can discuss it later after the talk some of the terms you need to know about data access in snowflake there's the
securable object that's the object we're trying to secure for example a table is a securable object and a database is a securable object and a view is a securable object by the way unlike as example redshift a column inside a table is not a securable object um so that's a securable object for example a table is a the best example of a securable object you want to allow access to a table a privilege or a permission it's called privilege in snowflake is what can you do with the security securable object it's like a top of the object and what you can do with it so um you can only read only that select use only select on this resource
or you can do update truncate delete insert etc etc so that's the privilege and a role gets these tuples these privileges to do stuff on secure on different securable objects and and and that defines the role of what it can do okay that's classic role-based access control and the user they are granted with the roles so a specific user like ben can have multiple roles and and but for each query i'm sending or for each command i'm sending i'm using a specific role and the privileges i have are based on that role hope that makes sense if that doesn't we can continue this after after the talk why do we want to have data access control
because some of organizations are very permissive and they're simply giving access to the entire data warehouse so it's risk reduction again if you're using data then you're creating value if you have access to data you're not using you're getting the risk but you're not providing a value so ideally you don't want people to have access to what they don't need and compliance of course meeting regulations and security frameworks and being able to say which users are can access which data and so on there's also privacy setting access to pii if you know that you have pii in specific tables you want to set specific access to that and restrict access to a pii sometimes of course pii is
found in unexpected places okay so setting up data access controls in snowflake is pretty easy you assign users to roles you can use do it manually like grant a role to a user but most organizations of course are doing that in an automated way where the groups are imported to snowflake are interfaced with snowflake and they become roles and the users are the users of those roles so for example with octa integration um so the first part is pretty easy it's mapping the users to the roles the second part is a bit trickier it's granting the objects to the roles that's a part that is tricky because not always is it very simple to know exactly what
data everybody needs etc etc but that's a topic for a different uh talk so that's it that's what you need to do and of course paper cabinet wants data access control and they have two roles packing samurais and packing ninjas and they want these roles to inherit you remember there's a hierarchy inherit from the packing role so the packing allows to select data only to retrieve data from customers so we know where to ship and from purchases so we know what to ship let's see that in snowflake and we'll create a roll packing okay and let's assign it to our role and grant usage for the database uh for the logistics to packing role and to all the tables in logistics
and now we're creating the two child roles we're doing this just to show that hierarchy works and and that it's fun i don't know if it's fun but it works uh and then we're let's try it out as you can see let's use the packing ninjas role you can see up here that we're now using the packing ninjas role and now we can select data from this table from the customers table pretty sweet now let's say there was an audit on paper catenator that i owe and they're saying that the packing teams are risky why because they're accessing pii some of it they need but some of it it's like date of birth and the
passwords they don't need and we want to restrict them from that so let's limit the access what we're going to do is uh use views uh and if you is basically uh giving partial access or full access to underlying objects so basically it's like you can think of it as a virtual table and that you can filter on top of existing tables you can filter you can join sometimes you use it for operational reasons sometimes you use it for security in in this case we'll return to the account admin role and this is a demo account but in your account do not use account admin for uh you should not do that this is just a demo account um
and what we are doing is creating a view that selects only the the the columns we need from customers okay and then we are revoking the access to customers from the packing rows from all of the packing roles and we're giving it access only to v customers which is the view we created hope you're with me let's go back to the samurai the ninjas packing ninjas role and now you can see that when you're retrieving data from the view you're not getting the date of birth or the password and when you're retrieving data from customers you're getting access denied so you can't access the data that you're not supposed to access and this was done using secure views i will
not go into details around secure views in snowflake but the they do two things a they they are not transparent about what they do to the user the user can't see what what's what exactly the query of the view is and the second thing is that they eliminate some optimizations which is not good operation otherwise uh but uh so you cannot get hints about about data that you're not supposed to get to have access to so now we implemented a very simple column-based access control we and we restricted the access from sensitive data from our packing teams and we created the interface uh or the abstraction layer between the the employees in the packing team
the users and the data so that's pretty pretty sweet uh data access control in snowflake it's a wider topic than that i'm sorry but we had i had to pick my battles and there's a lot more like dynamic masking like row based access control uh but it has some limitations um first of all it's challenging to manage especially in scale and some organizations also use multiple data stores not only snowflake that makes it even more challenging but imagine that you have thousands of tables and hundreds of users for example the and some of them need different uh restrictions than others then it can become a bit complex to manage to give permissions to take away permissions etc uh it's not dynamic
as in when i'm saying dynamic i mean for example based on the data types so i want to mask or restrict access to anything that contains a social security number email name etc so it's only very specific and it's very easy not only in snowflake but it's very easy to become over permissive and that's i call it the silent killer of security over permissions when you have uh an organization when you're in an organization long enough you have more access than you need because when you're in the security team or data team people always come to you when they need access but i don't know a lot of people who come back when they want that access to be revoked
so this is a big problem over privilege and i if you want i read i wrote a blog about it and you're welcome to read we also have a reporting tool of over privileges at satori if anyone is interested feel free and let's discuss the third topic uh being very conscious of our time here being the last talk uh so let's discuss logging and monitoring in snowflake so first of all uh i feel that with security folks like why login monitor is you know we don't even need to ask this we all love logs we're very passionate about logs first of all it's a requirement to have logs to have audits on data access it's a requirement in many frameworks in
this cyber security framework and other frameworks so first of all we need to have access logs and also of course when doing incident response or when understanding certain events that happen and it's very useful to have logs and for visibility to know what's going on inside that black box of of data warehouse which teams are using what resources how much etc not only for security reasons also for operational efficiency keep in mind that someone is paying the bills for the snowflake and for you know for snowflake and and sometimes it can be good to analyze what's happening and like i said to reduce over privileges and to proactively find threats find the find the users who are
pulling more data than others etc etc well we all know it analytics and over logs is a killer so logging in snowflake the good the good news is that logging in snowflake is awesome it it support it gives awesome out-of-the-box capabilities without setting up s3 buckets you want the logs to be sent to and which logs do you want to send it to sender and setting up etls over the logs etc so you're getting pretty good out of the box capabilities the logs are very detailed we'll see that in a while and and they have a long retention period of one year which is very cool uh some organizations need more than that and well for that you need to
work a little bit and then move the logs at least to your normal snowflake account or you can move it outside uh but it has a long retention period and this is very important also for organizations who find out you know the oh moment uh where you say okay i know that i needed to have this but do i have it and in some platforms it's not there out of the box with a long retention period and then you're just i'm sorry i don't have logs for that time and then that's uh that's an awkward sentence sentence to say and you don't have to say it with a snowflake and it's easy to access it's just a sql
query away they're in tables in sql tables of metadata that you can access and that's very cool so pay-per-catinator wants to leverage these logs and if anyone is interested about the specific locations of such logs and they're both in the snowflake at the documentation and i can also answer that to the best of my ability they want to know two things one if you've watched phineas and ferb and i'm hoping that some of you did where where is perry perry the platypus is always up to no good so we want to know what he was doing and you know keeping up to date we'll have like a sample table for reporting so let's do that
so let's return to our account admin role that we're using and let's select from the query history all the queries by perry platypus and as you can see he was selecting data from petercatenator.secret.ultra secret stuff which is very typical of him and so that's one thing of course this is just an example and you can see in here that you have tons of data about the cost about the credit cards about the error messages that you is that you've got and you can make aggregations of course and we can correlate that with other information so you have a lot of things to play with and the second thing this was from the query log this is from the login history
or the access log and in here we'll do a short aggregation of which users got the most the most failures login failures and and the reasons to these login failures so this is just an example of course a very simplistic example and but you can of course figure out lots of more useful things to do in there some of the limitations around the logging first of all it's the retention time which one year is a lot but sometimes you may need more it is raw data you need to do you need to work in order to get uh you know to get value out of it you need to do some work and that's that's the only
downside of the snowflake logs that i could think of some general takeaways and uh and then we'll open it up for questions if we have time i think we have time for at least a few so first of all snowflakes uh the security controlled in snowflake rock they are great and first of all it's sass so you don't need to worry about a lot of things you're getting out of the box and you're also getting good out of the box logs and retention and you're also getting abilities to set up fine-grained uh access controls column-based row-based access controls and of course there are more features that we still didn't uh that we didn't discuss maybe next year um
i will say that implementing a layer of security on top of snowflake and other data warehouses like we do at satori and provides additional visibility reduces uh reduces the over privileged risk and detects threats also it makes it easy when you have more than one data store and because it's decoupled from the data infrastructure itself and it gives better access management so that's also a point of consideration of course now what first of all i know that i was speaking fast so if you want you can also read more in our blog i have a i written i wrote a guide that's a bit longer than this talk about you can look for the frozen
heart-shaped padlock and it's a redshift hardening guide there's also a big query and uh not a redshift hardening guide snowflake hardening guide there is also a bigquery and redshift hardening guide that i also wrote and of course feel free to keep in touch using email linkedin twitter or of course um discord so i hope you enjoyed and i must say that it was very strange doing this talk without seeing anyone so i'm hoping you you truly enjoyed and i have to say that the organization really kicks ass uh for a virtual event and that was really really great everything is really super so and also the toxic iso were great so amazing thank you very much mute for the entire
time no you're you're great you're great uh thanks it's it's uh something that's definitely foreign to me um but people are responding well and uh thank you for it uh no questions as yet but um but yeah thank thanks for the talk uh i have to say that i called the agent p thing actually i called him perry and then somebody corrected me saying that the attacker would be agent p but yes uh good job [Laughter] it's been a while since i saw phoenix insurance no it's it's uh it was a fantastic talk we really enjoyed it um and and that's it that closes out kind of the first day of b-size toronto so we still
got another