Welcome to TechStation, SDG's space for exploring cutting-edge trends in data and analytics! In this article, we delve into Snowflake's Dynamic Data Masking feature and how the dbt_snow_mask package simplifies its implementation. Learn how to secure sensitive data dynamically, ensuring robust compliance and streamlined management across your data environments.
Interested in something else? Check all of our content here.
Dynamic Data Masking (DDM) in Snowflake is a security feature that allows organizations to protect their sensitive data by masking them in real-time, based on the user roles and their access permissions.
This feature guarantees that sensitive contents, such as personal information or financial data, are hidden for users who do not have the correct authorization, while still allowing authorized users to see the full data.
This article will present a powerful tool that helps data teams to manage more efficiently the DDM policies within their Snowflake environments, using a dbt framework.
Let us start by analyzing the two main components involved: dbt and Snowflake.
dbt (Data Build Tool) is an open-source tool designed for data transformation and modeling within modern data warehouses.
Snowflake is a cloud-based data warehouse platform providing flexible, scalable and completely managed services for storing and analyzing large data volumes.
The data transformation and modeling capabilities of dbt combined with the instant scalability and the cloud-based data warehousing offered by Snowflake make it easier for data teams to build, manage and analyze data pipelines.
dbt_snow_mask
is a package designed to work with dbt to automate the application of masking policies in Snowflake by exploiting the meta
property in dbt models.
In this way, it is easier to manage and apply data masking consistently across Snowflake data warehouses.
dbt_snow_mask
dbt_snow_mask
automates the application of arbitrarily complex Snowflake’s masking policies using definitions within the dbt’s meta
configs, reducing manual effort. This ensures consistency, by centrally managing policies in dbt, and scalability, by providing data masking across all relevant data.
This section will present how to apply a masking policy to a model; similar considerations can be made if, instead of a model, one were to consider a source or a snapshot.
1. First of all, the package must be installed:
1.a) Add the dbt_snow_mask
package to the packages.yml
file in the dbt project:
Replace “[latest_version]” with the actual latest version number, which can be found on the dbt Hub.
1.b) Since this package in turn uses dbt_utils
package, it also needs to be installed. This can be done by adding the following two lines to the packages.yml
file:
Also in this case replace “[latest_version]” with the actual latest version number, which can be found on the dbt Hub.
1.c) After adding the packages, run the following command to install them:
2. By default, the masking policies are created in the database-schema pair associated with the target specified in the profiles.yml
file.
This behaviour can be changed by acting on the parameters passed to the dbt_project.yml
file and use a common database or a common schema:
2.a) Use a common database.
By setting the following, optional, parameters the database and schema where the masking polices are created in can be changed:
use_common_masking_policy_db
: flag to enable or not the usage of a common pair database-schema for all masking policies. Valid values are ‘True’ or ‘False’.common_masking_policy_db
: the database name for creating masking policies.common_masking_policy_schema
: the schema name for creating masking policies.create_masking_policy_schema
: flag whose valid values are ‘True’ or ‘False’. When set to ‘False’, helps to avoid creating schema if the dbt role does not have access to create schema. The default value is ‘True’.Example: vars
config in dbt_project.yml
file to enable using a common masking policy database, with database name set to “DB_NAME”, schema name set to “SCHEMA_NAME”, avoiding creating schema if the dbt role does not have access to create schema:
2.b) Use a common schema (in the current database).
By setting the following, optional, parameters only the schema that the masking polices are created in can be changed:
use_common_masking_policy_schema_only
: flag to enable the usage or not of a common schema in the current database for all masking policies. Valid values are ‘True’ or ‘False’.common_masking_policy_schema
: the schema name for creating masking policies.create_masking_policy_schema
: flag whose valid values are ‘True’ or ‘False’. When set to ‘False’, helps to avoid creating schema if the dbt role does not have access to create schema. The default value is ‘True’.Example: vars
config in dbt_project.yml
file to enable using a common masking policy schema, with schema name set to “SCHEMA_NAME”, avoiding creating schema if the dbt role does not have access to create schema:
3) Use the meta
property in the model.yml
file to specify the masking policy to be adopted. Decide the masking policy name and add the key masking_policy
in the column which has to be masked.
Example: configuration of model.yml
file to apply the masking policy “MP_NAME” to the column “COLUMN_NAME” of model “MODEL_NAME”:
4) Before using a masking policy on models it must be defined and created on Snowflake. In order to do that, create a macro with the name create_masking_policy_<masking-policy-name>
and the SQL for masking policy definition.
Then create the masking policy in Snowflake by running the following command:
Example: definition of a masking policy named “MP_NAME” which allows the visibility of the content of column “COLUMN_NAME” only to roles “ROLE_1” and “ROLE_2”, all other roles will see ‘**********’ instead of the actual value:
To create the masking policy “MP_NAME”, with customizable database and schema configurations as explained in step 2, the command is the following:
5) Apply the masking policy by running below command:
6) To remove the applied masking policy, simply run the following command:
In summary, the dbt_snow_mask
package is a powerful tool for teams using dbt and Snowflake, allowing for straightforward, consistent, and automated application of Dynamic Data Masking policies across Snowflake data warehouses using the dbt meta
property.
Vuoi scoprire come il Dynamic Data Masking può migliorare la sicurezza e la conformità dei dati della tua azienda? Prenota una call per esplorare soluzioni su misura e scoprire come integrare facilmente il pacchetto dbt_snow_mask nel tuo ambiente Snowflake!