Visualizing Snowpipe usage with dbt and Superset

Visualizing Snowpipe usage with dbt and Superset

🕑 Published : Mar 22, 2022 | 🕣 Updated : Mar 23, 2022 | ⏳ 4 Min read

📁Dbt Snowflake Superset #️⃣Dbt Snowflake Superset

Table of Contents

Snowpipe is Snowflake’s continuous data ingestion service. Snowpipe loads data within minutes after files are added to a stage and submitted for ingestion. Currently there is not a consolidated dashboard in snowflake which shows the summary of Snowpipe. Copy history in Snowsight gives a dashboard for table level copy history. Table functions INFORMATION_SCHEMA.PIPE_USAGE_HISTORY and INFORMATION_SCHEMA.COPY_HISTORY has copy history but its available only for 14 days.

In this article we will try to materialize data from PIPE_USAGE_HISTORY and COPY_HISTORY into a snowflake table and then visualize the Snowpipe copy history and usage history with the help of dbt macro and dbt models.

Prerequisite

Download and install docker for your platform. Click here for instructions Docker will be used to bring containers with dbt and superset. If you already have an environment with these two tools installed then docker is not a prerequisite but recommended to follow along with this demo.
Snowflake account with Snowpipes running. If you don’t have Snowpipes running and wish to follow along, you can create Snowpipe by following one of my previous blog
- How To Pull Data From API Using AWS Lambda & Stream To Snowflake Using Snowpipe
- How To Stream Data From Kafka To Snowflake Using S3 And Snowpipe

Create dbt model

We will create a dbt model using data from table function PIPE_USAGE_HISTORY and COPY_HISTORY. Since this component is reusable, I have included them in the dbt package dbt-snow-utils.

Package consists of dbt macro get_snowpipe_details and dbt models with +tag:snowpipe
Macro get_snowpipe_details gets all Snowpipe and related target table information at account level
Models with tag +tag:snowpipe will load data into target incremental tables. Models have three layers named staging(views for data transformation), marts(materialize into table) and presentation(view for end users with necessary access controls)
Add the package into packages.yml of you dbt project
```
   - git: "https://github.com/entechlog/dbt-snow-utils.git"
     revision: 0.1.3
```
Please refer to the release version of this repo/dbt hub for the latest version. The version number mentioned above may not be the updated version number.
Install the package by running below command
```
dbt deps
```
Add the following variables under the vars section of dbt_project.yml. This allows to customize the data retrieval filters, edit them as needed. If you don’t need all pipes in account you can limit the databases in pipe_databases. Valid values are string “ALL” OR list of databases.
```
vars:
  dbt_snow_utils:
    pipe_databases: "ALL"
    filter_by_date: 
    pipe_copy_history_filter_key: "hours"
    pipe_copy_history_filter_value: -168
    pipe_usage_history_filter_key: "day"
    pipe_usage_history_filter_value: -7
```

Add the following model configuration under the models section of dbt_project.yml. This allows to customize the target database and schema, edit them as needed

models:
  dbt_snow_utils:
    staging:
      database: "DEMO_DB"
      schema: staging
    marts:
      database: "DEMO_DB"
      schema: marts
    presentation:
      database: "DEMO_DB"
      schema: presentation

Make sure the ROLE used by your dbt project has below access in snowflake

-- Required to track the pipe usage
GRANT MONITOR USAGE ON ACCOUNT TO ROLE <dbt-role-name>;

GRANT USAGE ON DATABASE <db-name> TO ROLE <dbt-role-name>;
GRANT CREATE SCHEMA ON DATABASE <db-name> TO ROLE <dbt-role-name>;

-- Required for superset to access presentation layer
GRANT SELECT ON ALL VIEWS IN SCHEMA <db-name>.PRESENTATION TO ROLE <bi-role-name>;

Run the models using command

dbt run --select +tag:snowpipe
OR
dbt run --select +tag:snowpipe --vars '{"filter_by_date": "2022-03-22"}'

This should create two tables presentation.snowpipe__usage_history and presentation.snowpipe__copy_history which can be integrated with BI tools to build Snowpipe monitoring dashboards.

Start Superset

We will use Superset running in docker for this demo purpose. You can use other BI tools also to create similar dashboards. Dashboards shown here might not have all the information you are looking for but gives a good idea on how to use this data to get more insights and operational information about Snowpipes running in your account.

Clone repo superset-examples
cd into superset-snowflake directory and start Superset docker container. This will bring up Superset and initialize it
```
cd superset-snowflake
docker-compose up -d --build
```
Validate the status of docker containers by running docker-compose ps
Validate docker logs by running docker logs -f openssh-client. It would take a couple of minutes for Superset to come online.

Run below command if you see issue with initialize-superset.sh being a windows file
```
sudo find . -type f -exec dos2unix {} \;
```
The docker startup script will also import database connection and dashboard which was already exported using below commands.
```
# Export commands
superset export_datasources -f /tmp/databases.yml
superset export-dashboards -f /tmp/dashboards.yml
```

Create Dashboard

Navigate to Superset UI and login using admin/admin
Navigate to Databases and click on database snowflake-db

Update the connection string to point to Snowflake instance. The connection can have following variations based on your snowflake setup

snowflake://{user}:{password}@{account}.{region}/{database}

snowflake://{user}:{password}@{account}.{region}/{database}?role={role}&warehouse={warehouse}

snowflake://{user}:{password}@{account}.{region}/?authenticator={authenticator}&database={database}&role={role}&warehouse={warehouse}

Navigate to Dashboards and click on Snowpipe monitoring dashboard
Here are some common Snowpipe statistics tracked by this Dashboard, you can customize the dashboard based on your needs

Clean Demo Resources

Open a new terminal and cd into superset-snowflake directory. Run the below command to delete the docker containers and related volumes

docker-compose down -v --remove-orphans

Hope this was helpful. Did I miss something ? Let me know in the comments OR in the forum section.

References

Comments

Exploring dbt with Snowflake

In this article we will see how to use dbt with Snowflake. dbt (data build tool) does the T in ELT (Extract, Load, …

Siva Nadesan

Jan 19, 2021 - 21 Min read

#Dbt #Docker #Snowflake

How to setup dbt for Materialize database with streaming data from Redpanda

Materialize is a streaming database for real-time applications. Materialize accepts input data from a variety of …

Siva Nadesan

Feb 17, 2022 - 8 Min read

#Dbt #Docker