How to setup dbt for Postgres database

How to setup dbt for Postgres database

🕑 Published : Sep 12, 2021 | 🕣 Updated : Sep 12, 2021 | ⏳ 6 Min read

📁Dbt #️⃣Dbt Docker

Table of Contents

In this article we will see how to initialize and configure a new dbt project for postgres database. Code used in this article can be found here.

Here we are using dbt seed to load raw and reference tables just for demo purposes.
In real use cases, you should never use dbt seed to load your raw layer, though it can be used to load simple reference tables. All other data ingestion to your RAW layer will happen outside dbt.

Prerequisite

Download and install docker for your platform. Click here for instructions

Start Docker containers

Let’s create the infrastructure required for this demo using Docker containers.

Clone this repo
Open a new terminal and cd into dbt-docker directory. This directory contains infra components for this demo
Create a copy of .env.template as .env
Start dbt container by running
```
docker-compose up -d --build
```

Start Postgres container by running

docker-compose -f docker-compose-postgres.yml up -d --build

Validate the container by running
```
docker ps
```
Navigate to pgAdmin in your browser and login with admin@admin.com and postgres
Click on Add New Server and enter following details
✅ Name : dbt demo
✅ Host Name : postgres
✅ User : postgres
✅ Password : postgres

If you change the credentials in .env then please make sure to use them
SSH into the dbt container by running
```
docker exec -it dbt /bin/bash
```

Create dbt project

Validate dbt is installed correctly by running
```
dbt --version
```
cd into your preferred directory
```
cd /C/
```
Create dbt project by running
```
dbt init dbt-postgres
```

Configure dbt profile

Navigate into dbt-postgres directory, create a new dbt profile file profiles.yml and update it with Postgres database connection details

dbt-postgres:
  target: dev
  outputs:
    dev:
      type: postgres
      host: postgres
      user: postgres
      password: postgres
      port: 5432
      dbname: postgres
      schema: demo
      threads: 3
      keepalives_idle: 0 # default 0, indicating the system default

Configure dbt project

Edit the dbt_project.yml to connect to the profile which we just created. The value for profile should exactly match with the name in profiles.yml
From the project directory, run dbt-set-profile to update DBT_PROFILES_DIR.

dbt-set-profile is alias to unset DBT_PROFILES_DIR && export DBT_PROFILES_DIR=$PWD
Validate the dbt profile and connection by running
```
dbt debug
```
Update name in dbt_project.yml to appropriate project name (say dbt_postgres_demo)

dbt commands

Delete the contents from data, macros, models, tests and snapshots directories to follow along this demo

Seeds

Seeds are CSV files in your dbt project that dbt can load into your data warehouse.

Copy the sample data from dbt-postgres\demo-artifacts\data to dbt-postgres\data

Review the seed configuration in dbt_project.yml. The seed configuration should have project name followed by names which should match with the structure of data directory.

seeds:
  dbt_postgres_demo:
    schema: raw # all seeds in this project will use the mapping schema by default
    imdb:
      schema: imdb # seeds in the `data/imdb/ subdirectory will use the imdb schema
    sakila:
      schema: sakila # seeds in the `data/sakila/ subdirectory will use the sakila schema
      address:
        +column_types:
          phone: varchar(50)
    lookups:
      schema: lookups # seeds in the `data/lookups/ subdirectory will use the lookups schema

Load the seed files by running below command
```
dbt seed
```
Data will be loaded into a schema with dbt_ prefix. To fix this we will create a small macro

Macros

Macros are pieces of code that can be reused multiple times.

Copy the macros from dbt-postgres\demo-artifacts\macros\utils to dbt-postgres\macros\utils
Macro generate_schema_name uses the custom schema when provided
Seed all files by running below command
```
dbt seed
```
This time data will be loaded into the correct schema without the dbt_ prefix
Seed select files by running
```
dbt seed --select address
```

Sources

Sources make it possible to name and describe the data loaded into your warehouse by your Extract and Load tools.

We could either use ref or source function to use the data which we seeded, but to stay close to a real use case, we will use source function.

Copy the source definition from dbt-postgres\demo-artifacts\models\sources\ to dbt-postgres\models\sources\
Test sources by running below command
```
dbt test --models source:*
```

Packages

dbt packages are in fact standalone dbt projects, with models and macros that tackle a specific problem area.

Create a new file named in packages.yml and add package configuration

Specify the package(s) you wish to add

packages:
  - package: dbt-labs/dbt_utils
    version: 0.7.0
  - package: dbt-labs/codegen
    version: 0.4.0

Install the packages by running
```
dbt deps
```

Models

Model is a select statement. Models are defined in .sql file.

Copy the model definition from dbt-postgres\demo-artifacts\models\ to dbt-postgres\models\

Review the model configuration in dbt_project.yml. The model configuration should have project name followed by names which should match with the structure of models directory

models:
  dbt_postgres_demo:
      materialized: table
      staging:
        schema: staging
        materialized: view
      marts:
        dim:
          schema: dim
          materialized: table
        fact:
          schema: fact
          materialized: table
      presentation:
        schema: analytics
        materialized: view

Default materialized configuration can be specified at model project level. materialized configuration can be overriden at each directory level

To define variables in a dbt project, add a vars config to dbt_project.yml file
```
vars:
  start_date: '2005-01-01'
  high_date: '9999-12-31'
```

Build models by running below command

dbt run --models stg_sakila__customer
dbt run --models staging.*
dbt run --models +fct_sales --var '{"start_date": "2005-05-24"}'
dbt run --models +tag:presentation-dim
dbt run --models +tag:presentation-fact --var '{"start_date": "2005-05-24"}'
dbt run

Tests

Tests are assertions you make about your models and other resources in your dbt project (e.g. sources, seeds and snapshots).

Types of tests:

schema tests (more common)
data tests: specific queries that return 0 records

Generate the yaml for existing models by running

dbt run-operation generate_model_yaml --args '{"model_name": "dim_customer"}'

schema tests can be added at table level OR column level. Here is an example of test under column definition in model yaml

version: 2

models:
  - name: dim_customer
    description: ""
    columns:
      - name: customer_id
        description: ""
        tests:
          - unique
          - not_null

Execute tests by running below commands

dbt test --models +tag:presentation-dim
dbt test --models +tag:presentation-fact

Validate data

-- Incremental demo
SELECT cast(date_id AS VARCHAR) as date_id
	,count(1) as rec_count
FROM analytics.sales
GROUP BY date_id
ORDER BY date_id;

-- Revenue by day
SELECT ad.day_name
	,af.film_rating
	,ac.customer_city
	,sum(amount) AS revenue
FROM analytics.sales asa
JOIN analytics.film af ON (af.film_id = asa.film_id)
JOIN analytics.DATE ad ON (ad.date_dim_id = asa.date_id)
JOIN analytics.customer ac ON (ac.customer_id = asa.customer_id)
GROUP BY (
		ad.day_name
		,af.film_rating
		,ac.customer_city
		)
ORDER BY revenue DESC limit 100;

Docs

dbt docs provides a way to generate documentation for your dbt project and render it as a website.

You can add descriptions to models, columns, sources in the related yml file
dbt also supports docs block using the jinja docs tag
Copy sample jinja docs from dbt-postgres\demo-artifacts\docs\ to dbt-postgres\docs\
Generate documents by running
```
dbt docs generate
```
Publish the docs by running
```
dbt docs serve --port 8085
```
Stop published docs by running ctrl + c

Clean Demo Resources

Open a new terminal and cd into dbt-docker directory. Run the below command to delete the docker containers and related volumes

docker-compose down --volume --remove-orphans

Hope this was helpful. Did I miss something ? Let me know in the comments OR in the forum section.

References

dbt Blog

Comments

Exploring dbt with Snowflake

In this article we will see how to use dbt with Snowflake. dbt (data build tool) does the T in ELT (Extract, Load, …

Siva Nadesan

Jan 19, 2021 - 21 Min read

#Dbt #Docker #Snowflake

How to pull data from API using AWS Lambda & stream to Snowflake using Snowpipe

In this article we will see how to get/pull data from an api, store in S3 and then stream the same data from S3 to …

Siva Nadesan

Sep 06, 2021 - 7 Min read

#Docker #Snowflake #AWS