Fivetran Platform Connector Sample Queries

In the provided sample queries, the <destination_db>and <destination_schema> placeholders are references to different entities, depending on your destination, that correspond to your destination name/project ID and Fivetran Platform connection name, respectively.

The following table shows the mapping between the placeholder and the destination entities along with usage examples:

Destination	`<destination_db>`	`<destination_schema>`	Example
Snowflake	Snowflake database name	Schema name	`analytics.platform.log`
BigQuery	GCP project ID	Dataset name	`my-project.my_dataset.log`

Use this mapping to replace the placeholders in your queries.

In BigQuery, the full table reference must be enclosed in backticks.

For an automatically-created Fivetran Platform connection (when you create a destination), the default name is fivetran_metadata. For a manually-created Fivetran Platform connection, the default name is fivetran_log. For S3 Data Lake and Managed Data Lake Services, the default name is fivetran_metadata_<group_id>.

Calculate monthly active rows (MAR) per connection

The Monthly Active Rows (MAR) count sent through the Fivetran Platform Connector includes the active rows recorded up to the current date. The count resets at the beginning of each month. For example, if your MAR count for schema_A.table_A is 5500 on January 1st but is 5800 on January 31st, your January MAR for schema_A.table_A is 5800. Your MAR count then drops to 0 on February 1st. To learn more about MAR, see our pricing documentation.

If you use these queries during your free trial, we recommend that you add every data source that you plan to use with Fivetran and let each connection run for 7 days under a typical load. This will give you a more accurate idea of how well Fivetran meets your business needs.

Calculate MAR grouped by schema (connection), destination, and month

Snowflake MAR

Expand for Snowflake query

We have tested the following query for Snowflake destination:

SELECT
  schema_name, 
  destination_id, 
  date_trunc('month', measured_date) AS measured_month, 
  SUM(incremental_rows) AS MAR
FROM <destination_db>.<destination_schema>.incremental_mar
WHERE  free_type = 'PAID'
GROUP BY schema_name, destination_id, measured_month
ORDER BY measured_month, schema_name

This query depends on the INCREMENTAL_MAR table.

BigQuery MAR

Expand for BigQuery query

We have tested the following query for BigQuery destinations:

SELECT
  schema_name, 
  destination_id, 
  date_trunc(measured_date, month) AS measured_month,
  SUM(incremental_rows) AS MAR
FROM `<project>.<dataset>.incremental_mar`
WHERE  free_type = 'PAID'
GROUP BY schema_name, destination_id, measured_month
ORDER BY measured_month, schema_name

This query depends on the INCREMENTAL_MAR table.

Calculate MAR by table

Snowflake MAR by table

Expand for Snowflake query

Run the following query with a Snowflake destination:

SELECT 
  schema_name, 
  destination_id, 
  table_name,
  connection_name,
  date_trunc('month', measured_date) AS measured_month,
  SUM(incremental_rows) AS incremental_rows
FROM <destination_db>.<destination_schema>.incremental_mar
WHERE free_type = 'PAID'
GROUP BY schema_name, destination_id, measured_month, table_name, connection_name
ORDER BY measured_month, schema_name, table_name;

This query depends on the INCREMENTAL_MAR table.

BigQuery MAR by table

Expand for BigQuery query

Run the following query with a BigQuery destination:

SELECT 
  schema_name, 
  destination_id, 
  table_name,
  connection_name,
  date_trunc(measured_date, month) AS measured_month,
  SUM(incremental_rows) AS incremental_rows
FROM `<project>.<dataset>.incremental_mar`
WHERE free_type = 'PAID'
GROUP BY schema_name, destination_id, measured_month, table_name, connection_name
ORDER BY measured_month, schema_name, table_name;

This query depends on the INCREMENTAL_MAR table.

Calculate monthly transformation model runs

The Fivetran Platform Connector logs the number of times transformation models have run each month. This count includes all successful model runs up to the current date and resets to zero at the beginning of each month. For example, if your count for the Unified RAG job is 5500 on January 21 and 5800 on January 31, then the total number of successful model runs for January is 5800. On February 1st, the count resets to 0.

Calculate model runs grouped by month, destination, and job name

Snowflake

Expand for Snowflake query

Run the following query with a Snowflake destination:

SELECT
  date_trunc('month', measured_date) AS measured_month,
  destination_id, 
  job_name, 
  SUM(model_runs) AS model_runs
FROM <destination_db>.<destination_schema>.transformation_runs
WHERE free_type = 'PAID'
GROUP BY measured_month, destination_id, job_name
ORDER BY measured_month, destination_id, job_name;

This query depends on the TRANSFORMATION_RUNS table.

BigQuery

Expand for BigQuery query

Run the following query with a BigQuery destination:

SELECT
  date_trunc(measured_date, month) AS measured_month,
  destination_id, 
  job_name,
  SUM(model_runs) AS model_runs
FROM `<project>.<dataset>.transformation_runs`
WHERE free_type = 'PAID'
GROUP BY measured_month, destination_id, job_name
ORDER BY measured_month, destination_id, job_name;

This query depends on the TRANSFORMATION_RUNS table.

Calculate model runs grouped by destination, project type, and month

BigQuery

Expand for BigQuery query

Run the following query with a BigQuery destination:

SELECT  
  destination_id, 
  project_type,
  date_trunc(measured_date, month) AS measured_month,
  SUM(model_runs) AS model_runs
FROM `<project>.<dataset>.transformation_runs`
WHERE free_type = 'PAID'
GROUP BY destination_id, project_type, measured_month
ORDER BY destination_id, project_type, measured_month;

This query depends on the TRANSFORMATION_RUNS table.

Snowflake

Expand for Snowflake query

Run the following query with a Snowflake destination:

SELECT
  destination_id, 
  project_type, 
  date_trunc(`month`, measured_date) AS measured_month,
  SUM(model_runs) AS model_runs
FROM <destination_db>.<destination_schema>.transformation_runs
WHERE free_type = 'PAID'
GROUP BY destination_id, project_type, measured_month
ORDER BY destination_id, project_type, measured_month;

This query depends on the TRANSFORMATION_RUNS table.

Check connection status

Check sync start and end times

Expand for universal query

Run the following query with a BigQuery, Redshift, Databricks or Snowflake destination:

SELECT 
  connection_id, 
  message_event,
  time_stamp AS process_start_time
FROM <destination_db>.<destination_schema>.log
WHERE message_event = 'sync_start' OR message_event = 'sync_end'
ORDER BY time_stamp DESC;

This query depends on the LOG table.

Troubleshoot errors and warnings

Expand for universal query

Run the following query with a BigQuery, Redshift, Databricks or Snowflake destination:

SELECT connection_id, time_stamp, event, message_data
FROM <destination_db>.<destination_schema>.log
WHERE event = 'WARNING' OR event = 'SEVERE'
ORDER BY time_stamp DESC;

This query depends on the LOG table.

Check records modified since last sync

The sample queries below return the volume of data that has been inserted, updated, or deleted since your last successful sync. They also return the timestamp of your connection's last record modification. Query results are at the connection level.

Use the sample query for your destination:

If you want to filter your results based on data modification type (for example, view inserts only), use the operationType field in the message_data JSON object.

BigQuery modified records since last sync

Expand for BigQuery query

WITH parse_json AS (
    SELECT
        time_stamp,
        JSON_EXTRACT(message_data, '$.schema') AS connection_schema, 
        CAST(JSON_EXTRACT(message_data, '$.count') AS int64) AS row_volume,
        message_event,
        MAX(CASE WHEN message_event = 'sync_end' THEN time_stamp ELSE NULL END) OVER(PARTITION BY connection_id) AS last_sync_completed_at
    FROM `<project>.<dataset>.log`
    WHERE message_event = 'records_modified'
        OR message_event = 'sync_end'
)

SELECT 
    connection_schema,
    MAX(time_stamp) AS last_records_modified_at,
    SUM(CASE WHEN time_stamp > last_sync_completed_at OR last_sync_completed_at IS NULL THEN row_volume ELSE 0 END) AS row_volume_since_last_sync
FROM parse_json
WHERE message_event = 'records_modified'
GROUP BY connection_schema
ORDER BY row_volume_since_last_sync DESC
;

This query depends on the LOG table.

Snowflake modified records since last sync

Expand for Snowflake query

WITH parse_json AS (
    SELECT
        time_stamp,
        PARSE_JSON(message_data) AS message_data,
        message_event,
        MAX(CASE WHEN message_event = 'sync_end' THEN time_stamp ELSE NULL END) OVER(PARTITION BY connection_id) AS last_sync_completed_at
    FROM <destination_db>.<destination_schema>.log
    WHERE message_event = 'records_modified'
        OR message_event = 'sync_end'
)

SELECT 
    message_data:schema AS connection_schema,
    MAX(time_stamp) AS last_records_modified_at,
    SUM(CASE WHEN time_stamp > last_sync_completed_at OR last_sync_completed_at IS NULL THEN message_data:count::integer ELSE 0 END) AS row_volume_since_last_sync
FROM parse_json 
WHERE message_event = 'records_modified'
GROUP BY connection_schema
ORDER BY row_volume_since_last_sync DESC
;

This query depends on the LOG table.

Check daily modified records

The sample queries below return the volume of data that has been inserted, updated, or deleted each day. Query results are at the table level.

Use the sample query for your destination:

If you want to filter your results based on data modification type (for example, view inserts only), use the operationType field in the message_data JSON object.

BigQuery daily records

Expand for BigQuery query

SELECT
  DATE_TRUNC(CAST(time_stamp AS date), day) AS date_day,
  JSON_VALUE(message_data, '$.schema') AS schema, 
  JSON_VALUE(message_data, '$.table') AS table, 
  SUM(CAST(JSON_EXTRACT(message_data, '$.count') AS int64)) AS row_volume
FROM `<project>.<dataset>.log`
WHERE DATE_DIFF(CAST(CURRENT_DATE() AS date), CAST(time_stamp AS date), DAY) < 30
  AND message_event = 'records_modified'
GROUP BY date_day, schema, table
ORDER BY date_day DESC;

This query depends on the LOG table.

Snowflake daily records

Expand for Snowflake query

WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    PARSE_JSON(message_data) AS message_data
  FROM <destination_db>.<destination_schema>.log
  WHERE DATEDIFF(DAY, time_stamp, current_date) < 30
    AND message_event = 'records_modified'
)

SELECT 
  date_day,
  message_data:schema AS "schema",
  message_data:table AS "table",
  SUM(message_data:count::integer) AS row_volume
FROM parse_json
GROUP BY date_day, "schema", "table"
ORDER BY date_day DESC;

This query depends on the LOG table.

Audit user actions within connection

The following sample query returns various user actions that have been made within a connection for audit-trail purposes. This can be helpful when trying to trace a user action to a log event such as a schema change, sync frequency update, manual update, broken connection, and more.

Use the sample query for your destination:

BigQuery user actions within connection

Expand for BigQuery query

WITH parsed AS (
  SELECT
    DATE_TRUNC(DATE(time_stamp), DAY) AS date_day,
    FORMAT_DATE('%A', DATE(time_stamp)) AS dn,
    EXTRACT(DAYOFWEEK FROM DATE(time_stamp)) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    JSON_EXTRACT_SCALAR(message_data, '$.actor') AS acting_user
  FROM
    `<project>.<dataset>.log`
)
SELECT *
FROM parsed
WHERE acting_user IS NOT NULL;

This query depends on the LOG table.

Snowflake user actions within connection

Expand for Snowflake query

with parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    DAYNAME(time_stamp) AS dn,
    DAYOFWEEK(time_stamp) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
    FROM
      <destination_db>.<destination_schema>.log
),
t AS (
  SELECT
    id,
    event_time,
    dn AS weekday,
    dow,
    message_event,
    connection_id,
    message_data:actor AS acting_user
    FROM parse_json
)
SELECT
  *
FROM t
WHERE acting_user IS NOT NULL

This query depends on the LOG table.

Overview of event averages by day

Use the sample query for your destination:

BigQuery event averages by day

Expand for BigQuery query

WITH parsed AS (
  SELECT
    DATE_TRUNC(DATE(time_stamp), DAY) AS date_day,
    FORMAT_DATE('%A', DATE(time_stamp)) AS dn,
    EXTRACT(DAYOFWEEK FROM DATE(time_stamp)) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    JSON_EXTRACT_SCALAR(message_data, '$.event') AS extracted_event
  FROM `<project>.<dataset>.log`
),
t AS (
  SELECT
    date_day,
    dn,
    dow,
    message_event,
    connection_id,
    COUNT(id) AS event_count
  FROM parsed
  GROUP BY date_day, dn, dow, connection_id, message_event
),
ev AS (
  SELECT
    connection_id,
    message_event,
    dn AS weekday,
    ROUND(AVG(event_count)) AS av_event_count,
    ROUND(ROUND(AVG(event_count)) + ROUND(AVG(event_count)) * 0.2) AS high_event_variance_value,
    ROUND(ROUND(AVG(event_count)) - ROUND(AVG(event_count)) * 0.2) AS low_event_variance_value,
    ROUND(AVG(event_count) * 0.2) AS event_var_increment,
    ROUND(STDDEV_POP(event_count)) AS standard_deviation
  FROM t
  GROUP BY connection_id, message_event, dow, dn
  ORDER BY connection_id, message_event, dow
)
SELECT * FROM ev;

This query depends on the LOG table.

Snowflake event averages by day

Expand for Snowflake query

WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    DAYNAME(time_stamp) AS dn,
    DAYOFWEEK(time_stamp) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
    FROM <destination_db>.<destination_schema>.log
), 
t AS (
  SELECT
    date_day,
    dn,
    dow,
    message_event,
    connection_id,
    COUNT(id) AS event_count
  FROM parse_json
  GROUP BY date_day, dn, dow, connection_id, message_event
),
ev AS (
  SELECT
    t.connection_id,
    t.message_event,
    t.dn AS weekday,
    ROUND(AVG(t.event_count)) AS av_event_count,
    ROUND(ROUND(AVG(t.event_count)) + ROUND(AVG(t.event_count)) * .2) AS high_event_variance_value,
    ROUND(ROUND(AVG(t.event_count)) - ROUND(AVG(t.event_count)) * .2) AS low_event_variance_value,
    ROUND(ROUND(AVG(t.event_count)) * .2) AS event_var_increment,
    ROUND(STDDEV(t.event_count)) AS standard_deviation
  FROM t
  GROUP BY t.connection_id, t.message_event,t.dow, t.dn 
  ORDER BY t.connection_id, t.message_event,t.dow
)
SELECT * FROM ev

This query depends on the LOG table.

Assign your own variance logic and monitor your environment at event level

BigQuery monitor environment at event level

Expand for instructions in BigQuery

This query is run against the fivetran_log_event_averages table. A few example implementations to create the table fivetran_log_event_averages are are provided in the following sections:

WITH parsed AS (
  SELECT
    DATE_TRUNC(DATE(time_stamp), DAY) AS date_day,
    FORMAT_DATE('%A', DATE(time_stamp)) AS dn,
    EXTRACT(DAYOFWEEK FROM DATE(time_stamp)) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id
  FROM `<project>.<dataset>.log`
),
ec AS (
  SELECT
    COUNT(id) AS event_count,
    date_day,
    dn,
    dow,
    message_event,
    connection_id
  FROM parsed
  GROUP BY date_day, dn, dow, connection_id, message_event
),
av AS (
  SELECT *
  FROM `<project>.<dataset>.fivetran_log_event_averages`
)
SELECT
  ec.date_day,
  ec.dn AS weekday,
  ec.connection_id,
  ec.message_event,
  ec.event_count AS total_events,
  av.av_event_count,
  av.high_event_variance_value,
  av.low_event_variance_value,
  av.standard_deviation,
  av.event_var_increment,
  CASE
    WHEN ec.event_count > av.high_event_variance_value THEN 'Event_Variance'
    WHEN ec.event_count < av.low_event_variance_value THEN 'Event_Variance'
    ELSE 'Standard'
  END AS event_variance_flag
FROM ec
JOIN av
  ON ec.connection_id = av.connection_id
 AND ec.message_event = av.message_event
 AND ec.dn = av.weekday
ORDER BY ec.date_day, ec.dow, ec.connection_id, ec.message_event;

This query depends on the LOG table.

BigQuery `fivetran_log_event_averages` implementation options

Option 1: dbt implementation

Create the model file: models/<destination_schema>/fivetran_log_event_averages.sql:

{{ config(
    materialized = 'table',
    schema = <destination_schema> -- Replace <destination_schema> with your actual schema name
) }}

WITH parsed AS (
  SELECT
    DATE_TRUNC(DATE(time_stamp), DAY) AS date_day,
    FORMAT_DATE('%A', DATE(time_stamp)) AS dn,
    EXTRACT(DAYOFWEEK FROM DATE(time_stamp)) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id
  FROM `<project>.<dataset>.log`
),
t AS (
  SELECT
    date_day,
    dn,
    dow,
    message_event,
    connection_id,
    COUNT(id) AS event_count
  FROM parsed
  GROUP BY date_day, dn, dow, connection_id, message_event
),
ev AS (
  SELECT
    connection_id,
    message_event,
    dn AS weekday,
    ROUND(AVG(event_count)) AS av_event_count,
    ROUND(ROUND(AVG(event_count)) + ROUND(AVG(event_count)) * 0.2) AS high_event_variance_value,
    ROUND(ROUND(AVG(event_count)) - ROUND(AVG(event_count)) * 0.2) AS low_event_variance_value,
    ROUND(AVG(event_count) * 0.2) AS event_var_increment,
    ROUND(STDDEV_POP(event_count)) AS standard_deviation
  FROM t
  GROUP BY connection_id, message_event, dow, dn
)
SELECT * FROM ev;

Update models/sources.yml:

version: 2

sources:
  - name: fivetran
    database: "{{ var('fivetran_database') }}"
    schema: "{{ var('fivetran_schema') }}"
    tables:
      - name: log

Option 2: Direct BigQuery create table as...

CREATE OR REPLACE TABLE <destination_db>.<destination_schema>.fivetran_log_event_averages AS
WITH parsed AS (
  SELECT
    DATE_TRUNC(DATE(time_stamp), DAY) AS date_day,
    FORMAT_DATE('%A', DATE(time_stamp)) AS dn,
    EXTRACT(DAYOFWEEK FROM DATE(time_stamp)) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id
  FROM `<project>.<dataset>.log`
),
t AS (
  SELECT
    date_day,
    dn,
    dow,
    message_event,
    connection_id,
    COUNT(id) AS event_count
  FROM parsed
  GROUP BY date_day, dn, dow, connection_id, message_event
),
ev AS (
  SELECT
    connection_id,
    message_event,
    dn AS weekday,
    ROUND(AVG(event_count)) AS av_event_count,
    ROUND(ROUND(AVG(event_count)) + ROUND(AVG(event_count)) * 0.2) AS high_event_variance_value,
    ROUND(ROUND(AVG(event_count)) - ROUND(AVG(event_count)) * 0.2) AS low_event_variance_value,
    ROUND(AVG(event_count) * 0.2) AS event_var_increment,
    ROUND(STDDEV_POP(event_count)) AS standard_deviation
  FROM t
  GROUP BY connection_id, message_event, dow, dn
)
SELECT * FROM ev;

Snowflake monitor environment at event level

Expand for instructions in Snowflake

This query is run against the fivetran_log_event_averages table. A few example implementations to create the table fivetran_log_event_averages are provided in the following sections:

WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    DAYNAME(time_stamp) AS dn,
    DAYOFWEEK(time_stamp) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM <destination_db>.<destination_schema>.log

),
ec AS (
SELECT 
  COUNT(id) AS event_count,
  date_day,
  dn,
  dow,
  message_event,
  connection_id
FROM parse_json
GROUP BY date_day, dn, dow, connection_id, message_event
ORDER BY connection_id, message_event ASC
) 
, av AS (
SELECT
    ev.connection_id,
    ev.weekday,
    ev.message_event,
    ev.av_event_count,
    ev.high_event_variance_value,
    ev.low_event_variance_value,
    ev.event_var_increment,
    ev.standard_deviation
    FROM <destination_db>.<destination_schema>.fivetran_log_event_averages ev
    )
SELECT 
    ec.date_day,
    ec.dn AS weekday,
    ec.connection_id,
    ec.message_event,
    ec.event_count AS total_events,
    av.av_event_count,
    av.high_event_variance_value,
    av.low_event_variance_value,
    av.standard_deviation,
    av.event_var_increment,
    CASE WHEN ec.event_count > av.high_event_variance_value THEN  'Event_Variance'
         WHEN ec.event_count < av.low_event_variance_value  THEN  'Event_Variance'
            else 'Standard'
        END AS event_variance_flag
FROM ec 
    INNER JOIN av ON av.connection_id = ec.connection_id AND av.message_event = ec.message_event AND av.weekday = ec.dn
    ORDER BY 
        ec.date_day,
        ec.dow,
        ec.connection_id,
        ec.message_event

This query depends on the LOG table.

Snowflake `fivetran_log_event_averages` implementation options

Option 1: dbt implementation

Create the model file: models/<destination_schema>/fivetran_log_event_averages.sql:

{{ config(
    materialized = 'table',
    schema = <destination_schema> -- Replace <destination_schema> with your actual schema name
) }}

WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    DAYNAME(time_stamp) AS dn,
    DAYOFWEEK(time_stamp) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM {{ source('fivetran', 'log') }}
), 
t AS (
  SELECT
    date_day,
    dn,
    dow,
    message_event,
    connection_id,
    COUNT(id) AS event_count
  FROM parse_json
  GROUP BY date_day, dn, dow, connection_id, message_event
),
ev AS (
  SELECT
    t.connection_id,
    t.message_event,
    t.dn AS weekday,
    ROUND(AVG(t.event_count)) AS av_event_count,
    ROUND(ROUND(AVG(t.event_count)) + ROUND(AVG(t.event_count)) * .2) AS high_event_variance_value,
    ROUND(ROUND(AVG(t.event_count)) - ROUND(AVG(t.event_count)) * .2) AS low_event_variance_value,
    ROUND(ROUND(AVG(t.event_count)) * .2) AS event_var_increment,
    ROUND(STDDEV(t.event_count)) AS standard_deviation
  FROM t
  GROUP BY t.connection_id, t.message_event, t.dow, t.dn 
  ORDER BY t.connection_id, t.message_event, t.dow
)
SELECT * FROM ev

Update models/sources.yml:

version: 2

sources:
  - name: fivetran
    database: "{{ var('fivetran_database') }}"
    schema: "{{ var('fivetran_schema') }}"
    tables:
      - name: log

Option 2: Direct Snowflake create table as...

CREATE OR REPLACE TABLE <destination_db>.<destination_schema>.fivetran_log_event_averages AS
WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    DAYNAME(time_stamp) AS dn,
    DAYOFWEEK(time_stamp) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM <destination_db>.<destination_schema>.log
), 
t AS (
  SELECT
    date_day,
    dn,
    dow,
    message_event,
    connection_id,
    COUNT(id) AS event_count
  FROM parse_json
  GROUP BY date_day, dn, dow, connection_id, message_event
),
ev AS (
  SELECT
    t.connection_id,
    t.message_event,
    t.dn AS weekday,
    ROUND(AVG(t.event_count)) AS av_event_count,
    ROUND(ROUND(AVG(t.event_count)) + ROUND(AVG(t.event_count)) * .2) AS high_event_variance_value,
    ROUND(ROUND(AVG(t.event_count)) - ROUND(AVG(t.event_count)) * .2) AS low_event_variance_value,
    ROUND(ROUND(AVG(t.event_count)) * .2) AS event_var_increment,
    ROUND(STDDEV(t.event_count)) AS standard_deviation
  FROM t
  GROUP BY t.connection_id, t.message_event, t.dow, t.dn 
  ORDER BY t.connection_id, t.message_event, t.dow
)
SELECT * FROM ev

Usage

To use the fivetran_log_event_averages table, you can either run the dbt model or create the table directly via SQL.

dbt

dbt run --select fivetran_log_event_averages

Direct SQL

Run the appropriate CREATE TABLE AS query for your destination and replace the placeholder values in:

<destination_db>.<destination_schema>.fivetran_log_event_averages

Destination	`<destination_db>`	`<destination_schema>`	Example
Snowflake	Snowflake database name	Schema name	`analytics.platform.fivetran_log_event_averages`
BigQuery	GCP project ID	Dataset name	`my-project.my_dataset.fivetran_log_event_averages`

In BigQuery, the full table reference must be enclosed in backticks.

Output Schema

Column	Description
connection_id	Fivetran connection identifier
message_event	Type of event logged
weekday	Day of the week
av_event_count	Average event count
high_event_variance_value	Upper threshold (120% of average)
low_event_variance_value	Lower threshold (80% of average)
event_var_increment	20% of average event count
standard_deviation	Standard deviation of event counts

Review difference in seconds between `write_to_table_start` and `write_to_table_end` events

Use the sample query for your destination:

BigQuery review difference in seconds

Expand for BigQuery query

WITH parsed AS (
  SELECT
    DATE_TRUNC(DATE(time_stamp), DAY) AS date_day,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    JSON_EXTRACT_SCALAR(message_data, '$.table') AS table_name
  FROM `<project>.<dataset>.log`
  WHERE connection_id = '<your connection_id>'
),
t AS (
  SELECT 
    id,
    event_time,
    message_event,
    connection_id,
    table_name,
    RANK() OVER (PARTITION BY connection_id, table_name ORDER BY event_time ASC) AS rn,
    TIMESTAMP_DIFF(event_time, LAG(event_time) OVER (PARTITION BY connection_id, table_name ORDER BY event_time), SECOND) AS seconds_diff
  FROM parsed
  WHERE message_event IN ('write_to_table_start', 'write_to_table_end')
)
SELECT 
  id,
  event_time,
  message_event,
  connection_id,
  table_name AS `table`,
  CASE 
    WHEN message_event = 'write_to_table_start' AND seconds_diff > 0 THEN 0
    ELSE seconds_diff
  END AS diff
FROM t
ORDER BY connection_id, table_name, event_time;

This query depends on the LOG table.

Snowflake review difference in seconds

Expand for Snowflake query

WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM <destination_db>.<destination_schema>.log
  WHERE connection_id = '<your connection_id>'
), t AS (
SELECT 
  id,
  event_time,
  message_event,
  connection_id,
  message_data:table AS "table",
  RANK() OVER ( ORDER BY connection_id,"table",event_time ASC) AS rn ,
  DATEDIFF(second,lag(event_time,1) OVER (ORDER BY connection_id,"table",event_time  ASC),event_time) AS seconds_diff
FROM parse_json
WHERE message_event IN ('write_to_table_start','write_to_table_end')
GROUP BY id,connection_id,event_time,message_event,"table"
ORDER BY connection_id,"table",event_time ASC
)
SELECT 
    t.id,
    t.event_time,
    t.message_event,
    t.connection_id,
    t."table",
    CASE WHEN t.message_event = 'write_to_table_start'
         AND t.seconds_diff > 0
    THEN 0 ELSE t.seconds_diff
    END AS diff
FROM t

This query depends on the LOG table.

Review modified record count data by table

Snowflake review modified record count data by table

Expand for Snowflake query

WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    DAYNAME(time_stamp) AS dn,
    DAYOFWEEK(time_stamp) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
    FROM <destination_db>.<destination_schema>.log
), t AS (
SELECT 
  id,
  event_time,
  dn AS weekday,
  dow,
  message_event,
  connection_id,
  message_data:operationType AS optype,
  message_data:table AS logtable,
  message_data:count AS rowsimpacted
FROM parse_json
    WHERE message_event = 'records_modified'
      AND logtable <> 'fivetran_audit'
GROUP BY id,connection_id,event_time,dow,weekday,message_event,logtable,optype,rowsimpacted
ORDER BY connection_id,logtable ASC
)
SELECT
    connection_id,
    message_event,
    weekday,
    optype,
    logtable AS avtable,
    CAST(ROUND(AVG(t.rowsimpacted)) AS int) AS avgrow,
    ROUND(ROUND(AVG(t.rowsimpacted)) + ROUND(AVG(t.rowsimpacted)) * .2) AS high_variance_value,
    ROUND(ROUND(AVG(t.rowsimpacted)) - ROUND(AVG(t.rowsimpacted)) * .2) AS low_variance_value,
    ROUND(ROUND(AVG(t.rowsimpacted)) * .2 ) AS var_increment,
    IFNULL(ROUND(stddev(t.rowsimpacted)),0) AS standard_deviation
    FROM t
    GROUP BY connection_id,message_event,dow,weekday,avtable,optype
        ORDER BY connection_id,avtable,dow,optype

This query depends on the LOG table.

BigQuery review modified record count data by table

Expand for BigQuery query

WITH parse_json AS (
  SELECT
    DATE_TRUNC(time_stamp, DAY) AS date_day,
    FORMAT_DATETIME('%A', time_stamp) AS dn,
    EXTRACT(DAYOFWEEK FROM time_stamp) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    JSON_EXTRACT_SCALAR(message_data, '$.operationType') AS optype,
    JSON_EXTRACT_SCALAR(message_data, '$.table') AS logtable,
    CAST(JSON_EXTRACT_SCALAR(message_data, '$.count') AS BIGNUMERIC) AS rowsimpacted
    FROM `<project>.<dataset>.log`
), t AS (
SELECT 
  id,
  event_time,
  dn AS weekday,
  dow,
  message_event,
  connection_id,
  optype,
  logtable,
  rowsimpacted
FROM parse_json
    WHERE message_event = 'records_modified'
      AND logtable <> 'fivetran_audit'
GROUP BY id,connection_id,event_time,dow,weekday,message_event,logtable,optype,rowsimpacted
)
SELECT
    connection_id,
    message_event,
    weekday,
    optype,
    logtable AS avtable,
    CAST(ROUND(AVG(t.rowsimpacted)) AS INT64) AS avgrow,
    ROUND(ROUND(AVG(t.rowsimpacted)) + ROUND(AVG(t.rowsimpacted)) * .2) AS high_variance_value,
    ROUND(ROUND(AVG(t.rowsimpacted)) - ROUND(AVG(t.rowsimpacted)) * .2) AS low_variance_value,
    ROUND(ROUND(AVG(t.rowsimpacted)) * .2 ) AS var_increment,
    IFNULL(ROUND(STDDEV(t.rowsimpacted)),0) AS standard_deviation
FROM t
GROUP BY connection_id,message_event,dow,weekday,avtable,optype
ORDER BY connection_id,avtable,dow,optype

This query depends on the LOG table.

Assign your own variance logic and monitor your environment at table level

BigQuery monitor environment at table level

Expand for instructions in BigQuery

This query is run against the fivetran_records_modified_averages table. A few example implementations to create the table fivetran_records_modified_averages are provided in the following sections below.

WITH parsed AS (
  SELECT
    DATE_TRUNC(DATE(time_stamp), DAY) AS date_day,
    FORMAT_DATE('%A', DATE(time_stamp)) AS dn,
    EXTRACT(DAYOFWEEK FROM DATE(time_stamp)) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    JSON_EXTRACT_SCALAR(message_data, '$.operationType') AS optype,
    JSON_EXTRACT_SCALAR(message_data, '$.table') AS logtable,
    CAST(JSON_EXTRACT_SCALAR(message_data, '$.count') AS INT64) AS rowsimpacted
  FROM `<destination_db>.<destination_schema>.log`
  WHERE message_event = 'records_modified'
    AND JSON_EXTRACT_SCALAR(message_data, '$.table') <> 'fivetran_audit'
),
av AS (
  SELECT *
  FROM `<destination_db>.<destination_schema>.fivetran_records_modified_averages`
)
SELECT
  parsed.id,
  parsed.dn AS weekday,
  parsed.date_day,
  parsed.event_time,
  parsed.message_event,
  parsed.optype,
  parsed.connection_id,
  parsed.logtable,
  parsed.rowsimpacted,
  av.avgrow,
  av.high_variance_value,
  av.low_variance_value,
  av.var_increment,
  av.standard_deviation,
  CASE
    WHEN parsed.rowsimpacted > av.high_variance_value THEN 'Variance'
    WHEN parsed.rowsimpacted < av.low_variance_value THEN 'Variance'
    ELSE 'Standard'
  END AS varianceflag
FROM parsed
JOIN av ON
  parsed.connection_id = av.connection_id AND
  parsed.dn = av.weekday AND
  parsed.logtable = av.avtable AND
  parsed.optype = av.optype
ORDER BY parsed.connection_id, parsed.dow, parsed.event_time, parsed.logtable;

This query depends on the LOG table.

BigQuery `fivetran_records_modified_averages` implementation options

Option 1: dbt implementation

Create the model file models/<destination_schema>/fivetran_records_modified_averages.sql:

{{ config(
    materialized = 'table',
    schema = <destination_schema>
) }}

WITH parsed AS (
  SELECT
    DATE_TRUNC(DATE(time_stamp), DAY) AS date_day,
    FORMAT_DATE('%A', DATE(time_stamp)) AS dn,
    EXTRACT(DAYOFWEEK FROM DATE(time_stamp)) AS dow,
    id,
    time_stamp,
    message_event,
    connection_id,
    JSON_EXTRACT_SCALAR(message_data, '$.operationType') AS optype,
    JSON_EXTRACT_SCALAR(message_data, '$.table') AS logtable,
    CAST(JSON_EXTRACT_SCALAR(message_data, '$.count') AS INT64) AS rowsimpacted
  FROM {{ source('fivetran', 'log') }}
  WHERE message_event = 'records_modified'
    AND JSON_EXTRACT_SCALAR(message_data, '$.table') <> 'fivetran_audit'
),
t AS (
  SELECT
    connection_id,
    message_event,
    dn AS weekday,
    dow,
    logtable AS avtable,
    optype,
    rowsimpacted
  FROM parsed
),
ev AS (
  SELECT
    connection_id,
    message_event,
    weekday,
    optype,
    avtable,
    CAST(ROUND(AVG(rowsimpacted)) AS INT64) AS avgrow,
    ROUND(AVG(rowsimpacted) + AVG(rowsimpacted) * 0.2) AS high_variance_value,
    ROUND(AVG(rowsimpacted) - AVG(rowsimpacted) * 0.2) AS low_variance_value,
    ROUND(AVG(rowsimpacted) * 0.2) AS var_increment,
    IFNULL(ROUND(STDDEV_POP(rowsimpacted)), 0) AS standard_deviation
  FROM t
  GROUP BY connection_id, message_event, weekday, dow, avtable, optype
)
SELECT * FROM ev;

Update models/sources.yml:

version: 2

sources:
  - name: fivetran
    database: "{{ var('fivetran_database') }}"
    schema: "{{ var('fivetran_schema') }}"
    tables:
      - name: log

Option 2: Direct BigQuery create table as...

CREATE OR REPLACE TABLE <destination_db>.<destination_schema>.fivetran_records_modified_averages AS
WITH parsed AS (
  SELECT
    DATE_TRUNC(DATE(time_stamp), DAY) AS date_day,
    FORMAT_DATE('%A', DATE(time_stamp)) AS dn,
    EXTRACT(DAYOFWEEK FROM DATE(time_stamp)) AS dow,
    id,
    time_stamp,
    message_event,
    connection_id,
    JSON_EXTRACT_SCALAR(message_data, '$.operationType') AS optype,
    JSON_EXTRACT_SCALAR(message_data, '$.table') AS logtable,
    CAST(JSON_EXTRACT_SCALAR(message_data, '$.count') AS INT64) AS rowsimpacted
  FROM `<destination_db>.<destination_schema>.log`
  WHERE message_event = 'records_modified'
    AND JSON_EXTRACT_SCALAR(message_data, '$.table') <> 'fivetran_audit'
),
t AS (
  SELECT
    connection_id,
    message_event,
    dn AS weekday,
    dow,
    logtable AS avtable,
    optype,
    rowsimpacted
  FROM parsed
),
ev AS (
  SELECT
    connection_id,
    message_event,
    weekday,
    optype,
    avtable,
    CAST(ROUND(AVG(rowsimpacted)) AS INT64) AS avgrow,
    ROUND(AVG(rowsimpacted) + AVG(rowsimpacted) * 0.2) AS high_variance_value,
    ROUND(AVG(rowsimpacted) - AVG(rowsimpacted) * 0.2) AS low_variance_value,
    ROUND(AVG(rowsimpacted) * 0.2) AS var_increment,
    IFNULL(ROUND(STDDEV_POP(rowsimpacted)), 0) AS standard_deviation
  FROM t
  GROUP BY connection_id, message_event, weekday, dow, avtable, optype
)
SELECT * FROM ev;

Snowflake monitor environment at table level

Expand for instructions in Snowflake

WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    DAYNAME(time_stamp) AS dn,
    DAYOFWEEK(time_stamp) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM <destination_db>.<destination_schema>.log
  WHERE message_event = 'records_modified'
),
t AS (
  SELECT
    id,
    date_day,
    dn,
    dow,
    event_time,
    message_event,
    connection_id,
    message_data:operationType AS optype,
    message_data:table AS logtable,
    CAST(message_data:count AS INT) AS rowsimpacted
  FROM parse_json
  WHERE message_data:table <> 'fivetran_audit'
),
av AS (
  SELECT *
  FROM <destination_db>.<destination_schema>.fivetran_records_modified_averages
)
SELECT
  t.id,
  t.dn AS weekday,
  t.date_day,
  t.event_time,
  t.message_event,
  t.optype,
  t.connection_id,
  t.logtable,
  t.rowsimpacted,
  av.avgrow,
  av.high_variance_value,
  av.low_variance_value,
  av.var_increment,
  av.standard_deviation,
  CASE
    WHEN t.rowsimpacted > av.high_variance_value THEN 'Variance'
    WHEN t.rowsimpacted < av.low_variance_value THEN 'Variance'
    ELSE 'Standard'
  END AS varianceflag
FROM t
JOIN av ON
  t.connection_id = av.connection_id AND
  t.dn = av.weekday AND
  t.logtable = av.avtable AND
  t.optype = av.optype
ORDER BY t.connection_id, t.dow, t.event_time, t.logtable;

This query depends on the LOG table.

Snowflake `fivetran_records_modified_averages` implementation options

Option 1: dbt implementation

Create the model file models/<destination_schema>/fivetran_records_modified_averages.sql:

{{ config(
    materialized = 'table',
    schema = <destination_schema>
) }}

WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    DAYNAME(time_stamp) AS dn,
    DAYOFWEEK(time_stamp) AS dow,
    id,
    time_stamp,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM {{ source('fivetran', 'log') }}
  WHERE message_event = 'records_modified'
),
t AS (
  SELECT
    connection_id,
    message_event,
    dn AS weekday,
    dow,
    message_data:table AS avtable,
    message_data:operationType AS optype,
    CAST(message_data:count AS INT) AS rowsimpacted
  FROM parse_json
  WHERE message_data:table <> 'fivetran_audit'
),
ev AS (
  SELECT
    connection_id,
    message_event,
    weekday,
    optype,
    avtable,
    CAST(ROUND(AVG(rowsimpacted)) AS INT) AS avgrow,
    ROUND(ROUND(AVG(rowsimpacted)) + ROUND(AVG(rowsimpacted)) * 0.2) AS high_variance_value,
    ROUND(ROUND(AVG(rowsimpacted)) - ROUND(AVG(rowsimpacted)) * 0.2) AS low_variance_value,
    ROUND(ROUND(AVG(rowsimpacted)) * 0.2) AS var_increment,
    IFNULL(ROUND(STDDEV(rowsimpacted)), 0) AS standard_deviation
  FROM t
  GROUP BY connection_id, message_event, dow, weekday, avtable, optype
)
SELECT * FROM ev;

Update models/sources.yml:

version: 2

sources:
  - name: fivetran
    database: "{{ var('fivetran_database') }}"
    schema: "{{ var('fivetran_schema') }}"
    tables:
      - name: log

Option 2: Direct Snowflake create table as...

CREATE OR REPLACE TABLE <destination_db>.<destination_schema>.fivetran_records_modified_averages AS
WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    DAYNAME(time_stamp) AS dn,
    DAYOFWEEK(time_stamp) AS dow,
    id,
    time_stamp,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM <destination_db>.<destination_schema>.log
  WHERE message_event = 'records_modified'
),
t AS (
  SELECT
    connection_id,
    message_event,
    dn AS weekday,
    dow,
    message_data:table AS avtable,
    message_data:operationType AS optype,
    CAST(message_data:count AS INT) AS rowsimpacted
  FROM parse_json
  WHERE message_data:table <> 'fivetran_audit'
),
ev AS (
  SELECT
    connection_id,
    message_event,
    weekday,
    optype,
    avtable,
    CAST(ROUND(AVG(rowsimpacted)) AS INT) AS avgrow,
    ROUND(ROUND(AVG(rowsimpacted)) + ROUND(AVG(rowsimpacted)) * 0.2) AS high_variance_value,
    ROUND(ROUND(AVG(rowsimpacted)) - ROUND(AVG(rowsimpacted)) * 0.2) AS low_variance_value,
    ROUND(ROUND(AVG(rowsimpacted)) * 0.2) AS var_increment,
    IFNULL(ROUND(STDDEV(rowsimpacted)), 0) AS standard_deviation
  FROM t
  GROUP BY connection_id, message_event, dow, weekday, avtable, optype
)
SELECT * FROM ev;

Usage

To use the fivetran_records_modified_averages table, you can either run the dbt model or create the table directly via SQL.

dbt

dbt run --select fivetran_records_modified_averages

Direct SQL

Run the appropriate CREATE TABLE AS query for your destination and replace the placeholder values in:

<destination_db>.<destination_schema>.fivetran_records_modified_averages

Destination	`<destination_db>`	`<destination_schema>`	Example
Snowflake	Snowflake database name	Schema name	`analytics.platform.fivetran_records_modified_averages`
BigQuery	GCP project ID	Dataset name	`my-project.my_dataset.fivetran_records_modified_averages`

In BigQuery, the full table reference must be enclosed in backticks.

Output Schema

Column	Description
`connection_id`	Fivetran connection identifier
`message_event`	Type of event (always `records_modified` in this model)
`weekday`	Day of the week
`optype`	Operation type (`insert`, `update`, `delete`)
`avtable`	Table name affected by the operation
`avgrow`	Average number of rows impacted
`high_variance_value`	Upper threshold (120% of average row count)
`low_variance_value`	Lower threshold (80% of average row count)
`var_increment`	20% of average row count
`standard_deviation`	Standard deviation of row counts

dbt transformation data

Snowflake: dbt transformation data for a given event.

Expand for Snowflake query

WITH a AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    id,
    sync_id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM <destination_db>.<destination_schema>.log
  WHERE message_event IN ('transformation_start', 'transformation_succeeded', 'transformation_failed') 
)
SELECT
    message_data:dbtJobId::string AS dbtJobId,
    message_data:dbtJobName::string AS dbtJobName,
    message_data:dbtJobType::string AS dbtJobType,
    message_data:startTime::timestamp AS startTime,
    message_data:endTime::timestamp AS endTime,
    message_data:result:stepResults[0]:success::boolean AS success,
    message_data:models AS models,
    message_data:result:stepResults AS stepResults,
    message_data:startupDetails AS startupDetails,
    message_data:result:stepResults[0]:knownFailedModels AS knownFailedModels,
    message_data:result:stepResults[0]:knownSuccessfulModels AS knownSuccessfulModels,
    message_data
FROM a

This query depends on the LOG table.

BigQuery: dbt transformation data for a given event.

Expand for BigQuery query

WITH a AS (
  SELECT
    DATE_TRUNC(time_stamp, DAY) AS date_day,
    id,
    sync_id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM `<project>.<dataset>.log`
  WHERE message_event IN ('transformation_start', 'transformation_succeeded', 'transformation_failed') 
)
SELECT
    JSON_VALUE(message_data.dbtJobId) AS dbtJobId,
    JSON_VALUE(message_data.dbtJobName) AS dbtJobName,
    JSON_VALUE(message_data.dbtJobType) AS dbtJobType,
    CAST(JSON_VALUE(message_data.startTime) AS timestamp) AS startTime,
    CAST(JSON_VALUE(message_data.endTime) AS timestamp) AS endTime,
    CAST(JSON_VALUE(message_data.result.stepResults[0].success) AS boolean) AS success,
    JSON_VALUE(message_data.models) AS models,
    JSON_VALUE(message_data.result.stepResults) AS stepResults,
    JSON_VALUE(message_data.startupDetails) AS startupDetails,
    JSON_VALUE(message_data.result.stepResults[0].knownFailedModels) AS knownFailedModels,
    JSON_VALUE(message_data.result.stepResults[0].knownSuccessfulModels) AS knownSuccessfulModels,
    message_data
FROM a

This query depends on the LOG table.

Transformation data

Snowflake: Transformation data

Expand for Snowflake query

The following sample query returns transformation data for a given event.

WITH a AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    id,
    sync_id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM <destination_db>.<destination_schema>.log
  WHERE message_event IN ('transformation_start', 'transformation_succeeded', 'transformation_failed') 
)
SELECT
    message_data:id::string AS jobId,
    message_data:name::string AS jobName,
    message_data:transformationType::string AS transformationType,
    message_data:startTime::timestamp AS startTime,
    message_data:endTime::timestamp AS endTime,
    message_data:schedule AS schedule,
    message_data:result:stepResults AS stepResults,
    message_data:description AS resultsSummary,
    message_data
FROM a

This query depends on the LOG table.

BigQuery: Transformation data

Expand for BigQuery query

The following sample query returns transformation data for a given event.

WITH a AS (
  SELECT
    DATE_TRUNC(time_stamp, DAY) AS date_day,
    id,
    sync_id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM `<project>.<dataset>.log`
  WHERE message_event IN ('transformation_start','transformation_succeeded','transformation_failed') 
)
SELECT
    JSON_VALUE(message_data.id) AS jobId,
    JSON_VALUE(message_data.name) AS jobName,
    JSON_VALUE(message_data.transformationType) AS transformationType,
    CAST(JSON_VALUE(message_data.startTime) AS timestamp) AS startTime,
    CAST(JSON_VALUE(message_data.endTime) AS timestamp) AS endTime,
    JSON_VALUE(message_data.schedule) AS schedule,
    JSON_VALUE(message_data.result.stepResults) AS stepResults,
    JSON_VALUE(message_data.description) AS resultsSummary,
    message_data
FROM a

This query depends on the LOG table.

Sync events

Snowflake: sync events

Expand for Snowflake query

The following sample query returns all events FROM a given sync.

WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    id,
    sync_id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
  FROM <destination_db>.<destination_schema>.log
  WHERE sync_id = '<enter your sync_id here>'
), t AS (
SELECT 
  id,
  sync_id,
  event_time,
  message_event,
  connection_id,
  message_data,
  message_data:table AS "table",
  message_data:query AS query,
  RANK() OVER ( ORDER BY sync_id, connection_id,event_time ASC) AS rn ,
  DATEDIFF(second,lag(event_time,1) over (ORDER BY sync_id, connection_id,event_time  ASC),event_time) AS seconds_diff
FROM parse_json
GROUP BY id,sync_id,connection_id,event_time,message_event,message_data,"table"
)
SELECT 
    t.id,
    t.sync_id,
    t.event_time,
    t.message_event,
    t.message_data,
    t.connection_id,
    t.query,
    t."table",
    CASE WHEN t.message_event = 'write_to_table_start'
          AND t.seconds_diff > 0
    THEN 0 else t.seconds_diff
    END AS diff,
    t.rn
FROM t 
    ORDER BY t.sync_id, t.event_time ASC

This query depends on the LOG table.

BigQuery: sync events

Expand for BigQuery query

The following sample query returns all events FROM a given sync.

WITH parse_json AS (
  SELECT
    DATE_TRUNC(time_stamp, DAY) AS date_day,
    id,
    sync_id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    message_data,
    JSON_EXTRACT_SCALAR(message_data,'$.table') AS `table`,
    JSON_EXTRACT_SCALAR(message_data,'$.query') AS query
  FROM `<project>.<dataset>.log`
  WHERE sync_id = '<enter your sync_id here>'
), t AS (
SELECT DISTINCT  
  id,
  sync_id,
  event_time,
  message_event,
  connection_id,
  message_data,
  `table`,
  query,
  RANK() OVER ( ORDER BY sync_id, connection_id,event_time ASC) AS rn ,
  TIMESTAMP_DIFF(event_time, LAG(event_time, 1) OVER (ORDER BY sync_id, connection_id, event_time ASC), SECOND) AS seconds_diff
FROM parse_json
)
SELECT 
    t.id,
    t.sync_id,
    t.event_time,
    t.message_event,
    t.message_data,
    t.connection_id,
    t.query,
    t.`table`,
    CASE WHEN t.message_event = 'write_to_table_start'
          AND t.seconds_diff > 0
    THEN 0 else t.seconds_diff
    END AS diff,
    t.rn
FROM t 
    ORDER BY t.sync_id, t.event_time ASC

This query depends on the LOG table.

Sync statistics

Snowflake: sync statistics

Expand for Snowflake query

The following sample query returns sync statistics for PostgreSQL, Oracle, MySQL, and SQL Server connections.

WITH parse_json AS (
  SELECT
    DATE_TRUNC('DAY', time_stamp) AS date_day,
    DAYNAME(time_stamp) AS dn,
    DAYOFWEEK(time_stamp) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS message_data
      FROM <destination_db>.<destination_schema>.log
    WHERE message_event = 'sync_stats'
),t AS (
SELECT 
  id,
  event_time,
  dn AS weekday,
  dow,
  message_event,
  connection_id,
  message_data:extract_time_s AS extract_time_s,
  message_data:extract_volume_mb AS extract_volume_mb,
  message_data:load_time_s AS load_time_s,
  message_data:load_volume_mb AS load_volume_mb,
  message_data:process_time_s AS process_time_s,
  message_data:process_volume_mb AS process_volume_mb,
  message_data:total_time_s AS total_time_s
  FROM parse_json
  )
SELECT * FROM t ORDER BY extract_time_s DESC

This query depends on the LOG table.

BigQuery: sync statistics

Expand for BigQuery query

The following sample query returns sync statistics for PostgreSQL, Oracle, MySQL, and SQL Server connections.

WITH parse_json AS (
  SELECT
    TIMESTAMP_TRUNC(time_stamp, DAY) AS date_day,
    FORMAT_TIMESTAMP('%a', time_stamp) AS dn,
    EXTRACT(DAYOFWEEK FROM time_stamp) AS dow,
    id,
    time_stamp AS event_time,
    message_event,
    connection_id,
    PARSE_JSON(message_data) AS parsed_message_data
  FROM `<project>.<dataset>.log`
  WHERE
    message_event = 'sync_stats'
) SELECT
    id,
    event_time,
    dn AS weekday,
    dow,
    message_event,
    connection_id,
    CAST(JSON_VALUE(parsed_message_data.extract_time_s) AS INT64) AS extract_time_s,
    CAST(JSON_VALUE(parsed_message_data.extract_volume_mb) AS INT64) AS extract_volume_mb,
    CAST(JSON_VALUE(parsed_message_data.load_time_s) AS INT64) AS load_time_s,
    CAST(JSON_VALUE(parsed_message_data.load_volume_mb) AS INT64) AS load_volume_mb,
    CAST(JSON_VALUE(parsed_message_data.process_time_s) AS INT64) AS process_time_s,
    CAST(JSON_VALUE(parsed_message_data.process_volume_mb) AS INT64) AS process_volume_mb,
    CAST(JSON_VALUE(parsed_message_data.total_time_s) AS INT64) AS total_time_s
  FROM
    parse_json
  ORDER BY extract_time_s DESC

This query depends on the LOG table.

Total sync duration along with synced record count

Snowflake: Total sync duration along with synced record count

Expand for Snowflake query

select
    time_stamp as sync_time, 
    json_extract_path_text(message_data, 'total_time_s') as seconds, 
    r.row_count 
from <destination_db>.<destination_schema>.log l
inner join <destination_db>.<destination_schema>.connection c on l.connection_id = c.connection_id
inner join 
    (select sync_id, sum(json_extract_path_text(message_data, 'count')) as row_count
    from <destination_db>.<destination_schema>.log
    where message_event = 'records_modified'
    group by sync_id) r on l.sync_id = r.sync_id
where c.connection_name = 'your_schema_name' and l.message_event = 'sync_stats'
order by time_stamp desc;

This query depends on the LOG table.

BigQuery: Total sync duration along with synced record count

Expand for BigQuery query

SELECT
  time_stamp AS sync_time,
  JSON_EXTRACT_SCALAR(message_data, '$.total_time_s') AS seconds,
  r.row_count
FROM `<project>.<dataset>.log` l
INNER JOIN `<project>.<dataset>.connection` c
  ON l.connection_id = c.connection_id
INNER JOIN (
  SELECT
    sync_id,
    SUM(CAST(JSON_EXTRACT_SCALAR(message_data, '$.count') AS INT64)) AS row_count
  FROM `<project>.<dataset>.log`
  WHERE message_event = 'records_modified'
  GROUP BY sync_id
) r
  ON l.sync_id = r.sync_id
WHERE c.connection_name = 'your_schema_name'
  AND l.message_event = 'sync_stats'
ORDER BY time_stamp DESC;

This query depends on the LOG table.

API extract_summary data

Snowflake: API extract_summary data

Expand for Snowflake query

The following sample query returns extract summary log data.

If the query fails when using INNER JOIN <destination_db>.<destination_schema>.connection c on l.connection_id = c.connection_id, re-sync your Fivetran Platform connection.

WITH es as(
SELECT 
c.connection_name,
l.time_stamp,
PARSE_JSON(message_data) as md
    FROM <destination_db>.<destination_schema>.log l
    INNER JOIN <destination_db>.<destination_schema>.connection c on l.connection_id = c.connection_id
WHERE message_event = 'extract_summary'
    ORDER BY l._fivetran_synced DESC
)
SELECT
    connection_name,
    time_stamp,
    md:status,
    md:total_queries,
    md:total_rows,
    md:total_size,
    md:rounded_total_size,
    md:objects
FROM es

This query depends on the LOG table.

BigQuery: API extract_summary data

Expand for BigQuery query

The following sample query returns extract summary log data.

If the query fails when using INNER JOIN <project>.<dataset>.connection c on l.connection_id = c.connection_id, re-sync your Fivetran Platform connection.

WITH es AS (
SELECT 
    c.connection_name,
    l.time_stamp,
    PARSE_JSON(message_data) md,
    l._fivetran_synced
FROM `<project>.<dataset>.log` l
    INNER JOIN `<project>.<dataset>.connection` c ON l.connection_id = c.connection_id
WHERE message_event = 'extract_summary'
)
SELECT
    connection_name,
    time_stamp,
    JSON_VALUE(md.status) status,
    CAST(JSON_VALUE(md.total_queries) AS INT64) total_queries,
    CAST(JSON_VALUE(md.total_rows) AS INT64) total_rows,
    CAST(JSON_VALUE(md.total_size) AS INT64) total_size,
    JSON_VALUE(md.rounded_total_size) rounded_total_size,
    md.objects
FROM es
ORDER BY _fivetran_synced DESC

This query depends on the LOG table.

API extract_summary object data

Snowflake: API extract_summary object data

Expand for Snowflake query

The following sample query returns extract summary log data for API objects.

If the query fails when using INNER JOIN <destination_db>.<destination_schema>.connection c on l.connection_id = c.connection_id, re-sync your Fivetran Platform connection.

WITH es as(
SELECT 
    c.connection_name,
    l.time_stamp,
    PARSE_JSON(message_data) as md
FROM <destination_db>.<destination_schema>.log l
    INNER JOIN <destination_db>.<destination_schema>.connection c on l.connection_id = c.connection_id
WHERE message_event = 'extract_summary'
ORDER BY l._fivetran_synced DESC
), eso AS (
SELECT
    connection_name,
    time_stamp,
    md:status,
    md:total_queries,
    md:total_rows,
    md:total_size,
    md:rounded_total_size,
    PARSE_JSON(md:objects) as o
FROM es
)
SELECT 
    eso.connection_name,
    value:name AS name,
    value:queries AS queries
FROM eso,
    LATERAL FLATTEN(input => PARSE_JSON(o))

This query depends on the LOG table.

BigQuery: API extract_summary object data

Expand for BigQuery query

The following sample query returns extract summary log data for API objects.

If the query fails when using INNER JOIN <project>.<dataset>.connection c on l.connection_id = c.connection_id, re-sync your Fivetran Platform connection.

WITH es AS (
  SELECT
    c.connection_name,
    l.time_stamp,
    l._fivetran_synced,
    PARSE_JSON(l.message_data) AS md
  FROM
    `<project>.<dataset>.log` AS l
  INNER JOIN
    `<project>.<dataset>.connection` AS c
    ON l.connection_id = c.connection_id
  WHERE
    l.message_event = 'extract_summary'
),
eso AS (
  SELECT
    es._fivetran_synced,
    es.connection_name,
    es.time_stamp,
    JSON_VALUE(es.md.status) status,
    CAST(JSON_VALUE(es.md.total_queries) AS INT64) total_queries,
    CAST(JSON_VALUE(es.md.total_rows) AS INT64) total_rows,
    CAST(JSON_VALUE(es.md.total_size) AS INT64) total_size,
    JSON_VALUE(es.md.rounded_total_size) AS rounded_total_size,
    JSON_QUERY_ARRAY(es.md.objects) AS objects
  FROM es
)
SELECT
  eso.connection_name,
  JSON_VALUE(object.name) AS name,
  CAST(JSON_VALUE(object.queries) AS INT64) AS queries
FROM eso
  CROSS JOIN UNNEST(eso.objects) AS object
ORDER BY eso._fivetran_synced DESC

This query depends on the LOG table.

Check metadata

You must have an Enterprise plan or higher to query metadata.

The Fivetran Platform Connector provides access to metadata for data synced by Fivetran, which helps you understand the mapping between the source and destination. The data retrieved can be easily consumed in BI tools, data catalogs, or through direct SQL queries.

The data retrieved helps organizations:

Understand data synced by Fivetran
Audit and enforce access control
Retrieve metadata changes

The following queries return normalized tables with information on source and destination connections, schemas, tables, and columns.

Check which data moved through Fivetran

Expand for universal query

The query includes source/destination mapping so that it can be filtered by source.connectionId.

SELECT * FROM <destination_db>.<destination_schema>.connection c
JOIN <destination_db>.<destination_schema>.source_schema_metadata ssm
ON c.connection_id = ssm.connection_id
JOIN <destination_db>.<destination_schema>.source_table_metadata stm
ON stm.schema_id = ssm.id
JOIN <destination_db>.<destination_schema>.source_column_metadata scm
ON scm.table_id = stm.id
JOIN <destination_db>.<destination_schema>.schema_lineage sl
ON ssm.id = sl.source_schema_id
JOIN <destination_db>.<destination_schema>.table_lineage tl
ON stm.id = tl.source_table_id
JOIN <destination_db>.<destination_schema>.column_lineage cl
ON scm.id = cl.source_column_id
JOIN <destination_db>.<destination_schema>.destination_schema_metadata dsm
ON sl.destination_schema_id = dsm.id
JOIN <destination_db>.<destination_schema>.destination_table_metadata dtm
ON tl.destination_table_id = dtm.id
JOIN <destination_db>.<destination_schema>.destination_column_metadata dcm
ON cl.destination_column_id = dcm.id
LEFT JOIN <destination_db>.<destination_schema>.source_foreign_key_metadata sfkm
ON sfkm.column_id = scm.id

This query depends on the following tables:

What is this data a reference to

Expand for universal query

SELECT * FROM <destination_db>.<destination_schema>.source_column_metadata scm
JOIN <destination_db>.<destination_schema>.column_lineage cl
  ON scm.id = cl.source_column_id
WHERE cl.destination_column_id = %column_id%

This query depends on the following tables:

What downstream assets are impacted by this data

Expand for universal query

SELECT * FROM <destination_db>.<destination_schema>.destination_column_metadata dcm
JOIN <destination_db>.<destination_schema>.column_lineage cl
  ON dcm.id = cl.destination_column_id
WHERE cl.source_column_id = %column_id%

This query depends on the following tables:

Check audit trail events

You must have an Enterprise plan or higher to query audit trail events.

Check all actions by a specific user

Expand for universal query

SELECT action, captured_at, primary_resource_type, primary_resource_id
FROM <destination_db>.<destination_schema>.audit_trail
WHERE user_id = 'some_user_id'
ORDER BY captured_at DESC;

This query depends on the AUDIT_TRAIL table.

Check recent edits on accounts

Expand for universal query

SELECT user_id, captured_at, old_values, new_values
FROM <destination_db>.<destination_schema>.audit_trail
WHERE primary_resource_type = 'ACCOUNT' AND action = 'EDIT'
ORDER BY captured_at DESC;

This query depends on the AUDIT_TRAIL table.

List actions completed through API

Expand for universal query

SELECT user_id, action, primary_resource_type, primary_resource_id, captured_at
FROM <destination_db>.<destination_schema>.audit_trail
WHERE interaction_method = 'API'
ORDER BY captured_at DESC;

This query depends on the AUDIT_TRAIL table.

Check severe messages of SDK connections

Expand for universal query

SELECT message, event_time, connection_id, sync_id
FROM <destination_db>.<destination_schema>.connector_sdk_log
WHERE level = 'SEVERE' and message_origin = 'connector_sdk'
ORDER BY event_time DESC;

This query depends on the CONNECTOR_SDK_LOG table.

Fivetran Platform Connector Sample Queries

Calculate monthly active rows (MAR) per connection

Calculate MAR grouped by schema (connection), destination, and month

Snowflake MAR

BigQuery MAR

Calculate MAR by table

Snowflake MAR by table

BigQuery MAR by table

Calculate monthly transformation model runs

Calculate model runs grouped by month, destination, and job name

Snowflake

BigQuery

Calculate model runs grouped by destination, project type, and month

BigQuery

Snowflake

Check connection status

Check sync start and end times

Troubleshoot errors and warnings

Check records modified since last sync

BigQuery modified records since last sync

Snowflake modified records since last sync

Check daily modified records

BigQuery daily records

Snowflake daily records

Audit user actions within connection

BigQuery user actions within connection

Snowflake user actions within connection

Overview of event averages by day

BigQuery event averages by day

Snowflake event averages by day

Assign your own variance logic and monitor your environment at event level

BigQuery monitor environment at event level

BigQuery fivetran_log_event_averages implementation options

Option 1: dbt implementation

Option 2: Direct BigQuery create table as...

Snowflake monitor environment at event level

Snowflake fivetran_log_event_averages implementation options

Option 1: dbt implementation

Option 2: Direct Snowflake create table as...

Usage

dbt

Direct SQL

Output Schema

Review difference in seconds between write_to_table_start and write_to_table_end events

BigQuery review difference in seconds

Snowflake review difference in seconds

Review modified record count data by table

Snowflake review modified record count data by table

BigQuery review modified record count data by table

Assign your own variance logic and monitor your environment at table level

BigQuery monitor environment at table level

BigQuery fivetran_records_modified_averages implementation options

Option 1: dbt implementation

Option 2: Direct BigQuery create table as...

Snowflake monitor environment at table level

Snowflake fivetran_records_modified_averages implementation options

Option 1: dbt implementation

Option 2: Direct Snowflake create table as...

Usage

dbt

Direct SQL

Output Schema

dbt transformation data

Snowflake: dbt transformation data for a given event.

BigQuery: dbt transformation data for a given event.

Transformation data

Snowflake: Transformation data

BigQuery: Transformation data

Sync events

Snowflake: sync events

BigQuery: sync events

Sync statistics

Snowflake: sync statistics

BigQuery: sync statistics

Total sync duration along with synced record count

Snowflake: Total sync duration along with synced record count

BigQuery: Total sync duration along with synced record count

API extract_summary data

Snowflake: API extract_summary data

BigQuery: API extract_summary data

BigQuery `fivetran_log_event_averages` implementation options

Snowflake `fivetran_log_event_averages` implementation options

Review difference in seconds between `write_to_table_start` and `write_to_table_end` events

BigQuery `fivetran_records_modified_averages` implementation options

Snowflake `fivetran_records_modified_averages` implementation options