Query Fivetran-Managed Iceberg Tables from Snowflake
This tutorial explains how to query Fivetran-managed Iceberg tables directly from Snowflake.
Every Managed Data Lake Service destination includes a dedicated Fivetran Iceberg REST Catalog by default. Snowflake uses this catalog to discover and query the Fivetran-managed Iceberg tables.
Snowflake supports the following methods to access these tables. Use the table below to compare each approach based on setup complexity, networking requirements, and level of control.
| Approach | External Volume Required | Private Networking Supported | Recommended Use Case |
|---|---|---|---|
| Catalog-linked database with vended credentials | No | No | Recommended for most use cases |
| Catalog-linked database with external volume | Yes | Yes | Recommended when private networking is required |
| Manually created Iceberg tables | Yes | Yes | Use only when you need custom naming or object organization |
Before you begin
Before you configure Snowflake to query the Iceberg tables, review the following considerations.
Align your Snowflake deployment with your data lake
Snowflake reads Fivetran Managed Iceberg table data directly from your cloud storage. If your Snowflake account and your data lake are in different cloud providers or regions, queries may incur data egress charges from your storage provider.
To reduce or avoid egress costs, deploy your Snowflake account in the same cloud provider and region as your data lake. For example, if your data lake is hosted on Amazon Web Services (AWS) in us-east-1, deploy your Snowflake account on AWS in us-east-1.
Understand refresh behavior and cost
Snowflake periodically polls the Fivetran Iceberg REST Catalog to detect new table snapshots written by Fivetran syncs. The default polling interval is 30 seconds.
You can control this behavior by setting REFRESH_INTERVAL_SECONDS on the catalog integration. Valid values range from 30 to 86400.
More frequent refreshes reduce the delay before new snapshots become queryable, but they may also increase cost. For more information, see Snowflake documentation.
Decide whether you need private networking
If your organization requires private networking between Snowflake and your cloud storage, do not use vended credentials. Instead, configure an external volume so Snowflake can access the cloud storage through your own networking configuration.
Choose your catalog strategy
This tutorial uses the Fivetran Iceberg REST Catalog, which is included with every Managed Data Lake Service destination. However, if your organization already uses Glue as a central metadata store for tools such as Amazon Athena, Amazon Redshift, or Amazon SageMaker, you can also configure Fivetran to update the AWS Glue Data Catalog instead.
For more information, see the AWS Glue as an alternative catalog section of this tutorial.
Save your OAuth client secret
The OAUTH_CLIENT_SECRET value for your catalog integration is displayed only once in the Fivetran dashboard. Make sure you save it before leaving the page. If you lose it, you must regenerate it. Regenerating the secret immediately invalidates the previous value.
Option 1: Use catalog-linked database with vended credentials
This is the simplest setup and the recommended option for most use cases.
With vended credentials, Snowflake uses credentials provided by the Fivetran Iceberg REST Catalog to access your cloud storage directly. You do not need to create an external volume.
You then create a catalog-linked database, which automatically imports and refreshes all schemas and Fivetran-managed Iceberg tables available in the catalog.
Vended credentials always access your data over the public internet. If your organization requires private networking between Snowflake and your cloud storage, use a catalog-linked database with external volume instead.
Find your catalog credentials from Fivetran
Log in to your Fivetran account.
Go to the Destinations page and select your Managed Data Lake Service destination.
Go to Catalog integration > Snowflake.
Copy the values of the following properties from the SQL query:
CATALOG_URIWAREHOUSEOAUTH_TOKEN_URIOAUTH_CLIENT_IDOAUTH_CLIENT_SECRET
Fivetran displays the OAUTH_CLIENT_SECRET only once. If you lose it, click Regenerate client secret, which invalidates the previous secret and generated a new one.
Create catalog integration in Snowflake
In a Snowflake SQL worksheet, run the following command and replace the placeholder values with the values you copied from the Fivetran dashboard:
CREATE OR REPLACE CATALOG INTEGRATION <catalog_integration_name>
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
ENABLED = TRUE
REFRESH_INTERVAL_SECONDS = <30 to 86400>
REST_CONFIG = (
CATALOG_URI = '<catalog_uri_from_fivetran>'
WAREHOUSE = '<group_id_from_fivetran>'
ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_TOKEN_URI = '<oauth_token_uri_from_fivetran>'
OAUTH_CLIENT_ID = '<oauth_client_id_from_fivetran>'
OAUTH_CLIENT_SECRET = '<oauth_client_secret_from_fivetran>'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
);
This catalog integration gives Snowflake access to the entire Fivetran Iceberg REST Catalog for the destination. In the next step, you can choose whether to expose all connection schemas or only a subset.
Create catalog-linked database
Run the following command to create a catalog-linked database:
CREATE DATABASE <database_name>
LINKED_CATALOG = (
CATALOG = '<catalog_integration_name>'
);
Snowflake automatically discovers all schemas and Fivetran-managed Iceberg tables in the catalog and keeps them up to date as new snapshots become available. When Fivetran adds new tables to the destination, Snowflake reflects those tables automatically. No additional Snowflake configuration is required.
If you want to expose only specific Fivetran schemas to Snowflake, use ALLOWED_NAMESPACES:
-- Only expose specific schemas
CREATE DATABASE <database_name>
LINKED_CATALOG = (
CATALOG = '<catalog_integration_name>'
ALLOWED_NAMESPACES = ('<schema_one>', '<schema_two>')
);
If you want to expose all schemas except specific ones, use DISALLOWED_NAMESPACES:
CREATE DATABASE <database_name>
LINKED_CATALOG = (
CATALOG = '<catalog_integration_name>'
DISALLOWED_NAMESPACES = ('<schema_to_exclude>')
);
For more information about creating catalog-linked databases, see Snowflake documentation.
Catalog-linked databases are read-only. You cannot create, modify, or delete objects inside them from Snowflake.
Option 2: Use catalog-linked database with external volume
Use this option when your organization requires private networking between Snowflake and your cloud storage.
In this model, Snowflake accesses your storage location through an external volume that you define. The catalog integration still provides metadata, but storage access is handled through the external volume rather than through credentials vended by the catalog.
This approach requires more setup than vended credentials, but it gives you control over how Snowflake accesses the underlying storage.
Create external volume in Snowflake
In a Snowflake SQL worksheet, run the following command and replace the placeholder values with your cloud storage details:
CREATE OR REPLACE EXTERNAL VOLUME <external_volume_name>
STORAGE_LOCATIONS = (
(
NAME = '<storage_location_name>'
STORAGE_PROVIDER = 'S3' -- or 'AZURE', 'GCS'
STORAGE_BASE_URL = 's3://<your_bucket>/' -- update per provider
STORAGE_AWS_ROLE_ARN = '<arn:aws:iam::...>' -- AWS only
STORAGE_AWS_EXTERNAL_ID = '<external_id>' -- AWS only; reuse Fivetran's or let Snowflake generate one
)
);
Grant Snowflake access to your storage
After you create the external volume, grant Snowflake permission to access the storage location.
The required steps depend on your cloud provider:
Amazon Web Services (AWS): Run
DESC EXTERNAL VOLUME <external_volume_name>; and make a note of theSTORAGE_AWS_IAM_USER_ARNandSTORAGE_AWS_EXTERNAL_IDvalues. Then update the IAM trust policy on your S3 access role to allow Snowflake to assume that role using the external ID. For more information, see Snowflake documentation.Azure Data Lake Storage (ADLS): Grant the Snowflake service principal the
Storage Blob Data Readerrole on the ADLS container. For more information, see Snowflake documentation.Google Cloud Storage (GCS): Grant the Snowflake service account the
Storage Object Viewerrole on the GCS bucket. For more information, see Snowflake documentation.
Create catalog integration in Snowflake
Run the following command:
CREATE OR REPLACE CATALOG INTEGRATION <catalog_integration_name>
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
ENABLED = TRUE
REFRESH_INTERVAL_SECONDS = <30 to 86400>
REST_CONFIG = (
CATALOG_URI = '<catalog_uri_from_fivetran>'
WAREHOUSE = '<group_id_from_fivetran>'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_TOKEN_URI = '<oauth_token_uri_from_fivetran>'
OAUTH_CLIENT_ID = '<oauth_client_id_from_fivetran>'
OAUTH_CLIENT_SECRET = '<oauth_client_secret_from_fivetran>'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
);
In this configuration, you do not need to set ACCESS_DELEGATION_MODE. When you use an external volume, Snowflake uses EXTERNAL_VOLUME_CREDENTIALS.
Create catalog-linked database
Run the following command to create a catalog-linked database:
CREATE DATABASE <database_name>
LINKED_CATALOG = (
CATALOG = '<catalog_integration_name>'
);
Snowflake automatically discovers all schemas and Fivetran-managed Iceberg tables in the catalog and keeps them up to date as new snapshots become available. When Fivetran adds new tables to the destination, Snowflake reflects those tables automatically. No additional Snowflake configuration is required.
If you want to expose only specific Fivetran schemas to Snowflake, use ALLOWED_NAMESPACES:
-- Only expose specific schemas
CREATE DATABASE <database_name>
LINKED_CATALOG = (
CATALOG = '<catalog_integration_name>'
ALLOWED_NAMESPACES = ('<schema_one>', '<schema_two>')
);
If you want to expose all schemas except specific ones, use DISALLOWED_NAMESPACES:
CREATE DATABASE <database_name>
LINKED_CATALOG = (
CATALOG = '<catalog_integration_name>'
DISALLOWED_NAMESPACES = ('<schema_to_exclude>')
);
For more information about creating catalog-linked databases, see Snowflake documentation.
Catalog-linked databases are read-only. You cannot create, modify, or delete objects inside them from Snowflake.
Option 3: Create Iceberg tables manually
Use this option when you need full control over how tables are named and organized in Snowflake.
A catalog-linked database mirrors Fivetran schemas and table names directly from the catalog. By contrast, manually created Iceberg tables let you choose the Snowflake database, schema, and table name for each object. This can be useful when you need to fit Fivetran-managed data into an existing Snowflake naming standard or database layout.
The limitation of this approach is manual maintenance. Each time Fivetran adds a new table, you must create the corresponding Iceberg table in Snowflake yourself.
Before choosing this option, consider whether a catalog-linked database combined with a view layer for renaming or reorganizing objects before exposing them downstream can meet your requirements.
To create an Iceberg table, run the following command:
CREATE ICEBERG TABLE IF NOT EXISTS <database>.<schema>.<table_name>
EXTERNAL_VOLUME = '<external_volume_name>'
CATALOG = '<catalog_integration_name>'
CATALOG_NAMESPACE = '<your_connection_schema_name>'
CATALOG_TABLE_NAME = '<table_name_in_catalog>'
AUTO_REFRESH = TRUE;
Set AUTO_REFRESH = TRUE so Snowflake automatically detects new snapshots. The refresh frequency is controlled by REFRESH_INTERVAL_SECONDS (default value is 30 seconds) on the catalog integration.
AWS Glue as an alternative catalog
The Fivetran Iceberg REST Catalog is the default catalog for all Managed Data Lake Service destinations. However, you can also configure Fivetran to update the AWS Glue Data Catalog. This option is useful if your organization already uses Glue as a shared metadata layer and wants to access the same Iceberg tables from other AWS services, such as Amazon Athena, Amazon Redshift, or Amazon SageMaker.
When using Glue, configure Snowflake with CATALOG_SOURCE = ICEBERG_REST and CATALOG_API_TYPE = AWS_GLUE. In this setup, Snowflake authenticates with SigV4 instead of OAuth.
CREATE CATALOG INTEGRATION <catalog_integration_name>
CATALOG_SOURCE = ICEBERG_REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = '<glue_database_name>'
REST_CONFIG = (
CATALOG_URI = 'https://glue.<region>.amazonaws.com/iceberg'
CATALOG_API_TYPE = AWS_GLUE
CATALOG_NAME = '<aws_account_id>'
ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
)
REST_AUTHENTICATION = (
TYPE = SIGV4
SIGV4_IAM_ROLE = '<arn:aws:iam::...>'
SIGV4_SIGNING_REGION = '<region>'
)
ENABLED = TRUE;
CREATE DATABASE <database_name>
LINKED_CATALOG = (
CATALOG = '<catalog_integration_name>'
);
If you want Fivetran to update AWS Glue, decide that during destination setup. Enabling Glue support later can be more complex.
Additional considerations
OAuth client secret rotation
If you regenerate the OAuth client secret in Fivetran, your existing Snowflake catalog integration stops working immediately. Queries for your Fivetran Managed Iceberg tables will fail until you update or recreate the catalog integration in Snowflake with the new OAUTH_CLIENT_SECRET.
Catalog-linked databases are read-only
You cannot create, modify, or delete schemas or Fivetran-managed tables inside a catalog-linked database. All updates must go through Fivetran.
Reserved Iceberg column names
Fivetran prefixes reserved Iceberg column names with a hash symbol (#) to avoid conflicts.
The affected column names are:
_deleted_file_partition_pos_spec_idfile_pathposrow
If your source data includes any of these names, Snowflake displays them with a # prefix.
Timestamp type behavior
Fivetran maps:
- UTC timestamps (INSTANT) to Iceberg TIMESTAMPTZ
- timezone-naive timestamps (LOCALDATETIME) to Iceberg TIMESTAMP
Snowflake renders these types differently. TIMESTAMPTZ includes timezone information, while TIMESTAMP does not. Keep this in mind when filtering, comparing, or transforming timestamp columns.
Summary
You can query Fivetran Managed Iceberg tables from Snowflake in three ways:
- Use a catalog-linked database with vended credentials for the simplest setup and automatic table discovery.
- Use a catalog-linked database with an external volume when you need private networking between Snowflake and cloud storage.
- Use manually created Iceberg tables only when you need custom naming or more control over how objects are organized in Snowflake.
To simplify setup and avoid unnecessary cost, also keep the following in mind:
- Deploy Snowflake in the same cloud provider and region as your data lake to minimize egress costs.
- Use AWS Glue only if your organization already uses it as a shared metadata catalog.
For more information, see the Fivetran Iceberg REST Catalog Integration Guide or contact Fivetran Support.