Staging for BigQuery
Fivetran HVR stores/stages data in a temporary directory (staging file) before loading it into the target BigQuery location during Burst Integrate and Bulk Refresh. For more information about staging files on BigQuery, refer to Google BigQuery documentation.
The following outlines how HVR implements staging for BigQuery:
- HVR first writes data into the staging directory on Google Cloud Storage. Since 6.2.5/1, the staging files can be stored in CSV or Parquet format.
- HVR then uses the BigQuery SQL command
INSERT INTO
to ingest data into the BigQuery target tables from the staging directory.- To ingest data into BigQuery target, HVR pulls the compressed data from the staging directory (located on Google Cloud Storage) into a target table. A special 'external table' needs to exist for each target table that HVR loads data into. HVR will create these tables in BigQuery with names having the following patterns _ _x or _ _bx.
Configuring Staging
Setting up staging on Google Cloud Storage (GCS) requires configuring both the GCS environment and HVR.
Prerequisites
A GCS location (bucket) to store staging files. For more information about creating and configuring a GCS bucket, refer to the GCS documentation.
A Google Cloud user (storage account) to access the GCS bucket.
Configure permissions required on GCS and BigQuery. For more information, refer to the Google BigQuery documentation.
HVR Location Configuration
To connect HVR to GCS, define the following location properties when creating a BigQuery location.
For an existing BigQuery location, you can define these properties by editing the location's source and target properties.
The Location Property equivalent to the UI field is shown in parentheses below.
In the STORAGE drop-down menu (File_Scheme), select the protocol for connecting HVR to Google Cloud Storage:
- Google Cloud Storage (HTTPS)
- Google Cloud Storage (HTTP)
In the BUCKET field (GS_Bucket), enter the name of the GCS bucket.
In the STAGING DIRECTORY field (Staging_Directory), specify the path to the directory where HVR will create temporary staging files within the GCS bucket (e.g. /my_staging_dir).
Since v6.2.5/1
In the STAGING FILE FORMAT drop-down menu (Staging_File_Format), select the format for storing the staging files:- CSV
- Parquet