Firebase Beta
Firebase is an application development platform. Firebase Cloud Firestore is a NoSQL document database characterized by a lack of a fixed schema. Data is stored in key-value pairs in documents that form a collection.
Supported services
Fivetran supports the Firebase Cloud Firestore database.
We support only the Cloud Firestore databases in Native mode.
Supported configurations
Fivetran supports the following Firebase configurations:
Supportability Category | Supported Values |
---|---|
Connection limit per database | No limit |
Features
Feature Name | Supported | Notes |
---|---|---|
Capture deletes | check | |
History mode | ||
Custom data | check | |
Data blocking | check | |
Column hashing | check | |
Re-sync | check | |
API configurable | check | API configuration |
Priority-first sync | ||
Fivetran data models | ||
Private networking | ||
Authorization via API | check |
Setup guide
For specific instructions on how to set up your Firebase connection, see the Cloud Firestore setup guide.
Sync overview
Once Fivetran is connected to your Firestore database, we pull a complete dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In each sync, we pull updated data from the source and push to the destination. If deletes are detected, the next sync will re-import the respective collection.
Pack mode options
Pack mode determine the form in which Fivetran delivers your data. There are two pack modes - packed and unpacked.
Subcollections are always delivered in packed mode.
In the tables below, the text in parentheses next to the column name indicates the data type of that column. For example, "bar
(INTEGER)" means the column name is bar
and it stores INTEGER data.
Unpacked mode
Fivetran unpacks one layer of nested fields and infers types.
In unpacked mode, the following source table
{
"_id": "foo", <== document_id
"bar": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as
_id (STRING) | bar (INTEGER) | nested (JSON) |
---|---|---|
"foo" | 2 | {"baz":3} |
Packed mode
In packed mode, the following source table
{
"_id": "foo", <== document_id
"bar": 2,
"nested": {
"baz": 3
}
}
is delivered to your destination as
_id (STRING) | data (JSON) |
---|---|
"foo" | {"_id":"foo", "bar":2, nested":{"baz":3}} |
Switching pack modes
You can switch the pack mode for your connection at any time in your Fivetran dashboard.
We automatically perform a full connection re-sync during the next scheduled sync when you change pack modes.
To change the pack mode for your connection, do the following:
- Go to the Setup tab in the connection dashboard.
- Click Edit connection details.
- In the connection setup form, change the Pack Mode.
- Click Save & Test.
Replication speeds
If there are no deletes, replication speeds should be relatively fast. However, if change volume is too high or deletes are detected, the connector will need to re-import the respective collections on the next sync. This can take a longer period of time. For the best possible performance:
- Increase sync frequency
- If delete handling is not required, reach out to Support and ask about disabling delete tracking.
Two major factors can cause disparities between our estimates and the exact replication speed for your Fivetran-connected databases: network latency and discrepancies in the format of the data we receive versus how the data is stored at rest in the destination. The ability to sync changes quickly also depends on your configured sync frequency. We recommend setting up a higher sync frequency or frequency close to your average sync speed for data sources with a high rate of data changes.
Schema information
Fivetran tries replicating the exact schema and tables from your Firestore database to your destination.
Fivetran-generated columns
Fivetran adds the following columns to every table in your destination:
_fivetran_deleted
(BOOLEAN) marks deleted rows in the source database._fivetran_synced
(UTC TIMESTAMP) indicates when Fivetran successfully synced the row.
We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.
Type transformations and mapping
As we extract your data, we match Firestore data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system fails when we encounter columns with data types that we don't accept or transform.
The following table illustrates how we transform your Firestore data types into Fivetran supported types:
Firestore Data Type | Fivetran Data Type | Fivetran Supported |
---|---|---|
Array | JSON | True |
Boolean | BOOLEAN | True |
Bytes | STRING | True |
Date and time | INSTANT | True |
Floating-point number | DOUBLE | True |
Geographical point | STRING | True |
Integer | LONG | True |
Map | JSON | True |
Null | NULL | True |
NaN | STRING | True |
Reference | STRING | True |
Text string | STRING | True |
Vector | JSON | True |
In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual destination pages.
Excluding source data
If you don’t want to sync all the data from your primary database, you can exclude schemas or tables from your syncs on your Fivetran dashboard. To do so, go to your connection details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.
Alternatively, you can change the permissions to restrict access to particular collections or sub-collections using Firebase Security Rules.
Initial sync
When Fivetran connects to a new Firestore database, we first copy all the data from every collection (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We perform the db.collection(collection).get()
and db.collectionGroup(subcollection)
operations to fetch the collection and subcollection data from the source, respectively. We do not pull entire data, we paginate through the result to make sync failure tolerant.
Updating data
Fivetran performs incremental updates of any new or modified data for each selected collection and subcollection from the source Firestore database. During incremental updates, we request only the data that has changed since our last sync.
If deletes were detected during the last sync or the Firestore streaming limits for data volume were exceeded, Fivetran will re-import all data for the affected collection or subcollection. For more information on performance recommendations, see Replication speeds.
Fivetran maintains collections and subcollections separately.
For collections, we map Firestore's built-in document_id
(custom or auto-generated) column as the _id
column and use it as the primary key for each table. For subcollections, we use the _path
column, which contains a unique path for every subcollection as the primary key. For example, collection/document_id/subcollection/subcollection_document_id
.
The primary key field is used to identify rows to merge the changes in your documents into the corresponding tables in the destination as follows:
- Every inserted row in the source generates a new row in the destination with
_fivetran_deleted = FALSE
. - Every updated row in the source updates the data in the corresponding row in the destination, with
_fivetran_deleted = FALSE
. - For every deleted row, the
_fivetran_deleted
column value is set toTRUE
for the corresponding row in the destination.
Deleted rows
We do not delete rows from your destination. When a row is deleted from the source table, we set the _fivetran_deleted
column value of the corresponding row in the destination to TRUE
. Additionally, when a row deletion is detected, the connector must re-import the respective collection. This is handled automatically, but it may slow down connector performance. If you prefer better performance over handling deletes, contact our support team and ask about disabling delete tracking.
Subcollections
Subcollections are always delivered in packed mode.
collection:(level 0)
document:
Id:1
name:foo
nested_collection:(level 1)
nested_document:
Id:2
name:nested_foo
nested_collection_2:(level 2)
nested_document_2:
Id:nested_2
name:nested_level_2_foo
To sync subcollections, we follow a parent-child table approach. We support all levels of depth/nesting.
In the destination, we maintain a separate table for each uniquely named subcollection, ensuring a one-to-one relationship between the source and destination. If two or more subcollections have the same name, they are stored in a single table even if they belong to different parent collections. Subcollection names are prefixed with a forward slash (/
) in the destination. If this character isn’t supported by the destination, it will be replaced with an appropriate alternative, such as an underscore (_
).
Example:
The following source data
Collection | Document Id | Document Fields | Subcollection | Subcollection Document Id | Subcollection Document Fields |
---|---|---|---|---|---|
Rooms | Room A | Name: “chat room” | Messages | M1 | From: “alex” Msg: “Hello world” |
Room B | Name: "Study room" | Messages | M2 | From: “bob” Msg: “How are you?” | |
Room C | Name: "Living room" | Furniture | F1 | brand: "eco_fun" size: "king" |
is stored as follows in the destination:
/Messages
_path | data |
---|---|
Rooms/Room A/Messages/M1 | {From: “alex” Msg: “Hello world”} |
Rooms/Room B/Messages/M2 | {From: “bob” Msg: “How are you?”} |
/Furniture
_path | data |
---|---|
Rooms/Room C/Furniture/F1 | {brand: "eco_fun" size: "king" } |
Subcollection discovery
We do not automatically discover subcollections. To collect data from subcollections, you must create a .csv
file containing the names of the subcollections (for example, "subcollection1,subcollection2") you wish to include and upload it using the SubCollections field in the connector setup form.