Firebase Beta

Firebase is an application development platform. Firebase Cloud Firestore is a NoSQL document database characterized by a lack of a fixed schema. Data is stored in key-value pairs in documents that form a collection.

Supported services

Fivetran supports the Firebase Cloud Firestore database.

We support only the Cloud Firestore databases in Native mode.

Supported configurations

Fivetran supports the following Firebase configurations:

Supportability Category	Supported Values
Connection limit per database	No limit

Features

Feature Name	Supported	Notes
Capture deletes		All tables and fields
History mode
Custom data		All tables and fields
Data blocking		Column level, table level, and schema level
Column hashing
Re-sync		Table level
API configurable		API configuration
Priority-first sync
Fivetran data models
Private networking
Authorization via API

Setup guide

For specific instructions on how to set up your Firebase connection, see the Cloud Firestore setup guide.

Sync overview

Once Fivetran is connected to your Firestore database, we pull a complete dump of all selected data from your database. The initial sync finishes when all collections that existed when the sync started have finished importing. In each sync, we pull updated data from the source and push to the destination. If deletes are detected, the next sync will re-import the respective collection.

Pack mode options

Pack mode determine the form in which Fivetran delivers your data. There are two pack modes - packed and unpacked.

Subcollections are always delivered in packed mode.

In the tables below, the text in parentheses next to the column name indicates the data type of that column. For example, "bar (INTEGER)" means the column name is bar and it stores INTEGER data.

Unpacked mode

Fivetran unpacks one layer of nested fields and infers types.

In unpacked mode, the following source table

{
 "_id": "foo", <== document_id
 "bar": 2,
 "nested": {
   "baz": 3
 }
}

is delivered to your destination as

_id (STRING)	bar (INTEGER)	nested (JSON)
"foo"	2	`{"baz":3}`

Packed mode

In packed mode, the following source table

{
 "_id": "foo", <== document_id
 "bar": 2,
 "nested": {
   "baz": 3
 }
}

is delivered to your destination as

_id (STRING)	data (JSON)
"foo"	`{"_id":"foo", "bar":2, nested":{"baz":3}}`

Switching pack modes

You can switch the pack mode for your connection at any time in your Fivetran dashboard.

We automatically perform a full connection re-sync during the next scheduled sync when you change pack modes.

To change the pack mode for your connection, do the following:

Go to the Setup tab in the connection dashboard.
Click Edit connection details.
In the connection setup form, change the Pack Mode.
Click Save & Test.

Replication speeds

If there are no deletes, replication speeds should be relatively fast. However, if change volume is too high or deletes are detected, the connector will need to re-import the respective collections on the next sync. This can take a longer period of time. For the best possible performance:

Increase sync frequency
If delete handling is not required, reach out to Support and ask about disabling delete tracking.

Two major factors can cause disparities between our estimates and the exact replication speed for your Fivetran-connected databases: network latency and discrepancies in the format of the data we receive versus how the data is stored at rest in the destination. The ability to sync changes quickly also depends on your configured sync frequency. We recommend setting up a higher sync frequency or frequency close to your average sync speed for data sources with a high rate of data changes.

Schema information

Fivetran tries replicating the exact schema and tables from your Firestore database to your destination.

Fivetran-generated columns

Fivetran adds the following columns to every table in your destination:

_fivetran_deleted (BOOLEAN) marks deleted rows in the source database.
_fivetran_synced (UTC TIMESTAMP) indicates when Fivetran successfully synced the row.

We add these columns to give you insight into the state of your data and the progress of your data syncs. For more information about these columns, see our System Columns and Tables documentation.

Type transformations and mapping

As we extract your data, we match Firestore data types to types that Fivetran supports. If we don't support a data type, we automatically change that type to the closest supported type or, in some cases, don't load that data at all. Our system fails when we encounter columns with data types that we don't accept or transform.

The following table illustrates how we transform your Firestore data types into Fivetran supported types:

Firestore Data Type	Fivetran Data Type	Fivetran Supported
Array	JSON	True
Boolean	BOOLEAN	True
Bytes	STRING	True
Date and time	INSTANT	True
Floating-point number	DOUBLE	True
Geographical point	STRING	True
Integer	LONG	True
Map	JSON	True
Null	NULL	True
NaN	STRING	True
Reference	STRING	True
Text string	STRING	True
Vector	JSON	True

In some cases, when loading data into your destination, we may need to convert Fivetran data types into data types that are supported by the destination. For more information, see the individual destination pages.

Excluding source data

If you don’t want to sync all the data from your primary database, you can exclude schemas or tables from your syncs on your Fivetran dashboard. To do so, go to your connection details page and uncheck the objects you would like to omit from syncing. For more information, see our Data Blocking documentation.

Alternatively, you can change the permissions to restrict access to particular collections or sub-collections using Firebase Security Rules.

Initial sync

When Fivetran connects to a new Firestore database, we first copy all the data from every collection (except for those you have excluded in your Fivetran dashboard) and add Fivetran-generated columns. We perform the db.collection(collection).get() and db.collectionGroup(subcollection) operations to fetch the collection and subcollection data from the source, respectively. We do not pull entire data, we paginate through the result to make sync failure tolerant.

Updating data

Fivetran performs incremental updates of any new or modified data for each selected collection and subcollection from the source Firestore database. During incremental updates, we request only the data that has changed since our last sync.

If deletes were detected during the last sync or the Firestore streaming limits for data volume were exceeded, Fivetran will re-import all data for the affected collection or subcollection. For more information on performance recommendations, see Replication speeds.

Fivetran maintains collections and subcollections separately.

For collections, we map Firestore's built-in document_id (custom or auto-generated) column as the _id column and use it as the primary key for each table. For subcollections, we use the _path column, which contains a unique path for every subcollection as the primary key. For example, collection/document_id/subcollection/subcollection_document_id.

The primary key field is used to identify rows to merge the changes in your documents into the corresponding tables in the destination as follows:

Every inserted row in the source generates a new row in the destination with _fivetran_deleted = FALSE.
Every updated row in the source updates the data in the corresponding row in the destination, with _fivetran_deleted = FALSE.
For every deleted row, the _fivetran_deleted column value is set to TRUE for the corresponding row in the destination.

Deleted rows

We do not delete rows from your destination. When a row is deleted from the source table, we set the _fivetran_deleted column value of the corresponding row in the destination to TRUE. Additionally, when a row deletion is detected, the connector must re-import the respective collection. This is handled automatically, but it may slow down connector performance. If you prefer better performance over handling deletes, contact our support team and ask about disabling delete tracking.

Subcollections

Subcollections are always delivered in packed mode.

collection:(level 0)
    document:
        Id:1
        name:foo
        nested_collection:(level 1)
            nested_document:
                Id:2
                name:nested_foo
                nested_collection_2:(level 2)
                    nested_document_2:
                        Id:nested_2
                        name:nested_level_2_foo

To sync subcollections, we follow a parent-child table approach. We support all levels of depth/nesting.

In the destination, we maintain a separate table for each uniquely named subcollection, ensuring a one-to-one relationship between the source and destination. If two or more subcollections have the same name, they are stored in a single table even if they belong to different parent collections. Subcollection names are prefixed with a forward slash (/) in the destination. If this character isn’t supported by the destination, it will be replaced with an appropriate alternative, such as an underscore (_).

Example:

The following source data

Collection	Document Id	Document Fields	Subcollection	Subcollection Document Id	Subcollection Document Fields
Rooms	Room A	Name: “chat room”	Messages	M1	From: “alex” Msg: “Hello world”
	Room B	Name: "Study room"	Messages	M2	From: “bob” Msg: “How are you?”
	Room C	Name: "Living room"	Furniture	F1	brand: "eco_fun" size: "king"

is stored as follows in the destination:

/Messages

_path	data
Rooms/Room A/Messages/M1	{From: “alex” Msg: “Hello world”}
Rooms/Room B/Messages/M2	{From: “bob” Msg: “How are you?”}

/Furniture

_path	data
Rooms/Room C/Furniture/F1	{brand: "eco_fun" size: "king" }

Subcollection discovery

We do not automatically discover subcollections. To collect data from subcollections, you must create a .csv file containing the names of the subcollections (for example, "subcollection1,subcollection2") you wish to include and upload it using the SubCollections field in the connector setup form.