PostgreSQL MAR Management
NOTE: We wrote this article before we changed our pricing model on February 1, 2022. If you signed up for a new Fivetran account or renewed your existing account after January 31, 2022, you have an unlimited number of free customer-triggered re-syncs per connector each month. For more information about MAR usage, see our pricing documentation.
Use Case
You want to understand how the functioning of PostgreSQL database connectors affects Monthly Active Rows (MAR).
Environment
PostgreSQL database connectors
Connector Overview
The initial sync (first-ever historical sync) replicates the entire PostgreSQL data into your destination. Depending on the size of your database, this can take anywhere from a few days to a couple of weeks. This historical sync doesn't contribute toward MAR.
If you introduce additional database tables later, you won't incur MAR.
You can configure your connector(s) in multiple ways. Creating one single connector is easier to manage, whereas creating separate connectors lets you track and isolate changes coming to the database and keep them independent of each other:
- You can set up one connector for static tables and another connector for dynamic tables to track MAR.
- You can group your connectors by any arbitrary group. For example, business unit, function, or relationship.
- You can include small and large tables in a connector.
Contributing factors
There are four primary actions that contribute to MAR:
- A new row is added to an existing table
- A value is updated for an existing row in a table
- A value is deleted within a table
- A new column (with values) is added to a table
Managing MAR
If you synced a table initially, later excluded it from your syncs, left the table for some time and didn't refresh it, and then later re-sync the table, every primary key row counts towards MAR.
If separate connectors sync the same data from the same source (with the same primary keys), they contribute separately towards your MAR. This means if you have two or more of the same connector type that sync from one source to multiple destinations, we count each connector’s MAR separately.
If a table does not have a primary key, we create a synthetic (hash) primary key. The composition of this primary key differs by source. We determine the MAR for these tables using their synthetic primary keys.
Post-load transformations do not count towards MAR.
Replication frequency doesn't impact your monthly active rows because we calculate MAR by determining the unique rows that are updated.
If you delete a row that has a primary key, and then, after a while, create another row with the same primary key, this counts toward only one MAR.
In a table without a primary key, if you remove or add columns from which we generate the synthetic primary key, you incur MAR for every change that you made.
Make sure the destination is always connected. If Fivetran can't write data to your destination, we cache this data for up to twenty-four hours before discarding it.
If the destination is disconnected for more than twenty-four hours, we do the following (depending on how long the WAL retains data for):
- If the WAL retains data for less than twenty-four hours, we perform a historical re-sync, which incurs MAR.
- If the WAL retains data long enough, we may not need to perform a re-sync because we can read through the logs.
Perform regular vacuuming of PostgreSQL so that the XMIN is always in an unfrozen state; otherwise, we have to perform a full historical re-sync, which incurs MAR.
One of the best practices is to modify only new records in PostgreSQL. Modifying very old records every sync causes a high percentage of these tables to incur MAR.
Block schemas or tables from syncing if they don't contain valuable information.
Column blocking to reduce MAR is only applicable to tables that do not have primary keys in PostgreSQL. This is because we create a synthetic primary key based on a commonly changing column and other unique keys. Blocking the commonly changing column reduces MAR.
We recommend that you not pause the connector for a long period of time because we then need to perform a historical re-sync. We cannot perform a log read because the logs have expired.