Why Syncs May Not Run as Often as Scheduled
Question
Why aren't my connections used in transformation pipelines running on schedule?
Environment
Transformations
Answer
You may observe significant gaps between connection syncs, regardless of the set sync frequency (e.g., every 15 minutes). This is because our orchestrated data pipelines prioritize data consistency and completeness over strictly adhering to individual connection sync schedules.
Dependency management in transformation pipeline
Our pipelines operate as a single, interconnected system. This means data processing within the pipeline follows a dependency chain. Connections and data models are linked – downstream elements rely on the successful completion of upstream tasks before they can begin their own work.
Example: Connection A (15-minute frequency) and Connection B (1-hour frequency)
- Base sync start time for Connection A and Connection B is 00:00.
- Connection A: Scheduled to sync every 15 minutes.
- Connection B: Scheduled to sync every hour.
Case 1: Connection sync caused sync gap
- Connection B: The sync is taking longer than 15 minutes.
- Connection A: completes its sync at 00:12.
Before proceeding, we verify if all upstream data sources have delivered their information. Since Connection B’s sync is still ongoing at 00:15, 00:30, and 00:45, Connection A’s scheduled syncs at these times won’t run. Once Connection B finishes its sync, the dependency is resolved, and the data model transformation process can begin.
Dependency management results: Connection A's next sync occurs at 01:00. While Connection A is designed to run every 15 minutes, delays may occur due to dependencies on either Connection B's sync completion or an ongoing data model run.
Case 2: Data model run caused sync gap
- Downstream data model transformation process starts immediately and takes 44 minutes to complete.
- Connection A and Connection B both complete their syncs at 00:12.
- The data model transformation process starts at 00:12 and continues until 00:56.
Since the data model is still running at 00:15, 00:30, and 00:45, Connection A’s scheduled syncs at these times don’t run.
Dependency management results: Connection A's next sync occurs at 01:00. This ensures that the downstream data model receives fresh data from both connections, preventing incomplete or out-of-sync data that could lead to inaccurate analytics.
Sync frequency recommendations
To avoid unnecessary sync gaps, we recommend setting a reasonable sync frequency based on the observed typical sync duration. If a connection's sync takes longer than its set sync frequency interval, it will result in sync gaps.
For example, if a connection’s sync typically takes 20 minutes, setting the sync frequency to 15 minutes will lead to continuous sync gaps so that every even sync will be skipped. Instead, increasing the frequency to 30 minutes would allow for smoother and more predictable pipeline execution.
Notes
Our priority is ensuring data completeness and consistency within the data model. The overall pipeline execution prioritizes accuracy over strictly adhering to isolated connection sync frequencies. Accordingly, dependency management might cause individual connections to deviate from their set sync schedules.