Why Does the Connection Write Duplicates of Unmodified Rows to My Data Lake?
Question
Why does the connection write duplicates of unmodified rows to my data lake?
Environment
Destination: Azure Data Lake Storage
Answer
Fivetran uses the Copy-on-Write (COW) strategy to handle all changes in source data. When a record is deleted, Fivetran creates a new data file in the data lake that includes all the rows from the original file, except the deleted record. For upserts, Fivetran generates a new data file that excludes the old version of the record.
After writing the new file to your data lake, Fivetran updates the latest table snapshot to reference it. Older snapshots continue to reference the original file. As a result, you may see duplicate copies of unchanged records until Fivetran deletes the outdated snapshots during routine table maintenance. However, when you query your data using one of the supported query engines, you always receive the most accurate and up-to-date results without duplicates.