Incremental Refresh¶
During the Incremental refresh, only the ‘Changed’ and ‘New’ partitions are loaded into the data lake. If a Partition has changed, then it will be loaded/overwritten in the data lake. If a Partition is new, then the parquet file will be created.
In a scenario where there is a partition in which no data is available anymore (which was Loaded and Refreshed previously), data of the related partition will be removed from the lake and the data set.
How it works:¶
- During round 1 of Loading /Refreshing, partition A (YYYYMM) data is Loaded and Refreshed. And a .parquet file for partition A (YYYYMM) is available in the Power BI data set for the related YYYYMM.
- In round 2 when partition A (YYYYMM) has no data in Oracle anymore, during Loading/Refreshing, this partition will be removed in the Power BI dataset.
- This is done by passing the partitions to be deleted, as detected by the Detect Changes Logic, which removes the .parquet file for the partition A (YYYYMM).
Configure Power BI Table when using Partitioning
During the creation of a Parquet Data Source, incremental Load details are specified.
When incremental refresh is turned on (Partition Column has a value) this column contains _{0}. It is a placeholder where the value of the partition column is placed.
For the Incremental Refresh, the below formats are supported, and the example shows how the data must be formatted in the partition column.
Granularity | Format in table | Parquet file name | Notation | Power BI Partition Name |
---|---|---|---|---|
Year | 2023 | *YYYY | YYYY | 2023 |
Quarter | 2023Q4 | *YYYY-QQ | yyyy-QQ | 2023Q4 |
Month | 2023-11 | *YYYY-MM | yyyy-MM | 2023Q411 |
Day | 2023-11-25 | *YYYY-MM-DD | YYYY-MM-dd | 2023Q41125 |
The Last Value for a partition is stored in the database. This is determined from the Detect Changes Column. A query like below is executed against the database:
SELECT period_id_key AS partition_key, MAX(last_updated_date) AS last_value, count(*) as row_count
FROM <table_name>
GROUP BY period_id_key
Based on the outcome of the previous query, it is determined whether the last_value has changed, or the row_count has changed (Row Count is taken into account when the ‘Detect Changes Row Count’ is set to true).
Below is a sample query of an Incremental refreshing model's partition format [datasourcename]_YYYYQQMM.parquet.
Examples of Incremental Load¶
Example 1 - Scenario with Row Count¶
Incremental Loading scenario for the fact= FACT_ABSENCE_PERIOD (note that the column names are hypothetical)
- Partition column = month_key
Since the group by function is added automatically to this query, this fact table is grouped by the month_key.
-
Detect Changes column = entry_date
-
Detect Changes function = max
-
Row count included
SELECT month_key as partition_key, max(entry_date) as last_value , count(*) as row_count
FROM FACT_ABSENCE_PERIOD
GROUP BY month_key
Example 2¶
Incremental Loading scenario for the fact= FACT_ABSENCE_PERIOD (note that the column names are hypothetical)
-
Partition column = month_key
-
Detect Changes column = amount
-
Detect Changes function = sum
-
Row count excluded
SELECT month_key as partition_key, sum(amount) as last_value
FROM FACT_ABSENCE_PERIOD
GROUP BY month_key
Power Bi M-expression examples for Partitioning¶
Loading Quarterly based partition data¶
let
Source = AzureStorage.DataLake(IfsDataLakeContainerPath & "/Finance/content/AM_FACT_QUARTERS_TEST"),
FileName = "AM_FACT_QUARTERS_TEST_" & DateTime.ToText(RangeStart, "yyyy") & "Q" & Number.ToText(Date.QuarterOfYear(RangeStart)) & ".parquet",
IncRefreshFilter = Table.SelectRows(Source, each [Name] = FileName),
RemoveIncRefresh = Table.SelectColumns(IncRefreshFilter, {"Content", "Name"}),
#"Filtered Hidden Files1" = Table.SelectRows(RemoveIncRefresh, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
//#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File")))
#"Expanded Table Column1" = Table.Combine(#"Removed Other Columns1"[#"Transform File"])
in
#"Expanded Table Column1"
Sample .pbit file with M-expression for Partitioning (Partitioned Quarters) can be downloaded from here for further reference.
Loading Monthly based partition data¶
let
Source = AzureStorage.DataLake(IfsDataLakeContainerPath & "/HCM/content/AM_FACT_TEST"),
FileName = "AM_FACT_TEST_" & DateTime.ToText(RangeStart, "yyyy-MM") & ".parquet",
IncRefreshFilter = Table.SelectRows(Source, each [Name] = FileName ),
RemoveIncRefresh = Table.SelectColumns(IncRefreshFilter, {"Content", "Name"}),
#"Filtered Hidden Files1" = Table.SelectRows(RemoveIncRefresh, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.Combine(#"Removed Other Columns1"[#"Transform File"])
in
#"Expanded Table Column1"
Sample .pbit file with M-expression for Partitioning (Partitioned Months) can be downloaded from here for further reference.