Skip to content

Incremental Refresh

During the Incremental refresh, only the ‘Changed’ and ‘New’ partitions are loaded into the data lake. If a Partition has changed, then it will be loaded/overwritten in the data lake. If a Partition is new, then the parquet file will be created.

In a scenario where there is a partition in which no data is available anymore (which was Loaded and Refreshed previously), data of the related partition will be removed from the lake and the data set.

How it works:

  • During round 1 of Loading /Refreshing, partition A (YYYYMM) data is Loaded and Refreshed. And a .parquet file for partition A (YYYYMM) is available in the Power BI data set for the related YYYYMM.
  • In round 2 when partition A (YYYYMM) has no data in Oracle anymore, during Loading/Refreshing, this partition will be removed in the Power BI dataset.
  • This is done by passing the partitions to be deleted, as detected by the Detect Changes Logic, which removes the .parquet file for the partition A (YYYYMM).

Configure Power BI Table when using Partitioning

During the creation of a Parquet Data Source, incremental Load details are specified.

When incremental refresh is turned on (Partition Column has a value) this column contains _{0}. It is a placeholder where the value of the partition column is placed.

For the Incremental Refresh, the below formats are supported, and the example shows how the data must be formatted in the partition column.

Granularity Format in table Parquet file name Notation Power BI Partition Name
Year 2023 *YYYY YYYY 2023
Quarter 2023Q4 *YYYY-QQ yyyy-QQ 2023Q4
Month 2023-11 *YYYY-MM yyyy-MM 2023Q411
Day 2023-11-25 *YYYY-MM-DD YYYY-MM-dd 2023Q41125

The Last Value for a partition is stored in the database. This is determined from the Detect Changes Column. A query like below is executed against the database:

SELECT period_id_key AS partition_key, MAX(last_updated_date) AS last_value, count(*) as row_count
FROM <table_name>
GROUP BY period_id_key

Based on the outcome of the previous query, it is determined whether the last_value has changed, or the row_count has changed (Row Count is taken into account when the ‘Detect Changes Row Count’ is set to true).

Below is a sample query of an Incremental refreshing model's partition format [datasourcename]_YYYYQQMM.parquet.

Examples of Incremental Load

Example 1 - Scenario with Row Count

Incremental Loading scenario for the fact= FACT_ABSENCE_PERIOD (note that the column names are hypothetical)

  1. Partition column = month_key

Since the group by function is added automatically to this query, this fact table is grouped by the month_key.

  1. Detect Changes column = entry_date

  2. Detect Changes function = max

  3. Row count included

SELECT month_key as partition_key, max(entry_date) as last_value , count(*) as row_count 
FROM FACT_ABSENCE_PERIOD
GROUP BY month_key

Example 2

Incremental Loading scenario for the fact= FACT_ABSENCE_PERIOD (note that the column names are hypothetical)

  1. Partition column = month_key

  2. Detect Changes column = amount

  3. Detect Changes function = sum

  4. Row count excluded

SELECT month_key as partition_key, sum(amount) as last_value
FROM FACT_ABSENCE_PERIOD
GROUP BY month_key

Power Bi M-expression examples for Partitioning

Loading Quarterly based partition data

let    
    Source = AzureStorage.DataLake(IfsDataLakeContainerPath & "/Finance/content/AM_FACT_QUARTERS_TEST"),
    FileName = "AM_FACT_QUARTERS_TEST_" & DateTime.ToText(RangeStart, "yyyy") & "Q" & Number.ToText(Date.QuarterOfYear(RangeStart)) & ".parquet",
    IncRefreshFilter = Table.SelectRows(Source, each [Name] = FileName),
    RemoveIncRefresh = Table.SelectColumns(IncRefreshFilter, {"Content", "Name"}),

    #"Filtered Hidden Files1" = Table.SelectRows(RemoveIncRefresh, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
    //#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File")))    
    #"Expanded Table Column1" = Table.Combine(#"Removed Other Columns1"[#"Transform File"])
in
    #"Expanded Table Column1"

Sample .pbit file with M-expression for Partitioning (Partitioned Quarters) can be downloaded from here for further reference.

Loading Monthly based partition data

let
    Source = AzureStorage.DataLake(IfsDataLakeContainerPath & "/HCM/content/AM_FACT_TEST"),

    FileName = "AM_FACT_TEST_" & DateTime.ToText(RangeStart, "yyyy-MM") & ".parquet",
    IncRefreshFilter = Table.SelectRows(Source, each [Name] = FileName ),
    RemoveIncRefresh = Table.SelectColumns(IncRefreshFilter, {"Content", "Name"}),

    #"Filtered Hidden Files1" = Table.SelectRows(RemoveIncRefresh, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),    
    #"Expanded Table Column1" = Table.Combine(#"Removed Other Columns1"[#"Transform File"])
in
    #"Expanded Table Column1"

Sample .pbit file with M-expression for Partitioning (Partitioned Months) can be downloaded from here for further reference.