Parquet Data Sources¶
This page consists of all the content regarding Parquet Data Sources.
- About Azure Data Lake Gen 2 Storage container
- Adding New Parquet Data Source to an Analysis Model
- Connection between an Analysis Model and a Parquet Data Source
- Max Age of a Parquet Data Source
- How does the Refresh schedule work
- Load types
- Create and Load Parquet Data Sources
- Explicit Load
- Parquet Data Source Load History
- Edit Parquet Data Sources
- Analysis Models Basic Data Configuration
- Analysis Models Data Configuration
About Azure Data Lake Storage Gen 2 Container¶
ADLSG2 is the type of storage used for storing parquet files.
A SAS Token is required to access the ADLS Gen 2. The SAS Token can be used in Power BI Desktop to connect temporarily to the folders and files in ADLS Gen 2.
Read more about creating a SAS Token.
Adding New Parquet Data Source to an Analysis Model¶
After successfully creating and loading a Parquet Data Source(s), they can be utilized by connecting to the ADLS Gen 2 folder via Power BI Desktop when:
-
Customizing any model
-
Creating own model
Read more details about External Tools.
Connection between an Analysis Model and Parquet Data Sources¶
The connection between an Analysis Model and a Parquet Data Source is determined automatically during refresh by analyzing the semantic model and the Parquet Data Source definition.
A table in a model has an M-expression, which should contain the Parquet Data Source path in a specific format. Upon a successful match of the M-expression against a Parquet Data Source, the connection between the data source and the Analysis Model is made.
-
For CDM (Common Data Model) data sources, the strings to match are e.g.:
"DIM_COMPANY.manifest.cdm.json"
where 'DIM_COMPANY' is the name of the Data Source.
AND, any of the follow strings, where 'Shared' is the Area of the Data Source- Id="Shared"
- "/Shared"
- "/Shared/"
"DIM_COMPANY.manifest.cdm.json"
where 'DIM_COMPANY' is the name of the Data Source. -
For Non-CDM data sources: the string to match is e.g. (in bold):
"/Finance/content/FACT_ABSENCE_PERIOD"where 'Finance' is the Area, and FACT_ABSENCE_PERIOD is the name of the Data Source.
Max Age of a Parquet Data Source¶
The maximum amount of minutes set in a Parquet Data Source before the next refresh is required.
When the Max-Age of a Parquet Data Source is exceeded, the Parquet Data Source is considered outdated and needs to be loaded.
If a Parquet Data Source is outdated, it will be refreshed the next time an Analysis Model that uses the Parquet Data Source is refreshed. The refresh is done based on the schedule of the Analysis Model