Load Data Source¶

Loading a Data Source into the Data Lake¶

A Parquet Data Source is loaded into the relevant Data Lake (Self-Hosted Data Lake). Loading copies the data from the Oracle database to ADLS Gen 2 by creating .parquet files in the defined folders. The action can be triggered via an Analysis Model or based on an Explicit trigger to load a Parquet Data Source.

Explicit Load¶

A Parquet Data Source can be loaded explicitly as well. When loading is initiated via IFS Cloud Web using the Load option, the Explicit Load flag will be set to true. A Parquet Data Source load job will be triggered the next time the scheduler starts, regardless of the Max-Age or usage of the Parquet Data Source in any Analysis Model.

Explicit load will do a full load for Dimensions and an Incremental Load for Facts if applicable.

Loading a Parquet Data Source during creation:

Once all the columns are selected, Loading can be initiated in two approaches:

Select Yes during the Load Data Source prompt and exit the assistant.

Exit New Data Source assistant and Load later
This method requires selecting the required data source(s) which need to be loaded from the Parquet Data Source page and then hitting the Load button.

An Explicit Load can also be performed on any existing Parquet Data Source that has a Refresh History based on the Analysis Model refresh.

The Explicit Load sets the Explicit Load flag to Yes. When loading is in progress, the Parquet Data Source status will be transitioned as below.

Parquet Data Source Status Transition

Parquet Data Sources Starting status	Transitioned status	Context
Detecting Changes	Finished Detecting Changes	Start detecting changes for Incremental data sources and the Scheduler pod will add jobs to detect changes in the queue. The data pump will pick up the job run the detect changes query and mark applicable partitions to be loaded.
Detecting Changes	Error	Error while detecting changes for Incremental data sources.
Finished Detecting Changes	Job Queued	The Scheduler pod will add jobs to load data into the data lake.
Job Queued	Loading	The Data pump pods pick up the jobs from the queue and start processing the jobs. It reads the data from the Fact and Dimension tables and transforms it into a Parquet file.
Loading	Success	Loading jobs started. The parquet file will be uploaded to ADLS Gen 2. The Data pump pod marks the job completed by changing the status to Success and refresh info in the Parquet Data Source load history.
Loading	Error	Error while Loading. The parquet file will not be uploaded to ADLS Gen 2. The Data pump pod marks the job status to Error in the Parquet Data Source Load History.