Load Data Source¶
Loading a Data Source into the Data Lake¶
A Parquet Data Source is loaded into the relevant Data Lake (Self-Hosted Data Lake). Loading copies the data from the Oracle database to ADLS Gen 2 by creating .parquet files in the defined folders. The action can be triggered via an Analysis Model or based on an Explicit trigger to load a Parquet Data Source.
Explicit Load¶
A Parquet Data Source can be loaded explicitly as well. When loading is initiated via IFS Cloud Web using the Load option, the Explicit Load flag will be set to true. A Parquet Data Source load job will be triggered the next time the scheduler starts, regardless of the Max-Age or usage of the Parquet Data Source in any Analysis Model.
Explicit load will do a full load for Dimensions and an Incremental Load for Facts if applicable.
Loading a Parquet Data Source during creation:
Once all the columns are selected, Loading can be initiated in two approaches:
- Select Yes during the Load Data Source prompt and exit the assistant.
Exit New Data Source assistant and Load later
This method requires selecting the required data source(s) which need to be loaded from the Parquet Data Source page and then hitting the Load button.
An Explicit Load can also be performed on any existing Parquet Data Source that has a Refresh History based on the Analysis Model refresh.
The Explicit Load sets the Explicit Load flag to Yes. When loading is in progress, the Parquet Data Source status will be transitioned as below.
Read more about the Refreshing schedule.
Parquet Data Source Status Transition
Parquet Data Sources Starting status | Transitioned status | Context |
---|---|---|
Detecting Changes | Finished Detecting Changes | Start detecting changes for Incremental data sources and the Scheduler pod will add jobs to detect changes in the queue. The data pump will pick up the job run the detect changes query and mark applicable partitions to be loaded. |
Detecting Changes | Error | Error while detecting changes for Incremental data sources. |
Finished Detecting Changes | Job Queued | The Scheduler pod will add jobs to load data into the data lake. |
Job Queued | Loading | The Data pump pods pick up the jobs from the queue and start processing the jobs. It reads the data from the Fact and Dimension tables and transforms it into a Parquet file. |
Loading | Success | Loading jobs started. The parquet file will be uploaded to ADLS Gen 2. The Data pump pod marks the job completed by changing the status to Success and refresh info in the Parquet Data Source load history. |
Loading | Error | Error while Loading. The parquet file will not be uploaded to ADLS Gen 2. The Data pump pod marks the job status to Error in the Parquet Data Source Load History. |