Data Extraction and Migration from multiple Data Sources Example

The purpose of this document is to show how we can retrieve data from different data sources in order to transform and merge them later. In the following example data records are retrieved from an RDBMS and from a CSV file.

The first data source is a database table that contains basic customer data such as:

The second data source is a CSV file representing a dump of an Excel Spreadsheet which contains additional data of these customers. The file could be provided by a sales representative dealing with these customers.

The goal is to:

Data Extraction Setup

To run this example, you have to install lifecycle management and integration runtime.

The SQL script below shows how to create the customer table.

drop table if exists customer;
create table customer 
( 
customerId bigint(20) NOT NULL AUTO_INCREMENT,
entryDate datetime DEFAULT NULL,
firstName varchar(255) NOT NULL,
lastName varchar(255) NOT NULL,
ssn varchar(255) NOT NULL,
street varchar(255) NOT NULL,
city varchar(255) NOT NULL,
postalCode varchar(255) NOT NULL,
state varchar(255) NOT NULL,
phoneNumber varchar(255) NOT NULL,
areaCode varchar(255) NOT NULL,
investableAssets bigint(20) NOT NULL,
PRIMARY KEY (customerId)
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=latin1;

Data Extraction Process

A file trigger is used to start the data extraction process. The process retrieves data records from CSV file, puts the result on structured data (fileCustomers) and sends them to a database extraction and a merge process as shown below.

File Extraction Process:

File Extraction Process
Figure: File Extraction Process

Data Base Extraction and Merge Process:

Data Base Extraction and Merge Process
Figure: Data Base Extraction and Merge Process

Initially the process is waiting for data from a file extraction process to be started. Then the user has to enter two parameters for filtering:

The database will be invoked with SQL invocation and the SQL query is built based on the above parameters, as described in section Connectivity to JDBC. The output will be saved in structured data (dbCustomers) and then merged with fileCustomers using Script Invocation.

Definition of the Business Service

The Business Service is created as a process definition in a process model of your Infinity Process Platform Lifecycle Portal.

Definition of the Input Data Structure

The input data structure to define the query predicates is created as a structured data type in a process model of your Infinity Process Platform Lifecycle Portal.

Parameter settings
Figure: Parameter settings

For details on structured data types, please refer to chapter Working with Structured Types in the Business Process Modeling Handbook.

Definition of the RDBMS Record

The structure of the RDBMS record for which a set has to be retrieved from the RDBMS is created as a structured data type in a process model of your Infinity Process Platform Lifecycle Portal.

Customer DataBase Structure
Figure: Customer DataBase Structure

Definition of the CSV Record Structure

The structure of the CSV record is created as a structured data type in a process model of your Infinity Process Platform Lifecycle Portal.

Customer File Structure
Figure: Customer File Structure

Definition of Output Record Structure(Data Merge)

A set will be returned for the output record structure as a structured data type in a process model of your Infinity Process Platform Lifecycle Portal.

Customer Merge Structure
Figure: Customer Merge Structure

Connectivity to JDBC

Connectivity to the RDBMS is achieved with an SQL Invocation, the following screenshots describe how the application should be configured to extract records from the database using the structured data params as input and dbCustomers as query result.

  1. Create an SQL application. For details see chapter SQL Application Overlay in the Business Process Modeling Handbook.
  2. Open the property page of the SQL application and set the according parameters in the Parameters tab:

    Parameters
    Figure: Parameters

  3. Switch to the Configuration tab and enter the SQL query:

    SQL Query
    Figure: SQL Query

  4. In the Data Source tab, enter the data source connection details:

    Data Source settings
    Figure: Data Source settings

Connectivity to CSV

The connectivity to the CSV file is achieved with a Camel trigger endpoint and route:

  1. Create a Start event with Event class type Message.

    Select Event type Message
    Figure: Select Event type Message

  2. Switch to the Implementation pane and select the event implementation Generic Camel Route Event in the drop-down list.

    Select Event Implementation
    Figure: Select Event Implementation

  3. In the Configuration tab enter the route definition and additional bean specification:

    Camel Trigger configuration
    Figure: Camel Trigger configuration

Data Transformation and Merge via Script Invocation

Script invocation is used to perform the required transformations between the input data (records from database and CSV file) and output data (merged data). To merge the records:

  1. Create a Script invocation application
  2. In the Parameters section of the Script invocation application set the parameters for in and output.

    Parameter definitions for Script invocation
    Figure: Parameter definitions for Script invocation

  3. In the Configuration tab of the Script application enter JavaScript code to merge the records.

    Script code
    Figure: Script code

Example

The following screenshots are retrieved from the data extraction example. As you can see we have two data sources:

Records from database
Figure: Records from database

The following is an example content of a CSV file:

Customerid,Salary,Role
1,2600,HR
2,2920,Software engineer
3,3100,QA
4,1960,IT administrator
5,2380,Data Manager

The process is started by the file trigger (reading file). Then the user enters values for areaCode and investableAssets for filtering.

Enter values
Figure: Enter values

Once the above activity has been completed, the merged data is shown as below.

Merged data
Figure: Merged data

As expected, the Salary and Role fields are coming from the CSV file for customers with the same id.

Exposing the Service via REST

You can use the process interface facility of IPP to expose services based on processes via REST or as Web services.

Please refer to chapter Working with Process Definitions in the Business Process Modeling Handbook for details on working with process interfaces.

Invoking the Service periodically

Additionally, to invoke the data extraction service via a REST call, it should be invoked once a day and the result should be sent to the sales manager via e-mail.

This can be easily achieved by defining another process definition, e.g. DailyCustomerReport.

A Quartz timer start event would be defined for this process definition with the URI

<from uri="quartz://myTimer?trigger.repeatInterval=2000&trigger.repeatCount=-1"/>

Process Daily Customer Report would invoke a process Customer Query Service and a Mail Application which would send the retrieved data to the sales manager.

Processing of Record Sets

If you intend to process sets of records in your data processing, e.g. to

you first need to model a structured data type with

to carry the results of the data retrieval, e.g. for the example above.

To map entries from all CSV entries to RDBMS entries add

for (var n in csvCustomerRecords)

{
   for (var m in rdbmsCustomerRecords)
   
   {
      if (rdbmsCustomerRecord[m].CUSTOMERID == csvCustomerRecord[n].ID)
      {
         outputRecord.push({
            customerId: ,
            role: csvCustomerRecord[n].Role,
            ...});
      }
      
      break;
   }	
}

to the Advanced Mapping field of the corresponding Message Transformation application.