Tuning for Performance via Data Clusters

Data Clusters provide a way to tune performance by speeding up queries.

Data Clusters in Infinity Process Platform

By default, the Infinity Process Engine stores all process data - their values or references to these - in a single table - the table DATA_VALUE - with tuples containing

Binary data are stored in a separate table; the DATA_VALUE table contains a corresponding reference.

Access to data values - for example in data-driven worklist queries - is performed against this table. Depending on

accessing data values can be an expensive operation.

Data Clusters provide a way to speed up queries by reducing the necessary amount of database join operations on the DATA_VALUES table. To accomplish this, data-specific cluster tables may be created during an administrative tuning task consisting of several data slots, each containing a replica of the referenced data value. The drawback of this approach is considerably increased persistent storage need as of data replication.

Using n data slots from m data clusters for query evaluation reduces the number of join operations by (n-m). As some databases have tight limits on the number of efficiently performable parallel joins, doing so might considerably improve query execution plans, thus reducing query evaluation time. The implementation is straightforward, as for every data joined for predicate evaluation, the query builder just has to decide if the data is available inline in a cluster table and to replace SQL alias and attribute names accordingly.

Usage of Data Clusters will cause a worst case performance on par with an unclustered environment.

Data Cluster Configuration

Since Data Clusters only contain copies of referenced data values the Data Cluster configuration can be created and changed easily. This (re-)configuration is done by deploying an XML cluster configuration file to the audit trail database which satisfies the following XML schema:

<?xml version="1.0" encoding="UTF-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.carnot.ag/carnot-runtime-setup" xmlns:crs="http://www.carnot.ag/carnot-runtime-setup">
   <xs:element name="runtime-setup">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="crs:audit-trail" minOccurs="0" maxOccurs="1"></xs:element>
         </xs:sequence>
      </xs:complexType>
   </xs:element>
   <xs:element name="audit-trail">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="crs:data-clusters" minOccurs="0" maxOccurs="1"></xs:element>
         </xs:sequence>
      </xs:complexType>
   </xs:element>

   <xs:element name="data-clusters">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="crs:data-cluster" maxOccurs="unbounded" minOccurs="0"></xs:element>
         </xs:sequence>
      </xs:complexType>
   </xs:element>

   <xs:element name="data-cluster">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="crs:data-slots" minOccurs="0" maxOccurs="1"></xs:element>
            <xs:element ref="crs:data-cluster-indexes" minOccurs="0" maxOccurs="1"></xs:element>
         </xs:sequence>
         <xs:attribute name="tableName" type="xs:string" use="required"></xs:attribute>
         <xs:attribute name="processInstanceColumn" type="xs:string" use="required"></xs:attribute>
      </xs:complexType>
   </xs:element>

   <xs:element name="data-slots">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="crs:data-slot" minOccurs="0" maxOccurs="unbounded"></xs:element>
         </xs:sequence>
      </xs:complexType>
   </xs:element>

   <xs:element name="data-slot">
      <xs:complexType>
         <xs:attribute name="modelId" type="xs:string" use="required"></xs:attribute>
         <xs:attribute name="dataId" type="xs:string" use="required"></xs:attribute>
         <xs:attribute name="attributeName" type="xs:string" use="optional"></xs:attribute>
         <xs:attribute name="oidColumn" type="xs:string" use="required"></xs:attribute>
         <xs:attribute name="typeColumn" type="xs:string" use="required"></xs:attribute>
         <xs:attribute name="nValueColumn" type="xs:string" use="optional"></xs:attribute>
         <xs:attribute name="sValueColumn" type="xs:string" use="optional"></xs:attribute>
         <xs:attribute name="ignorePreparedStatements" type="xs:string" use="optional"></xs:attribute>
      </xs:complexType>
   </xs:element>

   <xs:element name="data-cluster-indexes">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="crs:data-cluster-index" minOccurs="0" maxOccurs="unbounded"></xs:element>
         </xs:sequence>
      </xs:complexType>
   </xs:element>

   <xs:element name="data-cluster-index">
      <xs:complexType>
         <xs:sequence>
            <xs:element ref="crs:column" minOccurs="1" maxOccurs="unbounded" />
         </xs:sequence>
         <xs:attribute name="indexName" type="xs:string" use="required" />
         <xs:attribute name="unique" type="xs:boolean" use="optional" default="false" />
      </xs:complexType>
   </xs:element>

   <xs:element name="column">
      <xs:complexType>
         <xs:attribute name="name" type="xs:string" use="required" />
      </xs:complexType>
   </xs:element>
</xs:schema>

with the following specifications:

Elements / Attributes Description
runtime-setupThe configuration file consists of one single <runtime-setup> element. If more than one exists, the first one will be evaluated. A <runtime-setup> element consists of one <audit-trail> element. If more than one exists, the first one will be evaluated.
audit-trailAn <audit-trail> element consists of one <data-clusters> element. If more than one exists, the first one will be evaluated.
data-clustersA <data-clusters> element consists of one or more <data-cluster> elements and one <data-cluster-indexes> element.
data-clusterA <data-cluster> element consists of the attribute tableName, the attribute processInstanceColumn and one <data-slots> element. If more than one <data-slots> element exists, the first one will be evaluated.
tableNameThe attribute tableName defines the name of the cluster table to be created.
processInstanceColumnThe attribute processInstanceColumn defines the name of the column which contains the process instance OID the Data Cluster entry is created for.
data-slotsA <data-slots> element consists of one or more <data-slot> elements. This element defines all data slots for the current data cluster.
data-slotA <data-slot> element consists of the attributes modelID, dataID, attributeName, oidColumn, typeColumn, sValueColumn and nValueColumn.
modelIDThe attribute modelID defines the ID of the model containing the data.
dataIDThe attribute dataID defines the ID of the data whose data value shall be copied to the Data Cluster for the process instance.
oidColumnThe attribute oidColumn defines the column name in the Data Cluster table which will take a copy of the OID column of table DATA_VALUE.
typeColumnThe attribute typeColumn defines the column name of the Data Cluster table which will take a copy of the type_key column of table DATA_VALUE.
sValueColumnThe attribute sValueColumn defines the column name of the Data Cluster table which will take a copy of the string_value column of the table DATA_VALUE.
nValueColumnThe attribute nValueColumn defines the column name of the Data Cluster table which will take a copy of the number_value column of the table DATA_VALUE.
sValueColumnThe attributes sValueColumn and nValueColumn are mutually exclusive. When the data_value is a number_value but sValueColumn is configured then this data slot will be ignored and vice versa.
data-cluster-indexesA <data-cluster-indexes> elements consists of one or more <data-cluster-index> elements. This element defines all indexes for the current Data Cluster.
data-cluster-indexA <data-cluster-index> element consist of the attribute indexName, the attribute unique and one or more <column> elements.
indexNameThe attribute indexName defines the name of one index on the Data Cluster table.
uniqueThe attribute unique defines the uniqueness of the index. Valid values are true and false . Default is false .
columnA <column> element consists of the attribute name. The order of the <column> elements in the <data-cluster-index> element matters because the same order will be used for the database index.
nameThe attribute name defines the column name in the Data Cluster table which shall be part of the current index.
ignorePreparedStatements If set to true, no prepared statement will be used for these columns when they are used in a query. The default value is false. If this attribute is missing or any value other than true is specified, it defaults to false.
enabledForProcessInstanceState The enabledForProcessInstanceState is an attribute of the data-cluster element. Using this attribute you can influence the storage of datavalues based on the process instance state. It accepts either a single entry or a comma separated list of entries from the Attribute Value column. Duplicates or intersections will be ignored.

Attribute Value Matching ProcessInstance States
ALL CREATED,ACTIVE,ABORTED,ABORTING,INTERRUPTED,COMPLETED
ALIVE CREATED,ACTIVE,ABORTING,INTERRUPTED
CREATED CREATED
ACTIVE ACTIVE
ABORTED ABORTED
ABORTING ABORTING
INTERRUPTED INTERRUPTED
COMPLETED COMPLETED

The usage of Data Clusters is activated with the property:

   Carnot.Engine.Tuning.Query.EvaluationProfile = dataClusters

Data Clusters are administered with the SysConsole Command line tool.

Data Slots

Data slots define a set of columns for a specific data. In data cluster definitions they are defined by the following attributes:

   <xs:element name="data-slot">
      <xs:complexType>
         <xs:attribute name="modelId" type="xs:string" use="required" />
         <xs:attribute name="dataId" type="xs:string" use="required" />
         <xs:attribute name="attributeName" type="xs:string" use="optional" />
         <xs:attribute name="oidColumn" type="xs:string" use="required" />
         <xs:attribute name="typeColumn" type="xs:string" use="required" />
         <xs:attribute name="nValueColumn" type="xs:string" use="optional" />
         <xs:attribute name="sValueColumn" type="xs:string" use="optional" />
         <xs:attribute name="ignorePreparedStatements" type="xs:string" use="optional"></xs:attribute>
      </xs:complexType>
   </xs:element>

The attributes dataId and attributeName are unique within a model.

The attribute modelId allows to define:

In case a data cluster definition does not contain the modelId attribute, a warning appears that the specific slot will be ignored. This would result in:

Ignoring prepared Statements (deprecated)

The following example demonstrates the usage of the ignorePreparedStatements attribute. It is assumed that prepared statement is turned on globally. Note that property ignorePreparedStatements in a DataCluster configuration will only be evaluated if AuditTrail.UsePreparedStatements is set to mixed mode. For details on this property, refer to section Using Prepared Statements of chapter Infinity Process Platform Database Access Tuning. For example set the attribute in a data-slot element of a data cluster definition as follows:

<data-slot modelId="DCModel" dataId="SimpleData" oidColumn="SimpleData_oid" 
typeColumn="SimpleData_t" sValueColumn="SimpleData_s" ignorePreparedStatements="true">
</data-slot>

No bind values will be used for these columns in case they are used in a query in mixed mode, e.g.:

ProcessInstanceQuery.findInStateHavingData(String processID,String dataID, Serializable dataValue, ProcessInstanceState state);

Without setting ignorePreparedStatements=true this would result in the following SQL statement:

SELECT pi.oid,
       pi.starttime,
       pi.terminationtime,
       pi.state,
       pi.model,
       pi.processdefinition,
       pi.priority,
       pi.tokencount,
       pi.propertiesavailable,
       pi.rootprocessinstance,
       pi.scopeprocessinstance,
       pi.startinguser,
       pi.startingactivityinstance
FROM   carnot.process_instance pi
       INNER JOIN carnot.model m
               ON ( pi.model = m.oid )
       INNER JOIN carnot.procinst_hierarchy PDF_PIH1
               ON ( pi.oid = PDF_PIH1.subprocessinstance )
       INNER JOIN carnot.process_instance PDF_PI1
               ON ( PDF_PIH1.processinstance = PDF_PI1.oid )
       INNER JOIN carnot.dc_test_data PR_DVCL1
               ON ( pi.scopeprocessinstance = PR_DVCL1.process_instance )
WHERE  ( PDF_PI1.processdefinition = ?
         AND pi.state = 2
         AND ( PR_DVCL1.simpledata_t = ?
               AND PR_DVCL1.simpledata_s = ? )
         AND m.partition = ? )  

Setting ignorePreparedStatements=true however would result in:

...
WHERE  ( PDF_PI1.processdefinition = ?
         AND pi.state = 2
         AND ( PR_DVCL1.simpledata_t = 8
               AND PR_DVCL1.simpledata_s = 'SimpleDataValue' )
         AND m.partition = ? )  

Sample Data Cluster Configuration File

The following example configuration file creates one Data Cluster table with name dv_mqt01 consisting of a column processInstance referencing the process instance primary key and three data slots referencing data values with data IDs anInt, aString and aTimestamp. Each data slot maps the DATA_VALUE table columns oid, type_key and number_value and string_value respectively to data slot columns. Additionally two non-unique indexes with the names DV_MQT01_IDX1 and DV_MQT01_IDX2 are created.

Evaluation of a query referencing three data values with the IDs anInt, aString and aLong would result in a query joining the DATA_VALUE table once for ID aLong - because this is not a slot in the configured cluster - and one join of the cluster table dv_mqt01. Defining clusters with more slots could result in bigger reduction of joins.

   <?xml version="1.0" encoding="UTF-8"?>
   <runtime-setup xmlns="http://www.carnot.ag/carnot-runtime-setup">
      <audit-trail>
         <data-clusters>
            <data-cluster
               tableName="dv_mqt01"
               processInstanceColumn="processInstance">
               <data-slots>
                  <data-slot 
                     dataId="anInt"
                     oidColumn="oid_anInt"
                     typeColumn="tk_anInt"
                     nValueColumn="nval_anInt" />
                  <data-slot
                     dataId="aString"
                     oidColumn="oid_aString"
                     typeColumn="tk_aString"
                     sValueColumn="sval_aString" />
                  <data-slot
                     dataId="aTimestamp"
                     oidColumn="oid_aTime"
                     typeColumn="tk_aTime"
                     nValueColumn="nval_aTime" />
               </data-slots>
               <data-cluster-indexes>
                 <data-cluster-index indexName="DV_MQT01_IDX1" unique="true">
                    <column name="processInstance"/>
                 </data-cluster-index>
                 <data-cluster-index indexName="DV_MQT01_IDX2" unique="false">
                    <column name="nval_anInt"/>
                    <column name="nval_aLong"/>
                    <column name="nval_aTime"/>
                    <column name="tk_anInt"/>
                    <column name="tk_aLong"/>
                    <column name="tk_aTime"/>
                    <column name="processInstance"/>
                 </data-cluster-index>
               </data-cluster-indexes>
            </data-cluster>
         </data-clusters>
      </audit-trail>
   </runtime-setup>

Data Slots for Structured Data

To create data slots for structured data, the additional attribute attributeName must be specified on the element data-slot. This attribute must contain the XPath to the primitive data field, that must be put to this data slot.

For example add the following in the configuration file:

...
<data-slot
dataId="OrderBook1"
attributeName="order/qty"
oidColumn="oid_qty_1"
typeColumn="tk_qty_1"
nValueColumn="nval_qty_1" />
...

Hereby the value resulting from evaluating XPath expression order/qty on data OrderBook1 is written to this data slot (table column nval_qty_1).

For information on working with structured data please refer to the chapter Structured Data Types.

Note
Structured data list values are not supported for Data Clusters as they would require more than one entry in the data slot.

Creating Data Slots for a Non-Empty Audit Trail

Note, that if a data slot is created when the Audit Trail database already contains data values, the following is required:

Data XPaths put to the attributeName of cluster slots for structured data and DMS data types must have both annotations indexed and transient set to true from the beginning of the model execution (these are the defaults). Otherwise, for these data types, no values of already existing process instances can be transferred into the new data cluster.

Required column types

When defining a data cluster slot for a date field from a structured data, please take care on the according column type required. For example clusters with xsd types date or boolean require the column type Number. Please refer to the table in the section Mapping XSD Types of the chapter Defining Structured Types for detailed information on the mapping performed for XSD types.

Designing Data Clusters

The design and use of Data Clusters has to be considered along the following guidelines:

Usually, the design of clusters along the previous principles should not be too difficult, as values of process data intensively used for querying and sorting typically are quite stable during the lifetime of process instances.

Designing Data Cluster Indexes

Three categories of indexes have to be defined for Data Clusters:

Update indexes have to exists and consist of only one column. This column is the same as the column defined by the attribute processInstanceColumn (see index DV_MQT01_IDX1 in the sample configuration file above).

Query Indexes are used for fast lookup in the first place. These indexes consist of columns defined by the attributes sValueColumn or nValueColumn of the <data-slot> elements as well as the columns defined by the according typeColumn attribute. Additionally, it should contain the same column as the update index.

Sorting indexes are used for sorting purposes and therefore only need to consist of columns which are used in ORDER BY clauses. There is no use of columns defined in the typeColumn attribute.

It depends on the ability of the used DBMS to combine several indexes in one query whether it is necessary to provide only a few small indexes which can be combined or many bigger indexes.