One of the most used features of the Pentaho Business Analytics platform is its OLAP engine (aka Mondrian) and the accompanying web-based interactive end-user tool, Analyzer. As long-time Pentaho experts, we have seen many and varied implementations of Mondrian/Analyzer.
At the heart of these implementations is the creation of a cube schema file -- an XML file that maps logical cube constructs like measures and dimensions to underlying database structures. The secret sauce of a successful OLAP implementation requires a blend of database and cube schema design, one that provides performant, yet flexible analytic investigations.
The need for dynamism
In most environments, the schema files are well-defined. The underlying database structure is well-known and all end-user access is consistent. You can define a static schema definition that suits all users, leveraging Mondrian role definitions to manage role-specific data authorization. However, there are a growing number of situations when the schema definition must be dynamically altered (even completely generated!) at run time.
We have encountered the following scenarios requiring dynamic schema definition:
- Internationalization - Replacing user-visible names and help text
- Multi-tenant Customizations - Enabling tenant-specific customizations including new/changed dimension and measure definitions.
- Database-driven Data Authorization - Using database structures to drive data authorization within cubes
- Dynamism in Underlying Data Content - Supporting the increasing prevalence of dynamic dataset structure -- especially for rapidly evolving big data event capture. A dynamic schema would allow for cube definitions to adapt as new fields are captured.
- Denormalizing Key-Value Dimensions - Supporting capture of additional dimensional attributes that are often stored in key-value tables. Imagine a customer dataset with varied attributing. A dynamic schema processor would allow for these key-values to be flattened into a series of dimensions without having to change the schema each time a new attribute is added.
These are difficult situations to handle with statically defined cube schema. The good news is that the open source Mondrian OLAP engine provides a comprehensive Java API. There is an interface that allows a java developer to create a custom Dynamic Schema Processor (DSP) which can, on-the-fly, generate and return schema XML to the Mondrian OLAP engine.
Inquidia has extensive experience developing custom DSPs for numerous clients. However, implementation requires deep knowledge of the Java API and strong Java development skills. We began searching for a way to make DSP development accessible to the broader audience of BI and data programmers.
Inquidia Labs does it again: a Java-free Mondrian Dynamic Schema Processor
The “dots were connected” with the creation of our PDI Token Replacement Plugin. This new step for Pentaho Data Integration (PDI) enables the dynamic customization of an arbitrary input string. You specify tokens in the string and then use the step to dynamically replace those tokens with data-driven input values to produce a new output string.
This token replacement pattern is identical to that used by most of the DSPs we have developed. The DSP typically receives a template schema XML document and dynamically replaces tokens in the document based on DSP-specific logic to produce a customized schema definition that is passed to Mondrian. If we could create a DSP using a PDI transformation, we could avoid complex custom java development!
Our Inquidia Labs team set to work on a solution -- enable the development of DSPs via the visual data programming capabilities of PDI. Once the problem was framed our Inquidia Labs director, Chris Deptula, quickly developed a solution. Since PDI is 100% Java, we just had to create a reusable, configurable DSP Java class that executed a PDI transformation to output a schema XML document for Mondrian. The PDI transformation can do all the heavy-lifting of schema generation.
How it works
The core of the solution is a new DSP class, KettleDynamicSchemaProcessor. Once the proper jar file is deployed to the Pentaho BA Server, you can use this class to define a new Mondrian connection via the Pentaho BA Server “Manage Data Sources” interface. The following table enumerates the most commonly used parameters (additional performance-focused parameters also exist):
|kettleTransFile||mandatory||Path to the PDI transformation file|
|kettleStepName||optional||(Default: OUTPUT) The name of the step in the transformation from which to read the output schema XML|
|schemaFieldName||optional||(Default: schema) The name of the field in the step that contains the schema XML|
There are also numerous Pentaho provided session and security parameters which can be passed to enable the schema generator to create session, user and role specific content.
Finally, we created a couple of helpful CTools CDE applications which can be used to aid development and testing. The Session Attributes Dashboard displays all of the current session’s attributes and values. This is helpful when debugging DSP logic that uses session variables. The DSP Cache Clearer does just that -- it ensures that the the Mondrian DSP cache is cleared so that the next attempt to access a DSP-based cube forces execution of DSP logic.
So far, the KettleDynamicSchemaProcessor works like a champ. We have put it through the paces with several scenarios in our lab. But, we’d like your help...
How you can get the software
The KettleDynamicSchemaProcessor is available in beta preview. We are looking for organizations that are interested to help test and extend the software before we release it to the general public.
If you’d like to give it a try, just send us an email at email@example.com, or fill out the request form below. We’ll get you the code and documentation needed to get started.