Derived Data
Description of the problem
Derived data are properties that are automatically calculated and set on a document during a session save. An example of derived data is the size of some (e.g. binary) property of a node. Such derived data might have to be stored on the node itself.
Since you don't control all the places where such properties might be set, the derived data needs to be calculated and set implicitly to prevent inconsistent data.
Another reason for this functionality is that the query languages available do not allow you to express all types of realistic queries. For example, XPath does not allow you to query for documents that have two properties that are equal to each other. Naively this could be written down as //*[@a=@b] but this yields no results, even though logically there are. Certain other queries are possible but have huge performance impacts. These are deliberate limitations in the query languages XPATH and JCR-SQL, not bugs.
Facility offered
As a solution for expressing efficient queries and for accessing information about the content without having to know and execute the procedure to obtain the data, Hippo Repository has the capability of triggering derived data functions. A derived data function computes properties that derived from other properties of the document. Derived properties may be put on and computed from the JCR node that represents a document, or on any descendant node in a document.
When editing a document which should contain such derived property, you should not set the value of the derived property yourself, instead, the repository will automatically compute the value of the property during save(). Because the repository guarantees to recompute the property upon a save, the data will always be up to date.
In order for the repository to do this, it must be informed when and how to compute the properties.
- "when" is determined by the JCR nodetype of the data. The repository can be configured to compute a property of a certain node type.
- "how" to compute a property is to be implemented by a class implementing the derived data function interface.
Usage
We will outline how to define, configure and use derived data functions based on a simple example to compute Pythagorean theorem.
Defining the data for which to compute properties
We define a document type that is a core shape definition:
[sample:shape] > hippo:document - sample:a (double) - sample:b (double)
And subsequently a definition that can be added as mixin type to the shape definition to indicate the shape is a triangle:
[sample:triangle] > hippo:derived mixin - sample:c (double)
To indicate certain properties of this type sample:triangle are to be computed using the procedure of derived data we must extend from the hippo:derived mixin node type.
Configuring the repository to compute derived properties for this data
Now we need to configure in the repository how to compute the derived property field of sample:triangle. These procedures are defined in the JCR repository under /hippo:configuration/hippo:derivatives. To compute the c property we can enter the following JCR definition
[repository root] + hippo:configuration + hippo:derivatives [hipposys:derivativesfolder] + pythagorean [hipposys:deriveddefinition] - hipposys:nodetype = sample:triangle - hipposys:classname = sample.PythagoreanTheorem - hipposys:serialver = 1 + hippo:accessed [hipposys:propertyreferences] + a [hipposys:relativepropertyreference] - hipposys:relPath = sample:a + b [hipposys:relativepropertyreference] - hipposys:relPath = sample:b + hippo:derived [hipposys:propertyreferences] + c [hipposys:relativepropertyreference] - hipposys:relPath = sample:c
First, the hipposys:nodetype property defines the nodetype which contains the properties that should be derived. For any change to nodes of this type, this derived data definition indicates the function to be executed.
The hipposys:classname property contains the name of the class that should extend the base class org.hippoecm.repository.ext.DerivedDataFunction. The class PythagoreanTheorem must have a no argument public constructor. The number stated in the hipposys:serialver property should match the serialVersionUID field in the implementing class sample.PythagorieanTheorem. The definitions in hippo:accessed and hippo:derived node structure indicate the input and output parameters to the derived data function. Here we indicate that relative to the node of type sample:triangle there are two input properties: sample:a and sample:b. The hipposys:relPath properties indicate the relative path to the subject node for which the computation takes place. The value of these two properties are entered as keys "a" and "b" (the name of the hipposys:relativepropertyreference nodes) in a Map the compute method implemented by PythagoreanTheorem takes as input:
public Map<String,Value[]> compute(Map<String,Value[]> parameters);
As result the compute method should return a map where under the key " c" the value for the derived property sample:c can be found. The definition also states the (possibly multiple) computed results by the function as nodes under hippo:derived. The hipposys:relPath again indicates the relative path to the property. The hipposys:relPath may indicate any property below the document for which properties are computed. It may not contain references to other documents.
Supplying the method that computes the derived property
The configuration indicates which class should be used to compute the data. This class must extend the org.hippoecm.repository.ext.DerivedDataFunction base class and implement the compute method. Since derived data is a Repository function, add this class to the cms module of your project and not the site module.
package sample; import org.hippoecm.repository.ext.DerivedDataFunction; public static class PythagoreanTheorem extends DerivedDataFunction { static final long serialVersionUID = 1; public Map<String,Value[]> compute(Map<String,Value[]> parameters) { double a = parameters.get("a")[0].getDouble(); double b = parameters.get("b")[0].getDouble(); double c = Math.sqrt(a * a + b * b); parameters.put("c", new Value[] { getValueFactory().createValue(c) }); return parameters; } }
This class can be packaged in a normal plug-in. Upon any change the properties will be computed. Current limitations give however one exception, imported data is not recomputed and must be already correct.
Deriving Data From Another Node
As stated above derived properties may be put on and computed from the JCR node that represents a document, or on any descendant node in a document. In some use cases this is not sufficient. Take for example the following typical node structure representing a document:
+ document [hippo:handle] - hippo:name = "Pretty Name" + document [myhippoproject:newsdocument] - myhippoproject:title = "Pretty Name" - hippostd:state = draft
(node and properties not relevant to the example left out)
There is a hippo:handle node document with one myhippoproject:newsdocument child node with the same name, representing the draft variant of the document. In addition the hippo:handle node has a property hippo:name (from the hippo:named mixin) holding the "pretty name" of the document.
The document's pretty name is entered by the user in the new document dialog when creating a document. Suppose you want to store the same pretty name for the myhippoproject:title property of the new document draft so that the user does not have to enter it again. A Derived Data Function would be a convenient way to implement this. However, the pretty name is not stored on the document node or one of its descendants, but rather on a parent node (the handle). A regular hipposys:relativepropertyreference node can't be used. In such a use case you can use a hipposys:resolvepropertyreference node and reference the sibling node's property as ../hippo:name.
[repository root] + hippo:configuration + hippo:derivatives [hipposys:derivativesfolder] + title [hipposys:deriveddefinition] - hipposys:nodetype = myhippoproject:newsdocument - hipposys:classname = org.example.NewsDocumentTitle - hipposys:serialver = 1 + hippo:accessed [hipposys:propertyreferences] + message [hipposys:resolvepropertyreference] - hipposys:relPath = ../hippo:name + hippo:derived [hipposys:propertyreferences] + title [hipposys:relativepropertyreference] - hipposys:relPath = myhippoproject:title