Run an Updater Script
Introduction
Goal
Run a Groovy Updater Script to perform bulk changes on repository content.
Background
The Updater Editor allows developers to create, manage and run updater scripts against a running repository from within the CMS UI. Updater scripts can perform bulk changes to existing content.
See Write an Updater Script for more information. This page explains the execution options available in the Updater Editor.
With Great Power Comes Great Responsibility
Updater scripts can modify large parts of your repository. Use them with care.Security
The scripts are executed via a custom Groovy ClassLoader which protects against obvious and trivial mistakes and misuse (for example invoking System.exit()). However this is not intended to provide a fully protected Groovy sandbox. This means that technically Groovy Updater scripts can be used to execute external programs, possibly compromising the server environment.Therefore, protection against incorrect usage of Groovy updater scripts must be enforced by limiting the access and usage to trusted developers and administrators only.
Manage Updater Scripts
The left side of the Updater Editor consists of three parts:
- Registry
Contains all created updater scripts. Select a script from the registry to execute it. - Queue
Contains all scripts that are (waiting to be) executed. The scripts are executed in the order in which they were added to the queue. Only one script is executed simultaneously, even in a clustered environment. You can stop the currently executing script, and delete queued scripts from the queue. Stopping a script will finish the current NodeUpdaterVisitor#doUpdate call before actually stopping. The output of the script is available in the bottom part of the screen, and live updated every few seconds. - History
Contains all scripts that have been (fully or partially) executed. Scripts that have been executed can be reverted from here, provided they support this feature by having implemented undoUpdate.
Execution Options
Node Selection
The updater engine uses the visitor pattern. Which nodes are visited is specified by Select node using (Repository path, XPath query, or Updater):
- Repository path is an absolute path in the repository, for example: /content/documents or /hst:hst/hst:configurations. All nodes below the path will be visited, including the node specified by the path itself.
- XPath query is an XPath query that selects the nodes to visits. Examples queries are:
//element(*, hippo:document) all nodes of type ' hippo:document' /jcr:root/hst:hst/hst:configurations//element(*, hst:sitemapitem) all nodes of type ' hst:sitemapitem' below /hst:hst/hst:configurations //*[@example:title='foo'] all nodes that have the property ' example:title' set to the value 'foo' Try out XPath queries in the repository servlet at http://localhost:8080/cms/repository - Updater indicates the script itself provides the logic for navigating one or more nodes to visit. The script must implement (override) the firstNode and nextNode methods provided by the BaseNodeUpdateVisitor base class.
This feature is available since Bloomreach Experience Manager v12.1.1 (also backported to v12.0.4, v11.2.5 and v10.2.9)
Performance
Changes to visited nodes are saved in batches. Each executed script can specify a Batch Size and a Throttle value:
- batch size is the number of nodes that have to be modified before changes are written to the repository (the engine counts the number of updated nodes by checking if the return of #doUpdate(Node) method; true for updated, false for skipped and exception/error for failed ones). Keep the batch size reasonably low, say fifty or a hundred, to avoid large changesets that consume a lot of memory.
See more detail in Reporting of Execution section below. - throttle is the number of milliseconds to wait after each batch. This avoids that a running repository is swamped with changes and becomes unresponsive to other users.
Logging
- Log Level You can select the log level of an Updat er script from one of these: TRACE, DEBUG and INFO. DEBUG has been set by default. For example, if you set the log level to INFO in the Updater Editor, any log messages at TRACE or DEBUG level in the script won't be printed out.
As of Bloomreach Experience Manager 12.6.26, 13.4.17, 14.7.6, and 15.0.0:
- Only when using the cargo.run profile (local development), log messages by Groovy Updater Scripts are written to JCR nodes and displayed in the UI when running the script.
- In all other scenarios, log messages are written to regular log files using the logger for org.onehippo.repository.update.UpdaterExecutionReport and not displayed in the UI.
Parameters
Scripts can accept Parameters:
- Parameters can be specified with a valid JSON string which defines a map of parameter name (String) and parameter value (Object) pairs.
Example: { "basePath": "/content/documents/myhippoproject/news", "tag" : "gogreen" }
Execution Mode
There are two ways to execute a script:
-
Execute will visit all specified nodes and save the changes to the repository after each batch. The UUIDs of all modified nodes are logged in case the script has to be undone later.
- Dry run will also visit all specified nodes, but never write any changes to the repository (i.e. the engine calls Session.refresh(false) after each batch)
Use dry run to try out new scripts without risk.
Automatically Execute Updater Scripts on Startup
It's possible to automatically execute scripts on startup by using the repository-data-application module to add the scripts as content definitions to /hippo:configuration/hippo:update/hippo:queue. Once the application has started, it will execute any scripts in the queue.
In a delivery-tier-only environment, only the functionality provided by Hippo Repository might be available on the classpath.
Undo Updates
An updater script can support undo of its modifications by implementing the undoUpdate method.
Scripts in the History that have been executed can be undone by clicking the Undo button. The updater engine will then visit only those nodes again that were modified before by the doUpdate method. For these modified nodes it will call the method undoUpdate.
Items in the History that were dry run or were the result of an undo run cannot be undone.