Run an Updater Script
Introduction
Goal
Run a Groovy Updater Script to perform bulk changes on repository content.
Background
The Updater Editor allows developers to create, manage and run updater scripts against a running repository from within the CMS UI. Updater scripts can perform bulk changes to existing content.
See Write an Updater Script for more information. This page explains the execution options available in the Updater Editor.
With Great Power Comes Great Responsibility
Updater scripts can modify large parts of your repository. Use them with care.Security
The scripts are executed via a custom Groovy ClassLoader which protects against obvious and trivial mistakes and misuse (for example invoking System.exit()). However this is not intended to provide a fully protected Groovy sandbox. This means that technically Groovy Updater scripts can be used to execute external programs, possibly compromising the server environment.Therefore, protection against incorrect usage of Groovy updater scripts must be enforced by limiting the access and usage to trusted developers and administrators only.
Manage Updater Scripts
The left side of the Updater Editor consists of three parts:
- Registry
Contains all created updater scripts. Select a script from the registry to execute it. - Queue
Contains all scripts that are (waiting to be) executed. The scripts are executed in the order in which they were added to the queue. Only one script is executed simultaneously, even in a clustered environment. You can stop the currently executing script, and delete queued scripts from the queue. Stopping a script will finish the current NodeUpdaterVisitor#doUpdate call before actually stopping. The output of the script is available in the bottom part of the screen, and live updated every few seconds. - History
Contains all scripts that have been (fully or partially) executed. Scripts that have been executed can be reverted from here, provided they support this feature by having implemented undoUpdate.
Execution Options
Node Selection
The updater engine uses the visitor pattern. Which nodes are visited is specified by Select node using (Repository path, XPath query, or Updater):
- Repository path is an absolute path in the repository, for example: /content/documents or /hst:hst/hst:configurations. All nodes below the path will be visited, including the node specified by the path itself.
- XPath query is an XPath query that selects the nodes to visits. Examples queries are:
//element(*, hippo:document) all nodes of type ' hippo:document' /jcr:root/hst:hst/hst:configurations//element(*, hst:sitemapitem) all nodes of type ' hst:sitemapitem' below /hst:hst/hst:configurations //*[@example:title='foo'] all nodes that have the property ' example:title' set to the value 'foo' Try out XPath queries in the repository servlet at http://localhost:8080/cms/repository - Updater indicates the script itself provides the logic for navigating one or more nodes to visit. The script must implement (override) the firstNode and nextNode methods provided by the BaseNodeUpdateVisitor base class.
Performance
Changes to visited nodes are saved in batches. Each executed script can specify a Batch Size and a Throttle value:
- batch size is the number of nodes that have to be modified before changes are written to the repository (the engine counts the number of updated nodes by checking if the return of #doUpdate(Node) method; true for updated, false for skipped and exception/error for failed ones). Keep the batch size reasonably low, say fifty or a hundred, to avoid large changesets that consume a lot of memory.
See more detail in Reporting of Execution section below. - throttle is the number of milliseconds to wait after each batch. This avoids that a running repository is swamped with changes and becomes unresponsive to other users.
Logging
- Log Level
You can select the log level of an Updater script from one of the following: TRACE, DEBUG and INFO. DEBUG has been set by default. For example, if you set the log level to INFO in the Updater Editor, any log messages at TRACE or DEBUG level in the script won't be printed out. - Log Target (available since Bloomreach Experience Manager version 13.4.22)
You can select the target for the Updater script to write log messages to from one of the following: LOG FILES or REPOSITORY. When LOG FILES is selected, log messages are written to regular log files using the logger for org.onehippo.repository.update.UpdaterExecutionReport and not displayed in the UI. When REPOSITORY is selected, log messages are written to JCR nodes and displayed in the UI when running the script.The Log Target option is only available either when running in local development mode (i.e. using the cargo.run profile) or when the system property groovy.persist.logs.supported is set to true. In all other scenarios, the Log Target option is not available and log messages are written to log files only.Log Target REPOSITORY should be avoided in use cases where frequently running scripts produces large log outputs, as this would fill up the datastore aggressively and lead to performance issues.In Bloomreach Experience Manager versions 13.4.17 up to and including 13.4.21:- Only when using the cargo.run profile (local development), log messages by Groovy Updater Scripts are written to JCR nodes and displayed in the UI when running the script.
- In all other scenarios, log messages are written to regular log files using the logger for org.onehippo.repository.update.UpdaterExecutionReport and not displayed in the UI.
In Bloomreach Experience Manager versions 13.4.16 and earlier, log messages by Groovy Updater Scripts are always written to JCR nodes and displayed in the UI when running the script.
Parameters
Scripts can accept Parameters:
- Parameters can be specified with a valid JSON string which defines a map of parameter name (String) and parameter value (Object) pairs.
Example: { "basePath": "/content/documents/myproject/news", "tag" : "gogreen" }
Execution Mode
There are two ways to execute a script:
-
Execute will visit all specified nodes and save the changes to the repository after each batch. The UUIDs of all modified nodes are logged in case the script has to be undone later.
- Dry run will also visit all specified nodes, but never write any changes to the repository (i.e. the engine calls Session.refresh(false) after each batch)
Use dry run to try out new scripts without risk.
Automatically Execute Updater Scripts on Startup
It's possible to automatically execute scripts on startup by using the repository-data-application module to add the scripts as content definitions to /hippo:configuration/hippo:update/hippo:queue. Once the application has started, it will execute any scripts in the queue.
As of version 13, the updater execution module is configured by default to run scripts on full CMS nodes only.
Technically, it is possible to automatically execute a script in a delivery-tier-only environment as long as the following two are true:
- At the node /hippo:configuration/hippo:modules/updater-execution, the property hipposys:cmsonly is set to false.
- The updater script only depends on libraries available on the classpath in that environment (typically this does not include CMS libraries!).
Undo Updates
An updater script can support undo of its modifications by implementing the undoUpdate method.
Scripts in the History that have been executed can be undone by clicking the Undo button. The updater engine will then visit only those nodes again that were modified before by the doUpdate method. For these modified nodes it will call the method undoUpdate.
Items in the History that were dry run or were the result of an undo run cannot be undone.