How Replication Works
Replication in Hippo is implemented using a push approach. Changes on the source are packaged and sent to the target where they are applied. The monitoring of the changes and the sending of the packages are separate, asynchronous processes.
The Change Monitor
Changes to the repository are monitored by periodically checking the repository journal for updates. This means that even if you don't run the cms clustered, you still need to configure a repository journal for replication to be able to do its job.
Reading the repository journal means processing JCR events. These events occur either within, or outside the scope of a unit of replication, where a unit of replication is a set of nodes that is sent to the target as a single serialised xml file. An example of a unit of replication is a single document variant with all its sub nodes.
Events that occur outside the scope of a unit of replication are ignored. These are events on nodes that should not be replicated.
Every unit of replication in which a change is detected is recorded in a change log. A single change log contains the units of replication that were modified as a result of a single session save operation. The fact that a single change log represents all the nodes that were changed within that save operation ensures consistency: as long as the application built on top of the repository saves changes to its model that are internally consistent to that model, replication of those changes is likewise consistent.
However, as the queue of change logs to be processed builds up, newly added change logs may intersect with existing change logs in the queue: a node that was changed as part of an earlier operation, is changed again as part of a later operation. To ensure consistency, both changes must be replicated at the same time.
Sending Replication Packages
While the change monitor is detecting changes, creates change logs, and queues them, a parallel process is running that consumes the queued change logs. This is the process that creates and sends packages to the target. A package is a zip file that contains xml representations of units of replication to be imported on the target. This xml representation is similar to system view xml with a few enhancements.
As packages are created from bundles of change logs, concurrent changes to the included replication scopes may disturb the consistency of the package. When such an overlapping change is detected the package is abandoned and a new attempt is made to create the next pending package.
Importing Replication Packages
The source repository sends the package to the target repository over REST. The target unpacks the package and imports the xml-serialised units of replication. This happens synchronously so that errors can be reported back to the source as part of the REST response. Some errors may trigger action on the source to automatically fix the problem. For instance, if a unit of replication could not be imported because of a missing ancestor, the source will initiate a partial sync of the subtree rooted at the missing ancestor.