Repository Data Modules Structure

Understand on a conceptual level how the repository data modules are structured.

Three Dimensional Layout

Bloomreach Experience Manager defines repository data in a project along three different dimensions:

  1. Application vs. Development
  2. Config vs Content
  3. Platform vs Site

You should carefully consider which pieces of repository data of your project should go into which slot in that three dimensional layout.

Application vs. Development

On the first dimension, we separate definitions meant to affect all environments from definitions meant to be used in development environments only. Definitions for all environments go into the repository-data/application and repository-data/site modules, while definitions for development environments go into the repository-data/development and repository-data/site-development modules. Typically the content of the development repository data module depends on the application repository data module, such that you can use the application module without the development module, but you cannot use the development module without the application module.

What is a development environment?

By development environment, we refer to an environment you use during the development of your project. This may be a local environment (on a laptop, for example), or it may be on some remote server that is used for continuous integration, demo-ing or testing purposes. In both cases, what makes the environment a development environment is the fact that you regularly reset the repository to a known, initial state, which typically includes a set of test configuration and content. Content created on a development environment may be used as seed content for a production deployment, but other than that, new or changed content on a development environment is a throw-away thing without value. In contrast, content on a production system (which is kind of the opposite of a development environment) is very valuable and must be dealt with very diligently.

What is application repository data?

Any definition that should show up on a production system’s repository, upon deployment of the project, should be considered application repository data. This almost certainly includes important system configuration data. This may also include seed content in order to provide CMS author and editors with a sensible starting point for creating production content.

What is development repository data?

In contrast to the application repository data, all config and content definitions, which are not intended to end up on a production system should be considered development repository data. Typical examples for development repository data are:

  • Test users such as the archetype’s author and editor users, along with their group memberships.

  • Demo or test content, such as test fixtures relied upon by automated testing, or a set of content to facilitate manual smoke/regression testing or demonstration of relevant project features.

  • Demo or test configuration, such as configuration of the auto-export functionality (relevant only for local deployments with access to the project sources), or facet navigation nodes used for demo or test reports

Deploying to a production (-like) environment

When deploying a distribution of your project to a production environment (or a production-like environment such as Acceptance), you should make sure that the development repository data module is not part of the distribution.

If you are working on validating an upgraded project against a production database copy locally, using the Maven cargo plugin, you also should not include development repository data in your setup. The Bloomreach Experience Manager archetype provides the -Pwithout-development-data Maven profile for this purpose.

Deploying to a development environment

When deploying to a local development environment (your laptop), make sure that both the application repository data module and the development repository data module are deployed. In Bloomreach Experience Manager’s archetype, the development module is included by default when deploying your project locally.

When deploying to a remote development environment, also make sure that both the application and the development module are included in your distribution. While the development module is most likely excluded by default when you build a distribution of your project, the Bloomreach Experience Manager archetype comes with the -Pdist-with-development-data Maven profile to include the development module in your distribution.

Config vs. Content

The second dimension for arranging your project’s repository data definitions is achieved by separating config and content. In order to do so, both the application and the development module may contain a hcm-config and a hcm-content folder in their /src/main/resources directory.

Note that in previous Bloomreach Experience Manager releases, config and content were only very loosely defined and typically, each had their own "bootstrap module". In Bloomreach Experience Manager 12 and higher, config and content are much more strictly defined and are now separated within each repository data module.

What is Config?

Basically, ‘config’ is any repository data which is owned (created and maintained) by a Bloomreach Experience Manager project developer, and which is driven from the project’s sources. Bloomreach Experience Manager’s configuration management mechanism subdivides this data into namespace, webfilebundle and config definitions.

The hcm-config folder should contain all YAML Sources which define configuration data, namespaces and webfile bundles*. The config data, in turn, not only contains definitions for configuration nodes and properties, but also repository data categorization (What subtrees are content? What subtrees or properties are system?) and initial values for system data.

Any externalized resources, referred to by the config or namespace definitions, also go into the hcm-config folder. For example, updater scripts can be externalized, such that the Groovy code is no longer embedded in the YAML Source, but more readable and maintainable in a separate *.groovy file. The same may be beneficial for configuration strings using the JSON format.

*Bloomreach Experience Manager has the limitation that a Maven module which contributes webfiles will get all directories under /src/main/resources bootstrapped into the repository’s /webfiles. For that reason, we recommend to keep your webfiles in a Maven module separate from the application and development modules.

What is Content?

In contrast to ‘config’, ‘content’ is any repository data which is owned (created and maintained) by CMS users. This includes primarily documents, image sets and assets, but also parts of the HST configuration data which can be adjusted by webmasters in the Experience manager (and are stored in the “hst:workspace”, per configuration), or editable data provided by plugins and add-ons, such as URL rewriter rules, Relevance characteristics or reusable parts of forms.

Since this data can be created in the CMS, it is fair to ask why content even has a place in Bloomreach Experience Manager’s configuration management mechanism. The answer is: because we need to be able to bootstrap seed, test and demo content. While test and demo content typically go into the development data module, seed content is what you’ll most likely encounter in the application data module. In cases where your seed content overlaps / conflicts with test or demo content, you could even consider introducing an extra repository data module, to contain data which is not intended to be included on development environments.

The hcm-content folder must only contain YAML content Sources, i.e. a single definition (root) per Source. Also, the definition root nodes in your hcm-content folder must match with the repository data categorization, as specified in the config data. If you root a subtree of content at a node which is not declared to be content in your configuration model, the CMS will throw an exception.

Bloomreach Experience Manager’s configuration management mechanism has been designed to be very careful not to interfere with any repository data marked as content. It therefore remembers when a piece (subtree) of content is bootstrapped, in order not to bootstrap it again later. As such, once the seed content has been bootstrapped, its ownership is transferred from the Bloomreach Experience Manager project developer (who developed the seed content) to the Bloomreach Experience Manager user (who will maintain the content, going forward).

On a development environment, there is, by definition, no existing content, and therefore, careful categorization into config and content is less important for the repository data in the development module. In order to get used to carefully working with config and content in the application module, we recommend to apply the same approach to the development module anyway. But yes, you do have more ‘freedom’ there.

Custom Categorization

Appropriate categorization of the repository data into config and content is thus essential to a proper operation (and upgrade) of your project. Bloomreach Experience Manager attempts to provide a sensible categorization out of the box (i.e. by means of the auto export configuration), but your project may have specific reasons to adopt a modified categorization. If that’s the case, it is your task to customize the categorization in your application module’s config definitions appropriately. Make sure to validate the effect of these customizations before you deploy an upgrade to a production system, in order to avoid accidental loss (or mix-up) of your project’s precious content.

Platform vs Site

Finally, the third dimension seperates the platform web application from the site web application(s). When running in multi-site mode (default since v13.0), each site application is responsible for bootstrapping its own repository data. Therefore, the data need to be contained in the site's own WAR. This allows for flexibility in the timing of deployments – sites can be updated separately from each other or the core CMS project.

What is platform data?

All repository data definitions pertaining to the CMS/platform web application is considered platform data. This includes but is not limited to CMS configuration, namespaces & content types, security domains, etc.

What is site data?

All repository data definitions pertaining to a particular site web application is considered site data. This includes but is not limited to HST configuration and web files.

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?