HTML cleaning
The contents of HTML fields can be cleaned both on client-side and the server-side.
Client-side
Client-side HTML cleaning is done by CKEditor itself. This feature is called Advanced Content Filter (ACF). Each plugin and command added to or removed from CKEditor influences the allowed HTML. For example, when there is no plugin to add an image, <img> tags will be removed automatically. This filtering also applies to attributes, which can, for instance, be allowed or required.
ACF can also be controlled per editor instance via the configuration property extraAllowedContent. Note that since Bloomreach Experience Manager 12, extraAllowedContent must be specified in JSON object format. For example:
{ extraAllowedContent: {q: {}, cite: {classes: 'myclass'}} }
More information on ACF and how to configure it can be found at the CKEditor documentation website.
Disable client-side HTML cleaning
ACF is enabled by default. To disable ACF, set the CKEditor property allowedContent to true:
ckeditor.config.overlayed.json:
{ allowedContent: true }
Server-side
Server-side HTML cleaning is done by an HTML-processor. The HTML-processor checks, cleans, and corrects the output of rich-text fields, as well as management of internal links and images. The configuration of the HTML-processor works on the basis of an allowlist that defines which elements are allowed and the attributes they may contain. If an attribute is not configured as allowed, it is stripped from the output (text nodes from elements are preserved).
By default, server-side HTML cleaning also removes any usage of the javascript: protocol and, as of Bloomreach Experience Manager 12.0.4/12.1.1, the data: protocol within <a> href and <object> data attributes. This security feature can be disabled by setting the omitJavascriptProtocol configuration property to false (see next paragraph).
Configuration
A CKEditor field is configured with an HTML-processor by setting the configuration property htmlprocessor.id. This property can either be specified in the cluster.options node of a field of a specific document-type, or globally (i.e. for all formatted and/or richtext fields). The value of this property should correspond to the name of the HTML-processor configuration node as defined in the HTML-processor module, which is located at:
/hippo:configuration/hippo:modules/htmlprocessor/hippo:moduleconfig
By default, the CMS is bootstrapped with the following HTML-processor configurations:
- formatted: contains an allowlist of elements used in Formatted fields.
- richtext: contains an allowlist of elements used in Rich Text fields and manages internal links and images.
- no-filter: contains an empty allowlist but does manage internal links and images when applied to Rich Text fields.
The configuration node of an HTML-processor is of nodetype hipposys:moduleconfig and has the following properties available:
- charset: the character set of the output. Defaults to UTF-8.
- serializer: the type of serializer to use. Valid values are pretty, compact, and simple. Defaults to simple.
- convertLineEndings: whether to convert CRLF to LF when storing html, and vice-versa when reading HTML. Defaults to true.
- omitComments: whether to strip comments from the html. Defaults to false.
- omitJavascriptProtocol: whether javascript statements are removed from the html. Defaults to true.
- filter: whether to apply allowlist filtering. Defaults to true.
Allowed HTML elements are defined as childnodes and are of nodetype hipposys:moduleconfig. The name of such a node corresponds with the allowed element name. These element nodes may contain a multi-valued property called attributes to list the HTML attributes allowed on the element.
Disable server-side HTML cleaning
Change the configuration property htmlprocessor.id to no-filter.