HTML cleaning
The contents of HTML fields can be cleaned both on client-side and the server-side.
Client-side
Client-side HTML cleaning is done by CKEditor itself. This feature is called Advanced Content Filter (ACF). Each plugin and command added to or removed from CKEditor influences the allowed HTML. For example, when there is no plugin to add an image, <img> tags will be removed automatically. This filtering also applies to attributes, which can for instance be allowed or required. ACF can also be controlled per editor instance via the configuration property ' extraAllowedContent'.
More information on ACF and how to configure it can be found at the CKEditor documentation website.
Note that allowed content rules may be set in two different formats: the compact string format and the more powerful object format.
Disable client-side HTML cleaning
ACF is enabled by default. To disable ACF, set the CKEditor property ' allowedContent' to 'true':
ckeditor.config.overlayed.json:
{ allowedContent: true }
Server-side
Server-side HTML cleaning is done by an HTML cleaner service. A CKEditor field uses the HTML cleaner service with the ID in the configuration property ' htmlcleaner.id'.
By default the HTML Cleaner is used. The HTML Cleaner checks, cleans and corrects the output of rich-text fields. The configuration of the HTML Cleaner works on the basis of a white list that defines which elements are allowed and which attributes they may contain. If an element, attribute is not configured as allowed, it is stripped from the output (text nodes from elements are preserved).
Server-side HTML cleaning also removes any usage of the javascript: protocol and, as of Bloomreach Experience Manager 11.2.5, the data: protocol within <a> href and <object> data attributes.
Configuration
The configuration is located at
/hippo:configuration/hippo:frontend/cms/cms-services/filteringHtmlCleanerService
The properties of this node are:
- charset: the character set of the output. Defaults to UTF-8.
- serializer: the type of serializer to use. Valid values are pretty, compact, and simple. Defaults to simple.
- service.id: the ID of the HTML cleaner service. Defaults to org.hippoecm.frontend.plugins.richtext.DefaultHtmlCleanerService.
- omitComments: whether to strip comments from the html. Defaults to false.
- filter: whether to apply whitelist filtering. Defaults to true.
A child node called whitelist contains a list of nodes that define whitelisted HTML elements. The name of such a node corresponds with the element name to allow. These white list element nodes may contain a multivalued property called attributes to list the allowed HTML attributes on the element.
Disable server-side HTML cleaning
Change the configuration property ' htmlcleaner.id' to an empty string, or remove it altogether.