HtmlCleanerService
Purpose
The HtmlCleanerService ensures that the content of an HTML field is correct XHTML, by changing the content before it is saved (as unpublished version). This is of importance if the user has added content by copy/paste or if the user edited in HTML mode. It also prevents XSS attacks by removing javascript: handlers from element attributes.
Typing in normal mode could not result in incorrect HTML, so the HtmlCleanerService will not change the content normally.
The HtmlCleanerService only preserves XHTML tags that are configured to be preserved. Non-HTML tags cannot be preserved. Tags that are not preserved are changed into the p or the div tag. If there is a mismatch in opening and closing tags, the HtmlCleanerService will remove the offending content. This occurs without warning.
Configuration
The configuration of the HtmlCleanerService is kept in the Hippo repository (together with all content). It can be changed via the Console under the node: /hippo:configuration/hippo:frontend/cms/cms-services/htmlCleanerService.
preserved tags |
Each tag to preserve should be mentioned in the list whitelist as a frontend:pluginconfig. Valid attributes should be listed in the multi-valued ' attributes' property. |
no warning |
If you add an non-HTML tag to the configuration, for example due to a typo, you will not get any error message, neither in the console, nor in the logs of the repository and CMS. |
non-HTML tags |
Non-HTML tags in the configuration have no effect on preservation. They will still not be preserved, as that would not result in correct XHTML. |
preserved attributes |
Attributes that are listed in the multi-valued ' attributes' property are preserved for the particular tag. Other attributes will be removed. |
comments | It's possible to remove comments by setting the property ' omitComments' to true |
formatting | The serialization format of the cleaner is specified by the ' serializer' property to either ' simple', ' pretty' or ' compact'. The default is 'simple'. |