Configure Standard CMS Document Search
Some behavior of the standard document search in Hippo CMS is configurable.
Limiting matching document types
By default, only documents will be searched. The set of searched types can also be customized.
To configure the search types in the document browse tab:
-
Edit the node
/hippo:configuration/hippo:frontend/cms/cms-browser/documentsTreeLoader /cluster.config
-
Add the multi-valued string property
nodetypes
-
Each string value specifies a document type to include in the search
The search box in the images and assets tab can be configured similarly in the nodes
/hippo:configuration/hippo:frontend/cms/cms-browser/imagesTreeLoader /cluster.config
and
/hippo:configuration/hippo:frontend/cms/cms-browser/assetsTreeLoader /cluster.config
Lucene Analyzer
Each token entered in the search box is interpreted by the tokenizer used by the Lucene analyzer. By default, Hippo CMS uses the StandardAnalyzer, which uses the StandardTokenizer. This tokenizer has the following behavior:
-
Splits words at punctuation characters, removing punctuation. However, a dot that's not followed by whitespace is considered part of a token.
-
Splits words at hyphens, unless there's a number in the token, in which case the whole token is interpreted as a product number and is not split.
-
Recognizes email addresses and internet hostnames as one token.
As a result, hyphens cannot be searched for, unless the word containing a hyphen also contains a number. For example, searching for "foo-bar" is not possible, but searching for "foo-bar-5" is. However, it is possible to search for parts of a word after a hyphen. For example, searching "bar" will match documents containing "foo-bar".
Punctiation cannot be searched for either, except for dots within words. For example, searching for "foo,bar" is not possible, but searching for "foo.bar" is.
Note that it is possible to configure a different Lucene analyzer.