Elasticsearch 5 Data Store
Introduction
To use the Trends panel and to see the experiment servings, visits must be stored in Elasticsearch.
Hippo DX 11.2 supports Elasticsearch version 5 (default) and (for DX 11.2.7 and higher) version 6. The Elasticsearch 2 data store has been removed. Elasticsearch 2 users must upgrade to Elasticsearch 5 or 6.
Install Elasticsearch 5
Download and install Elasticsearch 5.
Choose a Stale Data Removal Strategy
To control the data volume of the Elasticsearch index, choose one of the following two strategies:
- Use a scheduled cleanup job provided by the Relevance Module.
When choosing this strategy, entries older than a certain number of days will be automatically deleted by the relevance engine. Configure maximum age in the targeting datasource in the application context configuration (see below). - Use the Rollover Index API provided by Elasticsearch 5.
When the application connects to Elasticsearch it uploads an index template containing the mapping for the visit type. When rolling over to a new index this mapping will be automatically added by elasticsearch and the alias will be moved to the new index. Configure the template name and alias name in the targeting datasource in the application context configuration (see below).
Configure Visits Data Store
A Relevance Elasticsearch Data Store connects to its database through a JNDI data source lookup which needs to be defined on container level, e.g. Apache Tomcat.
Depending on your stale data removal strategy, add on of the following environment entries in conf/context.xml in your project.
When using the scheduled cleanup job stale data removal strategy:
<Environment name="elasticsearch/targetingDS" type="java.lang.String" value="{'indexName':'visits','maxAgeDays':'60', 'locations':['url-1','url-2]',...]}" />
When using the rollover index stale data removal strategy:
<Environment name="elasticsearch/targetingDS" type="java.lang.String" value="{'templateName':'myproject-hippo_relevance_visit', 'aliasName':'visits', 'locations':['url-1','url-2]',...]}" />
This will register a JNDI environment resource under java/comp:env/elasticsearch/targetingDS when the site web application is started. The JSON string contains the properties needed to instantiate a client that can connect to an Elasticsearch cluster.
Change ['url-1','url-2]',...] to the list of the URLs of your Elasticsearch cluster nodes. For local development, you can set locations to ['http://localhost:9200']'.
The table below lists all available JSON fields:
Field |
Type |
Default |
Description |
indexName1 |
String |
n/a |
The name of the Elasticsearch index (use with the scheduled cleanup job stale data removal strategy). |
templateName2 | String | n/a | The name of the index template (use with the rollover index stale data removal strategy). You are free to choose any name, but it is advised to use a descriptive name to prevent name collisions and confusion. |
aliasName2 | String | n/a | The name of the alias. |
locations3 |
String array |
n/a |
URL locations of nodes in the Elasticsearch cluster to connect to. One location is enough to connect to the cluster. Specifying multiple locations adds robustness for the startup process. |
username |
String |
n/a |
Optional. Username for if elasticsearch requires authenticated access. |
password |
String |
n/a |
Optional. Password for if elasticsearch requires authenticated access. |
maxConnections |
Long |
20 |
Optional. Maximum number of client threads in the connection pool that will be used to connect to Elasticsearch. |
maxAgeDays |
Long |
n/a |
Optional. Maximum number of days request logs are stored. Not relevant when using the rollover functionality. |
1 Required when using the scheduled cleanup job stale data removal strategy.
2 Required when using the rollover index stale data removal strategy.
3 Required regardless of stale data removal strategy.
Configure this JNDI environment resource for the visits store, like below default bootstrapped configuration:
/targeting:targeting/targeting:datastores/targeting:visits - targeting:storefactoryclass = com.onehippo.cms7.targeting.storage.elastic5.ElasticStoreFactory - dataSource = elasticsearch/targetingDS
Configure Elasticsearch
When using the scheduled cleanup job stale data removal strategy, create the index configured above (referred to by the indexName property) in Elasticsearch, e.g. using curl:
curl -s -S -XPUT http://localhost:9200/visits
When using the rollover stale data removal strategy, create the aliased index configured above (referred to by the aliasName property) in Elasticsearch, e.g. using curl:
curl -XPUT 'localhost:9200/%3Cvisits-%7Bnow%2Fd%7D-000001%3E' -d '{ "aliases": { "visits": {} } }'
This creates an initial index named visits-YYYY.MM.dd-000001 where YYYY.MM.dd are the current year, month and day. The alias for this index is visits. Make sure that the index you create is prefixed with the alias because the application will use ${alias}* for queries.
The index must be accessible for reading and writing to the users as configured by the authentication property. How this can be done is out of scope of this document because it depends on the deployment scenario of your Elasticsearch instance. Please consult your administrator to find out how you can create the index in your Elasticsearch instance.