Reaggregating visits
Reaggregating Visits
Sometimes schema changes make it necessary to rebuild the visit store. The procedure outlined in this document is a convenient way of doing that provided that the requests the visits are based on are still available in the request log store.
Preparation
- Add an INFO level logger for com.onehippo.cms7.targeting.dataflow in order to monitor progress.
- Make sure no experiments are running. Running experiments should be completed in the channel manager. When all experiments are completed and the channels are published there should be no child nodes anymore below
+ targeting:targeting + targeting:experiments
- Stop the dataflow jobs by setting running to false on the modelTrainer and visitsAggregator nodes:
+ targeting:targeting + targeting:dataflow + modelTrainer - running = false + visitsAggregator - running = false
- Wait until both jobs log that they have been disabled. This should happen in about 10 seconds.
Recreate the Index
There are two options. One is to create a new index with a different name and leave the old one unchanged until everything works. The second is to drop the old index and then create a new one with the same name. Creating a new index with a different name is most straightforward and has two advantages:
- The old visits data is still available if something goes wrong
- The name change forces a restart of the Visit Store which triggers it to upload the correct mapping ("schema").
The name of the Elasticsearch visits index can be found in the indexName property of
+ targeting:targeting + targeting:datastores + visits
To create an index with a new name,
- Create the new index in Elasticsearch, for example:
curl -s -S -XPUT http://elastic.host:9200/newindexname
- In the console, set the indexName property to the new name and save.
To drop the existing index and create a new one with the same name,
- Drop the old index in Elasticsearch, for example:
curl -s -S -XDELETE http://elastic.host:9200/indexname
- Create the new index in Elasticsearch, for example:
curl -s -S -XPUT http://elastic.host:9200/indexname
- In the console, temporarily create a property dummy on the node /targeting:targeting/targeting:datastores/visits and save:
+ targeting:targeting + targeting:datastores + visits - dummy (string) = bla
This change triggers a proper restart of the Visit Store. - A few minutes later the dummy property can be removed again.
Restart the Data Flow Jobs
- Delete the processedUntil property on the visitsAggregator but not on the modelTrainer. Set their running properties to true:
+ targeting:targeting + targeting:dataflow + modelTrainer - running = true // processedUntil unchanged + visitsAggregator - running = true // processedUntil REMOVED
- In a few seconds, both jobs will log that they have been enabled. The VisitsAggregator should also log that is aggregating requests into visits in batches of up to 10 000 at a time.