Initialize and configure the Vector Store and Ingestion process
Overview
This guide walks you through the process of setting up and configuring the BrXM AI Vector Store and Ingestion Process.
The Vector Store relies on an external vector database. Currently the Redis database is supported for on-prem only projects. Locally it can be tried out by installing in a container, see more info in Redis documentation.
A managed vector store for Search Agent is planned for Bloomreach Cloud in Q2 2026. Please reach out to your Account Manager for more information.
The Ingestion Process is a background process that runs within a cms pod, listens for wokflow events and updates the Vector Store accordingly. It can run in:
- preview mode, where it reacts whenever a Content item is saved or deleted, correspondigly updating or removing the vectors in the store
- live mode, where it reacts whenever a Content item is published or taken offline, again updating or removing the vectors in the store
Upon the ingestion process, the configured embedding model is asked to generate embeddings for a document, and then the Vector Store is updated with the new vectors. Generating embeddings has cost and time implications, but it has no performance implications as it happens externally, at the model provider side.
Installation
The Vector store requires an external Redis instance running. The Ingestion process is automatically installed upon installation of the AI module.
Configure via properties files
To configure the Vector Store and Ingestion in a production ready way, use properties files in any of the locations listed below. The order of this list is important: we look for properties files in all these locations but if a property is found in more than one files, the property from the location higher in this list takes precedence.
-
System properties passed on the command line
-
A properties file named xm-ai-service.properties, visible in the classpath
-
The project's platform.properties file
- For Bloomreach Cloud implementations, Set Environment Configuration Properties
- For On-premise implementations, HST-2 Container Configuration
Multiplicity of configurations
Configuring via properties allows configuring multiple different model providers and vector stores, with only one however being active.
Global Configuration options
The brxm.ai.vectorstore property is used to specify the name of the active Vector store. Possible values are:
- Redis
Ingestion Process Options
The Ingestion process is governed by the following properties:
| Property | Required | Type | Description | Default | Example |
|
brxm.ai.ingest.mode |
yes | enum | Ingestion operating mode | preview or live | |
| brxm.ai.ingest.types
|
no | list of doctypes | Comma sepatated list of fully qualified document types. Only content of those types will be indexed. If not provided, no content will be ingested. Removal from store ignores this filter |
No types allowed | myproject:bannerdocument, hippogallery:exampleAssetSet |
| brxm.ai.ingest.include-dirs | no | list of paths | Comma sepatated list of absolute paths. Unless the document is under one of those paths, it will be skipped from ingestion. Removal from store ignores this filter | Any path is considered included | /content/documents/myproject/banners/, /content/documents/myproject/news/ |
| brxm.ai.ingest.exclude-dirs | no | list of paths | Comma sepatated list of absolute paths. If the document is under any of those paths, it will be skipped from ingestion. Removal from store ignores this filter | No path is considered excluded | /content/documents/myproject/taxonomies/, /content/documents/myproject/private/ |
|
brxm.ai.ingest.initial-delay |
no | integer (seconds) | How long, in seconds, after system startup, should the Ingestion process begin | 300 | 1000 |
|
brxm.ai.ingest.interval |
no | integer (seconds) | How often the process runs | 12 | 60 |
|
brxm.ai.ingest.batch-size |
no | integer | The ingestion processes multiple documents at once and sends them all together to the Vector Store. Reduce if your Vector Store gets overloaded | 5 | 2 |
|
brxm.ai.ingest.delay |
no | integer (milli-seconds) | Back-off time after processing of every batch. Increase if your Vector Store gets overloaded |
1000 |
10000 |
Redis options
| Property | Required | Type | Description | Default | Example |
| brxm.ai.vectorstore.redis.host | yes | url | The hostname of your Redis instance |
myredis 127.0.0.1 |
|
|
brxm.ai.vectorstore.redis.port |
yes | integer | The port where your Redis is listening on | 6379 | |
|
brxm.ai.vectorstore.redis.index |
yes | string | The index name that is used in Redis to store all your embeddings | myindex | |
|
brxm.ai.vectorstore.redis.prefix |
yes | string | Each entry in your index is prefixed with this string. Used for identification purposes | my_prefix | |
|
brxm.ai.vectorstore.redis.user |
no | string | When stronger security is used, a user must be provided | 4096 | 15000 |
|
brxm.ai.vectorstore.redis.password |
no | string | When stronger security is used, a password must be provided | ||
| brxm.ai.vectorstore.redis.client-name | no | string | The name of the connecting application. Used for identification purposes. | myAppName | |
| brxm.ai.vectorstore.redis.timeout-millis | no | integer | Time-out, in milli-seconds, for connections towards the Redis database |
latest Redis default |
5000 |
Logging
To help with troubleshooting issues related to ingestion and processing of the ingestion queue, the following log levels can be set for <Logger name="com.bloomreach.xm.ai.service.impl.vector.ingest" level=.. />
Setting level to:
- info, will show logs when ingestion happens and the ingestion queue being used
- debug, will show more detailed logs when ingestion happens, when the scheduler runs and when the ingestion queue being used
- trace, will additionally shows logs from the event listeners in the CMS, the entry points that trigger an ingestion