Initialize and configure the Vector Store and Ingestion process

Overview

This guide walks you through the process of setting up and configuring the BrXM AI Vector Store and Ingestion Process.

The Vector Store relies on an external vector database. Currently the Redis database is supported for on-prem only projects. Locally it can be tried out by installing in a container, see more info in Redis documentation.

The Vector Store and Ingestion features are not yet available for Bloomreach Cloud implementations. 

A managed vector store for Search Agent is planned for Bloomreach Cloud in Q2 2026. Please reach out to your Account Manager for more information.

The Vector Store is designed to be extensible. Custom vector store backends can be wired in by implementing the VectorStoreFactory SPI, allowing you to integrate any VectorStore supported by Spring AI. See AI Module Extensibility Guide for details.

The Ingestion Process is a background process that runs within a cms pod, listens for wokflow events and updates the Vector Store accordingly. It can run in:

  • preview mode, where it reacts whenever a Content item is saved or deleted, correspondigly updating or removing the vectors in the store
  • live mode, where it reacts whenever a Content item is published or taken offline, again updating or removing the vectors in the store

Upon the ingestion process, the configured embedding model is asked to generate embeddings for a document, and then the Vector Store is updated with the new vectors. Generating embeddings has cost and time implications, but it has no performance implications as it happens externally, at the model provider side.

AI usage costs associated with embeddings generation are not currently visible, unless LiteLLM provider is used.
Embeddings generated by a specific model may not be usable if not consumed by same or equivalent model. When a different model is configured, embeddings may need to be re-generated.

Installation

The Vector store requires an external Redis instance running. The Ingestion process is automatically installed upon installation of the AI module.

Configure via properties files

To configure the Vector Store and Ingestion in a production ready way, use properties files in any of the locations listed below. The order of this list is important: we look for properties files in all these locations but if a property is found in more than one files, the property from the location higher in this list takes precedence.

  1. System properties passed on the command line

  2. A properties file named xm-ai-service.properties, visible in the classpath

  3. The project's platform.properties file

More information on managing properties files and System properties is available in the following documentation:
  - For Bloomreach Cloud implementations, Set Environment Configuration Properties
  - For On-premise implementations, HST-2 Container Configuration
If you are configuring via JCR configuration, use the same property names and make sure you use String properties in JCR.
If no chat/embedding model is registered, the Vector Store and Ingestion process will not initialize.

Multiplicity of configurations

Configuring via properties allows configuring multiple different model providers and vector stores, with only one however being active. 

Global Configuration options

The brxm.ai.vectorstore property is used to specify the name of the active Vector store. Possible values are:

  • Redis 
Providing an empty value for this property disables the Vector Store and the Ingestion process.

Ingestion Process Options

The Ingestion process is governed by the following properties:

Property Required Type Description Default Example

brxm.ai.ingest.mode

yes enum Ingestion operating mode   preview or live
brxm.ai.ingest.types

 

no list of doctypes Comma sepatated list of fully qualified document
types. Only content of those types will be indexed.
If not provided, no content will be ingested.
Removal from store ignores this filter
No types allowed myproject:bannerdocument, hippogallery:exampleAssetSet
brxm.ai.ingest.include-dirs no list of paths Comma sepatated list of absolute paths. Unless the document is under one of those paths, it will be skipped from ingestion. Removal from store ignores this filter Any path is considered included /content/documents/myproject/banners/, /content/documents/myproject/news/
brxm.ai.ingest.exclude-dirs no list of paths Comma sepatated list of absolute paths. If the document is under any of those paths, it will be skipped from ingestion. Removal from store ignores this filter No path is considered excluded /content/documents/myproject/taxonomies/, /content/documents/myproject/private/

brxm.ai.ingest.initial-delay

no integer (seconds) How long, in seconds, after system startup, should the Ingestion process begin 300 1000

brxm.ai.ingest.interval

no integer (seconds) How often the process runs 12 60

brxm.ai.ingest.batch-size

no integer The ingestion processes multiple documents at once and sends them all together to the Vector Store. Reduce if your Vector Store gets overloaded 5 2

brxm.ai.ingest.delay

no integer (milli-seconds) Back-off time after processing of every batch. Increase if your Vector Store gets overloaded

1000

10000

The ingestion process works by accumulating ingestion events in an in-memory queue, then processing them in batches. A batch is retried 5 times, with each attempt delayed by an interval that starts from brxm.ai.ingest.delay and increases exponentially with every failed attempt.
The Ingestion filter always excludes content items if they are not under the absolute path /content/. This is an additional, built-in filter that is always applied upon ingestions. It is not applied upon removals.
The ingestion filter always excludes content items if they are under the path /content/attic/ or /content/taxonomies/. This is an additional, built-in filter that is always applied upon ingestions. It is not applied upon removals.

Redis options

A running instance of Redis must be running and be accessible.
Property Required Type Description Default Example
brxm.ai.vectorstore.redis.host yes url The hostname of your Redis
instance
  myredis
127.0.0.1

brxm.ai.vectorstore.redis.port

yes integer The port where your Redis is listening on   6379

brxm.ai.vectorstore.redis.index

yes string The index name that is used in Redis to store all your embeddings   myindex

brxm.ai.vectorstore.redis.prefix

yes string Each entry in your index is prefixed with this string. Used for identification purposes   my_prefix

brxm.ai.vectorstore.redis.user

no string When stronger security is used, a user must be provided 4096 15000

brxm.ai.vectorstore.redis.password

no string When stronger security is used, a password must be provided    
brxm.ai.vectorstore.redis.client-name no string The name of the connecting application. Used for identification purposes.   myAppName
brxm.ai.vectorstore.redis.timeout-millis no integer Time-out, in milli-seconds, for connections towards the Redis database

latest Redis default

5000

Logging

To help with troubleshooting issues related to ingestion and processing of the ingestion queue, the following log levels can be set for <Logger name="com.bloomreach.xm.ai.service.impl.vector.ingest" level=.. />

Setting level to:

  • info, will show logs when ingestion happens and the ingestion queue being used 
  • debug, will show more detailed logs when ingestion happens, when the scheduler runs and when the ingestion queue being used
  • trace, will additionally shows logs from the event listeners in the CMS, the entry points that trigger an ingestion

 

 

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?