Initialize and configure the Vector Store and Ingestion process

Overview

This guide walks you through the process of setting up and configuring the BrXM AI Vector Store and Ingestion Process.

The Vector Store relies on an external vector database. Currently, Redis and Postgres databases are supported for on-prem only projects. Both can be locally tried out by installing in a virtual container, see more info for Redis in Redis documentation and for Postgres (PgVector) in PgVector container page.

The Vector Store and Ingestion features are not yet available for Bloomreach Cloud implementations.

A managed vector store for Search Agent is planned for Bloomreach Cloud in Q2 2026. Please reach out to your Account Manager for more information.

The Ingestion Process is a background process that runs within a cms pod, listens for wokflow events and updates the Vector Store accordingly. It can run in:

preview mode, where it reacts whenever a Content item is saved or deleted, correspondigly updating or removing the vectors in the store. The unpublished variant of the content item is indexed, thus the vector store will contain preview content
live mode, where it reacts whenever a Content item is published or taken offline, again updating or removing the vectors in the store. Only the published variant is indexed, thus the vector store will only contain published content

When the Vector store is first set up, it will be initially empty. No content is automatically indexed until the ingestion process is configured. Read later in Ingestion Process Options how to start indexing content, specifcally how to use the properties brxm.ai.ingest.mode and brxm.ai.ingest.types.

The Vector Store is designed to be extensible. Custom vector store backends can be wired in by implementing the VectorStoreFactory SPI, allowing you to integrate any VectorStore supported by Spring AI. See AI Module Extensibility Guide for details.

Upon the ingestion process, the configured embedding model is asked to generate embeddings for a document, and then the Vector Store is updated with the new vectors. Generating embeddings has cost and time implications, but it has no performance implications as it happens externally, at the model provider side.

AI usage costs associated with embeddings generation are not currently visible, unless LiteLLM provider is used.

Embeddings generated as vectors of specific dimensions by a model may won't be usable if not consumed by same or equivalent (same dimensions) model. When a model of different dimensions is configured as the embedding model, existing embeddings in the vector store (having the old dimensions) need to be re-generated.

Installation

The Vector store requires an external Redis or Postgres instance running. The Ingestion process is automatically installed upon installation of the AI module.

Configure via properties files

To configure the Vector Store and Ingestion in a production ready way, use properties files in any of the locations listed below. The order of this list is important: we look for properties files in all these locations but if a property is found in more than one files, the property from the location higher in this list takes precedence.

System properties passed on the command line
A properties file named xm-ai-service.properties, visible in the classpath
The project's platform.properties file

More information on managing properties files and System properties is available in the following documentation:
- For Bloomreach Cloud implementations, Set Environment Configuration Properties
- For On-premise implementations, HST-2 Container Configuration

If you are configuring via JCR configuration, use the same property names and make sure you use String properties in JCR.

If no chat/embedding model is registered, the Vector Store and Ingestion process will not initialize.

Multiplicity of configurations

Configuring via properties allows configuring multiple different model providers and vector stores, with only one however being active.

Global Configuration options

The brxm.ai.vectorstore property is used to specify the name of the active Vector store. Possible values are:

Redis
PgVector

Providing an empty value for this property disables the Vector Store and the Ingestion process.

Ingestion Process Options

The Ingestion process is governed by the following properties:

Property	Required	Type	Description	Default	Example
brxm.ai.ingest.mode	yes	enum	Ingestion operating mode. In preview mode, unpublished content is indexed upon save/rename/copy/move operations, while in live mode only published content in indexed during publication (and scheduled publication).		preview or live
brxm.ai.ingest.types	no	list of doctypes	Comma sepatated list of fully qualified document types. Only content of those types will be indexed. If not provided, no content will be ingested. Removal from store ignores this filter	No types allowed	myproject:bannerdocument, hippogallery:exampleAssetSet
brxm.ai.ingest.include-dirs	no	list of paths	Comma sepatated list of absolute paths. Unless the document is under one of those paths, it will be skipped from ingestion. Removal from store ignores this filter	Any path is considered included	/content/documents/myproject/banners/, /content/documents/myproject/news/
brxm.ai.ingest.exclude-dirs	no	list of paths	Comma sepatated list of absolute paths. If the document is under any of those paths, it will be skipped from ingestion. Removal from store ignores this filter	No path is considered excluded	/content/documents/myproject/taxonomies/, /content/documents/myproject/private/
brxm.ai.ingest.initial-delay	no	integer (seconds)	How long, in seconds, after system startup, should the Ingestion process begin	300	1000
brxm.ai.ingest.interval	no	integer (seconds)	How often the process runs	12	60
brxm.ai.ingest.batch-size	no	integer	The ingestion processes multiple documents at once and sends them all together to the Vector Store. Reduce if your Vector Store gets overloaded	5	2
brxm.ai.ingest.delay	no	integer (milli-seconds)	Back-off time after processing of every batch. Increase if your Vector Store gets overloaded	1000	10000

The ingestion process works by accumulating ingestion events in an in-memory queue, then processing them in batches. A batch is retried 5 times, with each attempt delayed by an interval that starts from brxm.ai.ingest.delay and increases exponentially with every failed attempt.

The Ingestion filter always excludes content items if they are not under the absolute path /content/. This is an additional, built-in filter that is always applied upon ingestions. It is not applied upon removals.

The ingestion filter always excludes content items if they are under the path /content/attic/ or /content/taxonomies/. This is an additional, built-in filter that is always applied upon ingestions. It is not applied upon removals.

Redis options

A running instance of Redis must be running and be accessible. See also the official Redis Spring AI documentation

Property	Required	Type	Description	Default	Example
brxm.ai.vectorstore.redis.host	yes	url	The hostname of your Redis instance		myredis 127.0.0.1
brxm.ai.vectorstore.redis.port	yes	integer	The port where your Redis is listening on		6379
brxm.ai.vectorstore.redis.index	yes	string	The index name that is used in Redis to store all your embeddings		myindex
brxm.ai.vectorstore.redis.prefix	yes	string	Each entry in your index is prefixed with this string. Used for identification purposes		my_prefix
brxm.ai.vectorstore.redis.user	no	string	When stronger security is used, a user must be provided	4096	15000
brxm.ai.vectorstore.redis.password	no	string	When stronger security is used, a password must be provided
brxm.ai.vectorstore.redis.client-name	no	string	The name of the connecting application. Used for identification purposes.		myAppName
brxm.ai.vectorstore.redis.timeout-millis	no	integer	Time-out, in milli-seconds, for connections towards the Redis database	latest Redis default	5000

PgVector options

A running instance of Postgres with vector extension enabled must be running and be accessible. See also the official PgVector Spring AI documentation.

Property	Required	Type	Description	Default	Example
brxm.ai.vectorstore.pgvector.url	yes	url	The connection string to PgVector		jdbc:postgresql://myhost:5432/mydbname
brxm.ai.vectorstore.pgvector.username	yes	integer	The username to connect to PgVector
brxm.ai.vectorstore.pgvector.password	yes	string	The password to connect to PgVector
brxm.ai.vectorstore.pgvector.dimensions	yes	string	Embeddings dimension. Dimensions are set to the embedding column upon initial table creation. If you change the dimensions your would have to re-create the vector_store table as well.		1536
brxm.ai.vectorstore.pgvector.index-type	no	string	Nearest neighbor search index type. Options are: NONE - exact nearest neighbor search, IVFFlat - index divides vectors into lists, and then searches a subset of those lists that are closest to the query vector. HNSW - creates a multilayer graph.	HNSW
brxm.ai.vectorstore.pgvector.distance-type	no	string	Search distance type. Defaults to COSINE_DISTANCE. But if vectors are normalized to length 1, you can use EUCLIDEAN_DISTANCE or NEGATIVE_INNER_PRODUCT for best performance.	COSINE_DISTANCE
brxm.ai.vectorstore.pgvector.remove-existing-vector-store-table	no	boolean	Deletes the existing vector_store table on start up.	false	true
brxm.ai.vectorstore.pgvector.initialize-schema	no	boolean	Whether to initialize the required schema	false	true
brxm.ai.vectorstore.pgvector.schema-name	no	string	Vector store schema name	public	myschema
brxm.ai.vectorstore.pgvector.table-name	no	string	Vector store table name	vector_store	my_vector_table
brxm.ai.vectorstore.pgvector.schema-validation	no	boolean	Enables schema and table name validation to ensure they are valid and existing objects.	false	true
brxm.ai.vectorstore.pgvector.max-document-batch-size	no	integer	Maximum number of documents to process in a single batch.	10000

Set brxm.ai.vectorstore.pgvector.initialize-schema to true the first time you connect to your PgVector isntance. The schema in your PgVector needs to get created prior to using it.

Logging

To help with troubleshooting issues related to ingestion and processing of the ingestion queue, the following log levels can be set for <Logger name="com.bloomreach.xm.ai.service.impl.vector.ingest" level=.. />

Setting level to:

info, will show logs when ingestion happens and the ingestion queue being used
debug, will show more detailed logs when ingestion happens, when the scheduler runs and when the ingestion queue being used
trace, will additionally shows logs from the event listeners in the CMS, the entry points that trigger an ingestion

Did you find this page helpful?

How could this documentation serve you better?

Did you find this page helpful?

How could this documentation serve you better?

Content Application

Channels

Projects

Relevance

Architecture

Concepts

Platform Configuration

Frontend Integration

Backend Development

Commerce Accelerator

Cloud Deployment (PaaS)

On-Premise Deployment

Security

Release Management

Platform Development

Bloomreach Documentation version

Initialize and configure the Vector Store and Ingestion process

Overview

Installation

Configure via properties files

Multiplicity of configurations

Global Configuration options

Ingestion Process Options

Redis options

PgVector options

Logging

Did you find this page helpful?

On this page

Did you find this page helpful?