Vector Store Maintenance Groovy Scripts
Overview
The AI module ships with a helper base Updater script, on top of which simple maintenance-targeted Groovy scripts can be implemented. The base updater class is com.bloomreach.xm.ai.service.impl.vector.mgmt.VectorStoreUpdater and it exposes two major functionalities:
- addVectorsFrom(handle)
Will ask the configured embedding model to generate embeddings for a handle and enqueue them for ingestion in the Vector Store. The published variant is used if ingestion runs in live mode, the unpublished if in preview mode. The ingestion filters are applied (types, include-dirs and exclude-dirs)
- removeVectorsFor(handle)
Will enqueue a request for removal of the vectors for given handle. No filters are applied
The base class also exposes the filter (and queue) to subclasses for any custom processing:
protected IngestionFilter ingestionFilter; protected IngestionQueue ingestionQueue;
Exposing methods that operate on the handle nodes makes it especially convenient for Groovy scripts since they are equipped with node iterating and searching capabilities already.
Implementations
The AI module ships with two implementations of this updater, these are bootstrapped and can be immediately found (after installing the AI module) under the Registry in the CMS Updater Editor interface.
VectorStoreExampleIngestor
Use this script to enqueue documents for ingestion into the vector store. You can select the handles to ingest with any of the existing Groovy script node retrieval approaches
class VectorStoreExampleIngestor extends VectorStoreUpdater {
boolean doUpdate(Node node) {
if (node.isNodeType(NT_HANDLE)) {
log.debug "Queuing up node ${node.path} for addition to the vector store"
if (!visitorContext.dryRun) {
addVectorsFrom(node)
return true
}
}
return false
}
VectorStoreExampleCleaner
Use this script to enqueue documents for removal of their vectors from the vector store. You can select the handles to remove with any of the existing Groovy script node retrieval approaches
class VectorStoreExampleCleaner extends VectorStoreUpdater {
boolean doUpdate(Node node) throws RepositoryException {
if (node.isNodeType(NT_HANDLE)) {
log.debug "Queuing up node ${node.path} for removal from vector store"
if (!visitorContext.dryRun) {
removeVectorsFor(node)
return true
}
}
return false
}
Using the ingestionFilter
The base updater takes care of applying the configured (via properties/jcr) ingestion filter when you use the addVectorsFrom method. The filter is also available in subclasses (your updaters) and it exposes 3 methods:
ingestionFilter.isIndexableType.test(documentType) ingestionFilter.isUnderIncludedDir.test(handlePath) ingestionFilter.isUnderExcludedDir.test(handlePath)
Use Cases
A few use cases where these scripts are useful are:
- A document type that was in the list of “enabled” types was now removed from that list
Documents of that type are still indexed in the vector store. The VectorStoreExampleCleaner must be run (with a query that finds all the documents of the removed type)
- A folder with indexed content was removed from the included-dirs list or added to the excluded-dirs
Documents under that folder (and all subfolders) need to be removed from the store. Again, the VectorStoreExampleCleaner should be used for this.
- Conversely, a folder was added to the included-dirs list or removed from the excluded-dirs
Documents under that folder (and all subfolders) need to be added to the store. The VectorStoreExampleIngestor should be used for this.
- New content was added via an import mechanism
Use the VectorStoreExampleIngestor to fetch that content and enqueue it for addition to the vector store.
- The Vector store seems to be out of sync with your Content
It may happen that an ingestion timed out or for some other reason documents have not been indexed when they should have. Use the VectorStoreExampleIngestor to index that specific content.