Vector Store Maintenance Groovy Scripts

Overview

The AI module ships with a helper base Updater script, on top of which simple maintenance-targeted Groovy scripts can be implemented. The base updater class is com.bloomreach.xm.ai.service.impl.vector.mgmt.VectorStoreUpdater and it exposes two major functionalities:

  • addVectorsFrom(handle)
    Will ask the configured embedding model to generate embeddings for a handle and enqueue them for ingestion in the Vector Store. The published variant is used if ingestion runs in live mode, the unpublished if in preview mode. The ingestion filters are applied (types, include-dirs and exclude-dirs)
     
  • removeVectorsFor(handle)
    Will enqueue a request for removal of the vectors for given handle. No filters are applied

The base class also exposes the filter (and queue) to subclasses for any custom processing:

protected IngestionFilter ingestionFilter;
protected IngestionQueue ingestionQueue;

Exposing methods that operate on the handle nodes makes it especially convenient for Groovy scripts since they are equipped with node iterating and searching capabilities already.

See Updater Scripts for more information

Implementations

The AI module ships with two implementations of this updater, these are bootstrapped and can be immediately found (after installing the AI module) under the Registry in the CMS Updater Editor interface.

These are example implementations that can be used either as is, or as a starting point for more advanced updaters.
They are also safe to delete, as the AI module itself does not depend on them

VectorStoreExampleIngestor
Use this script to enqueue documents for ingestion into the vector store. You can select the handles to ingest with any of the existing Groovy script node retrieval approaches

class VectorStoreExampleIngestor extends VectorStoreUpdater {

    boolean doUpdate(Node node) {
        if (node.isNodeType(NT_HANDLE)) {
            log.debug "Queuing up node ${node.path} for addition to the vector store"
            if (!visitorContext.dryRun) {
                addVectorsFrom(node)
                return true
            }
        }
        return false
    }

VectorStoreExampleCleaner
Use this script to enqueue documents for removal of their vectors from the vector store. You can select the handles to remove with any of the existing Groovy script node retrieval approaches

class VectorStoreExampleCleaner extends VectorStoreUpdater {

    boolean doUpdate(Node node) throws RepositoryException {
        if (node.isNodeType(NT_HANDLE)) {
            log.debug "Queuing up node ${node.path} for removal from vector store"
            if (!visitorContext.dryRun) {
                removeVectorsFor(node)
                return true
            }
        }
        return false
    }

Using the ingestionFilter

The base updater takes care of applying the configured (via properties/jcr) ingestion filter when you use the addVectorsFrom method. The filter is also available in subclasses (your updaters) and it exposes 3 methods:

ingestionFilter.isIndexableType.test(documentType)
ingestionFilter.isUnderIncludedDir.test(handlePath)
ingestionFilter.isUnderExcludedDir.test(handlePath)

Use Cases

A few use cases where these scripts are useful are:

  • A document type that was in the list of “enabled” types was now removed from that list
    Documents of that type are still indexed in the vector store. The VectorStoreExampleCleaner must be run (with a query that finds all the documents of the removed type)
     
  • A folder with indexed content was removed from the included-dirs list or added to the excluded-dirs
    Documents under that folder (and all subfolders) need to be removed from the store. Again, the VectorStoreExampleCleaner should be used for this.
     
  • Conversely, a folder was added to the included-dirs list or removed from the excluded-dirs
    Documents under that folder (and all subfolders) need to be added to the store. The VectorStoreExampleIngestor should be used for this.
     
  • New content was added via an import mechanism
    Use the VectorStoreExampleIngestor to fetch that content and enqueue it for addition to the vector store.
     
  • The Vector store seems to be out of sync with your Content
    It may happen that an ingestion timed out or for some other reason documents have not been indexed when they should have. Use the VectorStoreExampleIngestor to index that specific content.

 

Did you find this page helpful?
How could this documentation serve you better?
On this page
    Did you find this page helpful?
    How could this documentation serve you better?