Repository Assets Performance tuning
Do not allow very large assets
If you are using MySQL, the max_allowed_packet variable of your database server may be set to a larger value, to allow for larger assets, see increase packet size. However this will reduce performance.
Storing pdf assets
When uploading a pdf via the CMS, on the hippo:resource node in the repository containing the PDF binary, we also store the extracted text of the PDF in the binary property hippo:text. The reason for this is that extracting the text for (lucene) indexing from a pdf with the help of Apache Tika is very cpu intensive since the pdf needs to be executed. Hippo Repository however only extracts the text from a pdf if there is not already a hippo:text binary. If this latter is available, the text in that property is used for indexing. The advantages of storing a hippo:text binary are:
- Only one repository cluster node needs to do the expensive pdf text extraction : The other cluster nodes use the extracted text in the hippo:text binary.
- Reindexing pdf assets is much cheaper and faster
If you upload pdf files yourself without the CMS UI interface but for example via some importer tool or rest endpoint, for best performance, make sure to also store the extracted text in a hippo:text property.