Class CmsExtractionResultCache

java.lang.Object
org.opencms.search.documents.CmsExtractionResultCache

public class CmsExtractionResultCache extends Object
Implements a disk cache that stores text extraction results in the RFS.

This cache operates on resource file names, plus a hash code calculated from CmsResource.getDateLastModified() and CmsResource.getLength(). Optional a locale can be appended to this name.

Since text extraction is done only on the content of a resource, all siblings must have the same content. The difference can be only by the locale setting in case of an XML content or XML page. However, the most problematic contents to extract for the search are in fact the MS Office and PDF formats. For these documents, all siblings must produce the exact same text extraction result.

This cache is usable for resources from the online AND the offline project at the same time, because any change to a resource will result in a changed hash code. This means a resource changed in the offline project will have a new hash code compared to the online project. If the resource is identical in the online and the offline project, the generated hash codes will be the same.

Since:
6.2.0
  • Constructor Details

    • CmsExtractionResultCache

      public CmsExtractionResultCache(String basepath, String foldername)
      Creates a new disk cache.

      Parameters:
      basepath - the base path for the cache in the RFS
      foldername - the folder name for this cache, to be used a subfolder for the base folder
  • Method Details

    • cleanCache

      public int cleanCache(float maxAge)
      Removes all expired extraction result cache entries from the RFS cache.

      Parameters:
      maxAge - the maximum age of the extraction result cache files in hours (or fractions of hours)
      Returns:
      the total number of deleted resources
    • getCacheName

      public String getCacheName(CmsResource resource, Locale locale, String docTypeName)
      Returns the RFS name used for caching an the text extraction result based on the given VFS resource and locale.

      Parameters:
      resource - the VFS resource to generate the cache name for
      locale - the locale to generate the cache name for (may be null)
      docTypeName - the name of the search document type
      Returns:
      the RFS name to use for caching the given VFS resource with parameters
    • getCacheObject

      Returns the extraction result in the requested file in the disk cache, or null if the file is not found in the cache, or is found but out-dated.

      Parameters:
      rfsName - the file RFS name to look up in the cache
      Returns:
      the extraction result stored in the requested file in the RFS disk cache, or null
    • getRepositoryPath

      Returns the absolute path of the cache repository in the RFS.

      Returns:
      the absolute path of the cache repository in the RFS
    • saveCacheObject

      public void saveCacheObject(String rfsName, I_CmsExtractionResult content) throws IOException
      Serializes the given extraction result and saves it in the disk cache.

      Parameters:
      rfsName - the RFS name of the file to save the extraction result in
      content - the extraction result to serialize and save
      Throws:
      IOException - in case of disk access errors