Class CmsExtractionResultCache


  • public class CmsExtractionResultCache
    extends java.lang.Object
    Implements a disk cache that stores text extraction results in the RFS.

    This cache operates on resource file names, plus a hash code calculated from CmsResource.getDateLastModified() and CmsResource.getLength(). Optional a locale can be appended to this name.

    Since text extraction is done only on the content of a resource, all siblings must have the same content. The difference can be only by the locale setting in case of an XML content or XML page. However, the most problematic contents to extract for the search are in fact the MS Office and PDF formats. For these documents, all siblings must produce the exact same text extraction result.

    This cache is usable for resources from the online AND the offline project at the same time, because any change to a resource will result in a changed hash code. This means a resource changed in the offline project will have a new hash code compared to the online project. If the resource is identical in the online and the offline project, the generated hash codes will be the same.

    Since:
    6.2.0
    • Constructor Summary

      Constructors 
      Constructor Description
      CmsExtractionResultCache​(java.lang.String basepath, java.lang.String foldername)
      Creates a new disk cache.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int cleanCache​(float maxAge)
      Removes all expired extraction result cache entries from the RFS cache.
      java.lang.String getCacheName​(CmsResource resource, java.util.Locale locale, java.lang.String docTypeName)
      Returns the RFS name used for caching an the text extraction result based on the given VFS resource and locale.
      CmsExtractionResult getCacheObject​(java.lang.String rfsName)
      Returns the extraction result in the requested file in the disk cache, or null if the file is not found in the cache, or is found but out-dated.
      java.lang.String getRepositoryPath()
      Returns the absolute path of the cache repository in the RFS.
      void saveCacheObject​(java.lang.String rfsName, I_CmsExtractionResult content)
      Serializes the given extraction result and saves it in the disk cache.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • CmsExtractionResultCache

        public CmsExtractionResultCache​(java.lang.String basepath,
                                        java.lang.String foldername)
        Creates a new disk cache.

        Parameters:
        basepath - the base path for the cache in the RFS
        foldername - the folder name for this cache, to be used a subfolder for the base folder
    • Method Detail

      • cleanCache

        public int cleanCache​(float maxAge)
        Removes all expired extraction result cache entries from the RFS cache.

        Parameters:
        maxAge - the maximum age of the extraction result cache files in hours (or fractions of hours)
        Returns:
        the total number of deleted resources
      • getCacheName

        public java.lang.String getCacheName​(CmsResource resource,
                                             java.util.Locale locale,
                                             java.lang.String docTypeName)
        Returns the RFS name used for caching an the text extraction result based on the given VFS resource and locale.

        Parameters:
        resource - the VFS resource to generate the cache name for
        locale - the locale to generate the cache name for (may be null)
        docTypeName - the name of the search document type
        Returns:
        the RFS name to use for caching the given VFS resource with parameters
      • getCacheObject

        public CmsExtractionResult getCacheObject​(java.lang.String rfsName)
        Returns the extraction result in the requested file in the disk cache, or null if the file is not found in the cache, or is found but out-dated.

        Parameters:
        rfsName - the file RFS name to look up in the cache
        Returns:
        the extraction result stored in the requested file in the RFS disk cache, or null
      • getRepositoryPath

        public java.lang.String getRepositoryPath()
        Returns the absolute path of the cache repository in the RFS.

        Returns:
        the absolute path of the cache repository in the RFS
      • saveCacheObject

        public void saveCacheObject​(java.lang.String rfsName,
                                    I_CmsExtractionResult content)
                             throws java.io.IOException
        Serializes the given extraction result and saves it in the disk cache.

        Parameters:
        rfsName - the RFS name of the file to save the extraction result in
        content - the extraction result to serialize and save
        Throws:
        java.io.IOException - in case of disk access errors