Interface I_CmsDocumentFactory

All Superinterfaces:
I_CmsSearchExtractor
All Known Implementing Classes:
A_CmsVfsDocument, CmsDocumentContainerPage, CmsDocumentGeneric, CmsDocumentHtml, CmsDocumentMsOfficeOLE2, CmsDocumentMsOfficeOOXML, CmsDocumentOpenOffice, CmsDocumentPdf, CmsDocumentPlainText, CmsDocumentRtf, CmsDocumentXmlContent, CmsDocumentXmlPage, CmsSolrDocumentContainerPage, CmsSolrDocumentXmlContent

Used to create index Lucene Documents for OpenCms resources, controls the text extraction algorithm used for a specific OpenCms resource type / MIME type combination.

The configuration of the search index is defined in opencms-search.xml. There you can associate a combintion of OpenCms resource types and MIME types to an instance of this factory. This rather complex configuration is required because only the combination of OpenCms resource type and MIME type can decide what to use for search indexing. For example, if the OpenCms resource type is plain, the extraction algorithm for MIME types .html and .txt must be different. On the other hand, the MIME type .html in OpenCms can be almost any resource type, like xmlpage, xmlcontent or even jsp.

Since:
6.0.0
  • Method Details

    • createDocument

      Creates the Lucene Document for the given VFS resource and the given search index.

      This triggers the indexing process for the given VFS resource according to the configuration of the provided index.

      The provided index resource contains the basic contents to index. The provided search index contains the configuration what to index, such as the locale and possible special field mappings.

      Parameters:
      cms - the OpenCms user context used to access the OpenCms VFS
      resource - the search index resource to create the Lucene document from
      index - the search index to create the Document for
      Returns:
      the Search Document for the given index resource and the given search index
      Throws:
      CmsException - if something goes wrong
      See Also:
    • getCache

      Returns the disk based cache used to store the raw extraction results.

      In case null is returned, then result caching is not supported for this factory.

      Returns:
      the disk based cache used to store the raw extraction results
    • getDocumentKeys

      List<String> getDocumentKeys(List<String> resourceTypes, List<String> mimeTypes) throws CmsException
      Returns the list of accepted keys for the resource types that can be indexed using this document factory.

      The result List contains String objects. This String is later matched against A_CmsVfsDocument.getDocumentKey(String, String) to find the corrospondig I_CmsDocumentFactory for a resource to index.

      The list of accepted resource types may contain a catch-all entry "*"; in this case, a list for all possible resource types is returned, calculated by a logic depending on the document handler class.

      Parameters:
      resourceTypes - list of accepted resource types
      mimeTypes - list of accepted mime types
      Returns:
      the list of accepted keys for the resource types that can be indexed using this document factory (String objects)
      Throws:
      CmsException - if something goes wrong
    • getName

      Returns the name of this document type factory.

      Returns:
      the name of this document type factory
    • isLocaleDependend

      Returns true if this document factory is locale depended.

      Returns:
      true if this document factory is locale depended
    • isOnlyDependentOnContent

      default boolean isOnlyDependentOnContent()
      Returns true if the extraction result dependent on the resources content itself, i.e., has not to be re-extracted if the content date is unchanged.

      Returns:
      true if the extraction result dependent on the resources content itself, i.e., has not to be re-extracted if the content date is unchanged.
    • isUsingCache

      boolean isUsingCache()
      Returns true if result caching is supported for this factory.

      Returns:
      true if result caching is supported for this factory
    • setCache

      Sets the disk based cache used to store the raw extraction results.

      This should only be used for factories where isUsingCache() returns true.

      Parameters:
      cache - the disk based cache used to store the raw extraction results