Interface I_CmsSearchExtractor

All Known Subinterfaces:
I_CmsDocumentFactory
All Known Implementing Classes:
A_CmsVfsDocument, CmsDocumentContainerPage, CmsDocumentGeneric, CmsDocumentHtml, CmsDocumentMsOfficeOLE2, CmsDocumentMsOfficeOOXML, CmsDocumentOpenOffice, CmsDocumentPdf, CmsDocumentPlainText, CmsDocumentRtf, CmsDocumentXmlContent, CmsDocumentXmlPage, CmsSolrDocumentContainerPage, CmsSolrDocumentXmlContent

public interface I_CmsSearchExtractor
Defines a text extractor for the integrated search engine.

The job of a search extractor is to extract indexable plain text from a resource in the OpenCms VFS. This may be from the resource content, for example from a PDF file, or from the resource properties, for example the Title, Keywords and Description properties.

Since:
6.0.0
  • Method Details

    • extractContent

      Extracts the content of a given index resource according to the resource file type and the configuration of the given index.

      Parameters:
      cms - the cms object
      resource - the resource to extract the content from
      index - the index to extract the content for
      Returns:
      the extracted content of the resource
      Throws:
      CmsException - if something goes wrong