Package org.opencms.search.documents
Interface I_CmsSearchExtractor
- All Known Subinterfaces:
I_CmsDocumentFactory
- All Known Implementing Classes:
A_CmsVfsDocument
,CmsDocumentContainerPage
,CmsDocumentGeneric
,CmsDocumentHtml
,CmsDocumentMsOfficeOLE2
,CmsDocumentMsOfficeOOXML
,CmsDocumentOpenOffice
,CmsDocumentPdf
,CmsDocumentPlainText
,CmsDocumentRtf
,CmsDocumentXmlContent
,CmsDocumentXmlPage
,CmsSolrDocumentContainerPage
,CmsSolrDocumentXmlContent
public interface I_CmsSearchExtractor
Defines a text extractor for the integrated search engine.
The job of a search extractor is to extract indexable plain text from a resource in the OpenCms VFS. This may be from the resource content, for example from a PDF file, or from the resource properties, for example the Title, Keywords and Description properties.
- Since:
- 6.0.0
-
Method Summary
Modifier and TypeMethodDescriptionextractContent
(CmsObject cms, CmsResource resource, I_CmsSearchIndex index) Extracts the content of a given index resource according to the resource file type and the configuration of the given index.
-
Method Details
-
extractContent
I_CmsExtractionResult extractContent(CmsObject cms, CmsResource resource, I_CmsSearchIndex index) throws CmsException Extracts the content of a given index resource according to the resource file type and the configuration of the given index.- Parameters:
cms
- the cms objectresource
- the resource to extract the content fromindex
- the index to extract the content for- Returns:
- the extracted content of the resource
- Throws:
CmsException
- if something goes wrong
-