Package org.opencms.search.extractors
Contains a generic, low-level framework for extration of plain text content out of various popular file formats.
- Since:
- 6.0.0
-
Interface Summary Interface Description I_CmsExtractionResult The result of a document text extraction.I_CmsTextExtractor Allows extraction of the indexable "plain" text plus (optional) meta information from a given binary input document format. -
Class Summary Class Description A_CmsTextExtractor Base utility class that allows extraction of the indexable "plain" text from a given document format.CmsExtractionResult The result of a document text extraction.CmsExtractorHtml Extracts the text from an HTML document.CmsExtractorMsOfficeOLE2 Extracts text data from a VFS resource that is an OLE 2 MS Office document.CmsExtractorMsOfficeOOXML Extracts text data from a VFS resource that is an OOXML MS Office document.CmsExtractorOpenOffice Extracts the text from OpenOffice documents (.ods, .odf).CmsExtractorPdf Extracts the text from a PDF document.CmsExtractorRtf Extracts the text from a RTF document.Messages Convenience class to access the localized messages of this OpenCms package.