Package org.opencms.search.extractors
package org.opencms.search.extractors
Contains a generic, low-level framework for extration of plain text content out of various popular file formats.
- Since:
- 6.0.0
-
ClassDescriptionBase utility class that allows extraction of the indexable "plain" text from a given document format.The result of a document text extraction.Extracts the text from an HTML document.Extracts text data from a VFS resource that is an OLE 2 MS Office document.Extracts text data from a VFS resource that is an OOXML MS Office document.Extracts the text from OpenOffice documents (.ods, .odf).Extracts the text from a PDF document.Extracts the text from a RTF document.The result of a document text extraction.Allows extraction of the indexable "plain" text plus (optional) meta information from a given binary input document format.Convenience class to access the localized messages of this OpenCms package.