Package org.opencms.search.extractors
Class CmsExtractorMsOfficeOOXML
java.lang.Object
org.opencms.search.extractors.A_CmsTextExtractor
org.opencms.search.extractors.CmsExtractorMsOfficeOOXML
- All Implemented Interfaces:
I_CmsTextExtractor
Extracts text data from a VFS resource that is an OOXML MS Office document.
Supported formats are MS Word (.docx), MS PowerPoint (.pptx) and MS Excel (.xlsx).
The OLE 2 format was introduced in Microsoft Office version 97 and was the default format until Office version 2007 and the new XML-based OOXML format.
- Since:
- 8.0.1
-
Method Summary
Modifier and TypeMethodDescriptionExtracts the text and meta information from the document on the input stream.static I_CmsTextExtractor
Returns an instance of this text extractor.Methods inherited from class org.opencms.search.extractors.A_CmsTextExtractor
combineContentItem, extractText, extractText, extractText, extractText, removeControlChars
-
Method Details
-
getExtractor
Returns an instance of this text extractor.- Returns:
- an instance of this text extractor
-
extractText
Description copied from interface:I_CmsTextExtractor
Extracts the text and meta information from the document on the input stream.The encoding of the input stream is either not required (the document type may have one common default encoding) or the extractor is able to divine the encoding from the provided input stream automatically.
Delivers is the same result as calling
whenI_CmsTextExtractor.extractText(InputStream, String)
String == null
.- Specified by:
extractText
in interfaceI_CmsTextExtractor
- Overrides:
extractText
in classA_CmsTextExtractor
- Parameters:
in
- the input stream for the document to extract the text from- Returns:
- the extracted text and meta information
- Throws:
Exception
- if the text extration fails- See Also:
-