Package org.opencms.search.documents
Class CmsDocumentMsOfficeOOXML
java.lang.Object
org.opencms.search.documents.A_CmsVfsDocument
org.opencms.search.documents.CmsDocumentMsOfficeOOXML
- All Implemented Interfaces:
I_CmsDocumentFactory
,I_CmsSearchExtractor
Lucene document factory class to extract text data from a VFS resource that is an OOXML MS Office document.
Supported formats are MS Word (.docx), MS PowerPoint (.pptx) and MS Excel (.xlsx).
The OLE 2 format was introduced in Microsoft Office version 97 and was the default format until Office version 2007 and the new XML-based OOXML format.
- Since:
- 8.0.1
-
Field Summary
Fields inherited from class org.opencms.search.documents.A_CmsVfsDocument
DEFAULT_ALL_TYPES, DEFAULT_ALL_UNCONFIGURED_TYPES, m_name
-
Constructor Summary
ConstructorDescriptionCreates a new instance of this lucene document factory. -
Method Summary
Modifier and TypeMethodDescriptionextractContent
(CmsObject cms, CmsResource resource, I_CmsSearchIndex index) Returns the raw text content of a given vfs resource containing MS Word data.boolean
Returnstrue
if this document factory is locale depended.boolean
Returnstrue
if result caching is supported for this factory.Methods inherited from class org.opencms.search.documents.A_CmsVfsDocument
createDocument, getCache, getDocumentKey, getDocumentKeys, getName, logContentExtraction, readFile, setCache
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.opencms.search.documents.I_CmsDocumentFactory
isOnlyDependentOnContent
-
Constructor Details
-
CmsDocumentMsOfficeOOXML
Creates a new instance of this lucene document factory.- Parameters:
name
- name of the document type
-
-
Method Details
-
extractContent
public I_CmsExtractionResult extractContent(CmsObject cms, CmsResource resource, I_CmsSearchIndex index) throws CmsIndexException, CmsException Returns the raw text content of a given vfs resource containing MS Word data.- Parameters:
cms
- the cms objectresource
- the resource to extract the content fromindex
- the index to extract the content for- Returns:
- the extracted content of the resource
- Throws:
CmsException
- if something goes wrongCmsIndexException
- See Also:
-
isLocaleDependend
Description copied from interface:I_CmsDocumentFactory
Returnstrue
if this document factory is locale depended.- Returns:
true
if this document factory is locale depended- See Also:
-
isUsingCache
Description copied from interface:I_CmsDocumentFactory
Returnstrue
if result caching is supported for this factory.- Returns:
true
if result caching is supported for this factory- See Also:
-