Class A_CmsVfsDocument
- All Implemented Interfaces:
I_CmsDocumentFactory
,I_CmsSearchExtractor
- Direct Known Subclasses:
CmsDocumentContainerPage
,CmsDocumentGeneric
,CmsDocumentHtml
,CmsDocumentMsOfficeOLE2
,CmsDocumentMsOfficeOOXML
,CmsDocumentOpenOffice
,CmsDocumentPdf
,CmsDocumentPlainText
,CmsDocumentRtf
,CmsDocumentXmlContent
,CmsDocumentXmlPage
,CmsSolrDocumentXmlContent
CmsResource
,
just requires a specialized implementation of
I_CmsSearchExtractor.extractContent(CmsObject, CmsResource, I_CmsSearchIndex)
for text extraction from the binary document content.- Since:
- 6.0.0
-
Field Summary
-
Constructor Summary
ConstructorDescriptionA_CmsVfsDocument
(String name) Creates a new instance of this lucene document factory. -
Method Summary
Modifier and TypeMethodDescriptioncreateDocument
(CmsObject cms, CmsResource resource, I_CmsSearchIndex index) Creates the Lucene Document for the given VFS resource and the given search index.getCache()
Returns the disk based cache used to store the raw extraction results.static String
getDocumentKey
(String type, String mimeType) Creates a document factory lookup key for the given resource type name / MIME type configuration.getDocumentKeys
(List<String> resourceTypes, List<String> mimeTypes) Returns the list of accepted keys for the resource types that can be indexed using this document factory.getName()
Returns the name of this document type factory.protected void
logContentExtraction
(CmsResource resource, I_CmsSearchIndex index) Logs content extraction for the specified resource and index.protected CmsFile
readFile
(CmsObject cms, CmsResource resource) Upgrades the given resource to aCmsFile
with content.void
setCache
(CmsExtractionResultCache cache) Sets the disk based cache used to store the raw extraction results.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.opencms.search.documents.I_CmsDocumentFactory
isLocaleDependend, isOnlyDependentOnContent, isUsingCache
Methods inherited from interface org.opencms.search.documents.I_CmsSearchExtractor
extractContent
-
Field Details
-
DEFAULT_ALL_UNCONFIGURED_TYPES
Generic type name used as default for all types that are globally unconfigured. Note that any special xml content is already configured if xmlcontent is configured.- See Also:
-
DEFAULT_ALL_TYPES
Generic type name used as default for all types.- See Also:
-
m_name
Name of the document type.
-
-
Constructor Details
-
A_CmsVfsDocument
Creates a new instance of this lucene document factory.- Parameters:
name
- name of the documenttype
-
-
Method Details
-
getDocumentKey
Creates a document factory lookup key for the given resource type name / MIME type configuration.If the given
mimeType
isnull
, this indicates that the key should match all VFS resource of the given resource type regardless of the MIME type.- Parameters:
type
- the resource type name to usemimeType
- the MIME type to use- Returns:
- a document factory lookup key for the given resource id / MIME type configuration
-
createDocument
public I_CmsSearchDocument createDocument(CmsObject cms, CmsResource resource, I_CmsSearchIndex index) throws CmsException Creates the Lucene Document for the given VFS resource and the given search index.This triggers the indexing process for the given VFS resource according to the configuration of the provided index.
The provided index resource contains the basic contents to index. The provided search index contains the configuration what to index, such as the locale and possible special field mappings.
- Specified by:
createDocument
in interfaceI_CmsDocumentFactory
- Parameters:
cms
- the OpenCms user context used to access the OpenCms VFSresource
- the search index resource to create the Lucene document fromindex
- the search index to create the Document for- Returns:
- the Search Document for the given index resource and the given search index
- Throws:
CmsException
- if something goes wrong- See Also:
-
getCache
Description copied from interface:I_CmsDocumentFactory
Returns the disk based cache used to store the raw extraction results.In case
null
is returned, then result caching is not supported for this factory.- Specified by:
getCache
in interfaceI_CmsDocumentFactory
- Returns:
- the disk based cache used to store the raw extraction results
- See Also:
-
getDocumentKeys
public List<String> getDocumentKeys(List<String> resourceTypes, List<String> mimeTypes) throws CmsException Description copied from interface:I_CmsDocumentFactory
Returns the list of accepted keys for the resource types that can be indexed using this document factory.The result List contains String objects. This String is later matched against
getDocumentKey(String, String)
to find the corrospondigI_CmsDocumentFactory
for a resource to index.The list of accepted resource types may contain a catch-all entry "*"; in this case, a list for all possible resource types is returned, calculated by a logic depending on the document handler class.
- Specified by:
getDocumentKeys
in interfaceI_CmsDocumentFactory
- Parameters:
resourceTypes
- list of accepted resource typesmimeTypes
- list of accepted mime types- Returns:
- the list of accepted keys for the resource types that can be indexed using this document factory (String objects)
- Throws:
CmsException
- if something goes wrong- See Also:
-
getName
Description copied from interface:I_CmsDocumentFactory
Returns the name of this document type factory.- Specified by:
getName
in interfaceI_CmsDocumentFactory
- Returns:
- the name of this document type factory
- See Also:
-
setCache
Description copied from interface:I_CmsDocumentFactory
Sets the disk based cache used to store the raw extraction results.This should only be used for factories where
I_CmsDocumentFactory.isUsingCache()
returnstrue
.- Specified by:
setCache
in interfaceI_CmsDocumentFactory
- Parameters:
cache
- the disk based cache used to store the raw extraction results- See Also:
-
logContentExtraction
Logs content extraction for the specified resource and index.- Parameters:
resource
- the resource to log content extraction forindex
- the search index to log content extraction for
-
readFile
protected CmsFile readFile(CmsObject cms, CmsResource resource) throws CmsException, CmsIndexNoContentException Upgrades the given resource to aCmsFile
with content.- Parameters:
cms
- the current users OpenCms contextresource
- the resource to upgrade- Returns:
- the given resource upgraded to a
CmsFile
with content - Throws:
CmsException
- if the resource could not be readCmsIndexNoContentException
- if the resource has no content
-