Class A_CmsVfsDocument
- All Implemented Interfaces:
I_CmsDocumentFactory,I_CmsSearchExtractor
- Direct Known Subclasses:
CmsDocumentContainerPage,CmsDocumentGeneric,CmsDocumentHtml,CmsDocumentMsOfficeOLE2,CmsDocumentMsOfficeOOXML,CmsDocumentOpenOffice,CmsDocumentPdf,CmsDocumentPlainText,CmsDocumentRtf,CmsDocumentXmlContent,CmsDocumentXmlPage,CmsSolrDocumentXmlContent
CmsResource,
just requires a specialized implementation of
I_CmsSearchExtractor.extractContent(CmsObject, CmsResource, I_CmsSearchIndex)
for text extraction from the binary document content.- Since:
- 6.0.0
-
Field Summary
Fields -
Constructor Summary
ConstructorsConstructorDescriptionA_CmsVfsDocument(String name) Creates a new instance of this lucene document factory. -
Method Summary
Modifier and TypeMethodDescriptioncreateDocument(CmsObject cms, CmsResource resource, I_CmsSearchIndex index) Creates the Lucene Document for the given VFS resource and the given search index.getCache()Returns the disk based cache used to store the raw extraction results.static StringgetDocumentKey(String type, String mimeType) Creates a document factory lookup key for the given resource type name / MIME type configuration.getDocumentKeys(List<String> resourceTypes, List<String> mimeTypes) Returns the list of accepted keys for the resource types that can be indexed using this document factory.getName()Returns the name of this document type factory.protected voidlogContentExtraction(CmsResource resource, I_CmsSearchIndex index) Logs content extraction for the specified resource and index.protected CmsFilereadFile(CmsObject cms, CmsResource resource) Upgrades the given resource to aCmsFilewith content.voidsetCache(CmsExtractionResultCache cache) Sets the disk based cache used to store the raw extraction results.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.opencms.search.documents.I_CmsDocumentFactory
isLocaleDependend, isOnlyDependentOnContent, isUsingCacheMethods inherited from interface org.opencms.search.documents.I_CmsSearchExtractor
extractContent
-
Field Details
-
DEFAULT_ALL_UNCONFIGURED_TYPES
Generic type name used as default for all types that are globally unconfigured. Note that any special xml content is already configured if xmlcontent is configured.- See Also:
-
DEFAULT_ALL_TYPES
Generic type name used as default for all types.- See Also:
-
m_name
Name of the document type.
-
-
Constructor Details
-
A_CmsVfsDocument
Creates a new instance of this lucene document factory.- Parameters:
name- name of the documenttype
-
-
Method Details
-
getDocumentKey
Creates a document factory lookup key for the given resource type name / MIME type configuration.If the given
mimeTypeisnull, this indicates that the key should match all VFS resource of the given resource type regardless of the MIME type.- Parameters:
type- the resource type name to usemimeType- the MIME type to use- Returns:
- a document factory lookup key for the given resource id / MIME type configuration
-
createDocument
public I_CmsSearchDocument createDocument(CmsObject cms, CmsResource resource, I_CmsSearchIndex index) throws CmsException Creates the Lucene Document for the given VFS resource and the given search index.This triggers the indexing process for the given VFS resource according to the configuration of the provided index.
The provided index resource contains the basic contents to index. The provided search index contains the configuration what to index, such as the locale and possible special field mappings.
- Specified by:
createDocumentin interfaceI_CmsDocumentFactory- Parameters:
cms- the OpenCms user context used to access the OpenCms VFSresource- the search index resource to create the Lucene document fromindex- the search index to create the Document for- Returns:
- the Search Document for the given index resource and the given search index
- Throws:
CmsException- if something goes wrong- See Also:
-
getCache
Description copied from interface:I_CmsDocumentFactoryReturns the disk based cache used to store the raw extraction results.In case
nullis returned, then result caching is not supported for this factory.- Specified by:
getCachein interfaceI_CmsDocumentFactory- Returns:
- the disk based cache used to store the raw extraction results
- See Also:
-
getDocumentKeys
public List<String> getDocumentKeys(List<String> resourceTypes, List<String> mimeTypes) throws CmsException Description copied from interface:I_CmsDocumentFactoryReturns the list of accepted keys for the resource types that can be indexed using this document factory.The result List contains String objects. This String is later matched against
getDocumentKey(String, String)to find the corrospondigI_CmsDocumentFactoryfor a resource to index.The list of accepted resource types may contain a catch-all entry "*"; in this case, a list for all possible resource types is returned, calculated by a logic depending on the document handler class.
- Specified by:
getDocumentKeysin interfaceI_CmsDocumentFactory- Parameters:
resourceTypes- list of accepted resource typesmimeTypes- list of accepted mime types- Returns:
- the list of accepted keys for the resource types that can be indexed using this document factory (String objects)
- Throws:
CmsException- if something goes wrong- See Also:
-
getName
Description copied from interface:I_CmsDocumentFactoryReturns the name of this document type factory.- Specified by:
getNamein interfaceI_CmsDocumentFactory- Returns:
- the name of this document type factory
- See Also:
-
setCache
Description copied from interface:I_CmsDocumentFactorySets the disk based cache used to store the raw extraction results.This should only be used for factories where
I_CmsDocumentFactory.isUsingCache()returnstrue.- Specified by:
setCachein interfaceI_CmsDocumentFactory- Parameters:
cache- the disk based cache used to store the raw extraction results- See Also:
-
logContentExtraction
Logs content extraction for the specified resource and index.- Parameters:
resource- the resource to log content extraction forindex- the search index to log content extraction for
-
readFile
protected CmsFile readFile(CmsObject cms, CmsResource resource) throws CmsException, CmsIndexNoContentException Upgrades the given resource to aCmsFilewith content.- Parameters:
cms- the current users OpenCms contextresource- the resource to upgrade- Returns:
- the given resource upgraded to a
CmsFilewith content - Throws:
CmsException- if the resource could not be readCmsIndexNoContentException- if the resource has no content
-