Package org.opencms.search.extractors
Class CmsExtractionResult
java.lang.Object
org.opencms.search.extractors.CmsExtractionResult
- All Implemented Interfaces:
Serializable
,I_CmsExtractionResult
The result of a document text extraction.
This data structure contains the extracted text as well as (optional) meta information extracted from the document.
- Since:
- 6.0.0
- See Also:
-
Field Summary
Fields inherited from interface org.opencms.search.extractors.I_CmsExtractionResult
ITEM_AUTHOR, ITEM_CATEGORY, ITEM_COMMENTS, ITEM_COMPANY, ITEM_CONTENT, ITEM_CREATOR, ITEM_KEYWORDS, ITEM_MANAGER, ITEM_PRODUCER, ITEM_RAW, ITEM_SUBJECT, ITEM_TITLE, ITEMS_TO_MERGE
-
Constructor Summary
ConstructorDescriptionCmsExtractionResult
(String content) Creates a new extraction result without meta information and without additional fields.CmsExtractionResult
(String content, LinkedHashMap<String, String> contentItems) Creates a new unilingual extraction result.CmsExtractionResult
(String content, LinkedHashMap<String, String> contentItems, Map<String, String> fieldMappings) Creates a new unilingual extraction result.CmsExtractionResult
(Locale defaultLocale, Map<Locale, LinkedHashMap<String, String>> multilingualContentItems, Map<String, String> fieldMappings) Creates a new multilingual extraction result. -
Method Summary
Modifier and TypeMethodDescriptionstatic final CmsExtractionResult
fromBytes
(byte[] bytes) Creates an extraction result from a serialized byte array.byte[]
getBytes()
Returns this extraction result serialized as a byte array.Returns the extracted content of the best fitting locale combined as a String.getContent
(Locale locale) Returns the extracted content for the given locale combined as a String.Returns the extracted content for the best fitting locale as individual items.getContentItems
(Locale locale) Returns the extracted content for a given locale as individual items.Returns the best fitting locale for the content.Returns a map from search fields to values that should be stored in that fields.Returns the locales in which the content is available.merge
(List<I_CmsExtractionResult> extractionResults) Appends, for the locales of the current collection result, the content fields from all provided extraction results to the current extraction result.void
release()
Releases the information stored in this extraction result, to free up the memory used.
-
Constructor Details
-
CmsExtractionResult
public CmsExtractionResult(Locale defaultLocale, Map<Locale, LinkedHashMap<String, String>> multilingualContentItems, Map<String, String> fieldMappings) Creates a new multilingual extraction result.- Parameters:
defaultLocale
- the default (best fitting) locale of the result.multilingualContentItems
- the content items for the different localesfieldMappings
- special mappings to search fields with values extracted from the content
-
CmsExtractionResult
Creates a new extraction result without meta information and without additional fields.- Parameters:
content
- the extracted content
-
CmsExtractionResult
Creates a new unilingual extraction result.- Parameters:
content
- the extracted contentcontentItems
- the individual extracted content items
-
CmsExtractionResult
public CmsExtractionResult(String content, LinkedHashMap<String, String> contentItems, Map<String, String> fieldMappings) Creates a new unilingual extraction result.- Parameters:
content
- the extracted contentcontentItems
- the individual extracted content itemsfieldMappings
- extraction results that should directly be indexed
-
-
Method Details
-
fromBytes
Creates an extraction result from a serialized byte array.- Parameters:
bytes
- the serialized version of the extraction result- Returns:
- extraction result created from the serialized byte array
-
getBytes
Description copied from interface:I_CmsExtractionResult
Returns this extraction result serialized as a byte array.- Specified by:
getBytes
in interfaceI_CmsExtractionResult
- Returns:
- this extraction result serialized as a byte array
- See Also:
-
getContent
Description copied from interface:I_CmsExtractionResult
Returns the extracted content of the best fitting locale combined as a String.- Specified by:
getContent
in interfaceI_CmsExtractionResult
- Returns:
- the extracted content of the best fitting locale combined as a String
- See Also:
-
getContent
Description copied from interface:I_CmsExtractionResult
Returns the extracted content for the given locale combined as a String.- Specified by:
getContent
in interfaceI_CmsExtractionResult
- Parameters:
locale
- the locale of the extracted content- Returns:
- the extracted content for the given locale combined as a String
- See Also:
-
getContentItems
Description copied from interface:I_CmsExtractionResult
Returns the extracted content for the best fitting locale as individual items.The result Map contains all content items extracted by the extractor. The key is always a String, and contains the name of the item. The value is also a String and contains the extracted text.
The detailed form will depend on the resource type indexed:
- For a
xmlpage
, the key will be the element name, and the value will be the text of the element. - For a
xmlcontent
, the key will be the xpath of the XML node, and the value will be the text of that XML node. - In case the document contains meta information (for example PDF or MS Office documents), the meta information is stored with the name of the meta field as key and the content as value.
- For all other resource types, there will be only ony key
I_CmsExtractionResult.ITEM_CONTENT
, which will contain the value of the complete content.
- Specified by:
getContentItems
in interfaceI_CmsExtractionResult
- Returns:
- the extracted content as individual items
- See Also:
- For a
-
getContentItems
Description copied from interface:I_CmsExtractionResult
Returns the extracted content for a given locale as individual items.
- Specified by:
getContentItems
in interfaceI_CmsExtractionResult
- Parameters:
locale
- the locale of the extracted content items- Returns:
- the extracted content for a given locale as individual items.
- See Also:
-
getDefaultLocale
Description copied from interface:I_CmsExtractionResult
Returns the best fitting locale for the content.- Specified by:
getDefaultLocale
in interfaceI_CmsExtractionResult
- Returns:
- the best fitting locale for the content
- See Also:
-
getFieldMappings
Description copied from interface:I_CmsExtractionResult
Returns a map from search fields to values that should be stored in that fields.- Specified by:
getFieldMappings
in interfaceI_CmsExtractionResult
- Returns:
- A map from search fields to values that should be stored in that fields.
- See Also:
-
getLocales
Description copied from interface:I_CmsExtractionResult
Returns the locales in which the content is available.- Specified by:
getLocales
in interfaceI_CmsExtractionResult
- Returns:
- the locales in which the content is available
- See Also:
-
merge
Description copied from interface:I_CmsExtractionResult
Appends, for the locales of the current collection result, the content fields from all provided extraction results to the current extraction result.- Specified by:
merge
in interfaceI_CmsExtractionResult
- Parameters:
extractionResults
- the extraction results to merge- Returns:
- the merged result
- See Also:
-
release
Description copied from interface:I_CmsExtractionResult
Releases the information stored in this extraction result, to free up the memory used.- Specified by:
release
in interfaceI_CmsExtractionResult
- See Also:
-