Package org.opencms.search.extractors
Interface I_CmsExtractionResult
- All Known Implementing Classes:
CmsExtractionResult
public interface I_CmsExtractionResult
The result of a document text extraction.
This data structure contains the extracted text as well as (optional) meta information extracted from the document.
- Since:
- 6.0.0
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
Key to access the document author name in the item map.static final String
Key to access the document category in the item map.static final String
Key to access the document comments in the item map.static final String
Key to access the document company name in the item map.static final String
Key for accessing the default (combined) content ingetContentItems()
.static final String
Key to access the document creator name in the item map.static final String
Key to access the document keywords in the item map.static final String
Key to access the document manager name in the item map.static final String
Key to access the document producer name in the item map.static final String
Key for accessing the raw content ingetContentItems()
.static final String
Key to access the document subject in the item map.static final String
Key to access the document title in the item map.static final String[]
All items that should be merged. -
Method Summary
Modifier and TypeMethodDescriptionbyte[]
getBytes()
Returns this extraction result serialized as a byte array.Returns the extracted content of the best fitting locale combined as a String.getContent
(Locale locale) Returns the extracted content for the given locale combined as a String.Returns the extracted content for the best fitting locale as individual items.getContentItems
(Locale locale) Returns the extracted content for a given locale as individual items.Returns the best fitting locale for the content.Returns a map from search fields to values that should be stored in that fields.Returns the locales in which the content is available.merge
(List<I_CmsExtractionResult> extractionResults) Appends, for the locales of the current collection result, the content fields from all provided extraction results to the current extraction result.void
release()
Releases the information stored in this extraction result, to free up the memory used.
-
Field Details
-
ITEM_AUTHOR
Key to access the document author name in the item map.- See Also:
-
ITEM_CATEGORY
Key to access the document category in the item map.- See Also:
-
ITEM_COMMENTS
Key to access the document comments in the item map.- See Also:
-
ITEM_COMPANY
Key to access the document company name in the item map.- See Also:
-
ITEM_CONTENT
Key for accessing the default (combined) content ingetContentItems()
.- See Also:
-
ITEM_CREATOR
Key to access the document creator name in the item map.- See Also:
-
ITEM_KEYWORDS
Key to access the document keywords in the item map.- See Also:
-
ITEM_MANAGER
Key to access the document manager name in the item map.- See Also:
-
ITEM_PRODUCER
Key to access the document producer name in the item map.- See Also:
-
ITEM_RAW
Key for accessing the raw content ingetContentItems()
.- See Also:
-
ITEM_SUBJECT
Key to access the document subject in the item map.- See Also:
-
ITEM_TITLE
Key to access the document title in the item map.- See Also:
-
ITEMS_TO_MERGE
All items that should be merged.
-
-
Method Details
-
getBytes
byte[] getBytes()Returns this extraction result serialized as a byte array.- Returns:
- this extraction result serialized as a byte array
-
getContent
Returns the extracted content of the best fitting locale combined as a String.- Returns:
- the extracted content of the best fitting locale combined as a String
-
getContent
Returns the extracted content for the given locale combined as a String.- Parameters:
locale
- the locale of the extracted content- Returns:
- the extracted content for the given locale combined as a String
-
getContentItems
Returns the extracted content for the best fitting locale as individual items.The result Map contains all content items extracted by the extractor. The key is always a String, and contains the name of the item. The value is also a String and contains the extracted text.
The detailed form will depend on the resource type indexed:
- For a
xmlpage
, the key will be the element name, and the value will be the text of the element. - For a
xmlcontent
, the key will be the xpath of the XML node, and the value will be the text of that XML node. - In case the document contains meta information (for example PDF or MS Office documents), the meta information is stored with the name of the meta field as key and the content as value.
- For all other resource types, there will be only ony key
ITEM_CONTENT
, which will contain the value of the complete content.
- Returns:
- the extracted content as individual items
- For a
-
getContentItems
Returns the extracted content for a given locale as individual items.
- Parameters:
locale
- the locale of the extracted content items- Returns:
- the extracted content for a given locale as individual items.
- See Also:
-
getDefaultLocale
Returns the best fitting locale for the content.- Returns:
- the best fitting locale for the content
-
getFieldMappings
Returns a map from search fields to values that should be stored in that fields.- Returns:
- A map from search fields to values that should be stored in that fields.
-
getLocales
Returns the locales in which the content is available.- Returns:
- the locales in which the content is available
-
merge
Appends, for the locales of the current collection result, the content fields from all provided extraction results to the current extraction result.- Parameters:
extractionResults
- the extraction results to merge- Returns:
- the merged result
-
release
void release()Releases the information stored in this extraction result, to free up the memory used.
-