Package org.opencms.search.extractors
Interface I_CmsExtractionResult
-
- All Known Implementing Classes:
CmsExtractionResult
public interface I_CmsExtractionResult
The result of a document text extraction.This data structure contains the extracted text as well as (optional) meta information extracted from the document.
- Since:
- 6.0.0
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
ITEM_AUTHOR
Key to access the document author name in the item map.static java.lang.String
ITEM_CATEGORY
Key to access the document category in the item map.static java.lang.String
ITEM_COMMENTS
Key to access the document comments in the item map.static java.lang.String
ITEM_COMPANY
Key to access the document company name in the item map.static java.lang.String
ITEM_CONTENT
Key for accessing the default (combined) content ingetContentItems()
.static java.lang.String
ITEM_CREATOR
Key to access the document creator name in the item map.static java.lang.String
ITEM_KEYWORDS
Key to access the document keywords in the item map.static java.lang.String
ITEM_MANAGER
Key to access the document manager name in the item map.static java.lang.String
ITEM_PRODUCER
Key to access the document producer name in the item map.static java.lang.String
ITEM_RAW
Key for accessing the raw content ingetContentItems()
.static java.lang.String
ITEM_SUBJECT
Key to access the document subject in the item map.static java.lang.String
ITEM_TITLE
Key to access the document title in the item map.static java.lang.String[]
ITEMS_TO_MERGE
All items that should be merged.
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description byte[]
getBytes()
Returns this extraction result serialized as a byte array.java.lang.String
getContent()
Returns the extracted content of the best fitting locale combined as a String.java.lang.String
getContent(java.util.Locale locale)
Returns the extracted content for the given locale combined as a String.java.util.LinkedHashMap<java.lang.String,java.lang.String>
getContentItems()
Returns the extracted content for the best fitting locale as individual items.java.util.LinkedHashMap<java.lang.String,java.lang.String>
getContentItems(java.util.Locale locale)
Returns the extracted content for a given locale as individual items.java.util.Locale
getDefaultLocale()
Returns the best fitting locale for the content.java.util.Map<java.lang.String,java.lang.String>
getFieldMappings()
Returns a map from search fields to values that should be stored in that fields.java.util.Collection<java.util.Locale>
getLocales()
Returns the locales in which the content is available.I_CmsExtractionResult
merge(java.util.List<I_CmsExtractionResult> extractionResults)
Appends, for the locales of the current collection result, the content fields from all provided extraction results to the current extraction result.void
release()
Releases the information stored in this extraction result, to free up the memory used.
-
-
-
Field Detail
-
ITEM_AUTHOR
static final java.lang.String ITEM_AUTHOR
Key to access the document author name in the item map.- See Also:
- Constant Field Values
-
ITEM_CATEGORY
static final java.lang.String ITEM_CATEGORY
Key to access the document category in the item map.- See Also:
- Constant Field Values
-
ITEM_COMMENTS
static final java.lang.String ITEM_COMMENTS
Key to access the document comments in the item map.- See Also:
- Constant Field Values
-
ITEM_COMPANY
static final java.lang.String ITEM_COMPANY
Key to access the document company name in the item map.- See Also:
- Constant Field Values
-
ITEM_CONTENT
static final java.lang.String ITEM_CONTENT
Key for accessing the default (combined) content ingetContentItems()
.- See Also:
- Constant Field Values
-
ITEM_CREATOR
static final java.lang.String ITEM_CREATOR
Key to access the document creator name in the item map.- See Also:
- Constant Field Values
-
ITEM_KEYWORDS
static final java.lang.String ITEM_KEYWORDS
Key to access the document keywords in the item map.- See Also:
- Constant Field Values
-
ITEM_MANAGER
static final java.lang.String ITEM_MANAGER
Key to access the document manager name in the item map.- See Also:
- Constant Field Values
-
ITEM_PRODUCER
static final java.lang.String ITEM_PRODUCER
Key to access the document producer name in the item map.- See Also:
- Constant Field Values
-
ITEM_RAW
static final java.lang.String ITEM_RAW
Key for accessing the raw content ingetContentItems()
.- See Also:
- Constant Field Values
-
ITEM_SUBJECT
static final java.lang.String ITEM_SUBJECT
Key to access the document subject in the item map.- See Also:
- Constant Field Values
-
ITEM_TITLE
static final java.lang.String ITEM_TITLE
Key to access the document title in the item map.- See Also:
- Constant Field Values
-
ITEMS_TO_MERGE
static final java.lang.String[] ITEMS_TO_MERGE
All items that should be merged.
-
-
Method Detail
-
getBytes
byte[] getBytes()
Returns this extraction result serialized as a byte array.- Returns:
- this extraction result serialized as a byte array
-
getContent
java.lang.String getContent()
Returns the extracted content of the best fitting locale combined as a String.- Returns:
- the extracted content of the best fitting locale combined as a String
-
getContent
java.lang.String getContent(java.util.Locale locale)
Returns the extracted content for the given locale combined as a String.- Parameters:
locale
- the locale of the extracted content- Returns:
- the extracted content for the given locale combined as a String
-
getContentItems
java.util.LinkedHashMap<java.lang.String,java.lang.String> getContentItems()
Returns the extracted content for the best fitting locale as individual items.The result Map contains all content items extracted by the extractor. The key is always a String, and contains the name of the item. The value is also a String and contains the extracted text.
The detailed form will depend on the resource type indexed:
- For a
xmlpage
, the key will be the element name, and the value will be the text of the element. - For a
xmlcontent
, the key will be the xpath of the XML node, and the value will be the text of that XML node. - In case the document contains meta information (for example PDF or MS Office documents), the meta information is stored with the name of the meta field as key and the content as value.
- For all other resource types, there will be only ony key
ITEM_CONTENT
, which will contain the value of the complete content.
- Returns:
- the extracted content as individual items
- For a
-
getContentItems
java.util.LinkedHashMap<java.lang.String,java.lang.String> getContentItems(java.util.Locale locale)
Returns the extracted content for a given locale as individual items.
- Parameters:
locale
- the locale of the extracted content items- Returns:
- the extracted content for a given locale as individual items.
- See Also:
getContentItems()
-
getDefaultLocale
java.util.Locale getDefaultLocale()
Returns the best fitting locale for the content.- Returns:
- the best fitting locale for the content
-
getFieldMappings
java.util.Map<java.lang.String,java.lang.String> getFieldMappings()
Returns a map from search fields to values that should be stored in that fields.- Returns:
- A map from search fields to values that should be stored in that fields.
-
getLocales
java.util.Collection<java.util.Locale> getLocales()
Returns the locales in which the content is available.- Returns:
- the locales in which the content is available
-
merge
I_CmsExtractionResult merge(java.util.List<I_CmsExtractionResult> extractionResults)
Appends, for the locales of the current collection result, the content fields from all provided extraction results to the current extraction result.- Parameters:
extractionResults
- the extraction results to merge- Returns:
- the merged result
-
release
void release()
Releases the information stored in this extraction result, to free up the memory used.
-
-