Interface I_CmsExtractionResult

All Known Implementing Classes:
CmsExtractionResult

public interface I_CmsExtractionResult
The result of a document text extraction.

This data structure contains the extracted text as well as (optional) meta information extracted from the document.

Since:
6.0.0
  • Field Details

  • Method Details

    • getBytes

      byte[] getBytes()
      Returns this extraction result serialized as a byte array.

      Returns:
      this extraction result serialized as a byte array
    • getContent

      Returns the extracted content of the best fitting locale combined as a String.

      Returns:
      the extracted content of the best fitting locale combined as a String
    • getContent

      Returns the extracted content for the given locale combined as a String.

      Parameters:
      locale - the locale of the extracted content
      Returns:
      the extracted content for the given locale combined as a String
    • getContentItems

      Returns the extracted content for the best fitting locale as individual items.

      The result Map contains all content items extracted by the extractor. The key is always a String, and contains the name of the item. The value is also a String and contains the extracted text.

      The detailed form will depend on the resource type indexed:

      • For a xmlpage, the key will be the element name, and the value will be the text of the element.
      • For a xmlcontent, the key will be the xpath of the XML node, and the value will be the text of that XML node.
      • In case the document contains meta information (for example PDF or MS Office documents), the meta information is stored with the name of the meta field as key and the content as value.
      • For all other resource types, there will be only ony key ITEM_CONTENT, which will contain the value of the complete content.
      The map has to be ordered to e.g., get the correct indexing order for search field mappings when a sequence of values is mapped to a multi-valued search field.
      Returns:
      the extracted content as individual items
    • getContentItems

      Returns the extracted content for a given locale as individual items.

      Parameters:
      locale - the locale of the extracted content items
      Returns:
      the extracted content for a given locale as individual items.
      See Also:
    • getDefaultLocale

      Returns the best fitting locale for the content.
      Returns:
      the best fitting locale for the content
    • getFieldMappings

      Returns a map from search fields to values that should be stored in that fields.
      Returns:
      A map from search fields to values that should be stored in that fields.
    • getLocales

      Returns the locales in which the content is available.
      Returns:
      the locales in which the content is available
    • merge

      Appends, for the locales of the current collection result, the content fields from all provided extraction results to the current extraction result.
      Parameters:
      extractionResults - the extraction results to merge
      Returns:
      the merged result
    • release

      void release()
      Releases the information stored in this extraction result, to free up the memory used.