Interface I_CmsExtractionResult

  • All Known Implementing Classes:
    CmsExtractionResult

    public interface I_CmsExtractionResult
    The result of a document text extraction.

    This data structure contains the extracted text as well as (optional) meta information extracted from the document.

    Since:
    6.0.0
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String ITEM_AUTHOR
      Key to access the document author name in the item map.
      static java.lang.String ITEM_CATEGORY
      Key to access the document category in the item map.
      static java.lang.String ITEM_COMMENTS
      Key to access the document comments in the item map.
      static java.lang.String ITEM_COMPANY
      Key to access the document company name in the item map.
      static java.lang.String ITEM_CONTENT
      Key for accessing the default (combined) content in getContentItems().
      static java.lang.String ITEM_CREATOR
      Key to access the document creator name in the item map.
      static java.lang.String ITEM_KEYWORDS
      Key to access the document keywords in the item map.
      static java.lang.String ITEM_MANAGER
      Key to access the document manager name in the item map.
      static java.lang.String ITEM_PRODUCER
      Key to access the document producer name in the item map.
      static java.lang.String ITEM_RAW
      Key for accessing the raw content in getContentItems().
      static java.lang.String ITEM_SUBJECT
      Key to access the document subject in the item map.
      static java.lang.String ITEM_TITLE
      Key to access the document title in the item map.
      static java.lang.String[] ITEMS_TO_MERGE
      All items that should be merged.
    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      byte[] getBytes()
      Returns this extraction result serialized as a byte array.
      java.lang.String getContent()
      Returns the extracted content of the best fitting locale combined as a String.
      java.lang.String getContent​(java.util.Locale locale)
      Returns the extracted content for the given locale combined as a String.
      java.util.LinkedHashMap<java.lang.String,​java.lang.String> getContentItems()
      Returns the extracted content for the best fitting locale as individual items.
      java.util.LinkedHashMap<java.lang.String,​java.lang.String> getContentItems​(java.util.Locale locale)
      Returns the extracted content for a given locale as individual items.
      java.util.Locale getDefaultLocale()
      Returns the best fitting locale for the content.
      java.util.Map<java.lang.String,​java.lang.String> getFieldMappings()
      Returns a map from search fields to values that should be stored in that fields.
      java.util.Collection<java.util.Locale> getLocales()
      Returns the locales in which the content is available.
      I_CmsExtractionResult merge​(java.util.List<I_CmsExtractionResult> extractionResults)
      Appends, for the locales of the current collection result, the content fields from all provided extraction results to the current extraction result.
      void release()
      Releases the information stored in this extraction result, to free up the memory used.
    • Method Detail

      • getBytes

        byte[] getBytes()
        Returns this extraction result serialized as a byte array.

        Returns:
        this extraction result serialized as a byte array
      • getContent

        java.lang.String getContent()
        Returns the extracted content of the best fitting locale combined as a String.

        Returns:
        the extracted content of the best fitting locale combined as a String
      • getContent

        java.lang.String getContent​(java.util.Locale locale)
        Returns the extracted content for the given locale combined as a String.

        Parameters:
        locale - the locale of the extracted content
        Returns:
        the extracted content for the given locale combined as a String
      • getContentItems

        java.util.LinkedHashMap<java.lang.String,​java.lang.String> getContentItems()
        Returns the extracted content for the best fitting locale as individual items.

        The result Map contains all content items extracted by the extractor. The key is always a String, and contains the name of the item. The value is also a String and contains the extracted text.

        The detailed form will depend on the resource type indexed:

        • For a xmlpage, the key will be the element name, and the value will be the text of the element.
        • For a xmlcontent, the key will be the xpath of the XML node, and the value will be the text of that XML node.
        • In case the document contains meta information (for example PDF or MS Office documents), the meta information is stored with the name of the meta field as key and the content as value.
        • For all other resource types, there will be only ony key ITEM_CONTENT, which will contain the value of the complete content.
        The map has to be ordered to e.g., get the correct indexing order for search field mappings when a sequence of values is mapped to a multi-valued search field.
        Returns:
        the extracted content as individual items
      • getContentItems

        java.util.LinkedHashMap<java.lang.String,​java.lang.String> getContentItems​(java.util.Locale locale)

        Returns the extracted content for a given locale as individual items.

        Parameters:
        locale - the locale of the extracted content items
        Returns:
        the extracted content for a given locale as individual items.
        See Also:
        getContentItems()
      • getDefaultLocale

        java.util.Locale getDefaultLocale()
        Returns the best fitting locale for the content.
        Returns:
        the best fitting locale for the content
      • getFieldMappings

        java.util.Map<java.lang.String,​java.lang.String> getFieldMappings()
        Returns a map from search fields to values that should be stored in that fields.
        Returns:
        A map from search fields to values that should be stored in that fields.
      • getLocales

        java.util.Collection<java.util.Locale> getLocales()
        Returns the locales in which the content is available.
        Returns:
        the locales in which the content is available
      • merge

        I_CmsExtractionResult merge​(java.util.List<I_CmsExtractionResult> extractionResults)
        Appends, for the locales of the current collection result, the content fields from all provided extraction results to the current extraction result.
        Parameters:
        extractionResults - the extraction results to merge
        Returns:
        the merged result
      • release

        void release()
        Releases the information stored in this extraction result, to free up the memory used.