Class CmsSearchManager

java.lang.Object
org.opencms.search.CmsSearchManager
All Implemented Interfaces:
I_CmsEventListener, I_CmsScheduledJob

Implements the general management and configuration of the search and indexing facilities in OpenCms.

Since:
6.0.0
  • Field Details

  • Constructor Details

    • CmsSearchManager

      Default constructor when called as cron job.

  • Method Details

    • getAnalyzer

      public static org.apache.lucene.analysis.Analyzer getAnalyzer(String className) throws Exception
      Returns an analyzer for the given class name.

      Parameters:
      className - the class name of the analyzer
      Returns:
      the appropriate lucene analyzer
      Throws:
      Exception - if something goes wrong
    • getIndexSolr

      public static final CmsSolrIndex getIndexSolr(CmsObject cms, Map<String,String[]> params)
      Returns the Solr index configured with the parameters name. The parameters must contain a key/value pair with an existing Solr index, otherwise null is returned.

      Parameters:
      cms - the current context
      params - the parameter map
      Returns:
      the best matching Solr index
    • isLuceneIndex

      public static boolean isLuceneIndex(String indexName)
      Returns true if the index for the given name is a Lucene index, false otherwise.

      Parameters:
      indexName - the name of the index to check
      Returns:
      true if the index for the given name is a Lucene index
    • addAnalyzer

      public void addAnalyzer(CmsSearchAnalyzer analyzer)
      Adds an analyzer.

      Parameters:
      analyzer - an analyzer
    • addDocumentTypeConfig

      public void addDocumentTypeConfig(CmsSearchDocumentType documentType)
      Adds a document type.

      Parameters:
      documentType - a document type
    • addFieldConfiguration

      public void addFieldConfiguration(I_CmsSearchFieldConfiguration fieldConfiguration)
      Adds a search field configuration to the search manager.

      Parameters:
      fieldConfiguration - the search field configuration to add
    • addSearchIndex

      public void addSearchIndex(I_CmsSearchIndex searchIndex)
      Adds a search index to the configuration.

      Parameters:
      searchIndex - the search index to add
    • addSearchIndexSource

      public void addSearchIndexSource(CmsSearchIndexSource searchIndexSource)
      Adds a search index source configuration.

      Parameters:
      searchIndexSource - a search index source configuration
    • cmsEvent

      public void cmsEvent(CmsEvent event)
      Implements the event listener of this class.

      Specified by:
      cmsEvent in interface I_CmsEventListener
      Parameters:
      event - CmsEvent that has occurred
      See Also:
    • getAllSolrIndexes

      Returns all Solr index.

      Returns:
      all Solr indexes
    • getAnalyzer

      public org.apache.lucene.analysis.Analyzer getAnalyzer(Locale locale) throws CmsSearchException
      Returns an analyzer for the given language.

      The analyzer is selected according to the analyzer configuration.

      Parameters:
      locale - the locale to get the analyzer for
      Returns:
      the appropriate lucene analyzer
      Throws:
      CmsSearchException - if something goes wrong
    • getAnalyzers

      Returns an unmodifiable view of the map that contains the CmsSearchAnalyzer list.

      The keys in the map are Locale objects, and the values are CmsSearchAnalyzer objects.

      Returns:
      an unmodifiable view of the Analyzers Map
    • getCmsSearchAnalyzer

      Returns the search analyzer for the given locale.

      Parameters:
      locale - the locale to get the analyzer for
      Returns:
      the search analyzer for the given locale
    • getDirectory

      public String getDirectory()
      Returns the name of the directory below WEB-INF/ where the search indexes are stored.

      Returns:
      the name of the directory below WEB-INF/ where the search indexes are stored
    • getDirectorySolr

      Returns the configured Solr home directory null if not set.

      Returns:
      the Solr home directory
    • getDocumentFactoryForName

      Returns the document factory configured under the provided name.
      Parameters:
      docTypeName - the name of the document type.
      Returns:
      the factory for the provided name.
    • getDocumentTypeConfig

      Returns a document type config.

      Parameters:
      name - the name of the document type config
      Returns:
      the document type config.
    • getDocumentTypeConfigs

      Returns an unmodifiable view (read-only) of the DocumentTypeConfigs Map.

      Returns:
      an unmodifiable view (read-only) of the DocumentTypeConfigs Map
    • getDocumentTypeKeys

      Returns the document type keys used to specify the correct document factory.
      Parameters:
      resource - the resource to generate the list of document type keys for.
      Returns:
      the document type keys.
      See Also:
    • getDocumentTypeKeys

      public List<String> getDocumentTypeKeys(String resourceType, String mimeType)
      Returns the document type keys used to specify the correct document factory. One resource typically has more than one key. The document factories are matched in the provided order and the first matching factory is used. The keys for type name "typename" and mimetype "mimetype" would be a subset of:
      • typename_mimetype
      • typename
      • if typename is a sub-type of containerpage
        • containerpage_mimetype
        • containerpage
      • if typename is a sub-type of xmlcontent
        • xmlcontent_mimetype
        • xmlcontent
      • __unconfigured___mimetype
      • __unconfigured__
      • __all___mimetype
      • __all__
        • Note that all keys except the "__all__"-keys are only added as long as globally there is no matching factory for the key. This in particular means that a factory matching "typename" will never be used if you have a factory for "typename__mimetype" - even if this is not configured for the used index source. Eventually, the content will not be indexed in such cases.
      Parameters:
      resourceType - the resource type to generate the list of document type keys for.
      mimeType - the mime type to generate the list of document type keys for.
      Returns:
      the document type keys.
    • getDocumentTypeMapForTypeNames

      Returns the map from document type keys to document factories with all entries for the provided document type names.
      Parameters:
      documentTypeNames - list of document type names to generate the map for.
      Returns:
      the map from document type keys to document factories.
    • getExtractionCacheMaxAge

      public float getExtractionCacheMaxAge()
      Returns the maximum age a text extraction result is kept in the cache (in hours).

      Returns:
      the maximum age a text extraction result is kept in the cache (in hours)
    • getFieldConfiguration

      Returns the search field configuration with the given name.

      In case no configuration is available with the given name, null is returned.

      Parameters:
      name - the name to get the search field configuration for
      Returns:
      the search field configuration with the given name
    • getFieldConfigurations

      Returns the unmodifieable List of configured I_CmsSearchFieldConfiguration entries.

      Returns:
      the unmodifieable List of configured I_CmsSearchFieldConfiguration entries
    • getFieldConfigurationsLucene

      Returns the Lucene search field configurations only.

      Returns:
      the Lucene search field configurations
    • getFieldConfigurationsSolr

      Returns the Solr search field configurations only.

      Returns:
      the Solr search field configurations
    • getForceunlock

      Returns the force unlock mode during indexing.

      Returns:
      the force unlock mode during indexing
    • getHighlighter

      Returns the highlighter.

      Returns:
      the highlighter
    • getIndex

      public I_CmsSearchIndex getIndex(String indexName)
      Returns the Lucene search index configured with the given name.

      The index must exist, otherwise null is returned.

      Parameters:
      indexName - then name of the requested search index
      Returns:
      the Lucene search index configured with the given name
    • getIndexLockMaxWaitSeconds

      Returns the seconds to wait for an index lock during an update operation.

      Returns:
      the seconds to wait for an index lock during an update operation
    • getIndexNames

      Returns the names of all configured indexes.

      Returns:
      list of names
    • getIndexSolr

      public CmsSolrIndex getIndexSolr(String indexName)
      Returns the Solr index configured with the given name.

      The index must exist, otherwise null is returned.

      Parameters:
      indexName - then name of the requested Solr index
      Returns:
      the Solr index configured with the given name
    • getIndexSource

      Returns a search index source for a specified source name.

      Parameters:
      sourceName - the name of the index source
      Returns:
      a search index source
    • getMaxExcerptLength

      public int getMaxExcerptLength()
      Returns the max. excerpt length.

      Returns:
      the max excerpt length
    • getMaxIndexWaitTime

      public long getMaxIndexWaitTime()
      Returns the maximal time to wait for re-indexing after a content is edited (in milliseconds).

      Returns:
      the maximal time to wait for re-indexing after a content is edited (in milliseconds)
    • getMaxModificationsBeforeCommit

      Returns the maximum number of modifications before a commit in the search index is triggered.

      Returns:
      the maximum number of modifications before a commit in the search index is triggered
    • getOfflineUpdateFrequency

      Returns the update frequency of the offline indexer in milliseconds.

      Returns:
      the update frequency of the offline indexer in milliseconds
    • getSearchIndexes

      Returns an unmodifiable list of all configured I_CmsSearchIndex instances.

      Returns:
      an unmodifiable list of all configured I_CmsSearchIndex instances
    • getSearchIndexesAll

      Returns an unmodifiable list of all configured I_CmsSearchIndex instances.

      Returns:
      an unmodifiable list of all configured I_CmsSearchIndex instances
    • getSearchIndexesSolr

      Returns an unmodifiable list of all configured I_CmsSearchIndex instances.

      Returns:
      an unmodifiable list of all configured I_CmsSearchIndex instances
    • getSearchIndexSources

      Returns an unmodifiable view (read-only) of the SearchIndexSources Map.

      Returns:
      an unmodifiable view (read-only) of the SearchIndexSources Map
    • getSolrDictionary

      Return singleton instance of the OpenCms spellchecker.

      Returns:
      instance of CmsSolrSpellchecker.
    • getSolrServerConfiguration

      Returns the Solr configuration.

      Returns:
      the Solr configuration
    • getTimeout

      public long getTimeout()
      Returns the timeout to abandon threads indexing a resource.

      Returns:
      the timeout to abandon threads indexing a resource
    • initialize

      Initializes the search manager.

      Parameters:
      cms - the cms object
      Throws:
      CmsRoleViolationException - in case the given opencms object does not have CmsRole.WORKPLACE_MANAGER permissions
    • initializeFieldConfigurations

      Calls I_CmsSearchFieldConfiguration.init() for all registered field configurations.
    • initializeIndexes

      public void initializeIndexes()
      Initializes all configured document types, index sources and search indexes.

      This method needs to be called if after a change in the index configuration has been made.

    • initOfflineIndexes

      public void initOfflineIndexes()
      Initialize the offline index handler, require after an offline index has been added.

    • initSpellcheckIndex

      public void initSpellcheckIndex(CmsObject adminCms)
      Initializes the spell check index.

      Parameters:
      adminCms - the ROOT_ADMIN cms context
    • isOfflineIndexingPaused

      public boolean isOfflineIndexingPaused()
      Returns if the offline indexing is paused.

      Returns:
      true if the offline indexing is paused
    • launch

      public String launch(CmsObject cms, Map<String,String> parameters) throws Exception
      Updates the indexes from as a scheduled job.

      Specified by:
      launch in interface I_CmsScheduledJob
      Parameters:
      cms - the OpenCms user context to use when reading resources from the VFS
      parameters - the parameters for the scheduled job
      Returns:
      the String to write in the scheduler log
      Throws:
      Exception - if something goes wrong
      See Also:
    • pauseOfflineIndexing

      Pauses the offline indexing and returns a pause request id that has to be used for resuming offline indexing again.

      May take some time, because the indexes are updated first.

      Returns:
      the pause request id. The id has to be given to the resumeOfflineIndexing(CmsUUID) method to resume offline indexing.
    • rebuildAllIndexes

      public void rebuildAllIndexes(I_CmsReport report) throws CmsException
      Rebuilds (if required creates) all configured indexes.

      Parameters:
      report - the report object to write messages (or null)
      Throws:
      CmsException - if something goes wrong
    • rebuildIndex

      public void rebuildIndex(String indexName, I_CmsReport report) throws CmsException
      Rebuilds (if required creates) the index with the given name.

      Parameters:
      indexName - the name of the index to rebuild
      report - the report object to write messages (or null)
      Throws:
      CmsException - if something goes wrong
    • rebuildIndexes

      public void rebuildIndexes(List<String> indexNames, I_CmsReport report) throws CmsException
      Rebuilds (if required creates) the List of indexes with the given name.

      Parameters:
      indexNames - the names (String) of the index to rebuild
      report - the report object to write messages (or null)
      Throws:
      CmsException - if something goes wrong
    • registerSolrIndex

      Registers a new Solr core for the given index.

      Parameters:
      index - the index to register a new Solr core for
      Throws:
      CmsConfigurationException - if no Solr server is configured
    • removeSearchFieldConfiguration

      Removes this field configuration from the OpenCms configuration (if it is not used any more).

      Parameters:
      fieldConfiguration - the field configuration to remove from the configuration
      Returns:
      true if remove was successful, false if preconditions for removal are ok but the given field configuration was unknown to the manager.
      Throws:
      CmsIllegalStateException - if the given field configuration is still used by at least one I_CmsSearchIndex.
    • removeSearchFieldConfigurationField

      Removes a search field from the field configuration.

      Parameters:
      fieldConfiguration - the field configuration
      field - field to remove from the field configuration
      Returns:
      true if remove was successful, false if preconditions for removal are ok but the given field was unknown.
    • removeSearchFieldMapping

      Removes a search field mapping from the given field.

      Parameters:
      field - the field
      mapping - mapping to remove from the field
      Returns:
      true if remove was successful, false if preconditions for removal are ok but the given mapping was unknown.
      Throws:
      CmsIllegalStateException - if the given mapping is the last mapping inside the given field.
    • removeSearchIndex

      public void removeSearchIndex(I_CmsSearchIndex searchIndex)
      Removes a search index from the configuration.

      Parameters:
      searchIndex - the search index to remove
    • removeSearchIndexes

      public void removeSearchIndexes(List<String> indexNames)
      Removes all indexes included in the given list (which must contain the name of an index to remove).

      Parameters:
      indexNames - the names of the index to remove
    • removeSearchIndexSource

      Removes this indexsource from the OpenCms configuration (if it is not used any more).

      Parameters:
      indexsource - the indexsource to remove from the configuration
      Returns:
      true if remove was successful, false if preconditions for removal are ok but the given searchindex was unknown to the manager.
      Throws:
      CmsIllegalStateException - if the given indexsource is still used by at least one I_CmsSearchIndex.
    • resumeOfflineIndexing

      public void resumeOfflineIndexing(CmsUUID pauseId)
      Resumes offline indexing if it was paused and no pause for another pauseId is still present.

      Parameters:
      pauseId - the id of the pause request, which now allows for resuming.
    • setDirectory

      public void setDirectory(String value)
      Sets the name of the directory below WEB-INF/ where the search indexes are stored.

      Parameters:
      value - the name of the directory below WEB-INF/ where the search indexes are stored
    • setExtractionCacheMaxAge

      public void setExtractionCacheMaxAge(float extractionCacheMaxAge)
      Sets the maximum age a text extraction result is kept in the cache (in hours).

      Parameters:
      extractionCacheMaxAge - the maximum age for a text extraction result to set
    • setExtractionCacheMaxAge

      public void setExtractionCacheMaxAge(String extractionCacheMaxAge)
      Sets the maximum age a text extraction result is kept in the cache (in hours) as a String.

      Parameters:
      extractionCacheMaxAge - the maximum age for a text extraction result to set
    • setForceunlock

      public void setForceunlock(String value)
      Sets the unlock mode during indexing.

      Parameters:
      value - the value
    • setHighlighter

      public void setHighlighter(String highlighter)
      Sets the highlighter.

      A highlighter is a class implementing org.opencms.search.documents.I_TermHighlighter.

      Parameters:
      highlighter - the package/class name of the highlighter
    • setIndexLockMaxWaitSeconds

      public void setIndexLockMaxWaitSeconds(int value)
      Sets the seconds to wait for an index lock during an update operation.

      Parameters:
      value - the seconds to wait for an index lock during an update operation
    • setMaxExcerptLength

      public void setMaxExcerptLength(int maxExcerptLength)
      Sets the max. excerpt length.

      Parameters:
      maxExcerptLength - the max. excerpt length to set
    • setMaxExcerptLength

      public void setMaxExcerptLength(String maxExcerptLength)
      Sets the max. excerpt length as a String.

      Parameters:
      maxExcerptLength - the max. excerpt length to set
    • setMaxIndexWaitTime

      public void setMaxIndexWaitTime(long maxIndexWaitTime)
      Sets the maximal wait time for offline index updates after edit operations.

      Parameters:
      maxIndexWaitTime - the maximal wait time to set in milliseconds
    • setMaxIndexWaitTime

      public void setMaxIndexWaitTime(String maxIndexWaitTime)
      Sets the maximal wait time for offline index updates after edit operations.

      Parameters:
      maxIndexWaitTime - the maximal wait time to set in milliseconds
    • setMaxModificationsBeforeCommit

      public void setMaxModificationsBeforeCommit(int maxModificationsBeforeCommit)
      Sets the maximum number of modifications before a commit in the search index is triggered.

      Parameters:
      maxModificationsBeforeCommit - the maximum number of modifications to set
    • setMaxModificationsBeforeCommit

      Sets the maximum number of modifications before a commit in the search index is triggered as a string.

      Parameters:
      value - the maximum number of modifications to set
    • setOfflineUpdateFrequency

      public void setOfflineUpdateFrequency(long offlineUpdateFrequency)
      Sets the update frequency of the offline indexer in milliseconds.

      Parameters:
      offlineUpdateFrequency - the update frequency in milliseconds to set
    • setOfflineUpdateFrequency

      public void setOfflineUpdateFrequency(String offlineUpdateFrequency)
      Sets the update frequency of the offline indexer in milliseconds.

      Parameters:
      offlineUpdateFrequency - the update frequency in milliseconds to set
    • setSolrServerConfiguration

      Sets the Solr configuration.

      Parameters:
      config - the Solr configuration
    • setTimeout

      public void setTimeout(long value)
      Sets the timeout to abandon threads indexing a resource.

      Parameters:
      value - the timeout in milliseconds
    • setTimeout

      public void setTimeout(String value)
      Sets the timeout to abandon threads indexing a resource as a String.

      Parameters:
      value - the timeout in milliseconds
    • shutDown

      public void shutDown()
      Shuts down the search manager.

      This will cause all search indices to be shut down.

    • updateOfflineIndexes

      public void updateOfflineIndexes()
      Updates all offline indexes.

      Can be used to force an index update when it's not convenient to wait until the offline update interval has eclipsed.

      Since the offline indexes still need some time to update the new resources, the method waits for at most the configurable maxIndexWaitTime to ensure that updating is finished.

      See Also:
    • updateOfflineIndexes

      public void updateOfflineIndexes(long waitTime)
      Updates all offline indexes.

      Can be used to force an index update when it's not convenient to wait until the offline update interval has eclipsed.

      Since the offline index will still need some time to update the new resources even if it runs directly, a wait time of 2500 or so should be given in order to make sure the index finished updating.

      Parameters:
      waitTime - milliseconds to wait after the offline update index was notified of the changes
    • addAdditionallyAffectedResources

      Collects the resources whose indexed document depends on one of the updated resources.

      We take transitive dependencies into account and handle cyclic dependencies correctly as well.

      Parameters:
      adminCms - an OpenCms user context with Admin permissions
      updateResources - the resources to be re-indexed
      Returns:
      the updated list of resource to re-index
    • addIndexContentRelatedResources

      Collects the resources whose indexed document depends on one of the updated resources.

      Parameters:
      adminCms - an OpenCms user context with Admin permissions
      updateResources - the resources to be re-indexed
      updateResourcesToCheck - the resources to check additionally affected resources for, subset of updateResources
      Returns:
      the list of resources that need to be additionally re-index
    • cleanExtractionCache

      protected void cleanExtractionCache()
      Cleans up the extraction result cache.

    • findRelatedContainerPages

      Collects the related containerpages to the resources that have been published.

      Parameters:
      adminCms - an OpenCms user context with Admin permissions
      updateResources - the resources to be re-indexed
      updateResourcesToCheck - the resources to check additionally affected resources for, subset of updateResources
      Returns:
      the list of resources that need to be additionally re-index
    • getDocumentTypes

      Returns the set of names of all configured document types.

      Returns:
      the set of names of all configured document types
    • getOfflineIndexProject

      Returns the a offline project used for offline indexing.

      Returns:
      the offline project if available
    • getThreadManager

      Returns a new thread manager for the indexing threads.

      Returns:
      a new thread manager for the indexing threads
    • initAvailableDocumentTypes

      protected void initAvailableDocumentTypes()
      Initializes the available Cms resource types to be indexed.

      A map stores document factories keyed by a string representing a colon separated list of Cms resource types and/or mimetypes.

      The keys of this map are used to trigger a document factory to convert a Cms resource into a Lucene index document.

      A document factory is a class implementing the interface I_CmsDocumentFactory.

    • initIndexSources

      protected void initIndexSources()
      Initializes the index sources.
    • initSearchIndexes

      protected void initSearchIndexes()
      Initializes the configured search indexes.

      This initializes also the list of Cms resources types to be indexed by an index source.

    • shouldUpdateAtAll

      protected boolean shouldUpdateAtAll(I_CmsSearchIndex index)
      Checks, if the index should be rebuilt/updated at all by the search manager.
      Parameters:
      index - the index to check.
      Returns:
      a flag, indicating if the index should be rebuilt/updated at all.
    • updateAllIndexes

      protected void updateAllIndexes(CmsObject adminCms, CmsUUID publishHistoryId, I_CmsReport report)
      Incrementally updates all indexes that have their rebuild mode set to "auto" after resources have been published.

      Parameters:
      adminCms - an OpenCms user context with Admin permissions
      publishHistoryId - the history ID of the published project
      report - the report to write the output to
    • updateAllIndexes

      protected void updateAllIndexes(CmsObject adminCms, List<CmsPublishedResource> updateResources, I_CmsReport report)
      Incrementally updates all indexes that have their rebuild mode set to "auto".

      Parameters:
      adminCms - an OpenCms user context with Admin permissions
      updateResources - the resources to update
      report - the report to write the output to
    • updateIndex

      protected void updateIndex(I_CmsSearchIndex index, I_CmsReport report, List<CmsPublishedResource> resourcesToIndex) throws CmsException
      Updates (if required creates) the index with the given name.

      If the optional List of CmsPublishedResource instances is provided, the index will be incrementally updated for these resources only. If this List is null or empty, the index will be fully rebuild.

      Parameters:
      index - the index to update or rebuild
      report - the report to write output messages to
      resourcesToIndex - an (optional) list of CmsPublishedResource objects to update in the index
      Throws:
      CmsException - if something goes wrong
    • updateIndexCompletely

      protected void updateIndexCompletely(CmsObject cms, I_CmsSearchIndex index, I_CmsReport report) throws CmsIndexException
      The method updates all OpenCms documents that are indexed.
      Parameters:
      cms - the OpenCms user context to use for accessing the VFS
      index - the index to update
      report - the report to write output messages to
      Throws:
      CmsIndexException - thrown if indexing fails for some reason
    • updateIndexIncremental

      protected void updateIndexIncremental(CmsObject cms, I_CmsSearchIndex index, I_CmsReport report, List<CmsPublishedResource> resourcesToIndex) throws CmsException
      Incrementally updates the given index.

      Parameters:
      cms - the OpenCms user context to use for accessing the VFS
      index - the index to update
      report - the report to write output messages to
      resourcesToIndex - a list of CmsPublishedResource objects to update in the index
      Throws:
      CmsException - if something goes wrong
    • updateIndexOffline

      protected void updateIndexOffline(I_CmsReport report, List<CmsPublishedResource> resourcesToIndex)
      Updates the offline search indexes for the given list of resources.

      Parameters:
      report - the report to write the index information to
      resourcesToIndex - the list of CmsPublishedResource objects to index