Class CmsSearchManager

    • Constructor Detail

      • CmsSearchManager

        public CmsSearchManager()
        Default constructor when called as cron job.

    • Method Detail

      • getAnalyzer

        public static org.apache.lucene.analysis.Analyzer getAnalyzer​(java.lang.String className)
                                                               throws java.lang.Exception
        Returns an analyzer for the given class name.

        Parameters:
        className - the class name of the analyzer
        Returns:
        the appropriate lucene analyzer
        Throws:
        java.lang.Exception - if something goes wrong
      • getIndexSolr

        public static final CmsSolrIndex getIndexSolr​(CmsObject cms,
                                                      java.util.Map<java.lang.String,​java.lang.String[]> params)
        Returns the Solr index configured with the parameters name. The parameters must contain a key/value pair with an existing Solr index, otherwise null is returned.

        Parameters:
        cms - the current context
        params - the parameter map
        Returns:
        the best matching Solr index
      • isLuceneIndex

        public static boolean isLuceneIndex​(java.lang.String indexName)
        Returns true if the index for the given name is a Lucene index, false otherwise.

        Parameters:
        indexName - the name of the index to check
        Returns:
        true if the index for the given name is a Lucene index
      • addSearchIndex

        public void addSearchIndex​(I_CmsSearchIndex searchIndex)
        Adds a search index to the configuration.

        Parameters:
        searchIndex - the search index to add
      • addSearchIndexSource

        public void addSearchIndexSource​(CmsSearchIndexSource searchIndexSource)
        Adds a search index source configuration.

        Parameters:
        searchIndexSource - a search index source configuration
      • getAnalyzer

        public org.apache.lucene.analysis.Analyzer getAnalyzer​(java.util.Locale locale)
                                                        throws CmsSearchException
        Returns an analyzer for the given language.

        The analyzer is selected according to the analyzer configuration.

        Parameters:
        locale - the locale to get the analyzer for
        Returns:
        the appropriate lucene analyzer
        Throws:
        CmsSearchException - if something goes wrong
      • getCmsSearchAnalyzer

        public CmsSearchAnalyzer getCmsSearchAnalyzer​(java.util.Locale locale)
        Returns the search analyzer for the given locale.

        Parameters:
        locale - the locale to get the analyzer for
        Returns:
        the search analyzer for the given locale
      • getDirectory

        public java.lang.String getDirectory()
        Returns the name of the directory below WEB-INF/ where the search indexes are stored.

        Returns:
        the name of the directory below WEB-INF/ where the search indexes are stored
      • getDirectorySolr

        public java.lang.String getDirectorySolr()
        Returns the configured Solr home directory null if not set.

        Returns:
        the Solr home directory
      • getDocumentFactoryForName

        public I_CmsDocumentFactory getDocumentFactoryForName​(java.lang.String docTypeName)
        Returns the document factory configured under the provided name.
        Parameters:
        docTypeName - the name of the document type.
        Returns:
        the factory for the provided name.
      • getDocumentTypeConfig

        public CmsSearchDocumentType getDocumentTypeConfig​(java.lang.String name)
        Returns a document type config.

        Parameters:
        name - the name of the document type config
        Returns:
        the document type config.
      • getDocumentTypeConfigs

        public java.util.List<CmsSearchDocumentTypegetDocumentTypeConfigs()
        Returns an unmodifiable view (read-only) of the DocumentTypeConfigs Map.

        Returns:
        an unmodifiable view (read-only) of the DocumentTypeConfigs Map
      • getDocumentTypeKeys

        public java.util.List<java.lang.String> getDocumentTypeKeys​(java.lang.String resourceType,
                                                                    java.lang.String mimeType)
        Returns the document type keys used to specify the correct document factory. One resource typically has more than one key. The document factories are matched in the provided order and the first matching factory is used. The keys for type name "typename" and mimetype "mimetype" would be a subset of:
        • typename_mimetype
        • typename
        • if typename is a sub-type of containerpage
          • containerpage_mimetype
          • containerpage
        • if typename is a sub-type of xmlcontent
          • xmlcontent_mimetype
          • xmlcontent
        • __unconfigured___mimetype
        • __unconfigured__
        • __all___mimetype
        • __all__
          • Note that all keys except the "__all__"-keys are only added as long as globally there is no matching factory for the key. This in particular means that a factory matching "typename" will never be used if you have a factory for "typename__mimetype" - even if this is not configured for the used index source. Eventually, the content will not be indexed in such cases.
        Parameters:
        resourceType - the resource type to generate the list of document type keys for.
        mimeType - the mime type to generate the list of document type keys for.
        Returns:
        the document type keys.
      • getDocumentTypeMapForTypeNames

        public java.util.Map<java.lang.String,​I_CmsDocumentFactorygetDocumentTypeMapForTypeNames​(java.util.List<java.lang.String> documentTypeNames)
        Returns the map from document type keys to document factories with all entries for the provided document type names.
        Parameters:
        documentTypeNames - list of document type names to generate the map for.
        Returns:
        the map from document type keys to document factories.
      • getExtractionCacheMaxAge

        public float getExtractionCacheMaxAge()
        Returns the maximum age a text extraction result is kept in the cache (in hours).

        Returns:
        the maximum age a text extraction result is kept in the cache (in hours)
      • getFieldConfiguration

        public I_CmsSearchFieldConfiguration getFieldConfiguration​(java.lang.String name)
        Returns the search field configuration with the given name.

        In case no configuration is available with the given name, null is returned.

        Parameters:
        name - the name to get the search field configuration for
        Returns:
        the search field configuration with the given name
      • getIndex

        public I_CmsSearchIndex getIndex​(java.lang.String indexName)
        Returns the Lucene search index configured with the given name.

        The index must exist, otherwise null is returned.

        Parameters:
        indexName - then name of the requested search index
        Returns:
        the Lucene search index configured with the given name
      • getIndexLockMaxWaitSeconds

        public int getIndexLockMaxWaitSeconds()
        Returns the seconds to wait for an index lock during an update operation.

        Returns:
        the seconds to wait for an index lock during an update operation
      • getIndexNames

        public java.util.List<java.lang.String> getIndexNames()
        Returns the names of all configured indexes.

        Returns:
        list of names
      • getIndexSolr

        public CmsSolrIndex getIndexSolr​(java.lang.String indexName)
        Returns the Solr index configured with the given name.

        The index must exist, otherwise null is returned.

        Parameters:
        indexName - then name of the requested Solr index
        Returns:
        the Solr index configured with the given name
      • getIndexSource

        public CmsSearchIndexSource getIndexSource​(java.lang.String sourceName)
        Returns a search index source for a specified source name.

        Parameters:
        sourceName - the name of the index source
        Returns:
        a search index source
      • getMaxExcerptLength

        public int getMaxExcerptLength()
        Returns the max. excerpt length.

        Returns:
        the max excerpt length
      • getMaxIndexWaitTime

        public long getMaxIndexWaitTime()
        Returns the maximal time to wait for re-indexing after a content is edited (in milliseconds).

        Returns:
        the maximal time to wait for re-indexing after a content is edited (in milliseconds)
      • getMaxModificationsBeforeCommit

        public int getMaxModificationsBeforeCommit()
        Returns the maximum number of modifications before a commit in the search index is triggered.

        Returns:
        the maximum number of modifications before a commit in the search index is triggered
      • getOfflineUpdateFrequency

        public long getOfflineUpdateFrequency()
        Returns the update frequency of the offline indexer in milliseconds.

        Returns:
        the update frequency of the offline indexer in milliseconds
      • getSearchIndexSources

        public java.util.Map<java.lang.String,​CmsSearchIndexSourcegetSearchIndexSources()
        Returns an unmodifiable view (read-only) of the SearchIndexSources Map.

        Returns:
        an unmodifiable view (read-only) of the SearchIndexSources Map
      • getTimeout

        public long getTimeout()
        Returns the timeout to abandon threads indexing a resource.

        Returns:
        the timeout to abandon threads indexing a resource
      • initializeIndexes

        public void initializeIndexes()
        Initializes all configured document types, index sources and search indexes.

        This methods needs to be called if after a change in the index configuration has been made.

      • initOfflineIndexes

        public void initOfflineIndexes()
        Initialize the offline index handler, require after an offline index has been added.

      • initSpellcheckIndex

        public void initSpellcheckIndex​(CmsObject adminCms)
        Initializes the spell check index.

        Parameters:
        adminCms - the ROOT_ADMIN cms context
      • isOfflineIndexingPaused

        public boolean isOfflineIndexingPaused()
        Returns if the offline indexing is paused.

        Returns:
        true if the offline indexing is paused
      • launch

        public java.lang.String launch​(CmsObject cms,
                                       java.util.Map<java.lang.String,​java.lang.String> parameters)
                                throws java.lang.Exception
        Updates the indexes from as a scheduled job.

        Specified by:
        launch in interface I_CmsScheduledJob
        Parameters:
        cms - the OpenCms user context to use when reading resources from the VFS
        parameters - the parameters for the scheduled job
        Returns:
        the String to write in the scheduler log
        Throws:
        java.lang.Exception - if something goes wrong
        See Also:
        I_CmsScheduledJob.launch(CmsObject, Map)
      • pauseOfflineIndexing

        public CmsUUID pauseOfflineIndexing()
        Pauses the offline indexing and returns a pause request id that has to be used for resuming offline indexing again.

        May take some time, because the indexes are updated first.

        Returns:
        the pause request id. The id has to be given to the resumeOfflineIndexing(CmsUUID) method to resume offline indexing.
      • rebuildAllIndexes

        public void rebuildAllIndexes​(I_CmsReport report)
                               throws CmsException
        Rebuilds (if required creates) all configured indexes.

        Parameters:
        report - the report object to write messages (or null)
        Throws:
        CmsException - if something goes wrong
      • rebuildIndex

        public void rebuildIndex​(java.lang.String indexName,
                                 I_CmsReport report)
                          throws CmsException
        Rebuilds (if required creates) the index with the given name.

        Parameters:
        indexName - the name of the index to rebuild
        report - the report object to write messages (or null)
        Throws:
        CmsException - if something goes wrong
      • rebuildIndexes

        public void rebuildIndexes​(java.util.List<java.lang.String> indexNames,
                                   I_CmsReport report)
                            throws CmsException
        Rebuilds (if required creates) the List of indexes with the given name.

        Parameters:
        indexNames - the names (String) of the index to rebuild
        report - the report object to write messages (or null)
        Throws:
        CmsException - if something goes wrong
      • removeSearchFieldConfigurationField

        public boolean removeSearchFieldConfigurationField​(I_CmsSearchFieldConfiguration fieldConfiguration,
                                                           CmsSearchField field)
        Removes a search field from the field configuration.

        Parameters:
        fieldConfiguration - the field configuration
        field - field to remove from the field configuration
        Returns:
        true if remove was successful, false if preconditions for removal are ok but the given field was unknown.
      • removeSearchIndex

        public void removeSearchIndex​(I_CmsSearchIndex searchIndex)
        Removes a search index from the configuration.

        Parameters:
        searchIndex - the search index to remove
      • removeSearchIndexes

        public void removeSearchIndexes​(java.util.List<java.lang.String> indexNames)
        Removes all indexes included in the given list (which must contain the name of an index to remove).

        Parameters:
        indexNames - the names of the index to remove
      • removeSearchIndexSource

        public boolean removeSearchIndexSource​(CmsSearchIndexSource indexsource)
                                        throws CmsIllegalStateException
        Removes this indexsource from the OpenCms configuration (if it is not used any more).

        Parameters:
        indexsource - the indexsource to remove from the configuration
        Returns:
        true if remove was successful, false if preconditions for removal are ok but the given searchindex was unknown to the manager.
        Throws:
        CmsIllegalStateException - if the given indexsource is still used by at least one I_CmsSearchIndex.
      • resumeOfflineIndexing

        public void resumeOfflineIndexing​(CmsUUID pauseId)
        Resumes offline indexing if it was paused and no pause for another pauseId is still present.

        Parameters:
        pauseId - the id of the pause request, which now allows for resuming.
      • setDirectory

        public void setDirectory​(java.lang.String value)
        Sets the name of the directory below WEB-INF/ where the search indexes are stored.

        Parameters:
        value - the name of the directory below WEB-INF/ where the search indexes are stored
      • setExtractionCacheMaxAge

        public void setExtractionCacheMaxAge​(float extractionCacheMaxAge)
        Sets the maximum age a text extraction result is kept in the cache (in hours).

        Parameters:
        extractionCacheMaxAge - the maximum age for a text extraction result to set
      • setExtractionCacheMaxAge

        public void setExtractionCacheMaxAge​(java.lang.String extractionCacheMaxAge)
        Sets the maximum age a text extraction result is kept in the cache (in hours) as a String.

        Parameters:
        extractionCacheMaxAge - the maximum age for a text extraction result to set
      • setForceunlock

        public void setForceunlock​(java.lang.String value)
        Sets the unlock mode during indexing.

        Parameters:
        value - the value
      • setHighlighter

        public void setHighlighter​(java.lang.String highlighter)
        Sets the highlighter.

        A highlighter is a class implementing org.opencms.search.documents.I_TermHighlighter.

        Parameters:
        highlighter - the package/class name of the highlighter
      • setIndexLockMaxWaitSeconds

        public void setIndexLockMaxWaitSeconds​(int value)
        Sets the seconds to wait for an index lock during an update operation.

        Parameters:
        value - the seconds to wait for an index lock during an update operation
      • setMaxExcerptLength

        public void setMaxExcerptLength​(int maxExcerptLength)
        Sets the max. excerpt length.

        Parameters:
        maxExcerptLength - the max. excerpt length to set
      • setMaxExcerptLength

        public void setMaxExcerptLength​(java.lang.String maxExcerptLength)
        Sets the max. excerpt length as a String.

        Parameters:
        maxExcerptLength - the max. excerpt length to set
      • setMaxIndexWaitTime

        public void setMaxIndexWaitTime​(long maxIndexWaitTime)
        Sets the maximal wait time for offline index updates after edit operations.

        Parameters:
        maxIndexWaitTime - the maximal wait time to set in milliseconds
      • setMaxIndexWaitTime

        public void setMaxIndexWaitTime​(java.lang.String maxIndexWaitTime)
        Sets the maximal wait time for offline index updates after edit operations.

        Parameters:
        maxIndexWaitTime - the maximal wait time to set in milliseconds
      • setMaxModificationsBeforeCommit

        public void setMaxModificationsBeforeCommit​(int maxModificationsBeforeCommit)
        Sets the maximum number of modifications before a commit in the search index is triggered.

        Parameters:
        maxModificationsBeforeCommit - the maximum number of modifications to set
      • setMaxModificationsBeforeCommit

        public void setMaxModificationsBeforeCommit​(java.lang.String value)
        Sets the maximum number of modifications before a commit in the search index is triggered as a string.

        Parameters:
        value - the maximum number of modifications to set
      • setOfflineUpdateFrequency

        public void setOfflineUpdateFrequency​(long offlineUpdateFrequency)
        Sets the update frequency of the offline indexer in milliseconds.

        Parameters:
        offlineUpdateFrequency - the update frequency in milliseconds to set
      • setOfflineUpdateFrequency

        public void setOfflineUpdateFrequency​(java.lang.String offlineUpdateFrequency)
        Sets the update frequency of the offline indexer in milliseconds.

        Parameters:
        offlineUpdateFrequency - the update frequency in milliseconds to set
      • setTimeout

        public void setTimeout​(long value)
        Sets the timeout to abandon threads indexing a resource.

        Parameters:
        value - the timeout in milliseconds
      • setTimeout

        public void setTimeout​(java.lang.String value)
        Sets the timeout to abandon threads indexing a resource as a String.

        Parameters:
        value - the timeout in milliseconds
      • shutDown

        public void shutDown()
        Shuts down the search manager.

        This will cause all search indices to be shut down.

      • updateOfflineIndexes

        public void updateOfflineIndexes()
        Updates all offline indexes.

        Can be used to force an index update when it's not convenient to wait until the offline update interval has eclipsed.

        Since the offline indexes still need some time to update the new resources, the method waits for at most the configurable maxIndexWaitTime to ensure that updating is finished.

        See Also:
        updateOfflineIndexes(long)
      • updateOfflineIndexes

        public void updateOfflineIndexes​(long waitTime)
        Updates all offline indexes.

        Can be used to force an index update when it's not convenient to wait until the offline update interval has eclipsed.

        Since the offline index will still need some time to update the new resources even if it runs directly, a wait time of 2500 or so should be given in order to make sure the index finished updating.

        Parameters:
        waitTime - milliseconds to wait after the offline update index was notified of the changes
      • addAdditionallyAffectedResources

        protected java.util.List<CmsPublishedResourceaddAdditionallyAffectedResources​(CmsObject adminCms,
                                                                                        java.util.List<CmsPublishedResource> updateResources)
        Collects the resources whose indexed document depends on one of the updated resources.

        We take transitive dependencies into account and handle cyclic dependencies correctly as well.

        Parameters:
        adminCms - an OpenCms user context with Admin permissions
        updateResources - the resources to be re-indexed
        Returns:
        the updated list of resource to re-index
      • addIndexContentRelatedResources

        protected java.util.Collection<CmsPublishedResourceaddIndexContentRelatedResources​(CmsObject adminCms,
                                                                                             java.util.Collection<CmsPublishedResource> updateResources,
                                                                                             java.util.Collection<CmsPublishedResource> updateResourcesToCheck)
        Collects the resources whose indexed document depends on one of the updated resources.

        Parameters:
        adminCms - an OpenCms user context with Admin permissions
        updateResources - the resources to be re-indexed
        updateResourcesToCheck - the resources to check additionally affected resources for, subset of updateResources
        Returns:
        the list of resources that need to be additionally re-index
      • cleanExtractionCache

        protected void cleanExtractionCache()
        Cleans up the extraction result cache.

      • findRelatedContainerPages

        protected java.util.Collection<CmsPublishedResourcefindRelatedContainerPages​(CmsObject adminCms,
                                                                                       java.util.Collection<CmsPublishedResource> updateResources,
                                                                                       java.util.Collection<CmsPublishedResource> updateResourcesToCheck)
        Collects the related containerpages to the resources that have been published.

        Parameters:
        adminCms - an OpenCms user context with Admin permissions
        updateResources - the resources to be re-indexed
        updateResourcesToCheck - the resources to check additionally affected resources for, subset of updateResources
        Returns:
        the list of resources that need to be additionally re-index
      • getDocumentTypes

        protected java.util.List<java.lang.String> getDocumentTypes()
        Returns the set of names of all configured document types.

        Returns:
        the set of names of all configured document types
      • getOfflineIndexProject

        protected CmsProject getOfflineIndexProject()
        Returns the a offline project used for offline indexing.

        Returns:
        the offline project if available
      • initAvailableDocumentTypes

        protected void initAvailableDocumentTypes()
        Initializes the available Cms resource types to be indexed.

        A map stores document factories keyed by a string representing a colon separated list of Cms resource types and/or mimetypes.

        The keys of this map are used to trigger a document factory to convert a Cms resource into a Lucene index document.

        A document factory is a class implementing the interface I_CmsDocumentFactory.

      • initIndexSources

        protected void initIndexSources()
        Initializes the index sources.
      • initSearchIndexes

        protected void initSearchIndexes()
        Initializes the configured search indexes.

        This initializes also the list of Cms resources types to be indexed by an index source.

      • shouldUpdateAtAll

        protected boolean shouldUpdateAtAll​(I_CmsSearchIndex index)
        Checks, if the index should be rebuilt/updated at all by the search manager.
        Parameters:
        index - the index to check.
        Returns:
        a flag, indicating if the index should be rebuilt/updated at all.
      • updateAllIndexes

        protected void updateAllIndexes​(CmsObject adminCms,
                                        CmsUUID publishHistoryId,
                                        I_CmsReport report)
        Incrementally updates all indexes that have their rebuild mode set to "auto" after resources have been published.

        Parameters:
        adminCms - an OpenCms user context with Admin permissions
        publishHistoryId - the history ID of the published project
        report - the report to write the output to
      • updateAllIndexes

        protected void updateAllIndexes​(CmsObject adminCms,
                                        java.util.List<CmsPublishedResource> updateResources,
                                        I_CmsReport report)
        Incrementally updates all indexes that have their rebuild mode set to "auto".

        Parameters:
        adminCms - an OpenCms user context with Admin permissions
        updateResources - the resources to update
        report - the report to write the output to
      • updateIndex

        protected void updateIndex​(I_CmsSearchIndex index,
                                   I_CmsReport report,
                                   java.util.List<CmsPublishedResource> resourcesToIndex)
                            throws CmsException
        Updates (if required creates) the index with the given name.

        If the optional List of CmsPublishedResource instances is provided, the index will be incrementally updated for these resources only. If this List is null or empty, the index will be fully rebuild.

        Parameters:
        index - the index to update or rebuild
        report - the report to write output messages to
        resourcesToIndex - an (optional) list of CmsPublishedResource objects to update in the index
        Throws:
        CmsException - if something goes wrong