Class CmsSearchIndex

java.lang.Object
org.opencms.search.A_CmsSearchIndex
org.opencms.search.CmsSearchIndex
All Implemented Interfaces:
Serializable, I_CmsConfigurationParameterHandler, I_CmsSearchIndex
Direct Known Subclasses:
CmsSolrIndex

public class CmsSearchIndex extends A_CmsSearchIndex
Abstract search index implementation.

See Also:
  • Field Details

  • Constructor Details

  • Method Details

    • getDateRangeSpan

      public static List<String> getDateRangeSpan(long startDate, long endDate)
      Generates a list of date terms for the optimized date range search with "daily" granularity level.

      How this works:

      • For each document, terms are added for the year, the month and the day the document was modified or created) in. So for example if a document is modified at February 02, 2009, then the following terms are stored for this document: "20090202", "200902" and "2009".
      • In case a date range search is done, then all possible matches for the provided rage are created as search terms and matched with the document terms.
      • Consider the following use case: You want to find out if a resource has been changed in the time between November 29, 2007 and March 01, 2009. One term to match is simply "2008" because if a document was modified in 2008, then it is clearly in the date range. Other terms are "200712", "200901" and "200902", because all documents modified in these months are also a certain matches. Finally we need to add terms for "20071129", "20071130" and "20090301" to match the days in the starting and final month.
      Parameters:
      startDate - start date of the range to search in
      endDate - end date of the range to search in
      Returns:
      a list of date terms for the optimized date range search
    • addConfigurationParameter

      public void addConfigurationParameter(String key, String value)
      Adds a parameter.

      Specified by:
      addConfigurationParameter in interface I_CmsConfigurationParameterHandler
      Overrides:
      addConfigurationParameter in class A_CmsSearchIndex
      Parameters:
      key - the key/name of the parameter
      value - the value of the parameter
      See Also:
    • createEmptyDocument

      Creates an empty document that can be used by this search field configuration.

      Parameters:
      resource - the resource to create the document for
      Returns:
      a new and empty document
    • getAnalyzer

      public org.apache.lucene.analysis.Analyzer getAnalyzer()
      Returns the Lucene analyzer used for this index.

      Returns:
      the Lucene analyzer used for this index
    • getConfiguration

      Description copied from class: A_CmsSearchIndex
      Returns the empty configuration. Override the method if your index is configurable.
      Specified by:
      getConfiguration in interface I_CmsConfigurationParameterHandler
      Overrides:
      getConfiguration in class A_CmsSearchIndex
      Returns:
      the parameters of this configurable class instance, or null if the class does not need any parameters
      See Also:
    • getContentIfUnchanged

      Description copied from class: A_CmsSearchIndex
      We always assume we have no unchanged copy of the content, since it depends on the concrete index. Override the method to enhance indexing performance if you know where to grap the content from your index. See the implementation getContentIfUnchanged(CmsResource) for an example.
      Specified by:
      getContentIfUnchanged in interface I_CmsSearchIndex
      Overrides:
      getContentIfUnchanged in class A_CmsSearchIndex
      Parameters:
      resource - the resource the content should be provided for.
      Returns:
      the up-to-date extraction result as gained from the index - if possible, or null, if no up-to-date extraction result can be obtained from the index.
      See Also:
    • getDocument

      public I_CmsSearchDocument getDocument(int docId)
      Returns a document by document ID.

      Parameters:
      docId - the id to get the document for
      Returns:
      the CMS specific document
    • getDocument

      @Deprecated public org.apache.lucene.document.Document getDocument(String rootPath)
      Deprecated.
      Use getDocument(String, String) instead and provide CmsSearchField.FIELD_PATH as field to search in
      Returns the Lucene document with the given root path from the index.

      Parameters:
      rootPath - the root path of the document to get
      Returns:
      the Lucene document with the given root path from the index
    • getDocument

      Returns the first document where the given term matches the selected index field.

      Use this method to search for documents which have unique field values, like a unique id.

      Parameters:
      field - the field to search in
      term - the term to search for
      Returns:
      the first document where the given term matches the selected index field
    • getLocaleForResource

      public Locale getLocaleForResource(CmsObject cms, CmsResource resource, List<Locale> availableLocales)
      Returns the language locale for the given resource in this index.

      Specified by:
      getLocaleForResource in interface I_CmsSearchIndex
      Overrides:
      getLocaleForResource in class A_CmsSearchIndex
      Parameters:
      cms - the current OpenCms user context
      resource - the resource to check
      availableLocales - a list of locales supported by the resource
      Returns:
      the language locale for the given resource in this index
    • getLocaleString

      Returns the language locale of the index as a String.

      Returns:
      the language locale of the index as a String
      See Also:
    • getMaxHits

      public int getMaxHits()
      Indicates the number of how many hits are loaded at maximum.

      The number of maximum documents to load from the index must be specified. The default of this setting is MAX_HITS_DEFAULT (5000). This means that at maximum 5000 results are returned from the index. Please note that this number may be reduced further because of OpenCms read permissions or per-user file visibility settings not controlled in the index.

      Returns:
      the number of how many hits are loaded at maximum
      Since:
      7.5.1
    • getPath

      public String getPath()
      Returns the path where this index stores it's data in the "real" file system.

      Specified by:
      getPath in interface I_CmsSearchIndex
      Overrides:
      getPath in class A_CmsSearchIndex
      Returns:
      the path where this index stores it's data in the "real" file system
      See Also:
    • getPriority

      public int getPriority()
      Returns the Thread priority for this search index.

      Returns:
      the Thread priority for this search index
    • getSearcher

      public org.apache.lucene.search.IndexSearcher getSearcher()
      Returns the Lucene index searcher used for this search index.

      Returns:
      the Lucene index searcher used for this search index
    • initialize

      public void initialize() throws CmsSearchException
      Description copied from class: A_CmsSearchIndex
      Initializes the search index.

      Specified by:
      initialize in interface I_CmsSearchIndex
      Overrides:
      initialize in class A_CmsSearchIndex
      Throws:
      CmsSearchException - if the index source association failed or a configuration error occurred
      See Also:
    • isBackupReindexing

      public boolean isBackupReindexing()
      Returns true if backup re-indexing is done by this index.

      This is an optimization method by which the old extracted content is reused in order to save performance when re-indexing.

      Returns:
      true if backup re-indexing is done by this index
      Since:
      7.5.1
    • isCheckingPermissions

      public boolean isCheckingPermissions()
      Returns true if permissions are checked for search results by this index.

      If permission checks are not required, they can be turned off in the index search configuration parameters in opencms-search.xml. Not checking permissions will improve performance.

      This is can be of use in scenarios when you know that all search results are always readable, which is usually true for public websites that do not have personalized accounts.

      Please note that even if a result is returned where the current user has no read permissions, the user can not actually access this document. It will only appear in the search result list, but if the user clicks the link to open the document he will get an error.

      Returns:
      true if permissions are checked for search results by this index
    • isCheckingTimeRange

      public boolean isCheckingTimeRange()
      Returns true if the document time range is checked with a granularity level of seconds for search results by this index.

      Since OpenCms 8.0, time range checks are always done if CmsSearchParameters.setMinDateLastModified(long) or any of the corresponding methods are used. This is done very efficiently using optimized Lucene filers. However, the granularity of these checks are done only on a daily basis, which means that you can only find "changes made yesterday" but not "changes made last hour". For normal limitation of search results, a daily granularity should be enough.

      If time range checks with a granularity level of seconds are required, they can be turned on in the index search configuration parameters in opencms-search.xml. Not checking the time range with a granularity level of seconds will improve performance.

      By default the granularity level of seconds is turned off since OpenCms 8.0

      Returns:
      true if the document time range is checked with a granularity level of seconds for search results by this index
    • isCheckPermissions

      public boolean isCheckPermissions()
      Returns the checkPermissions.

      Returns:
      the checkPermissions
    • isCreatingExcerpt

      public boolean isCreatingExcerpt()
      Returns true if an excerpt is generated by this index.

      If no except is required, generation can be turned off in the index search configuration parameters in opencms-search.xml. Not generating an excerpt will improve performance.

      Returns:
      true if an excerpt is generated by this index
    • isIgnoreExpiration

      public boolean isIgnoreExpiration()
      Returns the ignoreExpiration.

      Returns:
      the ignoreExpiration
    • isInitialized

      public boolean isInitialized()
      Description copied from interface: I_CmsSearchIndex
      Returns a flag, indicating if the search index is successfully initialized.
      Specified by:
      isInitialized in interface I_CmsSearchIndex
      Overrides:
      isInitialized in class A_CmsSearchIndex
      Returns:
      a flag, indicating if the search index is successfully initialized.
      See Also:
    • isRequireViewPermission

      public boolean isRequireViewPermission()
      Returns true if a resource requires read permission to be included in the result list.

      Returns:
      true if a resource requires read permission to be included in the result list
    • onIndexChanged

      public void onIndexChanged(boolean force)
      Description copied from interface: I_CmsSearchIndex
      Method called by the search manager if the index has changed. Typically the index searcher is reset when the method is called.
      Specified by:
      onIndexChanged in interface I_CmsSearchIndex
      Overrides:
      onIndexChanged in class A_CmsSearchIndex
      Parameters:
      force - if false the index might decide itself it it has to act on the change, if true it should act, even if itself cannot detect an index change.
      See Also:
    • search

      Performs a search on the index within the given fields.

      The result is returned as List with entries of type I_CmsSearchResult.

      Parameters:
      cms - the current user's Cms object
      params - the parameters to use for the search
      Returns:
      the List of results found or an empty list
      Throws:
      CmsSearchException - if something goes wrong
    • setAnalyzer

      public void setAnalyzer(org.apache.lucene.analysis.Analyzer analyzer)
      Sets the Lucene analyzer used for this index.

      Parameters:
      analyzer - the Lucene analyzer to set
    • setCheckPermissions

      public void setCheckPermissions(boolean checkPermissions)
      Sets the checkPermissions.

      Parameters:
      checkPermissions - the checkPermissions to set
    • setIgnoreExpiration

      public void setIgnoreExpiration(boolean ignoreExpiration)
      Sets the ignoreExpiration.

      Parameters:
      ignoreExpiration - the ignoreExpiration to set
    • setMaxHits

      public void setMaxHits(int maxHits)
      Sets the number of how many hits are loaded at maximum.

      This must be set at least to 50, or this setting is ignored.

      Parameters:
      maxHits - the number of how many hits are loaded at maximum to set
      Since:
      7.5.1
      See Also:
    • setRequireViewPermission

      public void setRequireViewPermission(boolean requireViewPermission)
      Controls if a resource requires view permission to be displayed in the result list.

      By default this is false.

      Parameters:
      requireViewPermission - controls if a resource requires view permission to be displayed in the result list
    • shutDown

      public void shutDown()
      Shuts down the search index.

      This will close the local Lucene index searcher instance.

      Specified by:
      shutDown in interface I_CmsSearchIndex
      Overrides:
      shutDown in class A_CmsSearchIndex
      See Also:
    • toString

      public String toString()
      Returns the name (A_CmsSearchIndex.getName()) of this search index.

      Overrides:
      toString in class Object
      Returns:
      the name (A_CmsSearchIndex.getName()) of this search index
      See Also:
    • appendCategoryFilter

      protected org.apache.lucene.search.BooleanQuery.Builder appendCategoryFilter(CmsObject cms, org.apache.lucene.search.BooleanQuery.Builder filter, List<String> categories)
      Appends the a category filter to the given filter clause that matches all given categories.

      In case the provided List is null or empty, the original filter is left unchanged.

      The original filter parameter is extended and also provided as return value.

      Parameters:
      cms - the current OpenCms search context
      filter - the filter to extend
      categories - the categories that will compose the filter
      Returns:
      the extended filter clause
    • appendDateCreatedFilter

      protected org.apache.lucene.search.BooleanQuery.Builder appendDateCreatedFilter(org.apache.lucene.search.BooleanQuery.Builder filter, long startTime, long endTime)
      Appends a date of creation filter to the given filter clause that matches the given time range.

      If the start time is equal to Long.MIN_VALUE and the end time is equal to Long.MAX_VALUE than the original filter is left unchanged.

      The original filter parameter is extended and also provided as return value.

      Parameters:
      filter - the filter to extend
      startTime - start time of the range to search in
      endTime - end time of the range to search in
      Returns:
      the extended filter clause
    • appendDateLastModifiedFilter

      protected org.apache.lucene.search.BooleanQuery.Builder appendDateLastModifiedFilter(org.apache.lucene.search.BooleanQuery.Builder filter, long startTime, long endTime)
      Appends a date of last modification filter to the given filter clause that matches the given time range.

      If the start time is equal to Long.MIN_VALUE and the end time is equal to Long.MAX_VALUE than the original filter is left unchanged.

      The original filter parameter is extended and also provided as return value.

      Parameters:
      filter - the filter to extend
      startTime - start time of the range to search in
      endTime - end time of the range to search in
      Returns:
      the extended filter clause
    • appendPathFilter

      protected org.apache.lucene.search.BooleanQuery.Builder appendPathFilter(CmsObject cms, org.apache.lucene.search.BooleanQuery.Builder filter, List<String> roots)
      Appends the a VFS path filter to the given filter clause that matches all given root paths.

      In case the provided List is null or empty, the current request context site root is appended.

      The original filter parameter is extended and also provided as return value.

      Parameters:
      cms - the current OpenCms search context
      filter - the filter to extend
      roots - the VFS root paths that will compose the filter
      Returns:
      the extended filter clause
    • appendResourceTypeFilter

      protected org.apache.lucene.search.BooleanQuery.Builder appendResourceTypeFilter(CmsObject cms, org.apache.lucene.search.BooleanQuery.Builder filter, List<String> resourceTypes)
      Appends the a resource type filter to the given filter clause that matches all given resource types.

      In case the provided List is null or empty, the original filter is left unchanged.

      The original filter parameter is extended and also provided as return value.

      Parameters:
      cms - the current OpenCms search context
      filter - the filter to extend
      resourceTypes - the resource types that will compose the filter
      Returns:
      the extended filter clause
    • createDateRangeFilter

      protected org.apache.lucene.search.Query createDateRangeFilter(String fieldName, long startTime, long endTime)
      Creates an optimized date range filter for the date of last modification or creation.

      If the start date is equal to Long.MIN_VALUE and the end date is equal to Long.MAX_VALUE than null is returned.

      Parameters:
      fieldName - the name of the field to search
      startTime - start time of the range to search in
      endTime - end time of the range to search in
      Returns:
      an optimized date range filter for the date of last modification or creation
    • createIndexBackup

      Creates a backup of this index for optimized re-indexing of the whole content.

      Returns:
      the path to the backup folder, or null in case no backup was created
    • createIndexWriter

      protected I_CmsIndexWriter createIndexWriter(boolean create, I_CmsReport report) throws CmsIndexException
      Creates a new index writer.

      Specified by:
      createIndexWriter in class A_CmsSearchIndex
      Parameters:
      create - if true a whole new index is created, if false an existing index is updated
      report - the report
      Returns:
      the created new index writer
      Throws:
      CmsIndexException - in case the writer could not be created
      See Also:
    • extendPathFilter

      protected void extendPathFilter(List<org.apache.lucene.index.Term> terms, String searchRoot)
      Extends the given path query with another term for the given search root element.

      Parameters:
      terms - the path filter to extend
      searchRoot - the search root to add to the path query
    • generateIndexDirectory

      Generates the directory on the RFS for this index.

      Returns:
      the directory on the RFS for this index
    • getMultiTermQueryFilter

      protected org.apache.lucene.search.Query getMultiTermQueryFilter(String field, List<String> terms)
      Returns a cached Lucene term query filter for the given field and terms.

      Parameters:
      field - the field to use
      terms - the term to use
      Returns:
      a cached Lucene term query filter for the given field and terms
    • getMultiTermQueryFilter

      protected org.apache.lucene.search.Query getMultiTermQueryFilter(String field, String terms)
      Returns a cached Lucene term query filter for the given field and terms.

      Parameters:
      field - the field to use
      terms - the term to use
      Returns:
      a cached Lucene term query filter for the given field and terms
    • getMultiTermQueryFilter

      protected org.apache.lucene.search.Query getMultiTermQueryFilter(String field, String termsStr, List<String> termsList)
      Returns a cached Lucene term query filter for the given field and terms.

      Parameters:
      field - the field to use
      termsStr - the terms to use as a String separated by a space ' ' char
      termsList - the list of terms to use
      Returns:
      a cached Lucene term query filter for the given field and terms
    • getResource

      Checks if the OpenCms resource referenced by the result document can be read by the user of the given OpenCms context. Returns the referenced CmsResource or null if the user is not permitted to read the resource.

      Parameters:
      cms - the OpenCms user context to use for permission testing
      doc - the search result document to check
      Returns:
      the referenced CmsResource or null if the user is not permitted
    • getResource

      Checks if the OpenCms resource referenced by the result document can be read by the user of the given OpenCms context. Returns the referenced CmsResource or null if the user is not permitted to read the resource.

      Parameters:
      cms - the OpenCms user context to use for permission testing
      doc - the search result document to check
      filter - the resource filter to apply
      Returns:
      the referenced CmsResource or null if the user is not permitted
    • getTermQueryFilter

      protected org.apache.lucene.search.Query getTermQueryFilter(String field, String term)
      Returns a cached Lucene term query filter for the given field and term.

      Parameters:
      field - the field to use
      term - the term to use
      Returns:
      a cached Lucene term query filter for the given field and term
    • hasReadPermission

      protected boolean hasReadPermission(CmsObject cms, I_CmsSearchDocument doc)
      Checks if the OpenCms resource referenced by the result document can be read be the user of the given OpenCms context.

      Parameters:
      cms - the OpenCms user context to use for permission testing
      doc - the search result document to check
      Returns:
      true if the user has read permissions to the resource
    • indexSearcherClose

      protected void indexSearcherClose()
      Closes the index searcher for this index.

      See Also:
    • indexSearcherClose

      protected void indexSearcherClose(org.apache.lucene.search.IndexSearcher searcher)
      Closes the given Lucene index searcher.

      Parameters:
      searcher - the searcher to close
    • indexSearcherOpen

      protected void indexSearcherOpen(String path)
      Initializes the index searcher for this index.

      In case there is an index searcher still open, it is closed first.

      For performance reasons, one instance of the index searcher should be kept for all searches. However, if the index is updated or changed this searcher instance needs to be re-initialized.

      Parameters:
      path - the path to the index directory
    • indexSearcherUpdate

      protected void indexSearcherUpdate()
      Reopens the index search reader for this index, required after the index has been changed.

      See Also:
    • isInTimeRange

      protected boolean isInTimeRange(org.apache.lucene.document.Document doc, CmsSearchParameters params)
      Checks if the document is in the time range specified in the search parameters.

      The creation date and/or the last modification date are checked.

      Parameters:
      doc - the document to check the dates against the given time range
      params - the search parameters where the time ranges are specified
      Returns:
      true if document is in time range or not time range set otherwise false
    • isSortScoring

      protected boolean isSortScoring(org.apache.lucene.search.IndexSearcher searcher, org.apache.lucene.search.Sort sort)
      Checks if the score for the results must be calculated based on the provided sort option.

      Since Lucene 3 apparently the score is no longer calculated by default, but only if the searcher is explicitly told so. This methods checks if, based on the given sort, the score must be calculated.

      Parameters:
      searcher - the index searcher to prepare
      sort - the sort option to use
      Returns:
      true if the sort option should be used
    • needsPermissionCheck

      protected boolean needsPermissionCheck(I_CmsSearchDocument doc)
      Checks if the OpenCms resource referenced by the result document needs to be checked.

      Parameters:
      doc - the search result document to check
      Returns:
      true if the document needs to be checked false otherwise
    • removeIndexBackup

      protected void removeIndexBackup(String path)
      Removes the given backup folder of this index.

      Parameters:
      path - the backup folder to remove