Class CmsSearchIndex

    • Field Detail

      • BACKUP_REINDEXING

        public static final java.lang.String BACKUP_REINDEXING
        Constant for additional parameter to enable optimized full index regeneration (default: false).
        See Also:
        Constant Field Values
      • DATES

        public static final java.lang.String[] DATES
        Look table to quickly zero-pad days / months in date Strings.
      • DOC_META_FIELDS

        public static final java.lang.String[] DOC_META_FIELDS
        Constant for a field list that contains the "meta" field as well as the "content" field.
      • EXCERPT

        public static final java.lang.String EXCERPT
        Constant for additional parameter to enable excerpt creation (default: true).
        See Also:
        Constant Field Values
      • IGNORE_EXPIRATION

        public static final java.lang.String IGNORE_EXPIRATION
        Constant for additional parameter to enable/disable language detection (default: false).
        See Also:
        Constant Field Values
      • LANGUAGEDETECTION

        public static final java.lang.String LANGUAGEDETECTION
        Constant for additional parameter to enable/disable language detection (default: false).
        See Also:
        Constant Field Values
      • MAX_HITS

        public static final java.lang.String MAX_HITS
        Constant for additional parameter for controlling how many hits are loaded at maximum (default: 1000).
        See Also:
        Constant Field Values
      • PERMISSIONS

        public static final java.lang.String PERMISSIONS
        Constant for additional parameter to enable permission checks (default: true).
        See Also:
        Constant Field Values
      • PRIORITY

        public static final java.lang.String PRIORITY
        Constant for additional parameter to set the thread priority during search.
        See Also:
        Constant Field Values
      • TIME_RANGE

        public static final java.lang.String TIME_RANGE
        Constant for additional parameter to enable time range checks (default: true).
        See Also:
        Constant Field Values
      • VISITOR

        protected static final org.apache.lucene.index.StoredFieldVisitor VISITOR
        A stored field visitor, that does not return the large fields: "content" and "contentblob".

    • Constructor Detail

      • CmsSearchIndex

        public CmsSearchIndex()
        Default constructor only intended to be used by the XML configuration.

        It is recommended to use the constructor CmsSearchIndex(String) as it enforces the mandatory name argument.

    • Method Detail

      • getDateRangeSpan

        public static java.util.List<java.lang.String> getDateRangeSpan​(long startDate,
                                                                        long endDate)
        Generates a list of date terms for the optimized date range search with "daily" granularity level.

        How this works:

        • For each document, terms are added for the year, the month and the day the document was modified or created) in. So for example if a document is modified at February 02, 2009, then the following terms are stored for this document: "20090202", "200902" and "2009".
        • In case a date range search is done, then all possible matches for the provided rage are created as search terms and matched with the document terms.
        • Consider the following use case: You want to find out if a resource has been changed in the time between November 29, 2007 and March 01, 2009. One term to match is simply "2008" because if a document was modified in 2008, then it is clearly in the date range. Other terms are "200712", "200901" and "200902", because all documents modified in these months are also a certain matches. Finally we need to add terms for "20071129", "20071130" and "20090301" to match the days in the starting and final month.
        Parameters:
        startDate - start date of the range to search in
        endDate - end date of the range to search in
        Returns:
        a list of date terms for the optimized date range search
      • createEmptyDocument

        public I_CmsSearchDocument createEmptyDocument​(CmsResource resource)
        Creates an empty document that can be used by this search field configuration.

        Parameters:
        resource - the resource to create the document for
        Returns:
        a new and empty document
      • getAnalyzer

        public org.apache.lucene.analysis.Analyzer getAnalyzer()
        Returns the Lucene analyzer used for this index.

        Returns:
        the Lucene analyzer used for this index
      • getDocument

        public I_CmsSearchDocument getDocument​(int docId)
        Returns a document by document ID.

        Parameters:
        docId - the id to get the document for
        Returns:
        the CMS specific document
      • getDocument

        @Deprecated
        public org.apache.lucene.document.Document getDocument​(java.lang.String rootPath)
        Deprecated.
        Use getDocument(String, String) instead and provide CmsSearchField.FIELD_PATH as field to search in
        Returns the Lucene document with the given root path from the index.

        Parameters:
        rootPath - the root path of the document to get
        Returns:
        the Lucene document with the given root path from the index
      • getDocument

        public I_CmsSearchDocument getDocument​(java.lang.String field,
                                               java.lang.String term)
        Returns the first document where the given term matches the selected index field.

        Use this method to search for documents which have unique field values, like a unique id.

        Parameters:
        field - the field to search in
        term - the term to search for
        Returns:
        the first document where the given term matches the selected index field
      • getLocaleForResource

        public java.util.Locale getLocaleForResource​(CmsObject cms,
                                                     CmsResource resource,
                                                     java.util.List<java.util.Locale> availableLocales)
        Returns the language locale for the given resource in this index.

        Specified by:
        getLocaleForResource in interface I_CmsSearchIndex
        Overrides:
        getLocaleForResource in class A_CmsSearchIndex
        Parameters:
        cms - the current OpenCms user context
        resource - the resource to check
        availableLocales - a list of locales supported by the resource
        Returns:
        the language locale for the given resource in this index
      • getMaxHits

        public int getMaxHits()
        Indicates the number of how many hits are loaded at maximum.

        The number of maximum documents to load from the index must be specified. The default of this setting is MAX_HITS_DEFAULT (5000). This means that at maximum 5000 results are returned from the index. Please note that this number may be reduced further because of OpenCms read permissions or per-user file visibility settings not controlled in the index.

        Returns:
        the number of how many hits are loaded at maximum
        Since:
        7.5.1
      • getPriority

        public int getPriority()
        Returns the Thread priority for this search index.

        Returns:
        the Thread priority for this search index
      • getSearcher

        public org.apache.lucene.search.IndexSearcher getSearcher()
        Returns the Lucene index searcher used for this search index.

        Returns:
        the Lucene index searcher used for this search index
      • isBackupReindexing

        public boolean isBackupReindexing()
        Returns true if backup re-indexing is done by this index.

        This is an optimization method by which the old extracted content is reused in order to save performance when re-indexing.

        Returns:
        true if backup re-indexing is done by this index
        Since:
        7.5.1
      • isCheckingPermissions

        public boolean isCheckingPermissions()
        Returns true if permissions are checked for search results by this index.

        If permission checks are not required, they can be turned off in the index search configuration parameters in opencms-search.xml. Not checking permissions will improve performance.

        This is can be of use in scenarios when you know that all search results are always readable, which is usually true for public websites that do not have personalized accounts.

        Please note that even if a result is returned where the current user has no read permissions, the user can not actually access this document. It will only appear in the search result list, but if the user clicks the link to open the document he will get an error.

        Returns:
        true if permissions are checked for search results by this index
      • isCheckingTimeRange

        public boolean isCheckingTimeRange()
        Returns true if the document time range is checked with a granularity level of seconds for search results by this index.

        Since OpenCms 8.0, time range checks are always done if CmsSearchParameters.setMinDateLastModified(long) or any of the corresponding methods are used. This is done very efficiently using optimized Lucene filers. However, the granularity of these checks are done only on a daily basis, which means that you can only find "changes made yesterday" but not "changes made last hour". For normal limitation of search results, a daily granularity should be enough.

        If time range checks with a granularity level of seconds are required, they can be turned on in the index search configuration parameters in opencms-search.xml. Not checking the time range with a granularity level of seconds will improve performance.

        By default the granularity level of seconds is turned off since OpenCms 8.0

        Returns:
        true if the document time range is checked with a granularity level of seconds for search results by this index
      • isCheckPermissions

        public boolean isCheckPermissions()
        Returns the checkPermissions.

        Returns:
        the checkPermissions
      • isCreatingExcerpt

        public boolean isCreatingExcerpt()
        Returns true if an excerpt is generated by this index.

        If no except is required, generation can be turned off in the index search configuration parameters in opencms-search.xml. Not generating an excerpt will improve performance.

        Returns:
        true if an excerpt is generated by this index
      • isIgnoreExpiration

        public boolean isIgnoreExpiration()
        Returns the ignoreExpiration.

        Returns:
        the ignoreExpiration
      • isRequireViewPermission

        public boolean isRequireViewPermission()
        Returns true if a resource requires read permission to be included in the result list.

        Returns:
        true if a resource requires read permission to be included in the result list
      • setAnalyzer

        public void setAnalyzer​(org.apache.lucene.analysis.Analyzer analyzer)
        Sets the Lucene analyzer used for this index.

        Parameters:
        analyzer - the Lucene analyzer to set
      • setCheckPermissions

        public void setCheckPermissions​(boolean checkPermissions)
        Sets the checkPermissions.

        Parameters:
        checkPermissions - the checkPermissions to set
      • setIgnoreExpiration

        public void setIgnoreExpiration​(boolean ignoreExpiration)
        Sets the ignoreExpiration.

        Parameters:
        ignoreExpiration - the ignoreExpiration to set
      • setMaxHits

        public void setMaxHits​(int maxHits)
        Sets the number of how many hits are loaded at maximum.

        This must be set at least to 50, or this setting is ignored.

        Parameters:
        maxHits - the number of how many hits are loaded at maximum to set
        Since:
        7.5.1
        See Also:
        getMaxHits()
      • setRequireViewPermission

        public void setRequireViewPermission​(boolean requireViewPermission)
        Controls if a resource requires view permission to be displayed in the result list.

        By default this is false.

        Parameters:
        requireViewPermission - controls if a resource requires view permission to be displayed in the result list
      • appendCategoryFilter

        protected org.apache.lucene.search.BooleanQuery.Builder appendCategoryFilter​(CmsObject cms,
                                                                                     org.apache.lucene.search.BooleanQuery.Builder filter,
                                                                                     java.util.List<java.lang.String> categories)
        Appends the a category filter to the given filter clause that matches all given categories.

        In case the provided List is null or empty, the original filter is left unchanged.

        The original filter parameter is extended and also provided as return value.

        Parameters:
        cms - the current OpenCms search context
        filter - the filter to extend
        categories - the categories that will compose the filter
        Returns:
        the extended filter clause
      • appendDateCreatedFilter

        protected org.apache.lucene.search.BooleanQuery.Builder appendDateCreatedFilter​(org.apache.lucene.search.BooleanQuery.Builder filter,
                                                                                        long startTime,
                                                                                        long endTime)
        Appends a date of creation filter to the given filter clause that matches the given time range.

        If the start time is equal to Long.MIN_VALUE and the end time is equal to Long.MAX_VALUE than the original filter is left unchanged.

        The original filter parameter is extended and also provided as return value.

        Parameters:
        filter - the filter to extend
        startTime - start time of the range to search in
        endTime - end time of the range to search in
        Returns:
        the extended filter clause
      • appendDateLastModifiedFilter

        protected org.apache.lucene.search.BooleanQuery.Builder appendDateLastModifiedFilter​(org.apache.lucene.search.BooleanQuery.Builder filter,
                                                                                             long startTime,
                                                                                             long endTime)
        Appends a date of last modification filter to the given filter clause that matches the given time range.

        If the start time is equal to Long.MIN_VALUE and the end time is equal to Long.MAX_VALUE than the original filter is left unchanged.

        The original filter parameter is extended and also provided as return value.

        Parameters:
        filter - the filter to extend
        startTime - start time of the range to search in
        endTime - end time of the range to search in
        Returns:
        the extended filter clause
      • appendPathFilter

        protected org.apache.lucene.search.BooleanQuery.Builder appendPathFilter​(CmsObject cms,
                                                                                 org.apache.lucene.search.BooleanQuery.Builder filter,
                                                                                 java.util.List<java.lang.String> roots)
        Appends the a VFS path filter to the given filter clause that matches all given root paths.

        In case the provided List is null or empty, the current request context site root is appended.

        The original filter parameter is extended and also provided as return value.

        Parameters:
        cms - the current OpenCms search context
        filter - the filter to extend
        roots - the VFS root paths that will compose the filter
        Returns:
        the extended filter clause
      • appendResourceTypeFilter

        protected org.apache.lucene.search.BooleanQuery.Builder appendResourceTypeFilter​(CmsObject cms,
                                                                                         org.apache.lucene.search.BooleanQuery.Builder filter,
                                                                                         java.util.List<java.lang.String> resourceTypes)
        Appends the a resource type filter to the given filter clause that matches all given resource types.

        In case the provided List is null or empty, the original filter is left unchanged.

        The original filter parameter is extended and also provided as return value.

        Parameters:
        cms - the current OpenCms search context
        filter - the filter to extend
        resourceTypes - the resource types that will compose the filter
        Returns:
        the extended filter clause
      • createDateRangeFilter

        protected org.apache.lucene.search.Query createDateRangeFilter​(java.lang.String fieldName,
                                                                       long startTime,
                                                                       long endTime)
        Creates an optimized date range filter for the date of last modification or creation.

        If the start date is equal to Long.MIN_VALUE and the end date is equal to Long.MAX_VALUE than null is returned.

        Parameters:
        fieldName - the name of the field to search
        startTime - start time of the range to search in
        endTime - end time of the range to search in
        Returns:
        an optimized date range filter for the date of last modification or creation
      • createIndexBackup

        protected java.lang.String createIndexBackup()
        Creates a backup of this index for optimized re-indexing of the whole content.

        Returns:
        the path to the backup folder, or null in case no backup was created
      • extendPathFilter

        protected void extendPathFilter​(java.util.List<org.apache.lucene.index.Term> terms,
                                        java.lang.String searchRoot)
        Extends the given path query with another term for the given search root element.

        Parameters:
        terms - the path filter to extend
        searchRoot - the search root to add to the path query
      • generateIndexDirectory

        protected java.lang.String generateIndexDirectory()
        Generates the directory on the RFS for this index.

        Returns:
        the directory on the RFS for this index
      • getMultiTermQueryFilter

        protected org.apache.lucene.search.Query getMultiTermQueryFilter​(java.lang.String field,
                                                                         java.util.List<java.lang.String> terms)
        Returns a cached Lucene term query filter for the given field and terms.

        Parameters:
        field - the field to use
        terms - the term to use
        Returns:
        a cached Lucene term query filter for the given field and terms
      • getMultiTermQueryFilter

        protected org.apache.lucene.search.Query getMultiTermQueryFilter​(java.lang.String field,
                                                                         java.lang.String terms)
        Returns a cached Lucene term query filter for the given field and terms.

        Parameters:
        field - the field to use
        terms - the term to use
        Returns:
        a cached Lucene term query filter for the given field and terms
      • getMultiTermQueryFilter

        protected org.apache.lucene.search.Query getMultiTermQueryFilter​(java.lang.String field,
                                                                         java.lang.String termsStr,
                                                                         java.util.List<java.lang.String> termsList)
        Returns a cached Lucene term query filter for the given field and terms.

        Parameters:
        field - the field to use
        termsStr - the terms to use as a String separated by a space ' ' char
        termsList - the list of terms to use
        Returns:
        a cached Lucene term query filter for the given field and terms
      • getResource

        protected CmsResource getResource​(CmsObject cms,
                                          I_CmsSearchDocument doc)
        Checks if the OpenCms resource referenced by the result document can be read by the user of the given OpenCms context. Returns the referenced CmsResource or null if the user is not permitted to read the resource.

        Parameters:
        cms - the OpenCms user context to use for permission testing
        doc - the search result document to check
        Returns:
        the referenced CmsResource or null if the user is not permitted
      • getResource

        protected CmsResource getResource​(CmsObject cms,
                                          I_CmsSearchDocument doc,
                                          CmsResourceFilter filter)
        Checks if the OpenCms resource referenced by the result document can be read by the user of the given OpenCms context. Returns the referenced CmsResource or null if the user is not permitted to read the resource.

        Parameters:
        cms - the OpenCms user context to use for permission testing
        doc - the search result document to check
        filter - the resource filter to apply
        Returns:
        the referenced CmsResource or null if the user is not permitted
      • getTermQueryFilter

        protected org.apache.lucene.search.Query getTermQueryFilter​(java.lang.String field,
                                                                    java.lang.String term)
        Returns a cached Lucene term query filter for the given field and term.

        Parameters:
        field - the field to use
        term - the term to use
        Returns:
        a cached Lucene term query filter for the given field and term
      • hasReadPermission

        protected boolean hasReadPermission​(CmsObject cms,
                                            I_CmsSearchDocument doc)
        Checks if the OpenCms resource referenced by the result document can be read be the user of the given OpenCms context.

        Parameters:
        cms - the OpenCms user context to use for permission testing
        doc - the search result document to check
        Returns:
        true if the user has read permissions to the resource
      • indexSearcherClose

        protected void indexSearcherClose​(org.apache.lucene.search.IndexSearcher searcher)
        Closes the given Lucene index searcher.

        Parameters:
        searcher - the searcher to close
      • indexSearcherOpen

        protected void indexSearcherOpen​(java.lang.String path)
        Initializes the index searcher for this index.

        In case there is an index searcher still open, it is closed first.

        For performance reasons, one instance of the index searcher should be kept for all searches. However, if the index is updated or changed this searcher instance needs to be re-initialized.

        Parameters:
        path - the path to the index directory
      • isInTimeRange

        protected boolean isInTimeRange​(org.apache.lucene.document.Document doc,
                                        CmsSearchParameters params)
        Checks if the document is in the time range specified in the search parameters.

        The creation date and/or the last modification date are checked.

        Parameters:
        doc - the document to check the dates against the given time range
        params - the search parameters where the time ranges are specified
        Returns:
        true if document is in time range or not time range set otherwise false
      • isSortScoring

        protected boolean isSortScoring​(org.apache.lucene.search.IndexSearcher searcher,
                                        org.apache.lucene.search.Sort sort)
        Checks if the score for the results must be calculated based on the provided sort option.

        Since Lucene 3 apparently the score is no longer calculated by default, but only if the searcher is explicitly told so. This methods checks if, based on the given sort, the score must be calculated.

        Parameters:
        searcher - the index searcher to prepare
        sort - the sort option to use
        Returns:
        true if the sort option should be used
      • needsPermissionCheck

        protected boolean needsPermissionCheck​(I_CmsSearchDocument doc)
        Checks if the OpenCms resource referenced by the result document needs to be checked.

        Parameters:
        doc - the search result document to check
        Returns:
        true if the document needs to be checked false otherwise
      • removeIndexBackup

        protected void removeIndexBackup​(java.lang.String path)
        Removes the given backup folder of this index.

        Parameters:
        path - the backup folder to remove