Search settings
OpenCms allows you to configure how your content is indexed by the integrated Solr search engine. In particular, you can add information about indexing in the content's schema definition and thereby, on the one hand, define if the content should only be found via the containerpage it resides on or also as content itself, and, on the other hand, you can tell which content elements are mapped to which index fields.
Adjusting the search settings such that they fit to your needs requires some background on the Solr search engine. Whenever you implement your own search functionality it is an important task to adjust the search settings otherwise you will almost definitely get unintended search results.
Why should I adjust the search settings?
OpenCms ships with a default configuration for the indexation of resources. The configuration is optimized w.r.t. the structure of OpenCms resources. So, for example, properties are indexed by default. But, some choices do not have an over-all good default. In particular concerning XML contents, the answers to the following questions are specific to the actual use case:
- Which schema elements of a structured content should not be indexed at all?
- Which schema elements of a structured content should be mapped to special index fields (of special type)?
- Should an XML content be findable as the content itself, or only the container-page the content is located on (or both notIf you do not want to find your content via the container page, there is a setting available in the formatter configuration.)?
The answers to these questions are mainly specific to your use-case. Thus you have to choose the settings by yourself. As guidelines you may regard:
- If you have a Boolean value or a date stored in your content, then you may not want to index them - at least not in the general content field.
- If you have detail pages for contents, it is usually intended to find the content as its own. Otherwise this may be not the case.
- If you want to have facets, you may map schema elements to special fields that you use for faceting.
By default
- The content of all schema elements of one language version will be stored in one language-specific text field called
content_{locale}
, e.g.,content_en
. (Here the dates or Boolean values should usually be omitted - learn later how) - The content is indexed such, that it can be found stand-alone (and also the container-page it is located on is findable).
Where to adjust the search settings?
You can adjust search settings by adding a <searchsettings>
node under xsd:annotation/xsd:appinfo
in a content type's schema definition, and for each schema element where special indexing behavior has to be configured, a <searchsetting>
sub-node. Examples follow, as we explain the settings in detail.
A trimmed-down version to ajust the search settings is via the field settings, introduced in OpenCms 11. With field settings you don't have the full flexibility, but most use cases are covered.
Configure how to find the content
How to find an XML content is configured via the attributes of the <searchsettings>
node. If you do not add the node at all, it's similar to stating it with the default values.
Attributes of <searchsettings>
-
containerPageOnly (optional)
If set to
true
, the content of the schema's type are not findable themselves. But if you search for the content's text (or other values), you'll still find the container pages where the content is placed - unless the used formatter is configured to not index the content. If the attribute is set tofalse
, the contents can also be found as the contents themselves.
Default:false
Configure how schema elements are indexed
By default, OpenCms creates for each resource the same index fields. These are of course not specific to the content's structure. You can look up these fields here. But, if you want to allow a sophisticated search queries for a specific content type, more index fields are needed. For each content element, you can specify the fields to create via a <searchsetting>
node.
Short-cut syntax for the most relevant search settings
Since OpenCms 11, the field settings syntax provides for a simple notification for the most relevant search settings. The basic field settings syntax is explained here. In the <FieldSetting>
sub-node <Search>
, the following values are allowed:
-
true
The field should be included in the content field of the indexed document. This is the same as adding
searchcontent="true"
to the<searchsetting>
node.-
false
The content of the field is not included in the content fields of the indexed document. This is the same as adding
searchcontent="false"
to the<searchsetting>
node.-
listdate
Use this setting for elements holding dates only. The elements value is indexed in field
instancedate_{locale}_dt
andinstancedatecurrenttill_{locale}_dt
.These contain the fields, the integrated list type uses for sorting by date and also the recommended field to use for sorting by date in almost every Solr search. The elements value is not added to the content fields.-
listorder
Use this setting for elements holding integer values only. The element's value is indexed in field
order_{locale}_i
. This field is used by the integrated list type to sort by order.-
listtitle
Use this setting for elements holding (title) strings. The elements value is indexed in the field
disptitle_{locale}_sort
, which is a special field supporting alphabetical sorting. It is used for sorting by title when using the integrated list. Moreover, the elements value is added to the content fields.-
listgeocoords
Use this setting for elements holding Geo coordinates. The element's value must be either in the form
{latitude},{longitude}
or it must be the JSON value of the Location picker widget. The value is indexed in the fieldgeocoords_loc
, which is a special field for Geo queries.
Add or not add a schema element to the content_{locale}
fields.
The index fields content_{locale}
are special fields they are intended to represent the language specific content of a resource. For XML contents, you can decide on schema element level if an element should be added to the content_{locale}
fields or not. This is done by the optional attribute searchcontent
, available for the tag <searchsetting>
. The default value is true
, indicating that the element's value is added to the content_{locale}
. Setting the value to false
will prevent the element from being added to the content_{locale}
field.
element="root element/nested element"
. The "searchcontent" attribute can be set in nested and root schema, where the root schema's value is overwrites the value from the nested schema.Here's an example configuration, telling that a Date
schema element should not be added to the content_{locale}
field - while Title
and Text
should be added.
Simplest way to map a schema element to a search field
For each schema element, you can specify a Solr search field, that should hold the (possibly processed) value of that field. Here is a very simple example:
<searchsettings>
<searchsetting element="Title">
<solrfield targetfield="atitle" />
</searchsetting>
<!-- more search settings here -->
</searchsettings>
The result of the above configuration is: For each content of the type defined with the schema, an extra language specific index field atitle_{locale}
is added, e.g., atitle_en
. The fields have type text_{locale}
, e.g., text_en
.
Specifying the search fields type
If you specify only the attribute targetfield
in the node <solrfield>
, the target field will have type text_{locale}
. But, you can also specify a different field type for the target field, using the attribute sourcefield
.
<searchsetting element="Date" searchcontent="true">
<solrfield targetfield="date" sourcefield="*_dt" />
</searchsetting>
The result of the example configuration is, that a locale specific fields date_{locale}_dt
, e.g., date_en_dt
, are added to the index, These have type date
. Having search fields of different types is important for your search. For example, for a range search concerning dates, fields of type date
are required.
Sometimes it is also useful, to have the same thing index more than once and with different types, e.g., as string
and as text
to allow for phrase search and for search of single words. To index schema elements multiple times, you can use the attribute copyfields
.
<searchsetting element="Title" searchcontent="true">
<solrfield targetfield="atitle" copyfields="*_s" />
</searchsetting>
In the example, the element Title will be indexed as field of type text_{locale}
and as field of type string
. The value is copied before processing involved in the index procedure takes place. Thus, in the example, the result are, for the English locale
- the field
atitle_en
, that contains the content of the elementTitle
processed as text according to the english locale - the field
atitle_en_s
, that contains a one-to-one copy of the value of theTitle
element.
In the attribute copyfields
you can provide more than one field by separating the respective fields by comma.
Additional attributes of <solrfield>
and attribute overview
There are more options to specify the indexing behavior for a content type. In particular, the attributes locale
, default
and boost
can be given for the node <solrfield>
. Moreover, since OpenCms 9.5.1, it is possible to add a Solr field not only to the content itself, but also to the containerpages the content is placed on.
Attributes of the <solrfield>
element
targetfield (required) |
The attribute |
locale (optional) |
If not provided, a search field is created for every locale the content is defined in. If you specify a specific locale in this parameter, the search field is created only for the provided locale - even if the content exists in other locales as well. |
sourcefield (optional) |
Use this option to change the type of the search field (see Section [specifying_searchfield_types]). Provide |
copyfields (optional) |
The attribute |
default (optional) |
This attribute sets a default value for the field that is used where the appropriate XML content field is empty. |
boost (optional) |
Sets a boost to the resulting Solr field. See also in the Solr wiki. |
addto (optional) |
This attribute specifies to which documents (search index entries) the field should be added. By default, it is added to the document of the XML content. But you can also add it to all containerpages the content is placed on. Set
Adding search fields to containerpages is in particular useful, if you have a facetted search over containerpages. You can build facets over fields that are filled from XML contents - as, for example done for this documentation. If you add a search field to containerpages, ensure to use a multivalued field. Think about two contents on a page mapping to the same field of the containerpage's document. |
Indexing more than schema elements
OpenCms allows you to map more than single schema elements to a search field. The node <solrfield>
takes a series of <mapping>
sub nodes. In each such sub node, you can specify which value should be mapped to the Solr target field. Values can be the whole content or schema elements, properties and attributes. Moreover, you can specify your own mappings. All values (of all <mapping>
nodes) are stored one after another in the (multi-valued) Solr target field.
Here is an example of such a mapping definition:
<searchsetting element="NewInVersion" searchcontent="false">
<solrfield targetfield="newInVersion" sourcefield="_l" addto="page,element">
<mapping type="dynamic" class="com.alkacon.opencms.documentation.search.CmsVersionNumberSearchFieldMapping">NewInVersion</mapping>
</solrfield>
</searchsetting>
In the example, a dynamic mapping is specified. The class com.alkacon.opencms.documentation.search.CmsVersionNumberSearchFieldMapping
(that implements I_CmsSearchFieldMapping
) specifies how to convert version numbers to long integers. It takes a parameter string that is specified in the body of the <mapping>
node (or alternatively given as param
attribute). In general, this parameter is an arbitrary string that has to fit the requirements of your class that implements the dynamic mapping. Concerning the example, the class takes the schema element that contains the version number as string.
Values for the type
-attribute of <mapping>
-element
-
item
Map a structured content item to a Solr field.
-
attribute
Maps a resource attribute to a Solrfield. Possible arguments are
dateReleased
anddateExpired
.-
content
Map the XML content to the target field. This value expects no argument.
-
property
Map the value of a resource property to the Solr target field.
-
property-search
Search the parents of the resource for the value of the passed resource property and map this value to the Solr target field.
-
dynamic
Use an instance of the interface
I_CmsSearchFieldMapping
to map content if the requirements for the field mapping are more "dynamic" than just: static piece of content -> specified field defined in the Solr schema.