Skip to content
OpenCms documentation
OpenCms documentation

Search settings

OpenCms allows you to configure how your content is indexed by the integrated Solr search engine. In particular, you can add information about indexing in the content's schema definition and thereby, on the one hand, define if the content should only be found via the containerpage it resides on or also as content itself, and, on the other hand, you can tell which content elements are mapped to which index fields.

Adjusting the search settings such that they fit to your needs requires some background on the Solr search engine. Whenever you implement your own search functionality it is an important task to adjust the search settings. Otherwise you will almost definitely get unintended search results.

OpenCms ships with a default configuration for the indexation of resources. The configuration is optimized with respect to the structure of OpenCms resources. So, for example, properties are indexed by default. But, some choices do not have an over-all good default. In particular concerning XML contents, the answers to the following questions are specific to the actual use case:

  • Which schema elements of a structured content should not be indexed at all?
  • Which schema elements of a structured content should be mapped to special index fields (of special type)?
  • Should an XML content be findable as the content itself, or only the container-page the content is located on (or both notIf you do not want to find your content via the container page, there is a setting available in the formatter configuration.)?

The answers to these questions are mainly specific to your use-case. Thus you have to choose the settings by yourself. As guidelines you may regard:

  • If you have a Boolean value or a date stored in your content, then you may not want to index them - at least not in the general content field.
  • If you have detail pages for contents, it is usually intended to find the content as its own. Otherwise this may be not the case.
  • If you want to have facets, you may map schema elements to special fields that you use for faceting.

By default

  • The content of all schema elements of one language version will be stored in one language-specific text field called content_{locale}, e.g., content_en. (Here the dates or Boolean values should usually be omitted - learn later how)
  • The content is indexed such, that it can be found stand-alone (and also the container-page it is located on is findable).

The easiest way to ajust search settings is via field settings. With field settings you don't have the full flexibility of the search settings syntax explained below, but most use cases are covered. In the <FieldSetting> sub-node <Search>, the following values are allowed:

The field should be included in the content field of the indexed document. This is the same as adding searchcontent="true" to the <searchsetting> node.

The content of the field is not included in the content fields of the indexed document. This is the same as adding searchcontent="false" to the <searchsetting> node.

Use this setting for elements holding dates only. The elements value is indexed in field instancedate_{locale}_dt and instancedatecurrenttill_{locale}_dt.These contain the fields, the integrated list type uses for sorting by date and also the recommended field to use for sorting by date in almost every Solr search. The elements value is not added to the content fields.

Use this setting for elements holding integer values only. The element's value is indexed in field order_{locale}_i. This field is used by the integrated list type to sort by order.

Use this setting for elements holding (title) strings. The elements value is indexed in the field disptitle_{locale}_sort, which is a special field supporting alphabetical sorting. It is used for sorting by title when using the integrated list. Moreover, the elements value is added to the content fields.

Use this setting for elements holding Geo coordinates. The element's value must be either in the form {latitude},{longitude} or it must be the JSON value of the Location picker widget. The value is indexed in the field geocoords_loc, which is a special field for Geo queries.

The special short-cuts for listdate, listorder and listtitle are translated to normal searchsetting nodes via string template as defined here.

By default, OpenCms creates the same index fields for each resource. These are of course not specific to the content's structure. You can look up these fields here. But, if you want to allow sophisticated search queries for a specific content type, more index fields can be configured. Configuration is done with the <searchsettings> element and it's child elements as explained below.

If the <searchsettings> element is not present at all, the internal default configurations are applied to the content.

If set to true, the contents of the schema's type are not findable themselves. But if you search for the content's text (or other values), you'll still find the container pages where the content is placed - unless the used formatter is configured to not index the content. If the attribute is set to false, the contents can also be found as the contents themselves. Default is false.

<searchsettings containerPageOnly="false">

The index fields content_{locale} are special fields. They collect all textual data of one language of a content into a common field and so to say represent the language specific content of a resource. For each content element one can decide if the textual data should be added to the content_{locale} fields or not. This is done by the optional attribute searchcontent, available for the <searchsetting> tag. The default value is true indicating that the element's value is added to the content_{locale} fields.

Here's an example configuration, where Date is not added to the content_{locale} field whilst Title and Text are added.

<searchsettings>
   <searchsetting element="Title" /> <!-- searchcontent is true by default -->
   <searchsetting element="Text" searchcontent="true" />
   <searchsetting element="Date" searchcontent="false" />
   <!-- more search settings here -->
</searchsettings>
The searchcontent attribute can be set in nested and root schema, where the root schema's value is overwritten by the value from the nested schema.

For each schema element, one can define additional Solr fields that store the (possibly processed) value of that element. The result of the example below is the following: for each locale, and extra Solr field atitle_{locale} , e.g. article_en, is added and filled with the locale specific value of the content's Title field.

<searchsettings>
   <searchsetting element="Title">
     <solrfield targetfield="atitle" />
   </searchsetting>
   <!-- more search settings here -->
</searchsettings>

If you specify only the attribute targetfield in the node <solrfield>, the target field will have type text_{locale}. But, you can also specify a different type for the target field, using the attribute sourcefield.

<searchsetting element="Date" searchcontent="true">
     <solrfield targetfield="date" sourcefield="*_dt" />
</searchsetting>

The result of the example configuration is, that a locale specific field date_{locale}_dt, e.g. date_en_dt, is added to the index for each locale available. The *_dt definition causes all the fields to be of type date. The fields can be used to realise a date range search, for example.

Sometimes it is useful to have the same element indexed more than once and with different types, e.g., as string and as text to allow for phrase search and for search of single words. To index schema elements multiple times, copyfields attribute can be used.

<searchsetting element="Title" searchcontent="true">
     <solrfield targetfield="atitle" copyfields="*_s" />
</searchsetting>

In the example, the element Title will be indexed as field of type text_{locale} (default) and additionally as field of type string due to the *_s notation. For the english locale, for example, the result would be the following:

  • the field atitle_en, that contains the content of the Title element processed as text according to the english locale
  • the field atitle_en_s, that contains a one-to-one copy of the value of the Title element

In the copyfields attribute one can provide more than one field by separating the respective fields by comma.

There are more options to specify the indexing behavior for a content type. In particular, the attributes locale, default and boost can be set for <solrfield>. Moreover, it is possible to add a Solr field not only to the content itself, but also to the containerpages the content is placed on.

The attribute targetfield defines the name of the Solr field where the mapped content is written to. The actual resulting Solr field name has the form {targetfield}_{locale}. The content is written into a locale-specific text field for each locale the document is defined in. E.g.: If the content is available in the locales English (en) and German (de), the values will be mapped to the fields {targetfield}_en and {targetfield}_de.

If not provided, a search field is created for every locale the content is defined in. If you specify a specific locale in this parameter, the search field is created for the provided locale only—even if the content exists in other locales as well.

Use this option to change the type of the search field. Provide *_{type suffix} as value, e.g., *_dt for dates. The resulting Solr field name will be {targetfield}_{locale}_{sourcefield}. This makes it possible to use other Solr field types than the field types of *_de, *_en, *_fr, ..., (which would otherwise be the destination field of the mapping).

The attribute copyfields is used to duplicate the result of the mapping to other Solr fields. See also in the Solr wiki.

This attribute sets a default value for the field that is used where the appropriate XML content field is empty.

Sets a boost to the resulting Solr field. See also in the Solr wiki.

This attribute specifies to which documents (search index entries) the field should be added. By default, it is added to the document of the XML content. But you can also add it to all containerpages the content is placed on. Set addto="page,element" to add the field to the documents of containerpages and XML content. Choose addto="page" to add it only to containerpages.

Adding search fields to containerpages is in particular useful, if you have a facetted search over containerpages. You can build facets over fields that are filled from XML contents - as, for example done for this documentation.

If you add a search field to containerpages, ensure to use a multivalued field. Think about two contents on a page mapping to the same field of the containerpage's document.

OpenCms allows you to map more than single schema elements to a search field. The node <solrfield> takes a series of <mapping> sub nodes. In each such sub node, you can specify which value should be mapped to the Solr target field. Values can be the whole content or schema elements, properties and attributes. Moreover, you can specify your own mappings. All values (of all <mapping> nodes) are stored one after another in the (multi-valued) Solr target field.

Note that the mappings for elements of nested contents can only be defined in the root content's schema document, and not in the nested schema document. 

Here is an example of such a mapping definition:

<searchsetting element="NewInVersion" searchcontent="false">
	<solrfield targetfield="newInVersion" sourcefield="_l" addto="page,element">
		<mapping type="dynamic" class="com.alkacon.opencms.documentation.search.CmsVersionNumberSearchFieldMapping">NewInVersion</mapping>
	</solrfield>
</searchsetting>

In the example, a dynamic mapping is specified. The class com.alkacon.opencms.documentation.search.CmsVersionNumberSearchFieldMapping (that implements I_CmsSearchFieldMapping) specifies how to convert version numbers to long integers. It takes a parameter string that is specified in the body of the <mapping> node (or alternatively given as param attribute). In general, this parameter is an arbitrary string that has to fit the requirements of your class that implements the dynamic mapping. Concerning the example, the class takes the schema element that contains the version number as string.

Map a structured content item to a Solr field.

Maps a resource attribute to a Solrfield. Possible arguments are dateReleased and dateExpired.

Map the XML content to the target field. This value expects no argument.

Map the value of a resource property to the Solr target field.

Search the parents of the resource for the value of the passed resource property and map this value to the Solr target field.

Use an instance of the interface I_CmsSearchFieldMapping to map content if the requirements for the field mapping are more "dynamic" than just: static piece of content -> specified field defined in the Solr schema.