Search the documentation
 Show GitHub edit links  Hide GitHub edit links
In OpenCms since: 7.5 Documented since: 9.5 Latest revision for: 9.5.2 Valid for OpenCms: 10.5.3

OpenCms allows you to configure how your content is indexed by the integrated Solr search engine. In particular, you can add information about indexing in the content's schema definition and thereby, on the one hand, define if the content should only be found via the containerpage it resides on or also as content itself, and, on the other hand, you can tell which content elements are mapped to which index fields.

Adjusting the search settings such that they fit to your needs requires some background on the Solr search engine. Whenever you implement your own search functionality it is an important task to adjust the search settings otherwise you will almost definitely get unintended search results.

Why should I adjust the search settings?

OpenCms ships with a default configuration for the indexation of resources. The configuration is optimized w.r.t. the structure of OpenCms resources. So, for example, properties are indexed by default. But, some choices do not have an over-all good default. In particular concerning XML contents, the answers to the following questions are specific to the actual use case:

  • Which schema elements of a structured content should not be indexed at all?
  • Which schema elements of a structured content should be mapped to special index fields (of special type)?
  • Should an XML content be findable as the content itself, or only the container-page the content is located on (or both notIf you do not want to find your content via the container page, there is a setting available in the formatter configuration.)?

The answers to these questions are mainly specific to your use-case. Thus you have to choose the settings by yourself. As guidelines you may regard:

  • If you have a Boolean value or a date stored in your content, then you may not want to index them - at least not in the general content field.
  • If you have detail pages for contents, it is usually intended to find the content as its own. Otherwise this may be not the case.
  • If you want to have facets, you may map schema elements to special fields that you use for faceting.

By default

  • The content of all schema elements of one language version will be stored in one language-specific text field called content_{locale}, e.g., content_en. (Here the dates or Boolean values should usually be omitted - learn later how)
  • The content is indexed such, that it can be found stand-alone (and also the container-page it is located on is findable).

 

Where to adjust the search settings?

You can adjust search settings by adding a <searchsettings> node under xsd:annotation/xsd:appinfo in a content type's schema definition, and for each schema element where special indexing behavior has to be configured, a <searchsetting> sub-node. Examples follow, as we explain the settings in detail.

Possible search settings

The settings can be grouped by

  • setting how to find the content at all
  • setting which information from the content are indexed in which way

Configure how to find the content

How to find an XML content is configured via the attributes of the <searchsettings> node. If you do not add the node at all, it's similar to stating it with the default values.

Attributes of <searchsettings>
containerPageOnly (optional)

If set to true, the content of the schema's type are not findable themselves. But if you search for the content's text (or other values), you'll still find the container pages where the content is placed - unless the used formatter is configured to not index the content. If the attribute is set to false, the contents can also be found as the contents themselves.
Default: false

Configure how schema elements are indexed

By default, OpenCms creates for each resource the same index fields. These are of course not specific to the content's structure. You can look up these fields here. But, if you want to allow a sophisticated search queries for a specific content type, more index fields are needed. For each content element, you can specify the fields to create via a <searchsetting> node.

Add or not add a schema element to the content_{locale} fields.

The index fields content_{locale} are special fields they are intended to represent the language specific content of a resource. For XML contents, you can decide on schema element level if an element should be added to the content_{locale} fields or not. This is done by the optional attribute searchcontent, available for the tag <searchsetting>. The default value is true, indicating that the element's value is added to the content_{locale}. Setting the value to false will prevent the element from being added to the content_{locale} field.

Note that, at the moment, the setting is not applicable to elements of nested contents from the root content's schema. Settings must be made in the schema of the nested content.

Here's an example configuration, telling that a Date schema element should not be added to the content_{locale} field - while Title and Text should be added.

<searchsettings>
   <searchsetting element="Title" /> <!-- searchcontent is true by default -->
   <searchsetting element="Text" searchcontent="true" />
   <searchsetting element="Date" searchcontent="false" />
   <!-- more search settings here -->
</searchsettings>

Simplest way to map a schema element to a search field

For each schema element, you can specify a Solr search field, that should hold the (possibly processed) value of that field. Here is a very simple example:

<searchsettings>
   <searchsetting element="Title">
     <solrfield targetfield="atitle" />
   </searchsetting>
   <!-- more search settings here -->
</searchsettings>

The result of the above configuration is: For each content of the type defined with the schema, an extra language specific index field atitle_{locale} is added, e.g., atitle_en. The fields have type text_{locale}, e.g., text_en.

Specifying the search fields type

If you specify only the attribute targetfield in the node <solrfield>, the target field will have type text_{locale}. But, you can also specify a different field type for the target field, using the attribute sourcefield.

<searchsetting element="Date" searchcontent="true">
     <solrfield targetfield="date" sourcefield="*_dt" />
</searchsetting>

The result of the example configuration is, that a locale specific fields date_{locale}_dt, e.g., date_en_dt, are added to the index, These have type date. Having search fields of different types is important for your search. For example, for a range search concerning dates, fields of type date are required.

Sometimes it is also useful, to have the same thing index more than once and with different types, e.g., as string and as text to allow for phrase search and for search of single words. To index schema elements multiple times, you can use the attribute copyfields.

<searchsetting element="Title" searchcontent="true">
     <solrfield targetfield="atitle" copyfields="*_s" />
   </searchsetting>

In the example, the element Title will be indexed as field of type text_{locale} and as field of type string. The value is copied before processing involved in the index procedure takes place. Thus, in the example, the result are, for the English locale

  • the field atitle_en, that contains the content of the element Title processed as text according to the english locale
  • the field atitle_en_s, that contains a one-to-one copy of the value of the Title element.

In the attribute copyfields you can provide more than one field by separating the respective fields by comma.

Additional attributes of <solrfield> and attribute overview

There are more options to specify the indexing behavior for a content type. In particular, the attributes locale, default and boost can be given for the node <solrfield>. Moreover, since OpenCms 9.5.1, it is possible to add a Solr field not only to the content itself, but also to the containerpages the content is placed on.

Attributes of the <solrfield> element
targetfield (required)

The attribute targetfield defines the name of the Solr field where the mapped content is written to. The actual resulting Solr field name consists of {targetfield}_{locale}. The content is written into a locale-specific text field for each locale the document is defined in. E.g.: If the content is available in the locales English (en) and German (de), the values will be mapped to the field {targetfield}_en and {targetfield}_de. See Section [simple_example] for an example.

locale (optional)

If not provided, a search field is created for every locale the content is defined in. If you specify a specific locale in this parameter, the search field is created only for the provided locale - even if the content exists in other locales as well.

sourcefield (optional)

Use this option to change the type of the search field (see Section [specifying_searchfield_types]). Provide *_{type suffix} as value, e.g., *_dt for dates. The resulting Solr field name will be {targetfield}_{locale}_{sourcefield}. This makes it possible to use other Solr field types than the field types of *_de, *_en, *_fr, ..., (which would otherwise be the destination field of the mapping).

copyfields (optional)

The attribute copyfields is used to duplicate the result of the mapping to other Solr fields. See [specifying_searchfield_types] for an example and also in the Solr wiki.

default (optional)

This attribute sets a default value for the field that is used where the appropriate XML content field is empty.

boost (optional)

Sets a boost to the resulting Solr field. See also in the Solr wiki.

addto (optional)

This attribute specifies to which documents (search index entries) the field should be added. By default, it is added to the document of the XML content. But you can also add it to all containerpages the content is placed on. Set addto="page,element" to add the field to the documents of containerpages and XML content. Choose addto="page" to add it only to containerpages.

 

Adding search fields to containerpages is in particular useful, if you have a facetted search over containerpages. You can build facets over fields that are filled from XML contents - as, for example done for this documentation.

If you add a search field to containerpages, ensure to use a multivalued field. Think about two contents on a page mapping to the same field of the containerpage's document.

Indexing more than schema elements

OpenCms allows you to map more than single schema elements to a search field. The node <solrfield> takes a series of <mapping> sub nodes. In each such sub node, you can specify which value should be mapped to the Solr target field. Values can be the whole content or schema elements, properties and attributes. Moreover, you can specify your own mappings. All values (of all <mapping> nodes) are stored one after another in the (multi-valued) Solr target field.

Here is an example of such a mapping definition:

<searchsetting element="NewInVersion" searchcontent="false">
	<solrfield targetfield="newInVersion" sourcefield="_l" addto="page,element">
		<mapping type="dynamic" class="com.alkacon.opencms.documentation.search.CmsVersionNumberSearchFieldMapping">NewInVersion</mapping>
	</solrfield>
</searchsetting>

In the example, a dynamic mapping is specified. The class com.alkacon.opencms.documentation.search.CmsVersionNumberSearchFieldMapping  (that implements I_CmsSearchFieldMapping) specifies how to convert version numbers to long integers. It takes a parameter string that is specified in the body of the <mapping> node (or alternatively given as param attribute). In general, this parameter is an arbitrary string that has to fit the requirements of your class that implements the dynamic mapping. Concerning the example, the class takes the schema element that contains the version number as string.

Values for the type-attribute of <mapping>-element
item

Map a structured content item to a Solr field.

attribute

Maps a resource attribute to a Solrfield. Possible arguments are dateReleased and dateExpired.

content

Map the XML content to the target field. This value expects no argument.

property

Map the value of a resource property to the Solr target field.

property-search

Search the parents of the resource for the value of the passed resource property and map this value to the Solr target field.

dynamic

Use an instance of the interface I_CmsSearchFieldMapping to map content if the requirements for the field mapping are more "dynamic" than just: static piece of content -> specified field defined in the Solr schema.

You can improve this page

Please contribute your suggestions or comments regarding this topic on our wiki. For support questions, please use the OpenCms mailing list or go for professional support.