org.apache.lucene.search.similarities.Similarity

org.opencms.search.CmsSearchSimilarity

public class CmsSearchSimilarity extends org.apache.lucene.search.similarities.Similarity

Reduces the importance of the computeNorm(FieldInvertState) factor for the CmsSearchField.FIELD_CONTENT field, while keeping the Lucene default for all other fields.

This implementation was added since apparently the default length norm is heavily biased for small documents. In the default, even if a term is found in 2 documents the same number of times, the smaller document (containing less terms) will have a score easily 3x as high as the longer document. Using this implementation the importance of the term number is reduced.

Inspired by Chuck Williams WikipediaSimilarity.

Since:: 6.0.0

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.Similarity.SimScorer
Constructor Summary

Constructors

Constructor

Description

CmsSearchSimilarity()

Creates a new instance of the OpenCms search similarity.
Method Summary

Modifier and Type

Method

Description

final long

computeNorm(org.apache.lucene.index.FieldInvertState state)

Special implementation for "compute norm" to reduce the significance of this factor for the CmsSearchField.FIELD_CONTENT field, while keeping the Lucene default for all other fields.

boolean

getDiscountOverlaps()

Returns true iff overlap tokens are discounted from the document's length.

org.apache.lucene.search.similarities.Similarity.SimScorer

scorer(float boost, org.apache.lucene.search.CollectionStatistics collectionStats, org.apache.lucene.search.TermStatistics... termStats)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- CmsSearchSimilarity
  
  public CmsSearchSimilarity()
  
  Creates a new instance of the OpenCms search similarity.
Method Details
- computeNorm
  
  public final long computeNorm(org.apache.lucene.index.FieldInvertState state)
  
  Special implementation for "compute norm" to reduce the significance of this factor for the CmsSearchField.FIELD_CONTENT field, while keeping the Lucene default for all other fields.
  
  Specified by:
  
  computeNorm in class org.apache.lucene.search.similarities.Similarity
- getDiscountOverlaps
  
  public boolean getDiscountOverlaps()
  
  Returns true iff overlap tokens are discounted from the document's length.
  Returns:
  
  true iff overlap tokens are discounted from the document's length.
  
  See Also:
  
  #setDiscountOverlaps(boolean)
- scorer
  
  public org.apache.lucene.search.similarities.Similarity.SimScorer scorer(float boost, org.apache.lucene.search.CollectionStatistics collectionStats, org.apache.lucene.search.TermStatistics... termStats)
  Specified by:
  
  scorer in class org.apache.lucene.search.similarities.Similarity
  
  See Also:
  
  Similarity.scorer(float, org.apache.lucene.search.CollectionStatistics, org.apache.lucene.search.TermStatistics[])

Class CmsSearchSimilarity

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

CmsSearchSimilarity

Method Details

computeNorm

getDiscountOverlaps

scorer