Package org.opencms.search
Class CmsSearchSimilarity
- java.lang.Object
-
- org.apache.lucene.search.similarities.Similarity
-
- org.opencms.search.CmsSearchSimilarity
-
public class CmsSearchSimilarity extends org.apache.lucene.search.similarities.Similarity
Reduces the importance of the
factor for thecomputeNorm(FieldInvertState)
field, while keeping the Lucene default for all other fields.CmsSearchField.FIELD_CONTENT
This implementation was added since apparently the default length norm is heavily biased for small documents. In the default, even if a term is found in 2 documents the same number of times, the smaller document (containing less terms) will have a score easily 3x as high as the longer document. Using this implementation the importance of the term number is reduced.
Inspired by Chuck Williams WikipediaSimilarity.
- Since:
- 6.0.0
-
-
Constructor Summary
Constructors Constructor Description CmsSearchSimilarity()
Creates a new instance of the OpenCms search similarity.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description long
computeNorm(org.apache.lucene.index.FieldInvertState state)
Special implementation for "compute norm" to reduce the significance of this factor for the
field, while keeping the Lucene default for all other fields.CmsSearchField.FIELD_CONTENT
boolean
getDiscountOverlaps()
Returns true iff overlap tokens are discounted from the document's length.org.apache.lucene.search.similarities.Similarity.SimScorer
scorer(float boost, org.apache.lucene.search.CollectionStatistics collectionStats, org.apache.lucene.search.TermStatistics... termStats)
void
setDiscountOverlaps(boolean v)
Sets whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm.
-
-
-
Constructor Detail
-
CmsSearchSimilarity
public CmsSearchSimilarity()
Creates a new instance of the OpenCms search similarity.
-
-
Method Detail
-
computeNorm
public final long computeNorm(org.apache.lucene.index.FieldInvertState state)
Special implementation for "compute norm" to reduce the significance of this factor for the
field, while keeping the Lucene default for all other fields.CmsSearchField.FIELD_CONTENT
- Specified by:
computeNorm
in classorg.apache.lucene.search.similarities.Similarity
-
getDiscountOverlaps
public boolean getDiscountOverlaps()
Returns true iff overlap tokens are discounted from the document's length.- Returns:
- true iff overlap tokens are discounted from the document's length.
- See Also:
setDiscountOverlaps(boolean)
-
scorer
public org.apache.lucene.search.similarities.Similarity.SimScorer scorer(float boost, org.apache.lucene.search.CollectionStatistics collectionStats, org.apache.lucene.search.TermStatistics... termStats)
- Specified by:
scorer
in classorg.apache.lucene.search.similarities.Similarity
- See Also:
Similarity.scorer(float, org.apache.lucene.search.CollectionStatistics, org.apache.lucene.search.TermStatistics[])
-
setDiscountOverlaps
public void setDiscountOverlaps(boolean v)
Sets whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.- Parameters:
v
- if true, tokens with position increment 0 are ignored when computing the norm, otherwise they are not ignored.
-
-