Package org.opencms.search
Class CmsSearchSimilarity
java.lang.Object
org.apache.lucene.search.similarities.Similarity
org.opencms.search.CmsSearchSimilarity
Reduces the importance of the
computeNorm(FieldInvertState)
factor
for the CmsSearchField.FIELD_CONTENT
field, while
keeping the Lucene default for all other fields.This implementation was added since apparently the default length norm is heavily biased for small documents. In the default, even if a term is found in 2 documents the same number of times, the smaller document (containing less terms) will have a score easily 3x as high as the longer document. Using this implementation the importance of the term number is reduced.
Inspired by Chuck Williams WikipediaSimilarity.
- Since:
- 6.0.0
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.search.similarities.Similarity
org.apache.lucene.search.similarities.Similarity.SimScorer
-
Constructor Summary
ConstructorDescriptionCreates a new instance of the OpenCms search similarity. -
Method Summary
Modifier and TypeMethodDescriptionfinal long
computeNorm
(org.apache.lucene.index.FieldInvertState state) Special implementation for "compute norm" to reduce the significance of this factor for the
field, while keeping the Lucene default for all other fields.CmsSearchField.FIELD_CONTENT
boolean
Returns true iff overlap tokens are discounted from the document's length.org.apache.lucene.search.similarities.Similarity.SimScorer
scorer
(float boost, org.apache.lucene.search.CollectionStatistics collectionStats, org.apache.lucene.search.TermStatistics... termStats)
-
Constructor Details
-
CmsSearchSimilarity
public CmsSearchSimilarity()Creates a new instance of the OpenCms search similarity.
-
-
Method Details
-
computeNorm
Special implementation for "compute norm" to reduce the significance of this factor for the
field, while keeping the Lucene default for all other fields.CmsSearchField.FIELD_CONTENT
- Specified by:
computeNorm
in classorg.apache.lucene.search.similarities.Similarity
-
getDiscountOverlaps
Returns true iff overlap tokens are discounted from the document's length.- Returns:
- true iff overlap tokens are discounted from the document's length.
- See Also:
-
#setDiscountOverlaps(boolean)
-
scorer
public org.apache.lucene.search.similarities.Similarity.SimScorer scorer(float boost, org.apache.lucene.search.CollectionStatistics collectionStats, org.apache.lucene.search.TermStatistics... termStats) - Specified by:
scorer
in classorg.apache.lucene.search.similarities.Similarity
- See Also:
-
Similarity.scorer(float, org.apache.lucene.search.CollectionStatistics, org.apache.lucene.search.TermStatistics[])
-