The score / sorting isn’t working as expected.
Everything which matches the title should have a higher score - which should according to the query also be what’s happening. (At least as far as I understand, but it’s also possible I’m missing something ).
However for some reason the “Body” match has a higher priority than the “Title” match. To me it’s not clear at all why this is happening.
But first let’s see the data and the query:
Database seed
/* 1 */
{
"_id" : ObjectId("62037e3d24fc15277c2bfa07"),
"SubscriptionId" : "b31a0037-5316-4220-958b-1f8c1a4c2759",
"ItemId" : "matched-by-body",
"GroupIds" : [
"default-group-id"
],
"Contents" : {
"en" : {
"Language" : "en",
"Active" : true,
"BodyPlainText" : "I have a dream"
}
}
}
/* 2 */
{
"_id" : ObjectId("62037e3d24fc15277c2bfa08"),
"SubscriptionId" : "b31a0037-5316-4220-958b-1f8c1a4c2759",
"ItemId" : "matched-by-title",
"GroupIds" : [
"default-group-id"
],
"Contents" : {
"en" : {
"Language" : "en",
"Active" : true,
"Title" : "I have a dream"
}
}
}
/* 3 */
{
"_id" : ObjectId("62037e3d24fc15277c2bfa0a"),
"SubscriptionId" : "other-sub",
"ItemId" : "matched-by-title-but-wrong-sub",
"GroupIds" : [
"default-group-id"
],
"Contents" : {
"en" : {
"Language" : "en",
"Active" : true,
"Title" : "I have a dream"
}
}
}
/* 4 */
{
"_id" : ObjectId("62037e3d24fc15277c2bfa0b"),
"SubscriptionId" : "b31a0037-5316-4220-958b-1f8c1a4c2759",
"ItemId" : "matched-by-title-but-wrong-group",
"GroupIds" : [
"nobody-has-access-group"
],
"Contents" : {
"en" : {
"Language" : "en",
"Active" : true,
"Title" : "I have a dream"
}
}
}
/* 5 */
{
"_id" : ObjectId("62037e3d24fc15277c2bfa0c"),
"SubscriptionId" : "b31a0037-5316-4220-958b-1f8c1a4c2759",
"ItemId" : "no-match",
"GroupIds" : [
"default-group-id"
],
"Contents" : {
"en" : {
"Language" : "en",
"Active" : true,
"BodyPlainText" : "So lonely"
}
}
}
The query
db.getCollection('news').aggregate([
{
"$search": {
"index": "my-search",
"compound": {
"filter": [
{
"text": {
"path": "SubscriptionId",
"query": "b31a0037-5316-4220-958b-1f8c1a4c2759"
}
},
{
"text": {
"path": "GroupIds",
"query": [
"default-group-id",
"hr-group-id",
"everybody-is-editor-group"
]
}
}
],
"should": [
{
"phrase": {
"query": "dream",
"path": {
"wildcard": "Contents.*.Title"
},
"score": {
"boost": {
"value": 7
}
}
}
},
{
"phrase": {
"query": "dream",
"path": {
"wildcard": "Contents.*.BodyPlainText"
},
"score": {
"boost": {
"value": 3
}
}
}
},
{
"text": {
"query": "dream",
"path": {
"wildcard": "Contents.*.Title"
},
"score": {
"boost": {
"value": 5
}
}
}
},
{
"text": {
"query": "dream",
"path": {
"wildcard": "Contents.*"
},
"fuzzy": {
"maxEdits": 2
}
}
}
],
"minimumShouldMatch": 1
}
}
},
{
$project: {
ItemId: 1,
"Contents.en.BodyPlainText": 1,
"Contents.en.Title": 1,
"Contents.en.Body": 1,
score: {
$meta: "searchScore"
},
},
}
])
Search index definition:
{
"mappings": {
"dynamic": true,
"fields": {
"GroupIds": [
{
"dynamic": true,
"type": "document"
},
{
"analyzer": "lucene.keyword",
"norms": "omit",
"searchAnalyzer": "lucene.keyword",
"type": "string"
}
],
"SubscriptionId": [
{
"dynamic": true,
"type": "document"
},
{
"analyzer": "lucene.keyword",
"norms": "omit",
"searchAnalyzer": "lucene.keyword",
"type": "string"
}
]
}
}
}
Execute the query
If we execute the query now, we get back the following:
/* 1 */
{
"_id" : ObjectId("62037e3d24fc15277c2bfa07"),
"ItemId" : "matched-by-body",
"Contents" : {
"en" : {
"BodyPlainText" : "I have a dream"
}
},
"score" : 1.10903561115265
}
/* 2 */
{
"_id" : ObjectId("62037e3d24fc15277c2bfa08"),
"ItemId" : "matched-by-title",
"Contents" : {
"en" : {
"Title" : "I have a dream"
}
},
"score" : 0.514158070087433
}
As you can see for some reason the body match has a higher score.
However - and now this is really strange - if we DELETE all other documents apart from the 2 which matches from the db and execute the same query again, we get back the following:
/* 1 */
{
"_id" : ObjectId("62037e3d24fc15277c2bfa08"),
"ItemId" : "matched-by-title",
"Contents" : {
"en" : {
"Title" : "I have a dream"
}
},
"score" : 1.69993948936462
}
/* 2 */
{
"_id" : ObjectId("62037e3d24fc15277c2bfa07"),
"ItemId" : "matched-by-body",
"Contents" : {
"en" : {
"BodyPlainText" : "I have a dream"
}
},
"score" : 0.523058295249939
}
As you can see, suddenly it behaves as expected. But I don’t understand why or how this makes sense.