Apply Mongo full text search but to a limited set of elements

Pedro_Silva1 · March 20, 2024, 12:11pm

Hello everyone,

I’m having troubles finding a query that would resolve my case. So I have a message collection that has room_id,user_id,content and created_at as parameters. What I would like to do is to make mongo text search but I would like to apply that search only to last 5 elements of the collection.

Right now I have this but I don’t think it is the right solution:

last_five_messages = (
                self.message_collection.find({"user_id": user, "room_id": room})
                .sort([("created_at", -1)])
                .limit(5)
            )

            # Step 2: Extract message text from the last 5 messages
            message_texts = [message["message"] for message in last_five_messages]
            text_search_results = self.message_collection.find(
                {
                    "$text": {"$search": escaped_message},
                    "user_id": user,
                    "room_id": room,
                    "message": {"$in": message_texts},
                },
            )

Can someone help me with this please?

I’m relatively new to Mongo so sorry if this isn’t the right place to question about this topic.

amyjian · March 20, 2024, 3:51pm

Hey @Pedro_Silva1 , can you share more about what’s not working for you right now? A few tips I would recommend looking into:

The Aggregation Pipeline allows you to perform complex operations in a single query.

For better performance, we recommend using the $search aggregation stage instead of $text. To use $search, you will need to create a search index, see how to do that here. Once you have a search index, you can use the compound operator to:

filter your search on user_id and room_id using the equals operator)
filter your search on message_texts using the in operator
use the text operator to search for escaped_message

Pedro_Silva1 · March 21, 2024, 12:25pm

Hello @amyjian thank you for the response! So in order to detail it a little further what I would want to do is to filter the collection to check messages of user_id in a room_id and order them by date and limit the query to fetch the last 5 messages the user sent and after that apply the full text search to that 5 messages to check if there is similar content. My objective with this is to check form spam messages. Is it possible to do with the suggestions you gave me?

amyjian · March 21, 2024, 6:24pm

@Pedro_Silva1 Yes, it is possible. Please feel free to give it a try and update here if you are having trouble with the suggestions.

Pedro_Silva1 · April 4, 2024, 4:23pm

Hello again @amyjian I was able to complete this query after subscribing to Mongo Atlas. However I some doubts that I would like to know if you could help me with:

So I’m using the following query to detetct if a user as produced similar content in my chat channel to avoid having spamming users:

pipeline = [
                {
                    "$search": {
                        "index": "content_comparer",
                        "text": {"query": message, "path": "message"},
                        "sort": {"created_at": -1},
                    }
                },
                {"$limit": 5},
                {
                    "$project": {
                        "message": 1,
                        "released": 1,
                        "score": {"$meta": "searchScore"},
                    }
                },
            ]

However I’m finding the results to be to “agressive” since if I write content like “I love coffee” and then “My coffee is great” it would detect the content to be similar. What do you think I should no in this cases? Only show matches with a score above a certain value? If so which value is it ok to choose since I can’t find a way to understand how is the score calculated and what is the range of values available

Pedro_Silva1 · April 4, 2024, 4:26pm

To provide more context I just want to check the following:

Initial phrase → “Join my telegram group”
I want to block similar content spammed next like such things:

“JOIN MY TELEGRAM GROUP”
“JoIn My GrOuP”
“Telegram group join”