Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Introducing MongoDB 8.0, the fastest MongoDB ever!
MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
MongoDBchevron-right

Aperol Spritz Summer With MongoDB Geospatial Queries & Vector Search

Anaiya Raisinghani13 min read • Published Aug 22, 2024 • Updated Aug 22, 2024
AIPythonMongoDB
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
It’s summer in New York City and you know what that means: It’s the season of the spritz! There is nothing (and I fully, truly, 110% mean nothing) better than a crisp Aperol spritz to end a day that was so hot and muggy that the subway was interchangeable from a sauna.
While I normally love adventuring through the city in search of what will perfectly fulfill my current craving, there are certain months when I refuse to spend any more time than necessary moving around outdoors (hello, heatwave?!). At night during an NYC summer, we are lounging — lounging on rooftops, terraces, and sidewalks — wherever we can fit. And with minimal movement, we want our Aperol spritzes as close as possible. So, let’s use MongoDB geospatial queries, MongoDB Atlas Vector Search, and the Google Places API to find our closest spritz locations in the West Village neighborhood of New York City while using semantic search to help us get the most out of our queries.
In this tutorial, we will use the various platforms listed above to find all the locations selling Aperol spritzes in the West Village neighborhood of New York City, ones that match our semantic query of being outdoors with quick service (we need those spritzes and need them NOW!), and the one closest to our starting location.
Before we begin the tutorial, let’s go over some of the important platforms we will be using on our journey.

What are MongoDB geospatial queries?

MongoDB geospatial queries allow you to search your database based on geographical locations! This means you are able to find different locations such as restaurants, parks, museums, etc. based just on their coordinates. In this tutorial, we will use MongoDB geospatial queries to search the locations of places that serve Aperol spritzes that we sourced from Google’s Places API. To use geospatial queries properly with MongoDB, we will need to ensure our data points are loaded in GeoJSON format. More on that below!

What is MongoDB Atlas Vector Search?

MongoDB Atlas Vector Search is a way of searching through your database semantically, or by meaning. This means instead of searching based on specific keywords or exact text phrases, you can retrieve results even if a word is spelled wrong, or retrieve results based on synonyms. This will integrate fabulously with our tutorial because we can search through the reviews we retrieve from our Google Places API and see which ones match closest to what we’re looking for. Let’s go!

Pre-requisites

To be successful with this tutorial, you will need:
  1. The IDE of your choosing — this tutorial uses a Google Colab notebook. Please feel free to run your commands directly from a notebook.
  2. A MongoDB Atlas account.
  3. A MongoDB Atlas cluster — the free tier will work perfectly.
  4. A Google Cloud Platform account — please create an account and a project. We will go through this together.
  5. A Google Cloud Platform API key.
  6. An OpenAI API key — this is how we will embed our location reviews so we can use MongoDB Atlas Vector Search!
Once your MongoDB Atlas cluster has been provisioned and you have everything else written down in a secure spot, you’re ready to begin. Please also ensure you have allowed "Access From Anywhere" in your MongoDB cluster, under "Network Access". This is not recommended for production, but it is used in this tutorial for ease of reference. Without this in place, you will not be able to write to your MongoDB cluster.

Set up your Google Cloud project

Our first step is to create a project inside of our Google Cloud account. This is so we can ensure the use of the Google Places API to find all locations that serve Aperol spritzes in the West Village.
This is what your project will look like once it’s been created. Please make sure to set up your billing account information on the left-hand side of the screen. You can set up a free trial for $300 worth of credits, so if you’re trying out this tutorial, please feel free to do that and save some money!
Google Cloud account setup
Once your account is set up, let’s enable the Google Places API that we are going to be using. You can do this through the same link to set up your Google Cloud project.
This is the API we want to use: Google Places API
Hit the Enable button and a popup will come up with your API key. Store it somewhere safe since we will be using it in our tutorial! Make sure to not lose it or expose it anywhere.
With every Places API request made, your API key must be used. You can find out more from the documentation.
Once that’s in place, we can get started on our tutorial.

Imports and API key setup

Now, head over to your Google Colab notebook.
We want to install googlemaps and openai in our notebook since these are necessary for us when building this tutorial.
1!pip install googlemaps
2!pip install openai==0.28
Then, define and run your imports:
1import googlemaps
2import getpass
3import openai
We are going to use the getpass library to keep our API keys secret.
Set it up for your Google API key and your OpenAI API key:
1# google API Key
2google_api_key = getpass.getpass(prompt= "Put in Google API Key here")
3map_client = googlemaps.Client(key=google_api_key)
4# openAI API Key
5openai_api_key = getpass.getpass(prompt= "Put in OpenAI API Key here")

Vector Search embedding function setup

Now, let's set ourselves up for Vector Search success. First, set your key and then establish our embedding function. For this tutorial, we are using OpenAI's "text-embedding-3-small" embedding model. We are going to be embedding the reviews of our spritz locations so we can make some judgments on where to go!
1# set your key
2openai.api_key = openai_api_key
3
4# embedding model we are using
5EMBEDDING_MODEL = "text-embedding-3-small"
6
7# our embedding function
8def get_embedding(text):
9 response = openai.Embedding.create(input=text, model=EMBEDDING_MODEL)
10 return response['data'][0]['embedding']

Nearby search method in Google Places API

When using Nearby Search in our Google Places API, we are required to set up three parameters: location, radius, and keyword. For our location, we can find our starting coordinates (the very middle of the West Village) by right-clicking on Google Maps and copying the coordinates to our clipboard. This is how I got the coordinates shown below: How to find our coordinates
For our radius, we have to have it in meters. Since I’m not very savvy with meters, let’s write a small function to help us make that conversion.
1# for Google Maps API we need to use a radius in meters. Let's first change our miles to meters
2def miles_to_meters(miles):
3 return miles * 1609.344
Our keyword will just be what we’re hoping to find from the Google Places API: Aperol spritzes!
1middle_of_west_village = (40.73490473393682, -74.00521094160642)
2search_radius = miles_to_meters(0.4) # West Village is small so just do less than half a mile.
3spritz_finder = 'aperol spritz'
We can then make our API call using the places_nearby method.
1# making the API call using our places_nearby method and our parameters
2response = map_client.places_nearby(
3 location=middle_of_west_village,
4 radius=search_radius,
5 keyword=spritz_finder
6)
Before we can go ahead and print out our locations, let’s think about our end goal. We want to achieve a couple of things before we insert our documents into our MongoDB Atlas cluster. We want to:
  1. Get detailed information about our locations, so we need to make another API call to get our place_id, the location name, our formatted_address, the geometry for our coordinates, some reviews (only up to five), and the location rating. You can find more fields to return (if your heart desires!) from the Nearby Search documentation.
  2. Embed our reviews for each location using our embedding function. We want to make sure that we have a field for these so our vectors are stored in an array inside our cluster. We are choosing to embed here just to make things easier for ourselves in the long run. Let’s also join the five reviews together into one string to make things a bit easier on the embedding.
  3. Think about how our coordinates are set up, while we’re creating a dictionary with all the important information we want to portray. MongoDB geospatial queries require GeoJSON objects. This means we need to make sure we have the proper format, or else we won’t be able to use our geospatial queries operators later. We also need to keep in mind that the longitude and latitude are stored in a nested array underneath geometry and location inside the Google Places API. So, unfortunately, we cannot just access it from the top level. We need to work some magic first. Here is an example of the output that I copied from their documentation showing where the latitude and longitude are nested:
1{
2"html_attributions": [],
3"results":
4 [
5 {
6 "business_status": "OPERATIONAL",
7 "geometry":
8 {
9 "location": { "lat": -33.8587323, "lng": 151.2100055 },
10 "viewport":
11 {
12 "northeast":
13 { "lat": -33.85739847010727, "lng": 151.2112436298927 },
14 "southwest":
15 { "lat": -33.86009812989271, "lng": 151.2085439701072 },
16 },
With all this in mind, let’s get to it!
1# find information we want: use the Nearby Places documentation to figure out which fields you want
2spritz_locations = []
3for location in response.get('results', []):
4 location_detail = map_client.place(
5 place_id=location['place_id'], fields=['name', 'formatted_address', 'geometry', 'reviews', 'rating']
6 )
7
8
9 # these are the specific details we want to be saved as fields in our documents
10 details = location_detail.get('result', {})
11
12
13 # we want to embed the five reviews so lets extract and join together
14 location_reviews = details.get('reviews', [])
15 store_reviews = [review['text'] for review in location_reviews[:5]]
16 joined_reviews = " ".join(store_reviews)
17
18
19 # generate embedding on your reviews
20 embedding_reviews = get_embedding(joined_reviews)
21
22
23 # we know that the longitude and latitude is nested inside Geometry and Location.
24 # so let's grab it using .get and then format it how we want.
25 geometry = details.get('geometry', {})
26 location = geometry.get('location', {})
27
28
29 # both are nested under location so open it up
30 longitude = location.get('lng')
31 latitude = location.get('lat')
32
33
34 location_info = {
35 'name': details.get('name'),
36 'address': details.get('formatted_address'),
37
38
39 # MongoDB geospatial queries require GeoJSON formatting
40 'location': {
41 'type': 'Point',
42 'coordinates': [longitude, latitude]
43 },
44 'rating': details.get('rating'),
45 'reviews': store_reviews,
46 'embedding': embedding_reviews
47 }
48 spritz_locations.append(location_info)
Let’s print out our output and see what our spritz locations in the West Village neighborhood are! Let’s also check and make sure that we have a newly developed embedding field with our reviews embedded:
1# print our spritz information
2for location in spritz_locations:
3 print(f"Name: {location['name']}, Address: {location['address']}, Coordinates: {location['location']}, Rating: {location['rating']}, Reviews: {location['reviews']}, Embedding: {location['embedding']}")
Our proper output
So, if I scroll over in my notebook, I can see there are embeddings, but I will prove they are there once we insert our data into MongoDB Atlas since it’s a bit hard to capture in a single picture.
Let’s insert them using the pymongo library.

Insert documents into MongoDB Atlas cluster

First, let’s install pymongo.
1# install pymongo
2!pip install pymongo
Now, set up our MongoDB connection. To do this, please make sure you have your connection string.
Please keep in mind that you can name your database and collection anything you like, since it won’t be created until we write in our data. I am naming my database “spritz_summer” and my collection “spritz_locations_WV”. Run the code block below to insert your documents into your cluster:
1from pymongo import MongoClient
2
3# set up your MongoDB connection
4connection_string = getpass.getpass(prompt= "Enter connection string WITH USER + PASS here")
5client = MongoClient(connection_string)
6
7# name your database and collection anything you want since it will be created when you enter your data
8database = client['spritz_summer']
9collection = database['spritz_locations_WV']
10
11# insert our spritz locations
12collection.insert_many(spritz_locations)
Go ahead and double-check that everything was written in correctly in MongoDB Atlas: Our documents in MongoDB Atlas
Make sure to double-check that your embedding field exists and that it’s an array of 1536, and please make sure your coordinates are properly configured the way mine are in the image.

Which comes first, vector search or geospatial queries?

Great question! Since both of these — if we’re looking at them simply from an aggregation pipeline operator — need to be the first stage in their pipelines, instead of making one pipeline, we can do a little loophole and create two. But how will we decide which one to do first?!
When I’m using Google Maps to figure out where to go, I normally first search for what I’m craving, and then I see how far away it is from where I currently am. So let’s keep that mindset and start off with MongoDB Atlas Vector Search. But, I understand that intuitively, some of you might prefer to search via all nearby locations and then semantically search (geospatial queries first and then vector search), so let’s highlight that method as well below.
We have a couple of steps here. Our first step is to create a Vector Search Index. Please do this inside of MongoDB Atlas by following the Vector Search documentation.
Please keep in mind that your index is not run in your script. It lives in your cluster. You’ll know it’s ready to go when it turns green and is activated.
1# create a Vector Search Index so we can use it
2{
3"fields": [
4 {
5 "numDimensions": 1536,
6 "path": "embedding",
7 "similarity": "cosine",
8 "type": "vector"
9 }
10]
11}
Once it’s activated, let’s get to vector searching!
So. Let’s say I just finished dinner with my besties at our favorite restaurant in the West Village, Balaboosta. The food was great, it’s a summer night, we’re in the mood for post-dinner spritzes outside, and we would prefer to be seated quickly. Let’s see if we can find a spot!
Our first step in building our pipeline is to embed our query. We cannot compare text to vectors; we have to compare vectors to vectors. We can do this with only a couple of lines since we are using the same embedding model that we embedded our reviews with:
1# You have to embed your queries just the same way you embedded your documents.
2# my query
3query_description = "outdoor seating quick service"
4
5# we need to embed the query as well, since our documents are embedded
6query_vector = get_embedding(query_description)
Now, let’s build out our aggregation pipeline. Since we are going to be using a $geoNear in our pipeline next, we want to keep the IDs found from this aggregation pipeline so we don’t search through everything — we only search through our sample size. For now, make sure your $vectorSearch stage is at the very top!
1spritz_near_me_vector = [
2 {
3 '$vectorSearch': {
4 'index': 'vector_index',
5 'path': 'embedding',
6 'queryVector': query_vector,
7 'numCandidates': 15,
8 'limit': 5
9 }
10 },
11 {
12 "$project": {
13 "_id": 1, # we want to keep this in place so we can search again using GeoNear
14 "name": 1,
15 "rating": 1,
16 "reviews": 1
17 #"address": 1,
18 #"location": 1,
19 #"embedding": 1
20 }
21 }
22]
Let’s print out our results and see what happens from our query of “outdoor seating quick service”:
1spritz_near_me_vector_results = list(collection.aggregate(spritz_near_me_vector))
2for result in spritz_near_me_vector_results:
3 print(result)
Output from printing our $vectorSearch aggregation pipeline
We have five fantastic options! If we go and read through the reviews, we can see they align with what we’re looking for. Here is one example: One example review aligning with what we’re looking for
Let’s go ahead and save the IDs from our pipeline above in a simple line so we can specify that we only want to use our $geoNear operator on these five:
1# now, we want to take the _ids from our above pipeline so we can use it to geo search
2spritz_near_me_ids = [result['_id'] for result in spritz_near_me_vector_results]
3print(spritz_near_me_ids)
Now that they’re saved, we can build out our $geoNear pipeline and see which one of these options is closest to us from our starting point, Balaboosta, so we can walk on over.

Geospatial queries in MongoDB

To figure out the coordinates of Balaboosta, I right-clicked on Google Maps and saved the coordinates, and then made sure I had the longitude and latitude in the proper order.
First, create a 2dsphere on our location field, so we can put a 2dsphere index on our collection:
1collection.create_index( { "location" : "2dsphere" } )
Here is the pipeline, with our query specifying that we only want to use the IDs of the locations we found above:
1# use the $geoNear operator to return documents that are at least 100 meters and at most 1000 meters from our specified GeoJSON point.
2spritz_near_me_geo = [
3 {
4 "$geoNear": {
5 "near": {
6 "type": "Point",
7 "coordinates": [-74.0059456749148, 40.73781277366724]
8 },
9 # here we are saying that we only want to use the sample size from above
10 "query": {"_id": {"$in": spritz_near_me_ids}},
11 "minDistance": 100,
12 "maxDistance": 1000,
13 "spherical": True,
14 "distanceField": "dist.calculated"
15 }
16 },
17 {
18 "$project": {
19 "_id": 0,
20 "name": 1,
21 "address": 1,
22 "rating": 1,
23 "dist.calculated": 1,
24 #"location": 1,
25 #"embedding": 1
26 }
27 },
28 {
29 "$limit": 3
30 },
31 {
32 "$sort": {
33 "dist.calculated": 1
34 }
35 }
36]
Let’s print it out and see what we get!
1spritz_near_me_geo_results = collection.aggregate(spritz_near_me_geo)
2for result in spritz_near_me_geo_results:
3 print(result)
Output after searching via distance from our five sample size
It seems like the restaurant we are heading over to is Pastis since it’s only 182.83 meters (0.1 miles) away. Time for an Aperol spritz outdoors!
For those who would prefer to switch things around and run geospatial queries first and then incorporate vector search, here is the pipeline:
1# create a 2dsphere index on our location field
2collection.create_index({"location": "2dsphere"})
3
4# our $geoNear pipeline
5spritz_near_me_geo = [
6 {
7 "$geoNear": {
8 "near": {
9 "type": "Point",
10 "coordinates": [-74.0059456749148, 40.73781277366724]
11 },
12 "minDistance": 100,
13 "maxDistance": 1000,
14 "spherical": True,
15 "distanceField": "dist.calculated"
16 }
17 },
18 {
19 "$project": {
20 "_id": 1,
21 "dist.calculated": 1
22 }
23 }
24]
25
26# list of ID's and distances so we can use them as our sample size
27places_ids = list(collection.aggregate(spritz_near_me_geo))
28distances = {result['_id']: result['dist']['calculated'] for result in places_ids} # have to create a new dictionary to keep our distances
29spritz_near_me_ids = [result['_id'] for result in places_ids]
30# print(spritz_near_me_ids)
First, create our $geoNear pipeline and ensure you’re saving in your places_ids and the distances so that we can carry them through our vector search pipeline.
We also need to rebuild our MongoDB Atlas Vector Search index with an included “_id” path:
1# our vector search index that was created inside of MongoDB Atlas
2vector_search_index = {
3 "fields": [
4 {
5 "numDimensions": 1536,
6 "path": "embedding",
7 "similarity": "cosine",
8 "type": "vector"
9 },
10 {
11 "type": "filter",
12 "path": "_id"
13 }
14 ]
15}
Once that’s active and ready, we can build out our vector search pipeline:
1# vector search pipeline
2spritz_near_me_vector = [
3 {
4 '$vectorSearch': {
5 'index': 'vector_index',
6 'path': 'embedding',
7 'queryVector': query_vector,
8 'numCandidates': 15,
9 'limit': 3,
10 'filter': {"_id": {'$in': spritz_near_me_ids}}
11 }
12 },
13 {
14 "$project": {
15 "_id": 1, # we want to keep this in place
16 "name": 1,
17 "rating": 1,
18 "dist.calculated": 1
19 #"reviews": 1
20 # "address": 1,
21 # "location": 1,
22 # "embedding": 1
23 }
24 }
25]
26
27spritz_near_me_vector_results = collection.aggregate(spritz_near_me_vector)
28for result in spritz_near_me_vector_results:
29 result['dist.calculated'] = distances.get(result['_id'])
30 print(result)
Run it, and you should see some pretty similar results as before! Leave a comment below letting me know which locations showed up for you as your output — these are mine: Output from running geospatial first and then vector search
As you can see, they’re the same results but in a slightly different order, as they are no longer ordered by distance.

Conclusion

In this tutorial, we covered how to use MongoDB Atlas Vector Search, and the Google Places API to find our closest spritz locations in the West Village neighborhood of New York City with semantic search, and then used MongoDB geospatial queries to find which locations were closest to us from a specific starting point.
For more information on MongoDB geospatial queries please visit the documentation located above, and if you have any questions or want to share your work, please join us in the MongoDB Developer Community.
Top Comments in Forums
Forum Commenter Avatar
Joao_SchaabJoão Schaab3 weeks ago

Thanks for this article, it is super helpful. Is there a way to return documents that are sorted by the score and distance without doing that in the application layer?

See More on Forums

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

A Beginner's Guide to Integrating MongoDB With TensorFlow Using JavaScript


Sep 04, 2024 | 15 min read
Article

Improving Storage and Read Performance for Free: Flat vs Structured Schemas


Jan 26, 2024 | 5 min read
Quickstart

Quick Start: BSON Data Types - ObjectId


Sep 23, 2022 | 3 min read
Tutorial

Real-Time Chat in a Phaser Game with MongoDB and Socket.io


Feb 03, 2023 | 11 min read
Table of Contents