Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
MongoDBchevron-right

Trader Joe's Fall Faves Party Planner With Playwright, LlamaIndex, and MongoDB Atlas Vector Search

Anaiya Raisinghani11 min read • Published Nov 11, 2024 • Updated Nov 12, 2024
AIPythonMongoDB
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
As someone who survives solely on snacks, Trader Joe’s is to me what Costco is to my parents: a kitchen non-negotiable.
When I lived in the same 0.2-mile radius of one during college, going was an everyday (sometimes multiple times a day) occurrence. Want to grab a yogurt for breakfast? A salad for lunch? Pick up some frozen items and make a feast for dinner? Trader Joe’s had it all and more, especially the stuff you didn’t even realize you were missing. From “Everything But the Bagel” seasoning to “Philly Cheesesteak Bao Buns,” every Trader Joe’s trip comes with something new for your pantry and your tastebuds.
The fact that Trader Joe’s has all the fun products you’ll want to serve during Thanksgiving makes it the easy choice for fall festivities. But what about all the other factors of planning a party, like decision fatigue, or having to stand in crazy long lines trying to look at every single fall product TJ’s has to offer and then decide between them? This is where an incredible combination of Playwright, LlamaIndex, and MongoDB Atlas Vector Search come in to save the day.
Let’s use these platforms to create a Trader Joe’s fall party planner. We’ll use Playwright to scrape the Trader Joe’s website for all the fall-related products, the LlamaIndex and Atlas Vector Search Integration to build a retrieval-augmented generation (RAG) chatbot with our fall products as our data store, and LlamaIndex’s Chat Engine to get the most interactive, conversational responses based on our fall product data so that we can plan the best party!
What's covered
  • Building a Trader Joe’s AI party planner using Playwright, LlamaIndex, and MongoDB Atlas Vector Search
  • Scraping Trader Joe’s fall items with Playwright and formatting them for chatbot use
  • Setting up and embedding product data in MongoDB Atlas Vector Store for semantic search
  • Creating a retrieval-augmented generation chatbot to answer party planning questions
  • Adding interactive chat engine functionality for back-and-forth Q&A about fall party items
Before diving in, let’s go over these platforms in more detail.

Playwright

Playwright makes it super easy to return dynamic website elements, which is why it was chosen for this tutorial. After inspecting the Trader Joe’s website, it was clear that JavaScript is required to load the content and the various products we are able to see, meaning that the page content is rendered dynamically! Because of this, other simple Python scrapers wouldn’t work to scrape the items we are looking for.

LlamaIndex

LlamaIndex is a framework that makes it easy to use large language models (LLMs) with your data. You can create all sorts of AI-powered applications with LlamaIndex, chatbots being one of them, which will be perfect for our fall party planner.
MongoDB Atlas Vector Search is a feature in MongoDB Atlas that allows you to store and query vector embeddings in your database. It allows you to build incredible applications that require semantic search, or searching based on meaning and context rather than exact keywords.
In this tutorial, with our party planner, we use MongoDB Atlas Vector Search as our data's storage and retrieval layer. It allows us to store all our fall product embeddings and search for the most relevant items depending on our queries!

LlamaIndex and MongoDB Atlas Vector Search integration

The LlamaIndex and MongoDB Atlas Vector Search integration combines both platforms, allowing LlamaIndex to organize and query the data. At the same time, Atlas Vector Search has vector storage and semantic search capabilities. So, all our product information (the Trader Joe’s products we scraped) is vectorized and stored in our cluster.
So what does this mean? This means when a question is asked—such as, “Which three sides are best served if I’m making a turkey?”—LlamaIndex gets the most accurate product by comparing different vectors stored in MongoDB Atlas, ensuring that answers are based on overall meaning!

Tutorial prerequisites

Please make sure you have the following prerequisites in order to be successful:
  1. IDE of your choice: This tutorial uses a Google Colab notebook. Please feel free to follow along.
  2. OpenAI API key: You will need to pay to access an API key.
  3. MongoDB Atlas cluster: Please make sure that you are using a free tier, that you have ensured your IP address is set to “access from anywhere” (not recommended for production, but it’s perfectly fine for this tutorial), and that you have copied your cluster’s connection string to a safe place.
Once you have all your tutorial requirements, we are ready to begin!

Part 1: Scraping Trader Joe's for fall items

Inspect your website!
Our first step is to inspect all the fall favorites from Trader Joe’s website and save them so we can easily scrape the website. Trader Joe’s makes this easy for us since they already catalog everything under specific tags. So, let’s click on the “Food” category, then scroll down and click the “Fall Faves” tag. Both these options are on the left-hand side of the webpage. Fall faves tag for Trader Joes
Food Fall Faves
Once we can see that these are all the food “Fall Faves,” save the URL. Now, we can do this again for all the other categories: Beverages, Flowers & Plants, and Everything Else! Ensure we are only focused on products listed under the “Fall Faves” tag.
Please keep in mind that since we are dealing with live data, these products and options may change depending on when you decide to scrape the information, so the products that show up for me may look different to you!
Once we know the URLs we will be scraping from, let’s figure out which selectors we need as well. We want to format our products as “Name” and “Price.”
Finding the selectors
The easiest way to find this information is to highlight the name of an item, right-click, and press “Inspect.” Then, you can open each drop-down until you find the information you’re looking for!
Here, we can see that every product name is located in an “h2” tag within a “ProductCard_card__title__text__uiWLe a” class, and each price is located in a “span” tag within a “ProductPrice_productPrice__price__3-50j” class. I recommend checking two or more products to ensure this pattern is throughout.
We can also see that all products are nested within “li” tags in the “ProductList_productList__item__1EIvq” class. Where our products are nested
This means we will have to wait for this class to show up when scraping before we can go ahead and extract the information within.
Now that we have our Fall Faves and know exactly where the information we want to retrieve lives, we are ready to build out our scraping function.

Scraping function

First, let’s install Playwright:
1!pip install playwright
2!playwright install
Once that’s done installing, we can import our necessary packages:
1import asyncio
2from playwright.async_api import async_playwright
Please keep in mind that we are using async because we are running everything inside of a Google Colab notebook.
Now, let’s start building our traderJoesScraper:
1async def traderJoesScraper():
2 async with async_playwright() as playwright:
3 # use headless mode since we are using Colab
4 browser = await playwright.chromium.launch(headless=True)
5 page = await browser.new_page()
6
7
8 # all the URLs for my foods, bevs, flowers&plants, and everything else categories
9 pages = [
10 {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'Food'},
11 {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A2%7D', 'category': 'Food'},
12 {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A3%7D', 'category': 'Food'},
13 {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A4%7D', 'category': 'Food'},
14 {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A5%7D', 'category': 'Food'},
15 {'url': 'https://www.traderjoes.com/home/products/category/beverages-182?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'Beverage'},
16 {'url': 'https://www.traderjoes.com/home/products/category/flowers-plants-203?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'Flowers&Plants'},
17 {'url': 'https://www.traderjoes.com/home/products/category/everything-else-215?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'EverythingElse'}
18 ]
19
20
21 items = []
22
23
24 # loop through each URL
25 for info in pages:
26 await page.goto(info['url'])
27
28
29 # let page load
30 await page.wait_for_selector('li.ProductList_productList__item__1EIvq', state='attached', timeout=60000)
31
32
33 # li.ProductList_productList__item__1EIvq is where all our info lives
34 products = await page.query_selector_all('li.ProductList_productList__item__1EIvq')
35
36
37 # get all our info
38 for product in products:
39 result = {}
40
41
42 name = await product.query_selector('h2.ProductCard_card__title__text__uiWLe a')
43 price = await product.query_selector('span.ProductPrice_productPrice__price__3-50j')
44
45
46 if name and price:
47 result['name'] = await name.inner_text()
48
49
50 # have to make price a number
51 price_text = await price.inner_text()
52 convert_price = float(price_text.replace('$', '').strip())
53 result['price'] = convert_price
54
55
56 # category is so we can save it nicely later
57 result['category'] = info['category']
58 items.append(result)
59
60
61 for item in items:
62 print(f"Name: {item['name']}, Price: {item['price']}, Category: {item['category']}")
63
64
65 await browser.close()
66 return items
67
68
69
70
71scraped_products = await traderJoesScraper()
72print(scraped_products)
We started off with manually putting in all the links we want to scrape the information off of. Please keep in mind that if you’re hoping to turn this into a scalable application, it’s recommended to use pagination for this part, but for the sake of simplicity, we can input them manually.
Then, we looped through each of the URLs listed, waited for our main selector to show up with all the elements we hoped to scrape, and then extracted our “name” and “price.”
Once we ran that, we got a list of all our products from the Fall Faves tag! Please remember that this screenshot doesn’t include all the products scraped.
Scraped items
To keep track of the items, we can quickly count them:
1scraped_products_count = len(scraped_products)
2print(scraped_products_count)
As of the date this was scraped, we had 89 products.
Now, let’s save our products into a .txt file so we can use it later in our tutorial when we are using our LlamaIndex and Atlas Vector Search integration. Name the file whatever you like. For the sake of tracking, I’m naming mine tj_fall_faves_oct30.txt.
1with open('tj_fall_faves_oct30.txt', 'w') as f:
2 for item in scraped_products:
3 f.write(f"Name: {item['name']}, Price: ${item['price']}, Category: {item['category']}\n")
Since we are using a notebook, please make sure that you download the file locally since once our runtime is disconnected, the .txt file will be lost.
Now that we have all our Trader Joe’s fall products, let’s build our AI Party Planner!

Part 2: LlamaIndex and Atlas Vector Search integration

To be successful with this part of the tutorial, follow our quickstart. We will be going over how to use Atlas Vector Search with LlamaIndex to build a RAG application with chat capabilities!
This section will cover in detail how to set up the environment, store our custom data that we previously scraped on Atlas, create an Atlas Vector Search index on top of our data, and to finish up, we will implement RAG and use Atlas Vector Search to answer questions from our unique data store.
Let’s first use pip to install all our necessary libraries. We will need to include llama-index, llama-index-vector-stores-mongodb, and llama-index-embeddings-openai.
1pip install --quiet --upgrade llama-index llama-index-vector-stores-mongodb llama-index-embeddings-openai pymongo
Now, import in your necessary import statements:
1import getpass, os, pymongo, pprint
2from pymongo.operations import SearchIndexModel
3from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
4from llama_index.core.settings import Settings
5from llama_index.core.retrievers import VectorIndexRetriever
6from llama_index.core.vector_stores import MetadataFilter, MetadataFilters, ExactMatchFilter, FilterOperator
7from llama_index.core.query_engine import RetrieverQueryEngine
8from llama_index.embeddings.openai import OpenAIEmbedding
9from llama_index.llms.openai import OpenAI
10from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
Input your OpenAI API key and your MongoDB Atlas cluster connection string when prompted:
1os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
2ATLAS_CONNECTION_STRING = getpass.getpass("MongoDB Atlas SRV Connection String:")
Once your keys are in, let’s assign our specific models for llama_index so it knows how to embed our file properly. This is just to keep everything consistent!
1Settings.llm = OpenAI()
2Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
Now, we can read in our .txt file with our scraped products. We are doing this using the SimpleDirectoryReader from llama_index. Text files aren’t the only files that can be nicely loaded into LlamaIndex. There are a ton of other supported methods, and I recommend checking out some of their supported file types.
So here we are, reading the contents of our file and then returning it as a list of documents, the format LlamaIndex requires.
1sample_data = SimpleDirectoryReader(input_files=["/content/tj_fall_faves_oct30.txt"]).load_data()
2sample_data[0]
Now that our file has been read, let’s connect to our MongoDB Atlas cluster and set up a vector store! Feel free to name the database and collection anything you like. We are initializing a vector store using MongoAtlasVectorSearch from llama_index, allowing us to work with our embedded documents directly in our cluster.
1# connect to your Atlas cluster
2mongo_client = pymongo.MongoClient(ATLAS_CONNECTION_STRING, appname = "devrel.showcase.tj_fall_faves")
3
4
5# instantiate the vector store
6atlas_vector_store = MongoDBAtlasVectorSearch(
7 mongo_client,
8 db_name = "tj_products",
9 collection_name = "fall_faves",
10 vector_index_name = "vector_index"
11)
12vector_store_context = StorageContext.from_defaults(vector_store=atlas_vector_store)
Since our vector store has been defined (by our vector_store_context), let’s go create a vector index in MongoDB for our documents in sample_data.
1vector_store_index = VectorStoreIndex.from_documents(
2 sample_data, storage_context=vector_store_context, show_progress=True
3)
Once this cell has run, you can view your data with the embeddings inside your Atlas cluster.
To allow for vector search queries on our created vector store, we need to create an Atlas Vector Search index on our tj_products.fall_faves collection. We can do this either through the Atlas UI or directly from our notebook:
1# Specify the collection for which to create the index
2collection = mongo_client["tj_products"]["fall_faves"]
3
4
5# Create your index model, then create the search index
6search_index_model = SearchIndexModel(
7 definition={
8 "fields": [
9 {
10 "type": "vector",
11 "path": "embedding",
12 "numDimensions": 1536,
13 "similarity": "cosine"
14 },
15 {
16 "type": "filter",
17 "path": "metadata.page_label"
18 }
19 ]
20 },
21 name="vector_index",
22 type="vectorSearch",
23)
24
25
26collection.create_search_index(model=search_index_model)
You’ll be able to see this index once it’s up and running under your “Atlas Search” tab in your Atlas UI. Once it’s done, we can start querying our data and do some basic RAG.

Part 3: Basic RAG

With our Atlas Vector Search index up and running, we are ready to have some fun and bring our AI Party Planner to life! We will continue with this dream team where we will use Atlas Vector Search to get our documents and LlamaIndex’s query engine to answer our questions based on our documents.
To do this, we will need to have Atlas Vector Search become a vector index retriever, and we will need to initialize a RetrieverQueryEngine to handle queries by passing each question through our vector retrieval system. This combination will allow us to ask any questions we want in natural language and match us with the most accurate documents.
1vector_store_retriever = VectorIndexRetriever(index=vector_store_index, similarity_top_k=5)
2
3
4query_engine = RetrieverQueryEngine(retriever=vector_store_retriever)
5
6
7response = query_engine.query('Which plant items are available right now? Please provide prices')
8
9
10print(response)
For the question “Which plant items are available right now? Please provide prices,” we get the response:
1Mum Fleurettes are available for $4.99 and Assorted Mum Plants are available for $6.99.
But what if we want to keep asking questions and get responses with memory? Let’s quickly build a chat engine.

Part 4: Chat engine

Instead of having to ask one question at a time about our Trader Joe’s products for our party, we can incorporate a back-and-forth conversation to get the most out of our AI Party Planner.
We first need to initialize the chat engine from our vector_store_index and enable a streaming response. Condense question mode is also used to ensure that the engine shortens their questions or rephrases them to make the most sense when used in a back-and-forth conversation. Streaming is enabled as well so we can see the response:
1# llamaindex chat engine
2chat_engine = vector_store_index.as_chat_engine(
3 chat_mode="condense_question", streaming=True
4)
Then, we can create our chat loop! This is just a basic while loop that will run until the user enters “exit.”
1while True:
2 # ask question
3 question = input("Ask away! Type 'exit' to quit >>> ")
4
5 # exit to quit
6 if question == 'exit':
7 print("Exiting chat. Have a happy fall!")
8 break
9
10
11 print("\n")
Our last step is to send the answer to our chat engine and stream and display the response.
1 # llamaindex ask
2 response_stream = chat_engine.stream_chat(question)
3
4 # llamaindex print
5 response_stream.print_response_stream()
6 print("\n")
Run the above code blocks and try it for yourself. Here are my questions and answers:
1Ask away! Type 'exit' to quit >>> hi! i am planning a fall party
2
3
4Consider including a variety of fall-themed food and beverages such as pumpkin pie, apple cider donuts, maple-flavored fudge, pumpkin spiced cookies, and harvest blend herbal tea to create a festive atmosphere for your fall party. Additionally, you could incorporate seasonal decorations like cinnamon brooms, scented candles, and mum plants to enhance the autumn ambiance.
5
6Ask away! Type 'exit' to quit >>> i want to make a turkey, which three sides with prices and reasonings will be best
7
8
9The best three side dishes to serve with turkey at a fall party would be Cut Butternut Squash, Brussels Sprouts, and Cornbread Stuffing. Cut Butternut Squash and Brussels Sprouts are reasonably priced at $3.99 and $4.99 respectively, offering a balance of flavors and textures that complement the turkey well. Cornbread Stuffing, priced at $5.99, adds a traditional touch to the meal and enhances the overall fall-themed dining experience.
10
11Ask away! Type 'exit' to quit >>> which drinks should i serve? i want something caffinated
12
13
14Harvest Blend Herbal Tea and Autumn Maple Coffee would be ideal caffeinated drinks to serve at a fall party to complement the autumn-themed food and create a festive atmosphere.
15
16Ask away! Type 'exit' to quit >>> what are the prices of these drinks
17
18
19$2.49 for Harvest Blend Herbal Tea and $8.99 for Autumn Maple Coffee.
20
21Ask away! Type 'exit' to quit >>> which decor should i use? i want my home to smell nice
22
23
24Cinnamon Whisk, Cinnamon Broom, Orange & Spice Scented Candle & Room Spritz
25
26Ask away! Type 'exit' to quit >>> what are the prices?
27
28
29$5.99, $1.29, $4.99
30
31Ask away! Type 'exit' to quit >>> exit
32Exiting chat. Have a happy fall!

Conclusion

In this tutorial, we have built a super helpful Trader Joe’s party planner using Playwright to scrape all the fall favorite items and the LlamaIndex and MongoDB Atlas Vector Search integration to save, embed, and query our data using natural language.
We have even taken it a step further and incorporated a chat engine into our tutorial to take things a step further than just standalone Q&A!
I hope you enjoyed this tutorial. Please connect with us in the Developer Forums.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Quickstart

Building Quarkus Application with MongoDB and Panache


Dec 03, 2024 | 5 min read
Code Example

Saving Data in Unity3D Using SQLite


Sep 07, 2022 | 13 min read
Code Example

Magazine Management


Sep 11, 2024 | 0 min read
Tutorial

Optimize and Tune MongoDB Performance with Hidden Indexes


Oct 01, 2024 | 5 min read
Table of Contents