Trader Joe's Fall Faves Party Planner With Playwright, LlamaIndex, and MongoDB Atlas Vector Search

Anaiya Raisinghani11 min read • Published Nov 11, 2024 • Updated Nov 12, 2024

AI Python MongoDB

Rate this tutorial

As someone who survives solely on snacks, Trader Joe’s is to me what Costco is to my parents: a kitchen non-negotiable.

When I lived in the same 0.2-mile radius of one during college, going was an everyday (sometimes multiple times a day) occurrence. Want to grab a yogurt for breakfast? A salad for lunch? Pick up some frozen items and make a feast for dinner? Trader Joe’s had it all and more, especially the stuff you didn’t even realize you were missing. From “Everything But the Bagel” seasoning to “Philly Cheesesteak Bao Buns,” every Trader Joe’s trip comes with something new for your pantry and your tastebuds.

The fact that Trader Joe’s has all the fun products you’ll want to serve during Thanksgiving makes it the easy choice for fall festivities. But what about all the other factors of planning a party, like decision fatigue, or having to stand in crazy long lines trying to look at every single fall product TJ’s has to offer and then decide between them? This is where an incredible combination of Playwright, LlamaIndex, and MongoDB Atlas Vector Search come in to save the day.

Let’s use these platforms to create a Trader Joe’s fall party planner. We’ll use Playwright to scrape the Trader Joe’s website for all the fall-related products, the LlamaIndex and Atlas Vector Search Integration to build a retrieval-augmented generation (RAG) chatbot with our fall products as our data store, and LlamaIndex’s Chat Engine to get the most interactive, conversational responses based on our fall product data so that we can plan the best party!

What's covered

Building a Trader Joe’s AI party planner using Playwright, LlamaIndex, and MongoDB Atlas Vector Search
Scraping Trader Joe’s fall items with Playwright and formatting them for chatbot use
Setting up and embedding product data in MongoDB Atlas Vector Store for semantic search
Creating a retrieval-augmented generation chatbot to answer party planning questions
Adding interactive chat engine functionality for back-and-forth Q&A about fall party items

Before diving in, let’s go over these platforms in more detail.

Playwright

Playwright makes it super easy to return dynamic website elements, which is why it was chosen for this tutorial. After inspecting the Trader Joe’s website, it was clear that JavaScript is required to load the content and the various products we are able to see, meaning that the page content is rendered dynamically! Because of this, other simple Python scrapers wouldn’t work to scrape the items we are looking for.

LlamaIndex

LlamaIndex is a framework that makes it easy to use large language models (LLMs) with your data. You can create all sorts of AI-powered applications with LlamaIndex, chatbots being one of them, which will be perfect for our fall party planner.

MongoDB Atlas Vector Search

MongoDB Atlas Vector Search is a feature in MongoDB Atlas that allows you to store and query vector embeddings in your database. It allows you to build incredible applications that require semantic search, or searching based on meaning and context rather than exact keywords.

In this tutorial, with our party planner, we use MongoDB Atlas Vector Search as our data's storage and retrieval layer. It allows us to store all our fall product embeddings and search for the most relevant items depending on our queries!

LlamaIndex and MongoDB Atlas Vector Search integration

The LlamaIndex and MongoDB Atlas Vector Search integration combines both platforms, allowing LlamaIndex to organize and query the data. At the same time, Atlas Vector Search has vector storage and semantic search capabilities. So, all our product information (the Trader Joe’s products we scraped) is vectorized and stored in our cluster.

So what does this mean? This means when a question is asked—such as, “Which three sides are best served if I’m making a turkey?”—LlamaIndex gets the most accurate product by comparing different vectors stored in MongoDB Atlas, ensuring that answers are based on overall meaning!

Tutorial prerequisites

Please make sure you have the following prerequisites in order to be successful:

IDE of your choice: This tutorial uses a Google Colab notebook. Please feel free to follow along.
OpenAI API key: You will need to pay to access an API key.
MongoDB Atlas cluster: Please make sure that you are using a free tier, that you have ensured your IP address is set to “access from anywhere” (not recommended for production, but it’s perfectly fine for this tutorial), and that you have copied your cluster’s connection string to a safe place.

Once you have all your tutorial requirements, we are ready to begin!

Part 1: Scraping Trader Joe's for fall items

Inspect your website!

Our first step is to inspect all the fall favorites from Trader Joe’s website and save them so we can easily scrape the website. Trader Joe’s makes this easy for us since they already catalog everything under specific tags. So, let’s click on the “Food” category, then scroll down and click the “Fall Faves” tag. Both these options are on the left-hand side of the webpage.

Once we can see that these are all the food “Fall Faves,” save the URL. Now, we can do this again for all the other categories: Beverages, Flowers & Plants, and Everything Else! Ensure we are only focused on products listed under the “Fall Faves” tag.

Please keep in mind that since we are dealing with live data, these products and options may change depending on when you decide to scrape the information, so the products that show up for me may look different to you!

Once we know the URLs we will be scraping from, let’s figure out which selectors we need as well. We want to format our products as “Name” and “Price.”

The easiest way to find this information is to highlight the name of an item, right-click, and press “Inspect.” Then, you can open each drop-down until you find the information you’re looking for!

Here, we can see that every product name is located in an “h2” tag within a “ProductCard_card__title__text__uiWLe a” class, and each price is located in a “span” tag within a “ProductPrice_productPrice__price__3-50j” class. I recommend checking two or more products to ensure this pattern is throughout.

We can also see that all products are nested within “li” tags in the “ProductList_productList__item__1EIvq” class.

This means we will have to wait for this class to show up when scraping before we can go ahead and extract the information within.

Now that we have our Fall Faves and know exactly where the information we want to retrieve lives, we are ready to build out our scraping function.

Scraping function

First, let’s install Playwright:

1 !pip install playwright
2 !playwright install

Once that’s done installing, we can import our necessary packages:

1 import asyncio
2 from playwright.async_api import async_playwright

Please keep in mind that we are using async because we are running everything inside of a Google Colab notebook.

Now, let’s start building our traderJoesScraper:

1 async def traderJoesScraper():
2    async with async_playwright() as playwright:
3        # use headless mode since we are using Colab
4        browser = await playwright.chromium.launch(headless=True)
5        page = await browser.new_page()
6 
7 
8        # all the URLs for my foods, bevs, flowers&plants, and everything else categories
9        pages = [
10            {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'Food'},
11            {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A2%7D', 'category': 'Food'},
12            {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A3%7D', 'category': 'Food'},
13            {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A4%7D', 'category': 'Food'},
14            {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A5%7D', 'category': 'Food'},
15            {'url': 'https://www.traderjoes.com/home/products/category/beverages-182?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'Beverage'},
16            {'url': 'https://www.traderjoes.com/home/products/category/flowers-plants-203?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'Flowers&Plants'},
17            {'url': 'https://www.traderjoes.com/home/products/category/everything-else-215?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'EverythingElse'}
18        ]
19 
20 
21        items = []
22 
23 
24        # loop through each URL
25        for info in pages:
26            await page.goto(info['url'])
27 
28 
29            # let page load
30            await page.wait_for_selector('li.ProductList_productList__item__1EIvq', state='attached', timeout=60000)
31 
32 
33            # li.ProductList_productList__item__1EIvq is where all our info lives
34            products = await page.query_selector_all('li.ProductList_productList__item__1EIvq')
35 
36 
37            # get all our info
38            for product in products:
39                result = {}
40 
41 
42                name = await product.query_selector('h2.ProductCard_card__title__text__uiWLe a')
43                price = await product.query_selector('span.ProductPrice_productPrice__price__3-50j')
44 
45 
46                if name and price:
47                    result['name'] = await name.inner_text()
48 
49 
50                    # have to make price a number
51                    price_text = await price.inner_text()
52                    convert_price = float(price_text.replace('$', '').strip())
53                    result['price'] = convert_price
54 
55 
56                    # category is so we can save it nicely later
57                    result['category'] = info['category']
58                    items.append(result)
59 
60 
61        for item in items:
62            print(f"Name: {item['name']}, Price: {item['price']}, Category: {item['category']}")
63 
64 
65        await browser.close()
66        return items
67 
68 
69 
70 
71 scraped_products = await traderJoesScraper()
72 print(scraped_products)

We started off with manually putting in all the links we want to scrape the information off of. Please keep in mind that if you’re hoping to turn this into a scalable application, it’s recommended to use pagination for this part, but for the sake of simplicity, we can input them manually.

Then, we looped through each of the URLs listed, waited for our main selector to show up with all the elements we hoped to scrape, and then extracted our “name” and “price.”

Once we ran that, we got a list of all our products from the Fall Faves tag! Please remember that this screenshot doesn’t include all the products scraped.

To keep track of the items, we can quickly count them:

1 scraped_products_count = len(scraped_products)
2 print(scraped_products_count)

As of the date this was scraped, we had 89 products.

Now, let’s save our products into a .txt file so we can use it later in our tutorial when we are using our LlamaIndex and Atlas Vector Search integration. Name the file whatever you like. For the sake of tracking, I’m naming mine tj_fall_faves_oct30.txt.

1 with open('tj_fall_faves_oct30.txt', 'w') as f:
2    for item in scraped_products:
3        f.write(f"Name: {item['name']}, Price: ${item['price']}, Category: {item['category']}\n")

Since we are using a notebook, please make sure that you download the file locally since once our runtime is disconnected, the .txt file will be lost.

Now that we have all our Trader Joe’s fall products, let’s build our AI Party Planner!

Part 2: LlamaIndex and Atlas Vector Search integration

To be successful with this part of the tutorial, follow our quickstart. We will be going over how to use Atlas Vector Search with LlamaIndex to build a RAG application with chat capabilities!

This section will cover in detail how to set up the environment, store our custom data that we previously scraped on Atlas, create an Atlas Vector Search index on top of our data, and to finish up, we will implement RAG and use Atlas Vector Search to answer questions from our unique data store.

Let’s first use pip to install all our necessary libraries. We will need to include llama-index, llama-index-vector-stores-mongodb, and llama-index-embeddings-openai.

1 pip install --quiet --upgrade llama-index llama-index-vector-stores-mongodb llama-index-embeddings-openai pymongo

Now, import in your necessary import statements:

1 import getpass, os, pymongo, pprint
2 from pymongo.operations import SearchIndexModel
3 from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
4 from llama_index.core.settings import Settings
5 from llama_index.core.retrievers import VectorIndexRetriever
6 from llama_index.core.vector_stores import MetadataFilter, MetadataFilters, ExactMatchFilter, FilterOperator
7 from llama_index.core.query_engine import RetrieverQueryEngine
8 from llama_index.embeddings.openai import OpenAIEmbedding
9 from llama_index.llms.openai import OpenAI
10 from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch

Input your OpenAI API key and your MongoDB Atlas cluster connection string when prompted:

1 os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
2 ATLAS_CONNECTION_STRING = getpass.getpass("MongoDB Atlas SRV Connection String:")

Once your keys are in, let’s assign our specific models for llama_index so it knows how to embed our file properly. This is just to keep everything consistent!

1 Settings.llm = OpenAI()
2 Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

Now, we can read in our .txt file with our scraped products. We are doing this using the SimpleDirectoryReader from llama_index. Text files aren’t the only files that can be nicely loaded into LlamaIndex. There are a ton of other supported methods, and I recommend checking out some of their supported file types.

So here we are, reading the contents of our file and then returning it as a list of documents, the format LlamaIndex requires.

1 sample_data = SimpleDirectoryReader(input_files=["/content/tj_fall_faves_oct30.txt"]).load_data()
2 sample_data[0]

Now that our file has been read, let’s connect to our MongoDB Atlas cluster and set up a vector store! Feel free to name the database and collection anything you like. We are initializing a vector store using MongoAtlasVectorSearch from llama_index, allowing us to work with our embedded documents directly in our cluster.

1 # connect to your Atlas cluster
2 mongo_client = pymongo.MongoClient(ATLAS_CONNECTION_STRING, appname = "devrel.showcase.tj_fall_faves")
3 
4 
5 # instantiate the vector store
6 atlas_vector_store = MongoDBAtlasVectorSearch(
7    mongo_client,
8    db_name = "tj_products",
9    collection_name = "fall_faves",
10    vector_index_name = "vector_index"
11 )
12 vector_store_context = StorageContext.from_defaults(vector_store=atlas_vector_store)

Since our vector store has been defined (by our vector_store_context), let’s go create a vector index in MongoDB for our documents in sample_data.

1 vector_store_index = VectorStoreIndex.from_documents(
2   sample_data, storage_context=vector_store_context, show_progress=True
3 )

Once this cell has run, you can view your data with the embeddings inside your Atlas cluster.

To allow for vector search queries on our created vector store, we need to create an Atlas Vector Search index on our tj_products.fall_faves collection. We can do this either through the Atlas UI or directly from our notebook:

1 # Specify the collection for which to create the index
2 collection = mongo_client["tj_products"]["fall_faves"]
3 
4 
5 # Create your index model, then create the search index
6 search_index_model = SearchIndexModel(
7  definition={
8    "fields": [
9      {
10        "type": "vector",
11        "path": "embedding",
12        "numDimensions": 1536,
13        "similarity": "cosine"
14      },
15      {
16        "type": "filter",
17        "path": "metadata.page_label"
18      }
19    ]
20  },
21  name="vector_index",
22  type="vectorSearch",
23 )
24 
25 
26 collection.create_search_index(model=search_index_model)

You’ll be able to see this index once it’s up and running under your “Atlas Search” tab in your Atlas UI. Once it’s done, we can start querying our data and do some basic RAG.

Part 3: Basic RAG

With our Atlas Vector Search index up and running, we are ready to have some fun and bring our AI Party Planner to life! We will continue with this dream team where we will use Atlas Vector Search to get our documents and LlamaIndex’s query engine to answer our questions based on our documents.

To do this, we will need to have Atlas Vector Search become a vector index retriever, and we will need to initialize a RetrieverQueryEngine to handle queries by passing each question through our vector retrieval system. This combination will allow us to ask any questions we want in natural language and match us with the most accurate documents.

1 vector_store_retriever = VectorIndexRetriever(index=vector_store_index, similarity_top_k=5)
2 
3 
4 query_engine = RetrieverQueryEngine(retriever=vector_store_retriever)
5 
6 
7 response = query_engine.query('Which plant items are available right now? Please provide prices')
8 
9 
10 print(response)

For the question “Which plant items are available right now? Please provide prices,” we get the response:

1 Mum Fleurettes are available for $4.99 and Assorted Mum Plants are available for $6.99.

But what if we want to keep asking questions and get responses with memory? Let’s quickly build a chat engine.

Part 4: Chat engine

Instead of having to ask one question at a time about our Trader Joe’s products for our party, we can incorporate a back-and-forth conversation to get the most out of our AI Party Planner.

We first need to initialize the chat engine from our vector_store_index and enable a streaming response. Condense question mode is also used to ensure that the engine shortens their questions or rephrases them to make the most sense when used in a back-and-forth conversation. Streaming is enabled as well so we can see the response:

1 # llamaindex chat engine
2 chat_engine = vector_store_index.as_chat_engine(
3    chat_mode="condense_question", streaming=True
4 )

Then, we can create our chat loop! This is just a basic while loop that will run until the user enters “exit.”

1 while True:
2    # ask question
3    question = input("Ask away! Type 'exit' to quit >>> ")
4   
5    # exit to quit
6    if question == 'exit':
7        print("Exiting chat. Have a happy fall!")
8        break
9 
10 
11    print("\n")

Our last step is to send the answer to our chat engine and stream and display the response.

1    # llamaindex ask
2    response_stream = chat_engine.stream_chat(question)
3   
4    # llamaindex print
5    response_stream.print_response_stream()
6    print("\n")

Run the above code blocks and try it for yourself. Here are my questions and answers:

1 Ask away! Type 'exit' to quit >>> hi! i am planning a fall party
2 
3 
4 Consider including a variety of fall-themed food and beverages such as pumpkin pie, apple cider donuts, maple-flavored fudge, pumpkin spiced cookies, and harvest blend herbal tea to create a festive atmosphere for your fall party. Additionally, you could incorporate seasonal decorations like cinnamon brooms, scented candles, and mum plants to enhance the autumn ambiance.
5 
6 Ask away! Type 'exit' to quit >>> i want to make a turkey, which three sides with prices and reasonings will be best
7 
8 
9 The best three side dishes to serve with turkey at a fall party would be Cut Butternut Squash, Brussels Sprouts, and Cornbread Stuffing. Cut Butternut Squash and Brussels Sprouts are reasonably priced at $3.99 and $4.99 respectively, offering a balance of flavors and textures that complement the turkey well. Cornbread Stuffing, priced at $5.99, adds a traditional touch to the meal and enhances the overall fall-themed dining experience.
10 
11 Ask away! Type 'exit' to quit >>> which drinks should i serve? i want something caffinated 
12 
13 
14 Harvest Blend Herbal Tea and Autumn Maple Coffee would be ideal caffeinated drinks to serve at a fall party to complement the autumn-themed food and create a festive atmosphere.
15 
16 Ask away! Type 'exit' to quit >>> what are the prices of these drinks
17 
18 
19 $2.49 for Harvest Blend Herbal Tea and $8.99 for Autumn Maple Coffee.
20 
21 Ask away! Type 'exit' to quit >>> which decor should i use? i want my home to smell nice
22 
23 
24 Cinnamon Whisk, Cinnamon Broom, Orange & Spice Scented Candle & Room Spritz
25 
26 Ask away! Type 'exit' to quit >>> what are the prices?
27 
28 
29 $5.99, $1.29, $4.99
30 
31 Ask away! Type 'exit' to quit >>> exit
32 Exiting chat. Have a happy fall!

Conclusion

In this tutorial, we have built a super helpful Trader Joe’s party planner using Playwright to scrape all the fall favorite items and the LlamaIndex and MongoDB Atlas Vector Search integration to save, embed, and query our data using natural language.

We have even taken it a step further and incorporated a chat engine into our tutorial to take things a step further than just standalone Q&A!

I hope you enjoyed this tutorial. Please connect with us in the Developer Forums.

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Tutorial

The 5-Minute Guide to Working with ESG Data on MongoDB

Aug 24, 2023 | 11 min read

Tutorial

Working With MongoDB Transactions With C# and the .NET Framework

Sep 11, 2024 | 3 min read

Article

Spring Data Unlocked: Performance Optimization Techniques With MongoDB

Dec 04, 2024 | 5 min read