Trader Joe's Fall Faves Party Planner With Playwright, LlamaIndex, and MongoDB Atlas Vector Search
Rate this tutorial
As someone who survives solely on snacks, Trader Joe’s is to me what Costco is to my parents: a kitchen non-negotiable.
When I lived in the same 0.2-mile radius of one during college, going was an everyday (sometimes multiple times a day) occurrence. Want to grab a yogurt for breakfast? A salad for lunch? Pick up some frozen items and make a feast for dinner? Trader Joe’s had it all and more, especially the stuff you didn’t even realize you were missing. From “Everything But the Bagel” seasoning to “Philly Cheesesteak Bao Buns,” every Trader Joe’s trip comes with something new for your pantry and your tastebuds.
The fact that Trader Joe’s has all the fun products you’ll want to serve during Thanksgiving makes it the easy choice for fall festivities. But what about all the other factors of planning a party, like decision fatigue, or having to stand in crazy long lines trying to look at every single fall product TJ’s has to offer and then decide between them? This is where an incredible combination of Playwright, LlamaIndex, and MongoDB Atlas Vector Search come in to save the day.
Let’s use these platforms to create a Trader Joe’s fall party planner. We’ll use Playwright to scrape the Trader Joe’s website for all the fall-related products, the LlamaIndex and Atlas Vector Search Integration to build a retrieval-augmented generation (RAG) chatbot with our fall products as our data store, and LlamaIndex’s Chat Engine to get the most interactive, conversational responses based on our fall product data so that we can plan the best party!
What's covered
- Building a Trader Joe’s AI party planner using Playwright, LlamaIndex, and MongoDB Atlas Vector Search
- Scraping Trader Joe’s fall items with Playwright and formatting them for chatbot use
- Setting up and embedding product data in MongoDB Atlas Vector Store for semantic search
- Creating a retrieval-augmented generation chatbot to answer party planning questions
- Adding interactive chat engine functionality for back-and-forth Q&A about fall party items
Before diving in, let’s go over these platforms in more detail.
Playwright makes it super easy to return dynamic website elements, which is why it was chosen for this tutorial. After inspecting the Trader Joe’s website, it was clear that JavaScript is required to load the content and the various products we are able to see, meaning that the page content is rendered dynamically! Because of this, other simple Python scrapers wouldn’t work to scrape the items we are looking for.
LlamaIndex is a framework that makes it easy to use large language models (LLMs) with your data. You can create all sorts of AI-powered applications with LlamaIndex, chatbots being one of them, which will be perfect for our fall party planner.
MongoDB Atlas Vector Search is a feature in MongoDB Atlas that allows you to store and query vector embeddings in your database. It allows you to build incredible applications that require semantic search, or searching based on meaning and context rather than exact keywords.
In this tutorial, with our party planner, we use MongoDB Atlas Vector Search as our data's storage and retrieval layer. It allows us to store all our fall product embeddings and search for the most relevant items depending on our queries!
The LlamaIndex and MongoDB Atlas Vector Search integration combines both platforms, allowing LlamaIndex to organize and query the data. At the same time, Atlas Vector Search has vector storage and semantic search capabilities. So, all our product information (the Trader Joe’s products we scraped) is vectorized and stored in our cluster.
So what does this mean? This means when a question is asked—such as, “Which three sides are best served if I’m making a turkey?”—LlamaIndex gets the most accurate product by comparing different vectors stored in MongoDB Atlas, ensuring that answers are based on overall meaning!
Please make sure you have the following prerequisites in order to be successful:
- OpenAI API key: You will need to pay to access an API key.
- MongoDB Atlas cluster: Please make sure that you are using a free tier, that you have ensured your IP address is set to “access from anywhere” (not recommended for production, but it’s perfectly fine for this tutorial), and that you have copied your cluster’s connection string to a safe place.
Once you have all your tutorial requirements, we are ready to begin!
Inspect your website!
Our first step is to inspect all the fall favorites from Trader Joe’s website and save them so we can easily scrape the website. Trader Joe’s makes this easy for us since they already catalog everything under specific tags. So, let’s click on the “Food” category, then scroll down and click the “Fall Faves” tag. Both these options are on the left-hand side of the webpage.
Once we can see that these are all the food “Fall Faves,” save the URL. Now, we can do this again for all the other categories: Beverages, Flowers & Plants, and Everything Else! Ensure we are only focused on products listed under the “Fall Faves” tag.
Please keep in mind that since we are dealing with live data, these products and options may change depending on when you decide to scrape the information, so the products that show up for me may look different to you!
Once we know the URLs we will be scraping from, let’s figure out which selectors we need as well. We want to format our products as “Name” and “Price.”
The easiest way to find this information is to highlight the name of an item, right-click, and press “Inspect.” Then, you can open each drop-down until you find the information you’re looking for!
Here, we can see that every product name is located in an “h2” tag within a “ProductCard_card__title__text__uiWLe a” class, and each price is located in a “span” tag within a “ProductPrice_productPrice__price__3-50j” class. I recommend checking two or more products to ensure this pattern is throughout.
We can also see that all products are nested within “li” tags in the “ProductList_productList__item__1EIvq” class.
This means we will have to wait for this class to show up when scraping before we can go ahead and extract the information within.
Now that we have our Fall Faves and know exactly where the information we want to retrieve lives, we are ready to build out our scraping function.
First, let’s install Playwright:
1 !pip install playwright 2 !playwright install
Once that’s done installing, we can import our necessary packages:
1 import asyncio 2 from playwright.async_api import async_playwright
Please keep in mind that we are using
async
because we are running everything inside of a Google Colab notebook.Now, let’s start building our
traderJoesScraper
:1 async def traderJoesScraper(): 2 async with async_playwright() as playwright: 3 # use headless mode since we are using Colab 4 browser = await playwright.chromium.launch(headless=True) 5 page = await browser.new_page() 6 7 8 # all the URLs for my foods, bevs, flowers&plants, and everything else categories 9 pages = [ 10 {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'Food'}, 11 {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A2%7D', 'category': 'Food'}, 12 {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A3%7D', 'category': 'Food'}, 13 {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A4%7D', 'category': 'Food'}, 14 {'url': 'https://www.traderjoes.com/home/products/category/food-8?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%2C%22page%22%3A5%7D', 'category': 'Food'}, 15 {'url': 'https://www.traderjoes.com/home/products/category/beverages-182?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'Beverage'}, 16 {'url': 'https://www.traderjoes.com/home/products/category/flowers-plants-203?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'Flowers&Plants'}, 17 {'url': 'https://www.traderjoes.com/home/products/category/everything-else-215?filters=%7B%22tags%22%3A%5B%22Fall+Faves%22%5D%7D', 'category': 'EverythingElse'} 18 ] 19 20 21 items = [] 22 23 24 # loop through each URL 25 for info in pages: 26 await page.goto(info['url']) 27 28 29 # let page load 30 await page.wait_for_selector('li.ProductList_productList__item__1EIvq', state='attached', timeout=60000) 31 32 33 # li.ProductList_productList__item__1EIvq is where all our info lives 34 products = await page.query_selector_all('li.ProductList_productList__item__1EIvq') 35 36 37 # get all our info 38 for product in products: 39 result = {} 40 41 42 name = await product.query_selector('h2.ProductCard_card__title__text__uiWLe a') 43 price = await product.query_selector('span.ProductPrice_productPrice__price__3-50j') 44 45 46 if name and price: 47 result['name'] = await name.inner_text() 48 49 50 # have to make price a number 51 price_text = await price.inner_text() 52 convert_price = float(price_text.replace('$', '').strip()) 53 result['price'] = convert_price 54 55 56 # category is so we can save it nicely later 57 result['category'] = info['category'] 58 items.append(result) 59 60 61 for item in items: 62 print(f"Name: {item['name']}, Price: {item['price']}, Category: {item['category']}") 63 64 65 await browser.close() 66 return items 67 68 69 70 71 scraped_products = await traderJoesScraper() 72 print(scraped_products)
We started off with manually putting in all the links we want to scrape the information off of. Please keep in mind that if you’re hoping to turn this into a scalable application, it’s recommended to use pagination for this part, but for the sake of simplicity, we can input them manually.
Then, we looped through each of the URLs listed, waited for our main selector to show up with all the elements we hoped to scrape, and then extracted our “name” and “price.”
Once we ran that, we got a list of all our products from the Fall Faves tag! Please remember that this screenshot doesn’t include all the products scraped.
To keep track of the items, we can quickly count them:
1 scraped_products_count = len(scraped_products) 2 print(scraped_products_count)
As of the date this was scraped, we had 89 products.
Now, let’s save our products into a
.txt
file so we can use it later in our tutorial when we are using our LlamaIndex and Atlas Vector Search integration. Name the file whatever you like. For the sake of tracking, I’m naming mine tj_fall_faves_oct30.txt
.1 with open('tj_fall_faves_oct30.txt', 'w') as f: 2 for item in scraped_products: 3 f.write(f"Name: {item['name']}, Price: ${item['price']}, Category: {item['category']}\n")
Since we are using a notebook, please make sure that you download the file locally since once our runtime is disconnected, the
.txt
file will be lost.Now that we have all our Trader Joe’s fall products, let’s build our AI Party Planner!
To be successful with this part of the tutorial, follow our quickstart. We will be going over how to use Atlas Vector Search with LlamaIndex to build a RAG application with chat capabilities!
This section will cover in detail how to set up the environment, store our custom data that we previously scraped on Atlas, create an Atlas Vector Search index on top of our data, and to finish up, we will implement RAG and use Atlas Vector Search to answer questions from our unique data store.
Let’s first use
pip
to install all our necessary libraries. We will need to include llama-index
, llama-index-vector-stores-mongodb
, and llama-index-embeddings-openai
.1 pip install --quiet --upgrade llama-index llama-index-vector-stores-mongodb llama-index-embeddings-openai pymongo
Now, import in your necessary import statements:
1 import getpass, os, pymongo, pprint 2 from pymongo.operations import SearchIndexModel 3 from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext 4 from llama_index.core.settings import Settings 5 from llama_index.core.retrievers import VectorIndexRetriever 6 from llama_index.core.vector_stores import MetadataFilter, MetadataFilters, ExactMatchFilter, FilterOperator 7 from llama_index.core.query_engine import RetrieverQueryEngine 8 from llama_index.embeddings.openai import OpenAIEmbedding 9 from llama_index.llms.openai import OpenAI 10 from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
Input your OpenAI API key and your MongoDB Atlas cluster connection string when prompted:
1 os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") 2 ATLAS_CONNECTION_STRING = getpass.getpass("MongoDB Atlas SRV Connection String:")
Once your keys are in, let’s assign our specific models for
llama_index
so it knows how to embed our file properly. This is just to keep everything consistent!1 Settings.llm = OpenAI() 2 Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
Now, we can read in our
.txt
file with our scraped products. We are doing this using the SimpleDirectoryReader
from llama_index
. Text files aren’t the only files that can be nicely loaded into LlamaIndex. There are a ton of other supported methods, and I recommend checking out some of their supported file types.So here we are, reading the contents of our file and then returning it as a list of documents, the format LlamaIndex requires.
1 sample_data = SimpleDirectoryReader(input_files=["/content/tj_fall_faves_oct30.txt"]).load_data() 2 sample_data[0]
Now that our file has been read, let’s connect to our MongoDB Atlas cluster and set up a vector store! Feel free to name the database and collection anything you like. We are initializing a vector store using
MongoAtlasVectorSearch
from llama_index
, allowing us to work with our embedded documents directly in our cluster.1 # connect to your Atlas cluster 2 mongo_client = pymongo.MongoClient(ATLAS_CONNECTION_STRING, appname = "devrel.showcase.tj_fall_faves") 3 4 5 # instantiate the vector store 6 atlas_vector_store = MongoDBAtlasVectorSearch( 7 mongo_client, 8 db_name = "tj_products", 9 collection_name = "fall_faves", 10 vector_index_name = "vector_index" 11 ) 12 vector_store_context = StorageContext.from_defaults(vector_store=atlas_vector_store)
Since our vector store has been defined (by our
vector_store_context
), let’s go create a vector index in MongoDB for our documents in sample_data
.1 vector_store_index = VectorStoreIndex.from_documents( 2 sample_data, storage_context=vector_store_context, show_progress=True 3 )
Once this cell has run, you can view your data with the embeddings inside your Atlas cluster.
To allow for vector search queries on our created vector store, we need to create an Atlas Vector Search index on our tj_products.fall_faves collection. We can do this either through the Atlas UI or directly from our notebook:
1 # Specify the collection for which to create the index 2 collection = mongo_client["tj_products"]["fall_faves"] 3 4 5 # Create your index model, then create the search index 6 search_index_model = SearchIndexModel( 7 definition={ 8 "fields": [ 9 { 10 "type": "vector", 11 "path": "embedding", 12 "numDimensions": 1536, 13 "similarity": "cosine" 14 }, 15 { 16 "type": "filter", 17 "path": "metadata.page_label" 18 } 19 ] 20 }, 21 name="vector_index", 22 type="vectorSearch", 23 ) 24 25 26 collection.create_search_index(model=search_index_model)
You’ll be able to see this index once it’s up and running under your “Atlas Search” tab in your Atlas UI. Once it’s done, we can start querying our data and do some basic RAG.
With our Atlas Vector Search index up and running, we are ready to have some fun and bring our AI Party Planner to life! We will continue with this dream team where we will use Atlas Vector Search to get our documents and LlamaIndex’s query engine to answer our questions based on our documents.
To do this, we will need to have Atlas Vector Search become a vector index retriever, and we will need to initialize a
RetrieverQueryEngine
to handle queries by passing each question through our vector retrieval system. This combination will allow us to ask any questions we want in natural language and match us with the most accurate documents.1 vector_store_retriever = VectorIndexRetriever(index=vector_store_index, similarity_top_k=5) 2 3 4 query_engine = RetrieverQueryEngine(retriever=vector_store_retriever) 5 6 7 response = query_engine.query('Which plant items are available right now? Please provide prices') 8 9 10 print(response)
For the question “Which plant items are available right now? Please provide prices,” we get the response:
1 Mum Fleurettes are available for $4.99 and Assorted Mum Plants are available for $6.99.
But what if we want to keep asking questions and get responses with memory? Let’s quickly build a chat engine.
Instead of having to ask one question at a time about our Trader Joe’s products for our party, we can incorporate a back-and-forth conversation to get the most out of our AI Party Planner.
We first need to initialize the chat engine from our
vector_store_index
and enable a streaming response. Condense question mode is also used to ensure that the engine shortens their questions or rephrases them to make the most sense when used in a back-and-forth conversation. Streaming is enabled as well so we can see the response:1 # llamaindex chat engine 2 chat_engine = vector_store_index.as_chat_engine( 3 chat_mode="condense_question", streaming=True 4 )
Then, we can create our chat loop! This is just a basic
while
loop that will run until the user enters “exit.”1 while True: 2 # ask question 3 question = input("Ask away! Type 'exit' to quit >>> ") 4 5 # exit to quit 6 if question == 'exit': 7 print("Exiting chat. Have a happy fall!") 8 break 9 10 11 print("\n")
Our last step is to send the answer to our chat engine and stream and display the response.
1 # llamaindex ask 2 response_stream = chat_engine.stream_chat(question) 3 4 # llamaindex print 5 response_stream.print_response_stream() 6 print("\n")
Run the above code blocks and try it for yourself. Here are my questions and answers:
1 Ask away! Type 'exit' to quit >>> hi! i am planning a fall party 2 3 4 Consider including a variety of fall-themed food and beverages such as pumpkin pie, apple cider donuts, maple-flavored fudge, pumpkin spiced cookies, and harvest blend herbal tea to create a festive atmosphere for your fall party. Additionally, you could incorporate seasonal decorations like cinnamon brooms, scented candles, and mum plants to enhance the autumn ambiance. 5 6 Ask away! Type 'exit' to quit >>> i want to make a turkey, which three sides with prices and reasonings will be best 7 8 9 The best three side dishes to serve with turkey at a fall party would be Cut Butternut Squash, Brussels Sprouts, and Cornbread Stuffing. Cut Butternut Squash and Brussels Sprouts are reasonably priced at $3.99 and $4.99 respectively, offering a balance of flavors and textures that complement the turkey well. Cornbread Stuffing, priced at $5.99, adds a traditional touch to the meal and enhances the overall fall-themed dining experience. 10 11 Ask away! Type 'exit' to quit >>> which drinks should i serve? i want something caffinated 12 13 14 Harvest Blend Herbal Tea and Autumn Maple Coffee would be ideal caffeinated drinks to serve at a fall party to complement the autumn-themed food and create a festive atmosphere. 15 16 Ask away! Type 'exit' to quit >>> what are the prices of these drinks 17 18 19 $2.49 for Harvest Blend Herbal Tea and $8.99 for Autumn Maple Coffee. 20 21 Ask away! Type 'exit' to quit >>> which decor should i use? i want my home to smell nice 22 23 24 Cinnamon Whisk, Cinnamon Broom, Orange & Spice Scented Candle & Room Spritz 25 26 Ask away! Type 'exit' to quit >>> what are the prices? 27 28 29 $5.99, $1.29, $4.99 30 31 Ask away! Type 'exit' to quit >>> exit 32 Exiting chat. Have a happy fall!
In this tutorial, we have built a super helpful Trader Joe’s party planner using Playwright to scrape all the fall favorite items and the LlamaIndex and MongoDB Atlas Vector Search integration to save, embed, and query our data using natural language.
We have even taken it a step further and incorporated a chat engine into our tutorial to take things a step further than just standalone Q&A!
Top Comments in Forums
There are no comments on this article yet.