Smart Restaurant Choices with MongoDB Atlas and Vertex AI
Rate this video
00:00:00Introduction to RAG and Vector Search
The session begins with an introduction to the concepts of Retrieval Augmented Generation (RAG) and vector search, explaining their significance in the context of AI and MongoDB Atlas.00:10:00Building a RAG Pipeline
The discussion moves on to constructing a RAG pipeline, detailing the steps involved in retrieving data, augmenting prompts, and generating responses using AI models.00:20:00Vector Search in MongoDB Atlas
The video highlights the functionality of vector search within MongoDB Atlas, showcasing how it enables semantic search by comparing vector embeddings.00:30:00Demo: Restaurant Recommendations
A live demonstration is provided, illustrating how the RAG architecture can be applied to generate restaurant recommendations based on user queries and reviews.00:40:00Code Walkthrough
The presenter walks through the Python code used in the demo, explaining the key functions and how they contribute to the RAG process.00:50:00Q&A and Final Thoughts
The session concludes with a Q&A segment, addressing questions from the audience and offering final insights on the future of RAG and its applications.01:00:00Closing Remarks
The video wraps up with closing remarks, encouraging viewers to experiment with RAG and vector search in their own projects.The primary focus of the video is on demonstrating how AI technologies, specifically Retrieval Augmented Generation (RAG) and vector search, can be integrated with MongoDB Atlas to create intelligent applications that provide expert-like responses based on personalized data.
🔑 Key Points
- RAG is an architecture that combines retrieval of private or rapidly changing data with AI models for generating responses.
- Vector search in MongoDB Atlas allows for semantic search by comparing vectors (embeddings) rather than keywords.
- The adaptability of RAG to new and evolving AI models ensures its relevance and utility in various applications.
- MongoDB Atlas facilitates the easy update and scaling of vector data alongside operational data, making it suitable for enterprise-scale applications.
Full Video Transcript
[Music] [Music] [Music] [Music] hello everybody and welcome yet again to another live stream I'm Shane mallister I'm on the developer relations team here at mongod mongodb and I'm thrilled to have you join us for what looks to be an exciting session on Smart restaurant choices with mongod Atlas and vertex AI but as ever before we do that a little bit of housekeeping so while we gear up for the live stream do drop a shout out in the chat on LinkedIn and YouTube and tell us where you're joining from and who you are we love to hear from our fantastic viewers wherever they might be in the world and use the same chat on LinkedIn and YouTube Once the show kicks off to pop any burning questions into the chat and we'll either tackle them live as we go through our show or we'll take care of them towards the end of the show as well too so please do use that opportunity to send me and our guest questions as we go through the show as ever this live stream is being recorded So once it's finished it will be available again back on our YouTube and Linkedin so if life drags you away or something comes up or you miss the start or you can't make it till the end don't worry the recording will exist and you can pop back in and have a look there as well too and of course while on our YouTube page and Linkedin don't forget to YouTube like And subscribe as ever and on LinkedIn to follow to make sure you keep up to date with these sort of events that happen on mongodb's mongodb TV channels and also the hottest news and the latest posts Etc as well too so if you're a season viewer it's really good to have you back and if you're new H you're very very welcome it's great to have you do dive into any of the past shows that we've had to see this and other content from my other colleagues here at mangad TV anyway today we have a very special guest with us norri Halperin from plus n Consulting but also a longstanding Mong goody be champion and I'm really delighted to be joined by Nuri today based in California Nuri is here to share his insights as to how AI Cutting Edge tools are transforming app development so we're going to dive deep into Concepts like retrieval augmented generation which is rag vector search techniques and the integration of AI via mongod Atlas and particularly mongod Atlas Vector search which we announced nearly a year ago now H which is superb time flies but earlier this month at local New York our largest. looc event for mongodb we also announce that it will be coming Vector search will be coming to our community and Enterprise on premise version of mongodb um I know that's a question come up a lot when we do shows like this on AI anyway Nuri is going to walk us through how these Technologies enable smarter decision making providing expert-like responses using personalized data and we're going to see a demonstration of this in action showing how we can enhance in this instance and in this example restaurant recommendations and operations so if you're a tech Enthusiast a business owner a developer a dabler or just curious about the future of AI in Practical applications you've come to the right place so without further Ado Nuri you're very welcome to mongodb TV Cloud connect how are you I'm good thanks Shane for uh having me on your Show excellent listen I'm delighted that you can join us I think and I there's very few of these Nuri but I've met you a few times in person which is always makes shows like this much easier to do but I'm delighted to have got your arm Twisted to join us on cloud connect as well too by way of introduction H can you I always like to get our guests to tell us a little bit about their career path to date and what got them to where they are now so I'd love to do a bit of that and they also want to talk to what I mentioned at the beginning of the intro about you being a Monga to be Champion as well so how did you get here Nuri what's what's been the role or the path to date for you um let's see I think uh I probably started uh in high school with uh uh with a Radio Shack uh TRS 40 um trying to you're showing your age now you yeah but maybe we should Zoom 40 years forward I I uh I have uh I went to to school for computer science and I I've done uh uh software uh for most of my professional career um I uh was part of um uh uh some startups uh doing online dating for many years so that's where I got my feet wet on both the software side and database side because we needed to kind of scale and be able to handle you know web traffic and stuff like that okay so that's kind of where you know I I'm I'm software uh software developer at heart uh but very versed in databases because that's where the business is really I believe and then uh with mongodb I started around I think 2011 um I don't use exclusively mongodb but I use it for everywhere that it fits um and uh I I really liked how Mongo's approach to storing data and retrieving data for applications for workloads uh is is is is really uh phenomenal and and made us able to go through startup with uh social photo sharing and stuff like that uh and really build things and iterate rapidly and deliver fast um uh with the knowledge that uh that that the the cluster is going to be there and thata uh you know availability and and U and uh uh durability is there um anecdotally I left a startup and two years later I spoke to some folks there and I said okay so who's DBA who's doing stuff and they're like um nobody it just runs so I think th those kind of things I like to hear you left it in good shape so so you you started Mong be experience in 2011 obviously you know I'm I'm in mongod be since 2020 so you're well ahead of me in terms of my my curve of learning in mongodb but at that time obviously it was very different than what it is now it was pre Atlas Etc as well too so you were it was on premise Mong B um that you were using at that time right and and uh uh I don't know if I should mention it on this podcast but mango is still available off Prem or or on Prem off Cloud uh for those who need it uh and uh and um you know there there is an aspect of of devop DBA work that needs to be there I believe that job is not gone even if you're on Atlas it's just an atlas all of the mundane tasks are automated but you still need to keep a watchful eye understand where things are going optimize workloads things like that that you know um I think that's where my emphasis is on education I have uh some courses on plural site some of them are uh quite old by now um but uh I I speak about um you know uh mongodb I'm doing a mongodb first steps even uh this weekend at uh SQL Saturday um because I believe people should know about how things work and how to exploit their strengths um you know just this idea that you just have a URL and you throw data at it and you expect it to come back is is okay uh it's it's a nice little Walled Garden but as you mature into you know high demand applications you want to go past that you want to really understand what's going on behind the scenes and how things work and I like to Tinker with things so it's kind of my alley so SQL Saturday they still let a nosql document databased Champion come through the door for that event um you know uh the more you learn right I mean this is a good sign right it's a sign that uh folks in the data Community recognize that this is a polyglot world it's not only Oracle it's not only SQL server or pogress or MySQL or Maria or a plethora of others so um some stuff um is still SQL language at the top and I think some folks are happy to just do that for their whole career some people want to see SQL at the top but on the bottom something completely different which for analytics and for uh large scale uh data lakes and stuff like that is is somewhat available even in [ __ ] uh some bi connectors and stuff like that and uh and some folks are like like I don't care what language what anything like it's it's all about infrastructure and learning new things and being able to store and process massive amounts of data and take on workloads and you to just choose what's the right tool for you so exactly the right tool for the job so you you you started with mongodb in 2011 when did you become a mongodb champion Nuri and what does that entail for you from your perspective um well well I was nominated gracefully to be a recipient of the William Zola award by mango um I don't recall the exact year and so I was an inaug inaugural recipient of that award uh because as I learned [ __ ] as a speaker an a frequent speaker at conferences and uh Tech events I I spoke a lot about [ __ ] I also uh was on stack Overflow uh kind of trying to get other community members engaged understanding and talking about it because let's face it we we don't know everything I I always love to learn from others and share my knowledge um so being a champion I guess at some point there was um the program was named different things over the years but uh I think in the 2015 era or something like that I started being engaged in that and what that gets is a little more exposure as in my name as out there somewhere but um from my own practice it's pretty much I do what I do anyway I'm a trador I I like to you know share the art and uh my discoveries and uh meet others who are in the field and learn from them and share ideas that's kind of what I do anyway yeah and and I know as I said we we've met in person a number of times a couple of mang. local some of our partner events most recently at Google Cloud next as well too norri which is where I first came across the content that you're going to talk about today as well too so um I suppose given your long history as a inquisitive developer and somebody in the data space for a long time everything that we see and do these days and I know somebody comment that every event is an AI event now every company is an AI company before we get stuck into the meat of what you're going to show us what's your thoughts on kind of this what to the outside world has seen you know 18 months of Rapid innovation in this generative AI space in particular where do you think you know what's your thoughts and viewpoints on gen AI overall um well so there's Ai and there's gen ai ai has been hyping and spiking for decades now um geni I think is very interesting because I think we are seeing finally products that where the AI is really capable of delivering products that humans didn't painstakingly make on the spot it doesn't mean they never made them because it's based on learning what humans have done before um but I think you know having for myself if I need some graphics for some promo or something like that and I can go to some system and say hey make me a picture that looks like this and this and that and not have a a human you know painstakingly work on it for for hours or days um you know that's a win now is the quality of it is the innovation of it is the uh product really what you need I don't know I don't think we've gotten to that point where you can blindly trust AI um yes yeah or the product uh but you know it's it's a great tool like we had code generators right that said oh you know automate all that craft I want to just deal you know do do all the needful and then I'll come in as an expert and and work with it and shorten my uh my workload therefore make me more productive I think this is a very mapable uh a very u a very close match to that same Paradigm it's a tool that accelerates some of the mundane work or some of the footwork that someone would have done um but it falls short of I think creating things that were never conceived of before in that since it doesn't really have understanding um or knowledge it it relies only on previous knowledge it is not actually creating so much new discovery it is just reshaping and uh um you know summarizing uh knowledge that has previously done so that's kind of my view of it um I'm excited I think the results are really impressive and it's fun to play with um you know I I don't think I need AI to make me a hamburger but I would love for AI to tell me of all of the hamburger choices where I should go get one uh that's kind of my my approach to it excellent and and and look that's a very good segue into the topic we have uh today which is you know smart restaurant choices so let's dive into that subject and um let's get your we've got a couple of slides and a demo which is always good I love to have demos on the on on on the show too so NRI why don't you start going through kind of the Genesis of this and why you put this demo and example together and what inspired you to do this and and as I said at the beginning anybody with questions on link or YouTube please add them into the chat and if they are you know aligned with what we're talking about the moment we'll bring them straight on air and answer them straight away so please throw them up there and if not we'll take care of the rest towards the end of the stream as well too so Nuri over to you why don't we get started and and we'll make our way through this ah sure thing Shane so uh this uh this slide deck was uh developed uh to kind of showcase uh using AI for making smart restaurant choices um and I'm using vertex AI uh but uh really it's an open source repo and uh I'll I'll flash that URL later maybe uh Shane um so people can just download the code and adapt it and you can use you know chat GPT or your own uh whatever model endpoint I'm just demonstrating kind of the architecture there um so the thing though from educational perspective or from what I want to deliver is is this notion of using AI in a practical application uh so there's a few things I want to impress on y'all uh first of all that there is no real ai ai is a marketing term for using techniques of machine learning and um and and and quite honestly math uh to to produce certain transforms on inputs to uh that gives you certain outputs uh at the end of the day this is all of computer science right you have some inputs you put it in the meat grinder machine and you get out some outputs and uh you you just have to find the right grinding in order to to get you the outputs you want I love that that there's no AI only math I've heard others describe AI as neither artificial nor necessarily intelligent as well too and I love the focus on this and how helps definitely yeah I mean mathematically speaking uh I don't know about intelligence because you have to reason about what intelligence is but there is a chance for AI to produce things that us as human would never think because of our own constraints either constraints of time or constraints of uh social conditioning or anything that we will oh we will never think that uh but the AI is like I just think in all directions so you know so there there are aspects of of machine learning and things that that to us uh either things are imperceptible in the inputs or Unthinkable as humans that AI just cuts through so that's maybe a chance for Innovation I don't know true um but I'm I'm not trying to dig too deep here I'm just trying to harness these things uh which is a standing on the shoulders of giants aspect here which is you know there are already Frameworks and uh do I have the uh the the budget to compete with uh people who spend Millions tens of millions or hundreds of millions to to build pre to to train models to to make them work I don't have that budget but I can use that right I can rent a slice of it and uh that's kind of where I'm going in a pragmatic approach to to harness this um and you mentioned that there were Choice obviously you used vertex AI in this instance in this demo Nuri um but there are choices and obviously with the anyone can take the repo and choose their own why was there any reason to choose vertex AI for you in building this was it easier to get up and running with did you understand it more what was the reasoning in this instance um I think I built several iterations of this um and vertex AI for uh Google Cloud next was a good fit um obviously uh I would say that from a API surface perspective um the various libraries in Python that interact with vendor a or ver vendor B could be somewhat different uh and there is the chance of of using Lang chain or something that abstracts that API from you and just haha just works um and power to you if if this just works for you but uh one things I one thing I do like is digging in and exploiting um the exact nuances of stuff so I like to work as close as possible to the driver or to the API caller and putting it together uh because that's a way for me to understand better what is going on um uh so vertex API exposes a fairly clean API to do that you create an object you know connected to your account and boom uh others have similar um yet not completely um uh identical steps to get there um I I did work with you know Azure um Azure machine Learning Studio and uh using their API on uh uh for for chat GPT interface um similar I think in effort um and then again there's there's Al always the option to host your own model and not use a that talks to somebody else hosting your model and embedding it but then you know it becomes a hardware issue you need you know a good G to make things run fast enough um so so yeah so that's uh yeah so yes I think how how did I choose that exactly I don't know that I chose that exactly it was fit for the presentation um of course but I did play around with several of them and I can say that to date I found the process to be less than um less than perfect in any ones that I tried in the sense that just get pull and then compile run or run interpreter and run never worked out of the box there's always these things to take care of like oh oh and you have to do this Li oh and this is the real stable one for the version and oh somebody just update uh there's stuff there's stuff yeah okay okay you you've got choices essentially maybe that's something for uh open AI to uh or for any company to to kind of to Kind of Perfect is the setup process for things yes yeah most definitely okay okay good somebody would say Docker I'm sure but yeah yeah please add in the comments if you have any particular choices or any particular favorites and other others that we could swwa in vertex AI instead of Etc as well too so yeah okay so uh in order to make things practical uh let's just list what I'm after here what what was my goal in this little project so I want to have a user ask a question so this is a chat type uh sure bot API application whatever and I want the answer uh to be such that it is using my own data uh and using my language let's speak English here uh and then I wanted to respond in a way that sounds like an expert meaning I don't want to just go and say oh here's a document read the PDF I want it to summarize tell me something in my language and say aha you asked me to for a restaurant recommendation here it is here's what I think in that color because that gives you a humanik interaction that that I like and I wanted to be a little more intelligent than just uh doing um you know like I found exactly what you want or I found nothing I wanted to be like I kind of understand what you're looking for so let me suggest this one even if it's not a match right yeah and and that's the thing I you know hit the search button wait three seconds and like we found nothing you're wrong why did you ask us this question like no dude like be nicer yeah yeah yeah that that that makes a ton of sense and and obviously you know what we're doing here you know is is and we look we'll get into it it's semantic search it's not keyword search etc etc so you'd want some suggestions you want a bit of intelligence right right so as a software architect I drew this diagram uh B's hungry okay there's a user Bob they're like where should I eat and then the apps has to act like an expert and give him like Hey this place slaps um I know this is recorded and slaps will come out of uh colloquial use in about five minutes so so slaps means here this is great uh so yes yeah as opposed to so slaps is because it yeah it could it could be either way right so slaps is great right as opposed to this sucks in in this five minute in time yes uh until all of the cool kids decide that it's not the word to use no more or maybe they have decided and I don't know it yet who knows yeah perfect well look I mean this is a the one thing I liked about this when I saw it first time was you know we've all used similar apps for these sort of things kind of you know trying to find use Yelp or Google Maps or all of these other things to help us find places and you know uh they're they're they're messy they're just big long lists of things based on whatever you may have searched for not necessar an expert right which is what I'm uh trying to impress on us here is that you know the idea of keyword search um is is fine and was fine for an era uh before we we had that we had to employ humans like painstakingly predicting what you want and giving you constrained dropdowns or checkbox lists to say I want the restaurant that has a burger that is open that is blah and okay that's that's reasonable to to say aha you'll match exactly those things because you know a priori that this is these are things that people supposedly care about but they weren't expressing nuances right uh so there's trouble with keyword-based stuff which is it's very synonym driven uh meaning somebody has to compile uh the list of CN synonyms for certain things uh so a word like slaps or or uh awesome or uh I don't know whatever uh could could all kind of mean the same thing right it's a positive sentiment about something uh but we couldn't do this for the whole query for the whole language we we we used keyword matching and keyword matching relies on uh a prior knowing how to map certain tokens uh and then stemming um depending on your language in in English it's a it's a big deal uh you know being able to say these two words are the same you know octopus octopi uh horse is something um horses is something um you know uh so s um stemming would would make it so that the word horse or horses is the same but semantic semantic meaning doesn't always and if I say horse and I say seahorse you know there's two different animals don't live in the same realm at all um so so having having something that is beyond just a token matched um you know if I want to ride a horse uh is a seahorse ridable I don't know so so maybe few hundred of them might be yeah you a few hundred seahorses maybe you could do that right yeah maybe I don't know how trainable they are uh maybe they are I don't know indeed I digress yeah so so so the the reason I I I don't love the the idea of using search engines I think still search engines have their role and they're very controllable machines we we made very good search engines um the user ask a question yeah they allow that and the Machine does answer yes it does and it looks at your data meaning the data you fed into the search engine is the data that it's going to search over but it's not exactly using my language in other words all of us have been trained like in Google or in Bing or in whatever your search engine is to go in and say aha you know uh horse uh Nega you know minus sign C like horse but not seahorse or something like that but you know we we're we're kind of like trained to use its language it's not using my language it's not like hey man I'm like or like excuse me sir I would like to like it doesn't use my own language and my own way of thinking and saying things which means it doesn't understand me really no we're adopting and does response like an expert yeah we're adopting to how we know search engines work or have been adopting to how we know they work over the last 20 years right yeah we think we're training the dog the dog trained us to give it food whenever it looks at us a certain way so like are we communicating yes but we're communicating in their language not ours so it's it's a little weird yeah and and certainly it it doesn't respond like an expert because what it gives me back and that's my problem is I'm hungry I want food I want you to tell me what would match the experience that I'm describing and what you're giving me is like here's a list go and scroll through and read each one and try to understand why it came back and whether it really matched my query so so there's a issue of lack of trust it just give me a laundry list of things it thingss kind of matched most closely but it doesn't infer anything for for me it doesn't intelligently say aha this is what it is and give me in my language a reasoning of why I chose it so what I'm trying to do is reduce risk fatigue of having to scroll through results and tap and and go in I want to just get answer to my question that's kind of what I'm after it's it's a direct let go through a few more uh uh slides and let's see if if that kind of lays out uh um so to to do this this is where really I'm I'm uh going to use Rag and what what is rag rag is retrieval augmented generation uh so it's acronym and retrieval the r in rag means it's using my own data so uh we talked about pre-trained model I'm going to talk about it a little more let's leave that aside for a second put a inin in it and retrieval is the notion that I'm going to do a process where it's leveraging my own data augmented means that I'm going to do some prompt engineering I'm going to transform my query in a semantic way to be able to look for or look through or sift through my own data uh and then the generation is that with the own data that the augmentation the augmentation uh process will take some of my own data and shape it into a query to the llm the L large language model for it to then create the final answer so it's really kind of a pipeline um of saying okay let's let's do the retrieval the augmented in the generation uh part uh so what it looks like is a little more like this if we're digging into the architectural diagram where Bob says oh to the app oh where should I eat uh then that prompt uh gets uh sent to a embedding uh model uh and I get an embedding back we'll talk more about that and then with that embedding I go and query a vector search uh over my data so rather than doing keyword search I'm doing semantic search I'm doing Vector search uh and then I get back some of my data and with that data that I get at step five here I'm doing some prompt engineering packaging that data that I know is a response uh and transforming it into a query for the llm and saying hey given this now please distill this for me and give me the answer I want to read uh so I'm giving it to an llm and it creates a cision for me it and it generates a final answer and then that gets returned to the user MH so there are are several actors involved in here uh to to to be able to complete the task let me highlight though the rag right so the retrieval portion is really in rag is really that I'm taking the prompt I'm creating an embeding and I'm using this embedding which is just a vector to Vector search mhm yes yeah the augmented is the fact that given that search result from Vector search I am generating a a smart prompt for the large language model for the chat GPT of the world or vertex AI or whatever model you're using vertex is not a model vertex hosts models so whatever model I'm using with that and then finally the generation is that part of taking that engineered prompt feeding it to an llm getting the llm response uh that's the generation part so that's right very clear very clear yeah yeah um so what do the models really do the model is using that prompt uh and it is generating a completion so what the way models are built is they are mapping all of what they learned into an n-dimensional space and um a model is really when you give it a prompt it really says given that kind of prefix this kind of initial thing what is the most likely next token and token here means word so it says you know I went to the you know Bakery school work no not the work so it won't be work right I went to the you know mountains to the beach to the something right so a model learned a whole bunch of stuff and it never said saw I went to the work I went to work I never went to the work right so it's looking and frequency wise and otherwise it may see oh you know the most likely answer to I went to the is Beach M and then it says Ah Beach most likely and then maybe Park and then maybe you know concert and then maybe something else who knows uh so that's what models do really you you feed them something and they try to generate so the idea here is if I created a query that has the initial parts of you know I'm looking for a restaurant that really is instagrammable dishes then in all of what it learned about what instagrammable and things like that it is looking for tokens that it can complete aha I found this that and the other yes that's that's when you're augmenting then so you've done as you heardred the previous diagram you're augmenting what you're passing through into the llm for that proper final answer by giving it more context right right that makes sense yeah and and as we seen the more context the better I think people have heard the term prompt engineering over the last 18 months or more um so you know the more context and the more like we might get to see it now where you'll say you are an expert restaurant reviewer or something like that please return back extra yeah yeah we're um we're uh we're kind of on track as we're speaking on things but we may clarify a few things through the demo so um let me see if I can uh blast through these few and and we'll we'll see in the demo and maybe in the code even if we have time uh how how things are put together uh but I wanted to just touch on how models are built because we're really thinking of models as having like knowledge of any kind it it really is just mapping what it found in whatever it was trained on so models are trained on a bunch of text um that they found and it could be an article it could be a Q&A between you know systems that got recorded it could be books written it could be all kinds of stuff so the knowledge it Intimates is just the fact that there's a likely token after the previous token right it just kind of Intimates these things so it's able to kind of complete your sentence no matter where you are or go on and complete more sentences uh so it's important also when we pick models to pick models that are fit for our problem so if I if I I have a model that's been trained on on scientific papers uh I wouldn't ask it for slam poetry unless there's a lot of scientific papers on Slam poetry like there is some domain uh spec domain specific specificity about these things and uh some kind of interaction way that we as human expect so if I want a completion to sound like it's the right completion and I should have something that's been trained in in a similar way at very least understand my language it was trained on English Corpus and things like that yeah that makes sense it's a question we get all the time is how to choose the models it really depends on what you're trying to answer you know what what you're trying to produce at the end of the day yeah yeah um and the the the retrieval part which is really where I'm using mongodb Atlas is using what's called a vector search and a vector search is imagine all of these tokens we we don't really intimate the knowledge of you know the llm doesn't know physics or know anything it just you know knows that there's parts of words and things that it put together um and this is the most likely token the vector search is is is relying on the fact that in doing so in so doing when you train a model you're actually encoding everything into a vector a vector is just a sequence of numbers an array if you will and um whatever text came in it can map it to a vector and the important thing about that is that uh if I have a query I could transform it into a vector using the same engine that the model uh did and thereby say you know what if I can tag a piece of text and Digest it into a vector and I can tag my query and digest it into a vector I can compare these two vectors and if the meaning meaning the vector kind of you know got crunched into a vector that's really close from a um uh geomet geometrical uh comparison then it's likely about that it's likely semantically similar and if it is very different then it is not so this is really what Vector search is about about it uh in Vector search what we do is we encode the pieces of knowledge the text it could be images too by the way uh we encode those pieces of knowledge into vectors store them together in the database slap a vector index on them and then when we come with a query we transform the query into a vector and compare them against the vectors that already exist so it's a very mechanical operation you know it's just math it's just like hey do the distance uh so the next slide here is about the fact that there's different ways to compare vectors um you know the vector v here in green is um a certain Vector let's say and it it has different similarity to the other vectors already existing in the documents a b and c and we could compare them on the angle between them uh we can compare them on the final Point meaning on where the dot is at the end of The Arrow kind of thing uh or we could do dot product which is uh looking at uh normalizing the length of the vector and uh and looking at the angle at the same time so uh there's different um ways to compare vectors and which one you choose uh should match the embedding model you chose so that's very important if I have an embedding model that was trained with cosine I should use cosine similarity when I'm defining my uh my Vector in in Atlas perfect perfect explain maybe quickly and and it's hard to comprehend cuz we're we live in 3D spaces and maybe we understand the fourth dimension but n space Nuri uh yeah so you know two two Dimensions here in a graph is kind of easy three dimension I can maybe squeeze fourth dimension imperceptible to me um uh but n Dimension just means there's more and more dimensions and these are you know this is a math construct so the math itself doesn't limit itself to just what we perceive uh the reason we encode it into longer vectors is because it allows more nuances in in bigger space so imagine I'm encoding two very separate Concepts in a onedimensional space I really have only you know well one dimension to compare it on it's really difficult for me to put like and nonik things uh uh on on spread them across the space in such a way that the things that are really close in meaning are close and the things that are really far are elsewhere but in end dimensional space I have more places to tuck and put clusters of things that are kind of you know similar um so people do encode in higher Dimensions than than two or three um there's actually a lot of math behind it and a lot of research it turns out that having a longer Vector doesn't necessarily give you a better model but uh but there are certain quants that people kind of settled on through research and the one I'm using in this demo is 384 Dimensions very difficult to draw yeah good luck with that we'd be here all the time perfect I I'll skip this um H just to get to maybe a demo because we're we're kind of uh long it um good conversations though yeah do please yeah um so Vector search the the the tldr for it is you know it's a specialized index or database that allows you to search given a vector so we're used to search engines we give them you know some some words keyword with some ands and ores between them and quotes and negative signs and plus signs um you know SQL uh you give it terms and equality inequality or reic Max stuff like that mango you give it mql a query with some match well here it's also a query with a match in in mango but against a vector index and the vector index is expecting in a a vector to be given to it and it gives you what documents have a vector that is closest to it that's what it does yeah well explains and as I said at the beginning we've we it's a year or nearly a year since we put Vector search out there and the the beauty there's lots of databases that handle Vector search but from a mongodb perspective we store the embeddings that you've been talking about Nuri alongside your existing operational data so you're not doing any extra round trips when you're build in your gen applications that's right and that is uh like I have a collection here uh that I stored my restaurant review so a restaurant review is really it has whatever Fields I had in it anyway right I I just had some some data about it uh who's the reviewer is and when they made the review and what the rating they gave and some text of what they actually typed and then I just in order to use Atlas search all I did was encode the document with a vector representing this review text MH so I can still use this collection for the same oltp workload I did before if I want to show it or manipulate it just every time I re every time I update the text I need to update the vector which I do which is a very valid point yes yes yeah that's important to do but it's fast I didn't have to retrain a model so if somebody needs to update their review updating this Vector is done almost at the same time and the vectors uh and the vector index will be updated as well automatically for me by the fact that I wrote it mhm so I don't have to build infrastructure around a feeder and a crawler and a thingy myob and all of that I can stitch it all very quickly uh using um well e either offline or uh in embedded in my application or in uh a trigger fashion uh whenever I this right happens trigger in action that just re reattaches a vector and I'm done yeah let let me even show you here cuz we're you know uh talking about Atlas the way it looks is that if I have a database um let's see here so I have a database and on a collection um in the demo uh database I have a restaurant reviews collection and then there's simply I I just created a search index on it um and that search in index uh is is a vector search uh type of index so I can um I can look at the index definition I just said hey create me an index on a specific field mhm mhm uh uh it's a vector uh index and I'm using cosine similarity because the embedding model I used is cosine uh and the number of Dimensions that that uh embedding is using and and that's that so I said uh you know give me a give me a an index on something yeah that's very simple very clear and and we' have a lot of anyone familiar with Mong to be developer Center where the de developer relations team and others put up their content um you'll find many examples of building kind of rag type applications up there as well too and as you said n I'll get the the repo URL off you towards the end of the stream as well we we'll put that into the comments great awesome um so yeah so that's Vector search I'm going to use uh Atlas Atlas uses something called a&n um which is different than KNN so KNN is K nearest neighbors a very you know the the Workhorse of clustering and similarity in the AI or machine learning world umn is is a approximate uh nearest neighbor and they differ in accuracy uh K&N would require me to compare the vector against all vectors in the database and if you have a 100 documents well you know comparing a 100 times is no biggie but if you have a ion well that becomes quite cumbersome and uh performance might suffer so Ann attempts to solve it using something called a hierarchical navigable navigable small world it's it's basically a tree structure that that divides the the space uh n space okay two-dimensional some some dimensional space into blocks and kind of drills down it says oh I have the whole world and then I have continents and then I have uh States or something or or countries and then I have zip codes let's say so you know you have this kind of hierarchy of blocks where you say ah well this Vector seems to be referring to this continent this country this state and this ZIP then in in that I already by five by four steps in I'm already at the ZIP code level and now you know I don't have that many to compare against yes and it's approximate because you can imagine that you know if I'm looking looking for somebody within my block whatever the block definition if I'm on the center and I'm saying everybody in my block is kind of close to me if I'm at the edge of the block well guess what the next block might have be have something that is closer to me from a physical perspective but it just happens to not be in this block so that's where approximation can occur and uh you you sacrifice some accuracy for Speed and efficiency and if that not right for you then you'll need do technique but for most practical uses this is this is great this works yeah well there's always that tradeoff accuracy speed efficiency Etc when we look at these type of examples and yeah I I and especially when as you say the data sets can be enormous you know hundreds of thousands if not millions that you're trying to search across as well too so right right and that's the thing is with mongodb typically you know you can you can feed any size application and and back it with [ __ ] it it you know especially in the beginning days of any startup you don't start with a lot of data but the ability to grow and to scale out is really what we're after and that's what I like about having a vector search ability for my documents because I don't have to worry about managing this separately and having separate infrastructure and doing a whole bunch of Cru just to get there yeah no that that's a Fair Point thank you for pointing that out to the and and the other thing too about having your operational data alongside your vectorizer your embedded data is that you can pre-filter you know as you say in certain examples we could troll through pick countries or geographies or something like that that allows you to be running um essentially your query you're embedding query in a much smaller smaller group of data as well too so that's right that's that's right yeah so that that's really the advantage of using a robust Vector search that doesn't just do Vector comparisons or or uh or a&n but it has the capability of prefiltering and making the amount of IO necessary even smaller uh and it you know speeding is all about IO perfect perfect uh so so yeah so Let's uh just take a a look at a quick demo of what what it looks like and maybe we'll have time for code I I hope so um so we will uh let's say here uh run my uh AI foodie body which is this python script and we it will prompt us to say what you know what kind of food I want so you notice here I'm not using search engine terms I'm saying hey I want the best poke place that has the best tuna on the island uh and this is a data set from Hawaii restaurant reviews so that's what it's doing so it's Computing and embedding for this query MH uh and then it's fashioning a atlas mql query to to retrieve after it retrieved back the document that matches the best then it gives it to the M MH and it says given this document that I composed please you know generate a recommendation for me that's that's that so uh given given this flow maybe I can um let's see here I need uh vs code so you know this whole thing is is just a script a bunch of python import uh that I I bootstrap with a connection to Atlas and so forth um and I've defined a few helper functions just just to reduce the um reduce the um um the amount of of of code we have to look at uh but basically we're starting with something called a sentence Transformer and it's using a specific embedding model um and this one was a cosine train so that's the one I'm using mhm uh and then with a Transformer I take the user question which was hey I want a restaurant with a best Pokey on the island and it just you know encodes it into a vector and makes it into a python list because that's the way I want to see it so that's the embedding that I got that's the r right yes uh so I'm preparing a query for the our portion of the show um and then with that embedding I'm giving it to something to format an mql query so the mql query um is is no more than just filling out a template I'm doing a mql mango query language aggregation framework thing uh I'm creating a vector search query uh it has to start the pipeline starts with a vector search and I'm giving it the embedding of the query that's that's about it and I'm saying hey you know pick me up to 180 results and this and that but among them I'm already doing a little more processing I'm taking the actual review the text of the review and then I'm actually aggregating and saying Hey I want to add up the scores of each individual review and then the rest around that one most reviews that are relevant to my query that's the one I want to focus on okay makes sense yeah so that is to reduce scrolling I want it to give me a definitive result so I have to choose what makes a result definitive I could evaluate the results it gave me in different ways I chose this one okay fair enough good explanation yeah so good given that uh then I format an mql query that's the pipeline and um and then I'm using a [ __ ] client uh because I have to talk to [ __ ] given the UR for Atlas and on this collection I just do an aggregation with this pipeline query so that's a retrieval I get back a document I take the first one Subzero um then I want to create an llm prompt so the way I do that is I take the document the that I got and I get the reviews out of it I get the first several views up to 10 of MH and I just join him into line by line I it's a very simple prompt engineering here but I add this element here I say summarize these recommendations to tell me why I should go to the restaurant given my criteria so really here is where I'm saying Hey I want you to reason I want you to be friendly to me and speak my language and tell me given those reviews that I know belong to One restaurant that I know actually is one of the reviews that scored High given this tell me should I really consider going there yeah yeah which is clear it goes back to the slides you had earlier you know where you are you know putting a context around the query then so for the llm to come back with a proper in my own language response right exactly yep uh uh and then you know then the llm response and the llm here uh happens to be a Vertex so I have some things around feeding an llm which have to do with temperature like how accurate versus imaginative SL hallucinating do I allow it to be um and things like that but that's you know kind of llm dependent uh you'll have to experiment with that um uh and and I'm using a specific model I mentioned verx AI in itself is not a model it's just a hosting for models it's a a a framework for for uh having models as a service uh so I'm using a pre-trained uh model that seems to be trained on Q&A type corpuses which means that this kind of asking it a question and expecting a result is is up its alley uh so I'm using that one versus other choices but yeah okay yeah that makes sense so then given a model I just say hey predict for me please that's the completion portion right I I Engineer The Prompt I give it the prompt and it gives me an answer that's that's about it okay so all told it's a it's you know 80 100 odd lines of python code to to do what you've just done here right I I I guess so I mean I I've done a little abstractions uh things to make it maybe easier to expose a function and things but those are just for dactic purposes if I just blit it out blurred it out all I need it would be less than 100 lines of code and if I used an abstract a higher abstraction a lang chain or something it will be even less potentially um but uh I like digging to things so uh this allows me to fiddle with things um a little more readily perfect yeah no super clear excellent excellent so yeah I I think um it's it's always the cleanest simplest examples that that they're most illustrative so Nuri in this instance I think and and we've got some comments coming in on the chat around that as well too so we can look at those anybody with other questions and and thank you for everyone joining from all over the place from Kenya and the Philippines and Mexico and Brazil Etc as well too it's been great to to see so please were few moments left any other questions to come in Nuri how are you for a couple more questions if I can interrupt I'm happy to yeah I'm happy to yeah I I mean we're we're done good good good with the demo I happy to answer any and all well I we Sarat I hope I'm pronouncing your name correctly anyway and and I'll take a little stab at this but I'd love to hear your opinion as well too um the adaptability of rag I suppose in the context of that yes it is an Ever evolving AI landscape but you know the llms that Nu is speaking about are trained on a you know a set Corpus of information possibly some of them at a set time we've seen that some of that information can be old um some of them are much more up toate these days much quicker much more releases all of the time as well too but still most of the llms that are out there if not all of them are trained on publicly available information they're not trained on anything that may be private to a company or to an organization as well too so that in the context of the fact that 80% of the data that's out there is in unstructured formats what do I mean by that I mean you know PDFs documents everything that you can imagine is an unstructured format even images and videos Etc as well too so rag is a is an incredibly powerful mechanis mechanism and tool to allow you to query across what might be private what might be very unstructured and augment the query that you do with the llms kind of natural voice as nearly as nurri has shown Us in this as well too and so I think the future for Rag and for rag applications is enormous Nuri um yeah I I I would stand behind everything you said I would add U maybe geeky uh observation which is uh rag uh as Shane pointed out is an architecture not a llm and not a model rag is just the notion of building that pipeline I showed you that's all it is now what is the future of rag the future is bright as in this method of pipelining this is here to stay it is useful because your data is private because data is rapidly changing because because because is it evolving in itself yes that architecture actually is evolving to uh from finding that people say aha instead of using a completely generic model like chat GPT or something or uh whatever I I download llama or whatever um instead of doing that I'm going to post Trin the model I'm going to fine-tune the model so now people are doing rag on top of a fine-tune model that they tune occasionally but keep the rag aspect of rapidly changing or private information still that that piece of uh Machinery still around so uh I think rag is here to stay I think people are understanding uh the the uh the the utility of just completely generic models versus post uh post tuned models uh post tuned perform better so people are starting to do that uh and then rag itself is very open in terms of um uh its Evolution because you can I I could switch this to chat GPT I could switch this to llama I could switch this to use a different model so uh I'm not I'm not stuck in time it's like oh my god oh that old hat no if they update the model update the version of a model or there's a model that performs better for me boom I change it no no biggie uh so I I think it's a very flexible and Powerful way to deal with things yeah excellent and and a follow up from Sarat on that as well too is on an Enterprise scale is it easy to add or scale the vector database with new data once the rag implementation is already moved to production well Nuri did touch on that a little bit so yes absolutely yeah a resounding yes this is what it's made for yeah the the the as I said in the demo um once your document whatever the basis is that you're learning or whatever the documents are uh you can put a trigger on the change to the document itself to just re-encode it and update it and the atlas Vector automatically picks it up uh so once you created a vector index it watches that those documents and it will update the index for you behind the scenes automatically no extra work for you so it's it's perfectly suitable for this yes yeah and it's an ideal way to keep everything current everything current so so that's great um any other questions throw them in we've got a a couple of minutes to go Nuri I suppose one of the questions that I love to talk to my guests about is how do you keep up to date how do you keep a breast of the changes that are happening in this space they we touched on it earlier it's pretty rapidly evolving where do you go to learn more or is it podcasts blogs what do you do um my preferred learning mode is trying it out uh which means as I see blogs or uh podcasts or Dev Center articles or whatever um uh then I like to uh go and try it myself do some experiments I mean this was born out of an experiment um so so I I like learning by examples um and uh mongodb Dev Center has a bunch of things about this and uh what I saw and didn't quite like or didn't quite learn and I I learned On My Own by going and researching more and just yeah okay so very much learn by doing and getting rolling up your sleeves and getting under the hood and as I speak in conferences and talk about it I talk to people who do this and then you know then then I learn and I get exposed to more ideas and more nuances and more problems that people may have and it helps settle and cement my understanding of things yeah excellent I got a one in from DD on YouTube there while watching the demo I got thinking if we transform this in a product the text result maybe is not the ideal format for a user that wants to find a restaurant right I trying to figure out what you're going to get yeah I know you're say what what it generates at the end is a recommendation that is text do you want to text a speech to it and just have a synthetic voice say hi I think you should go to the thing sure go for it if you want it to draw a picture based on it okay if you want to just return at the end of the day the name of the restaurant then I didn't need to generate any text with the llm the vector search alone might be enough and that is something to understand like the building blocks of rag are an embedding with Vector search uh and and the llm part if if all you need is to retrieve the results and you have a different way of presenting them or it then sure you know that's what we do in software all day what's the requirement oh you want to do it this way okay take the input find the meat grinder make the output you know yeah and I think and and you had it in your demo there we might have maybe gone through it a bit too quick was essentially you were returning the restaurant with the best with the best ranking and then you were running all of the reviews for that restaurant into the best ranking into getting that summary for why it was the recommended restaurant I was yeah I was cropping the best ranked reviews for that restaurant in order to feed it to the llm under the theory that if these are very fitting reviews to my query then this is really the best basis because yes a restaurant could have some raving reviews that are really relevant and another one that said the waiter looked at me funny now it may be a real problem that the waiter scares away folks but you know it wasn't about my query I asked about the Pokey about the food quality about the you know best uh food or something so uh retrieving and generating best on that would generate a weird response which is like oh well this is a top restaurant but the waiter looks at you funny that's you know it's not what I asked about yeah and and I I think you mean that summarization that can be done across reviews like that is very helpful because we've all been there before trying to find as you say restaurants some reviews are glowing some reviews are really bad some reviews have nothing to do with the food it might be about the parking it might be about the temperature of the restaurant not the food and and you know I'm currently trying to book a summer holiday for my family and I'm getting sick of reading reviews for hotels and and places to stay because they're so variable whereas if you use something in the summarization as you've done um you know that that can delineate out the the you know the outliers as it were um and give you more concrete information because we're all a bit stuck for time we want to get the overall summary and this is where you know I I like to say you know no SQL doesn't mean no DBA uh AI as a service doesn't mean no machine learning uh expert uh or or or person minding it what you did just now is the optimization of this now you're taking into consideration that not all is relevant how do I trust a reviewer a reviewer I don't know those are questions to answer right uh are those relevant to my query and if I only pick the relevant to my query maybe the other things so much outweigh him that I shouldn't go there right so uh a a proper data science operation would be to look at the data set at large compare these comparisons run run real user queries that come in see the results compare the star rating to the reviews maybe I'm getting glowing reviews really recommending it because they're the most relevant for my query but star reviews that are really low I can play with that and I could try to plot it out chart it out and and tune my result or my feeding or my prompt engineering to to produce something that is based on that and that that is again I I I don't think that just oh I wrote some python boom I'm done with a project no this is something that we need to to babysit and and really understand and evolve as well uh it's it's not a one time I did a POC and now it runs forever it's a it's a live interactive maintenance task excellent and I know you had it up in vs code there on the screen but there is the address and I'm just going to put it into the comments or the URL for the repo H for this so Nuri I presume you're open to people forking that cloning that playing around with it coming back maybe creating some PR for you Etc as well too yeah OSS yeah absolutely excellent let me drop that into the comments as well too so people can pick those up um we're just about to wrap up any final words you mentioned that you're a doer so you played around and you got going and you like to roll your sleeves up any final words for those that might be inspired by what you've shown them here today now Nuri uh I encourage you all to just dig in and try it and and understand it I I think it's I think it's here to stay and I think the results that it can deliver are are really are really exciting I I think it's really well well worth the time yeah excellent listen it's been a pleasure to have you on on the cloud connect with me Nuri always a pleasure to chat to you uh sequel Saturday is your next event and and if you aodb events coming up in your in your future as well too uh I might be in some of the dev days uh around uh California um or the West Coast we'll see and uh I have kcdc Kansas City developer conference I'll be speaking about uh some mango related stuff as well coming up the end of the month um so that's yeah that's my next two okay EXC oh in South Florida uh South Florida um data um I have a a Meetup next week coming up yeah look my look me up on LinkedIn you'll you'll see that stuff yeah excellent and and thank you for those that have asked the questions and for the kind comments towards the end of the show we do appreciate it as I said we stream very regularly this show uh generally was Thursdays for the last nine 10 months we're moving it to Tuesdays as of next week so please tune in you'll find uh events listed on our LinkedIn and on our YouTube page as well too for any of this type of content and any of the other demos um as well go to mb.com developer that will get you into our developer Center um and as always if you want to interact with our mongodb community go to m.com commmunity um you will find our forums there are heavily trafficked by our wider Community users by our Engineers by our developer relations team as well too and as we touched on just there we do a lot of events um and we have our series oflocal events which are in about 23 cities around the globe we've probably done about nine or 10 I think so still quite a few cities to go but if you go to Mong tobe.com events you will find the details of those. loal events and other events that mongodb are participating usually with some of our partners our Cloud Partners or Tech Partners as well too norri it's been a pleasure to have you on board thank you so much for joining me and making it so clear in terms of what you built and how it was built and showing us under the hood the code the Clusters back a MB the document that you had everything in there it's been super clear and I think hopefully our audience and viewers like that as well too so thank you once again thank you Shane it's been a pleasure excellent and to all our viewers thank you for joining please stay up to date on mongodb's YouTube and Linkedin for other events and shows such as this but for now for me Shan McAllister it's been a pleasure thank you all for joining us again take take care good luck