MongoDB Blog

Announcements, updates, news, and more

Bringing Gen AI Into The Real World with Ramblr and MongoDB

How do you bring the benefits of gen AI, a technology typically experienced on a keyboard and screen, into the physical world? That's the problem the team at Ramblr.ai , a San Francisco-based startup, is solving with its powerful and versatile 3D annotation and recognition capabilities. “With Ramblr you can record continuously what you are doing, and then ask the computer, in natural language, ‘Where did I go wrong’ or ‘What should I do next?” said Frank Angermann, Lead Pipeline & Infrastructure Engineer at Ramblr.ai. Gen AI for the real world One of the best examples of Ramblr’s technology, and its potential, is its work with the international chemical giant BASF. In a video demonstration on Ramblr’s website, a BASF engineer can be seen tightening bolts on a connector (or ‘flange’) joining two parts of a pipeline. Every move the engineer makes is recorded via a helmet-mounted camera. Once the worker is finished for the day this footage, and the footage of every other person working on the pipeline, is uploaded to a database. Using Ramblr’s technology, quality assurance engineers from BASF then query the collected footage from every worker, asking the software to, ‘Please assess footage from today’s pipeline connection work and see if any of the bolts were not tightened enough.’ Having processed the footage, Ramblr assesses whether those flanges had been assembled correctly and identifies any that required further inspection or correction. The method behind the magic “We started Ramblr.ai as an annotation platform, a place where customers could easily label images from a video and have machine learning models then identify that annotation throughout the video automatically,” said Frank. “In the past this work would be carried out manually by thousands of low-paid workers tagging videos by hand. We thought we could be better by automating that process,” he added. The software allows customers to easily customize and add annotations to footage for their particular use case, and with its gen-AI powered active learning approach Ramblr then ‘fills in’ the rest of the video based on those annotations. Why MongoDB? MongoDB has been part of the Ramblr technology stack since the beginning. “We use MongoDB Atlas for half of our storage processes. Metadata, annotation data, etc., can all be stored in the same database. This means we don’t have to rely on separate databases to store different types of data,” said Frank. Flexibility of data storage was also a key consideration when choosing a database. “With MongoDB Atlas, we could store information the way we wanted to,” he added. The built-in vector database capabilities of Atlas were also appealing to the Rambler team, “The ability to store vector embeddings without having to do any more work - for instance not having to move a 3mb array of data somewhere else to process it, was a big bonus for us.” The future Aside from infrastructure and construction Q&A, robotics is another area in which the Ramblr team is eager to deploy their technology. “Smaller robotics companies don’t typically have the data to train the models that inform their products. There are quite a few use cases where we could support these companies and provide a more efficient and cost-effective way to teach the robots more efficiently. We are extremely efficient in providing information for object detectors,” said Frank. But while there are plenty of commercial uses for Ramblr’s technology, the growth in spatial computing in the consumer sector - especially following the release of Apple’s Vision Pro and Meta Quest headsets - opens up a whole new category of use cases. “Spatial computing will be a big part of the world. Being able to understand the particular processes, taxonomy, and what the person is actually seeing in front of them will be a vital part of the next wave of innovation in user interfaces and the evolution of gen AI,” Frank added. Are you building AI apps? Join the MongoDB AI Innovators Program today! Successful participants gain access to free Atlas credits, technical enablement, and invaluable connections within the broader AI ecosystem. If your company is interested in being featured, we’d love to hear from you. Connect with us at ai_adopters@mongodb.com. Head over to our quick-start guide to get started with Atlas Vector Search today.

September 30, 2024
Artificial Intelligence

AI-Driven Noise Analysis for Automotive Diagnostics

Aftersales service is a crucial revenue stream for the automotive industry, with leading manufacturers executing repairs through their dealer networks. One global automotive giant recently embarked on an ambitious project to revolutionize their diagnostic process. Their project—which aimed to increase efficiency, customer satisfaction, and revenue throughput—involved the development of an AI-powered solution that could quickly analyze engine sounds and compare them to a database of known problems, significantly reducing diagnostic times for complex engine issues. Traditional diagnostic methods can be time-consuming, expensive, and imprecise, especially for complex engine issues. MongoDB’s client in automotive manufacturing envisioned an AI-powered solution that could quickly analyze engine sounds and compare them to a database of known problems, significantly reducing diagnostic times. Initial setbacks, then a fresh perspective Despite the client team's best efforts, the project faced significant challenges and setbacks during the nine-month prototype phase. Though the team struggled to produce reliable results, they were determined to make the project a success. At this point, MongoDB introduced its client to Pureinsights, a specialized gen AI implementation and MongoDB AI Application Program partner , to rethink the solution and to salvage the project. As new members of the project team, and as PureInsights’s CTO and Lead Architect, respectively, we brought a fresh perspective to the challenge. Figure 1: Before and after the AI-powered noise diagnostic solution A pragmatic approach: Text before sound Upon review, we discovered that the project had initially started with a text-based approach before being persuaded to switch to sound analysis. The PureInsights team recommended reverting to text analysis as a foundational step before tackling the more complex audio problem. This strategy involved: Collecting text descriptions of car problems from technicians and customers. Comparing these descriptions against a vast database of known issues already stored in MongoDB. Utilizing advanced natural language processing, semantic / vector search, and Retrieval Augmented Generation techniques to identify similar cases and potential solutions. Our team tested six different models for cross-lingual semantic similarity, ultimately settling on Google's Gecko model for its superior performance across 11 languages. Pushing boundaries: Integrating audio analysis With the text-based foundation in place, we turned to audio analysis. Pureinsights developed an innovative approach to the project by combining our AI expertise with insights from advanced sound analysis research. We drew inspiration from groundbreaking models that had gained renown for their ability to identify cities solely from background noise in audio files. This blend of AI knowledge and specialized audio analysis techniques resulted in a robust, scalable system capable of isolating and analyzing engine sounds from various recordings. We adapted these sophisticated audio analysis models, originally designed for urban sound identification, to the specific challenges of automotive diagnostics. These learnings and adaptations are also applicable to future use cases for AI-driven audio analysis across various industries. This expertise was crucial in developing a sophisticated audio analysis model capable of: Isolating engine and car noises from customer or technician recordings. Converting these isolated sounds into vectors. Using these vectors to search the manufacturer's existing database of known car problem sounds. At the heart of this solution is MongoDB’s powerful database technology. The system leverages MongoDB’s vector and document stores to manage over 200,000 case files. Each "document" is more akin to a folder or case file containing: Structured data about the vehicle and reported issue Sound samples of the problem Unstructured text describing the symptoms and context This unified approach allows for seamless comparison of text and audio descriptions of customer engine problems using MongoDB's native vector search technology. Encouraging progress and phased implementation The solution's text component has already been rolled out to several dealers, and the audio similarity feature will be integrated in late 2024. This phased approach allows for real-world testing and refinement before a full-scale deployment across the entire repair network. The client is taking a pragmatic, step-by-step approach to implementation. If the initial partial rollout with audio diagnostics proves successful, the plan is to expand the solution more broadly across the dealer network. This cautious (yet forward-thinking) strategy aligns with the automotive industry's move towards more data-driven maintenance practices. As the solution continues to evolve, the team remains focused on enhancing its core capabilities in text and audio analysis for current diagnostic needs. The manufacturer is committed to evaluating the real-world impact of these innovations before considering potential future enhancements. This measured approach ensures that each phase of the rollout delivers tangible benefits in efficiency, accuracy, and customer satisfaction. By prioritizing current diagnostic capabilities and adopting a phased implementation strategy, the automotive giant is paving the way for a new era of efficiency and customer service in their aftersales operations. The success of this initial rollout will inform future directions and potential expansions of the AI-powered diagnostic system. A new era in automotive diagnostics The automotive giant brought industry expertise and a clear vision for improving their aftersales service. MongoDB provided the robust, flexible data platform essential for managing and analyzing diverse, multi-modal data types at scale. We, at Pureinsights, served as the AI application specialist partner, contributing critical AI and machine learning expertise, and bringing fresh perspectives and innovative approaches. We believe our role was pivotal in rethinking the solution and salvaging the project at a crucial juncture. This synergy of strengths allowed the entire project team to overcome initial setbacks and develop a groundbreaking solution that combines cutting-edge AI technologies with MongoDB's powerful data management capabilities. The result is a diagnostic tool leveraging text and audio analysis to significantly reduce diagnostic times, increase customer satisfaction, and boost revenue through the dealer network. The project's success underscores several key lessons: The value of persistence and flexibility in tackling complex challenges The importance of choosing the right technology partners The power of combining domain expertise with technological innovation The benefits of a phased, iterative approach to implementation As industries continue to evolve in the age of AI and big data, this collaborative model—bringing together industry leaders, technology providers, and specialized AI partners—sets a new standard for innovation. It demonstrates how companies can leverage partnerships to turn ambitious visions into reality, creating solutions that drive business value while enhancing customer experiences. The future of automotive diagnostics—and AI-driven solutions across industries—looks brighter thanks to the combined efforts of forward-thinking enterprises, cutting-edge database technologies like MongoDB, and specialized AI partners like Pureinsights. As this solution continues to evolve and deploy across the global dealer network, it paves the way for a new era of efficiency, accuracy, and customer satisfaction in the automotive industry. This solution has the potential to not only revolutionize automotive diagnostics but also set a new standard for AI-driven solutions in other industries, demonstrating the power of collaboration and innovation. To deliver more solutions like this—and to accelerate gen AI application development for organizations at every stage of their AI journey—Pureinsights has joined the MongoDB AI Application Program (MAAP). Check out the MAAP page to learn more about the program and how MAAP ecosystem members like Pureinsights can help your organization accelerate time-to-market, minimize risks, and maximize the value of your AI investments.

September 27, 2024
Artificial Intelligence

Away From the Keyboard: Apoorva Joshi, MongoDB Senior AI Developer Advocate

Welcome to our article series focused on developers and what they do when they’re not building incredible things with code and data. “Away From the Keyboard” features interviews with developers at MongoDB, discussing what they do, how they establish a healthy work-life balance, and their advice for others looking to create a more holistic approach to coding. In this article, Apoorva Joshi shares her day-to-day responsibilities as a Senior AI Developer Advocate at MongoDB; what a flexible approach to her job and life looks like; and how her work calendar helps prioritize overall balance. Q: What do you do at MongoDB? Apoorva: My job is to help developers successfully build AI applications using MongoDB. I do this through written technical content, hands-on workshops, and design whiteboarding sessions. Q: What does work-life balance look like for you? Apoorva: I love remote work. It allows me to have a flexible approach towards work and life where I can accommodate life things, like dental appointments, walks, or lunches in the park during my work day—as long as work gets done. Q: Was that balance always a priority for you or did you develop it later in your career? Apoorva: Making work-life balance a priority has been a fairly recent development. During my first few years on the job, I would work long hours, partly because I felt like I needed to prove myself and also because I hadn’t prioritized finding activities I enjoyed outside of school or work up until then. The first lockdown during the pandemic put a lot of things into perspective. With work and life happening in the same place, I felt the need for boundaries. Having nowhere to go encouraged me to try out new hobbies, such as solving jigsaw puzzles; as well as reconnecting with old favorites, like reading and painting. Q: What benefits has this balance given you? Apoorva: Doing activities away from the keyboard makes me more productive at work. A flexible working schedule also creates a stress-free environment and allows me to bring my 100% to work. This balance helps me make time for family and friends, exercise, chores, and hobbies. Overall, having a healthy work-life balance helps me lead a fulfilling life that I am proud of. Q: What advice would you give to a developer seeking to find a better balance? Apoorva: The first step to finding a balance between work and life is to recognize that boundaries are healthy. I have found that putting everyday things, such as lunch breaks and walks on my work calendar is a good way to remind myself to take that break or close my laptop, while also communicating those boundaries with my colleagues. If you are having trouble doing this on your own, ask a family member, partner, or friend to remind you! Thank you to Apoorva Joshi for sharing her insights! And thanks to all of you for reading. Look for more in our new series. Interested in learning more about or connecting more with MongoDB? Join our MongoDB Community to meet other community members, hear about inspiring topics, and receive the latest MongoDB news and events. And let us know if you have any questions for our future guests when it comes to building a better work-life balance as developers. Tag us on social media: @/mongodb

September 26, 2024
Culture

Pathfinder Labs Tames Data Chaos and Unleashes AI with MongoDB

Pathfinder Labs develops software that specializes in empowering law enforcement agencies and investigators to apprehend criminals and rescue victims of child abuse. The New Zealand-headquartered company is staffed by professionals with diverse backgrounds and expertise, including counter-terrorism, online child abuse investigations, industrial espionage, digital forensics and more, spanning both the government and private sectors. Last July, I was thrilled to welcome Pathfinder Labs’ CEO Bree Atkinson, as well as co-founder and DevOps Architect, Peter Pilley to MongoDB .local Sydney where they shared more about the company’s innovative solutions powered by MongoDB. Those solutions are deployed and utilized by prestigious organizations on a global scale, including Interpol . Pathfinder Labs’ main product, Paradigm , has been built on MongoDB Atlas and runs on AWS . The tool—which relies on MongoDB’s developer data platform and document database model to sift through complex and continually growing numbers of data sets—helps collect, gather, and convert data into actionable decisions for law enforcement professionals. Pilley explained that Paradigm was “made by investigators, for investigators.” Paradigm is designed to present the information it helps gather in a way that will support a successful prosecution and outcome at trial. MongoDB Atlas enables Pathfinder Labs to tame the chaos arising from the data sets created and gathered throughout an investigation. MongoDB’s scalability and automation capabilities are particularly helpful in this regard. Powered by MongoDB Atlas, Paradigm can also easily identify similarities between cases, and uncover unique insights by bringing together information from disparate data sources. This could, for example, be about bringing together geolocalization data and metadata from an image, or identifying similar case patterns from law enforcement agencies operating in different states or countries. Ultimately, Paradigm simplifies evidence gathering and analysis, integrates external data sources and vendors, future-proof investigation methods, and helps minimize overall costs. Its capabilities are unlocking a whole new generation of data-driven investigative capabilities. During the presentation, Pilley used the example of the case of a missing female in the United States: it took a team of three investigators 12 months to solve the case. Using Paradigm, PathfinderLabs was able to solve that same case in less than an hour. “With Paradigm, we were able to feed some extra information and solve the case in 40 minutes. MongoDB Atlas allowed us to make quick decisions and present information to investigators in the most efficient way.” Pathfinder Labs also incorporates AI capabilities, including MongoDB Vector Search , which help identify which information is particularly relevant, select specific data points that can be used at a strategic point in time, connect data from one case to another, and identify what information might be missing. MongoDB Atlas Vector Search helps Pathfinder match images and details in images (i.e. people, objects), classify documents and text, and to build better search experiences for users via semantic search. “I was super excited when [Atlas Vector Search] came out. The fact that I can now have it as part of my standard workflow without having to deploy other kits all the time to support our vector searches has been an absolute game changer,” added Pilley. Finally, the team has seen great value in MongoDB’s Performance Adviser and Schema Anti Patterns features: “The performance Adviser alone has solved many problems,” concluded Pilley. To learn more and get started with MongoDB Vector Search, visit our Vector Search Quick Start page .

September 25, 2024
Artificial Intelligence

Introducing the New MongoDB Application Delivery Certification

Since we launched our System Integrators Certification Program in 2022, we have certified over 18,000 associates and architects across MongoDB’s various system integrator, advisory, and consulting services partners. This program gives system integrators a solid foundation in MongoDB and the capabilities that enable them to architect modernization projects and modern, AI-enriched applications. Our customers continue to tell us that they are looking to innovate quicker and take advantage of new technologies, and we want to support them in these goals. They want to work with partners who have in-depth knowledge of the problems they are trying to solve and hands-on experience working with the technology they are implementing. To meet this customer need and continue to evolve our partnership with our system integrators, we have launched the MongoDB Application Delivery Certification . This is a natural evolution of our certification program that provides comprehensive training and equips developers and application delivery leads with the knowledge and skills needed to design, develop, and deploy modern solutions at scale. Driving innovation alongside our partners The MongoDB Application Delivery Certification includes exclusive, partner-only, online learning and hands-on labs, as well as a proctored certification exam that teaches application delivery fundamentals and implementation best practices. Partners can expect carefully curated content on everything from optimizing storage, queries, and aggregation to retrieval-augmented generation (RAG), and how to architect and deliver with Atlas Vector Search . We piloted this new program with our partners at Accenture and Capgemini to ensure it would drive value for all participants. Twenty developers were invited from each company to participate in an initial version of the curriculum and were able to provide their input on its content. Based on their feedback, we created a program that’s completely self-service and flexible, so learners can fit the coursework into their schedules, at their own pace. "With the growth of AI and data-powered applications, Capgemini are investing in our staff to ensure they have the skills required for this transformation,” said Steve Jones, Executive Vice President, Data Driven Business & Collaborative Data Ecosystems at Capgemini. “The MongoDB Application Delivery Certification helps ensure our people have the right skills to help MongoDB and Capgemini collaborate with our clients on delivering the maximum business value possible in the data-powered future." "Accenture, a strategic partner and part of MongoDB’s AI Application Program, leverages MongoDB’s certification program to ensure the highest quality of delivery capability as our clients race to modernize legacy systems to MongoDB,” said Ram Ramalingam, Senior Managing Director and Global Lead, Platform Engineering and Intelligent Edge at Accenture. We understand that for many businesses, speed is a necessity, and keeping pace with the technological innovation in the current market is essential. Now, customers looking to implement MongoDB solutions will be able to do so quickly and easily by working with partners who have achieved the new MongoDB Application Delivery Certification. They can have the peace of mind knowing that these validated partners are extensively equipped to create and deploy robust MongoDB solutions at scale. What’s more, this new certification will provide partners with other opportunities. Partners who have demonstrated deep technical expertise by successfully completing the MongoDB Application Delivery Certification Program may be considered for the MongoDB AI Applications Program (MAAP). This will give them access to a greater network of customers that need help building and deploying modern applications enriched with AI technology. To learn more about MongoDB’s partners helping boost developer productivity with a range of proven technology integrations, visit the MongoDB Partner Ecosystem . Current SI partners can register for the MongoDB Certification Program and MongoDB Application Delivery Certification Program .

September 20, 2024
News

Ahamove Rides Vietnam’s E-commerce Boom with AI on MongoDB

The energy in Vietnam’s cities is frenetic as millions of people navigate the busy streets with determination and purpose. Much of this traffic is driven by e-commerce, with food and parcel deliveries perched on the back of the country’s countless motorcycles or in cars and trucks. In the first quarter of 2024, online spending in Vietnam grew a staggering 79% over the previous year. Explosive growth like this is expected to continue, raising the industry’s value to $32 billion by 2025 , with 70% of the country’s 100 million population making e-commerce transactions . With massive numbers like this, in logistics, efficiency is king. The high customer expectations for rapid deliveries drive companies like Ahamove to innovate their way to seamless operations with cloud technology. Ahamove is Vietnam’s largest on-demand delivery company, handling more than 200,000 e-commerce, food, and warehouse deliveries daily, with 100,000 drivers and riders plying the streets nationwide. The logistics leader serves a network of more than 300,000 merchants, including regional e-commerce giants like Lazada and Shopee, as well as nationwide supermarket chains and small restaurants. The stakes are high for all involved, so maximizing efficiency is of utmost importance. Innovating to make scale count Online shoppers’ behavior is rarely predictable, and to cope with sudden spikes in daily delivery demand, Ahamove needed to efficiently scale up its operations to enhance customer and end-user satisfaction. Moving to MongoDB Atlas on Amazon Web Services (AWS) in 2019, Ahamove fundamentally changed its ability to meet the rising demand for deliveries and new services that please e-commerce providers, online shoppers, and diners. The scalability of MongoDB is crucial for Ahamove, especially during peak times, like Christmas or Lunar New Year, when the volume of orders surges to more than 200,000 a day. “MongoDB's ability to scale ensures that the database can handle increased loads, including data requests, without compromising performance and leading to quicker order processing and improved user experience,” said Tien Ta, Strategic Planning Manager at Ahamove. One of the powerful services that improves e-commerce across Vietnam is geospatial queries enabled by MongoDB. Using this geospatial data associated with specific locations on Earth's surface, Ahamove can easily locate drivers, map drivers to restaurants to accelerate deliveries, and track orders without relying on third-party services to provide information, which slows deliveries. Meanwhile, the versatility of MongoDB’s developer data platform empowers Ahamove to store its operational data, metadata, and vector embeddings on MongoDB Atlas and seamlessly use Atlas Vector Search to index, retrieve, and build performant generative artificial intelligence (AI) applications. AI evolution Powered by MongoDB Atlas , Ahamove is transforming Vietnam’s e-commerce industry with innovations like instant order matching, real-time GPS vehicle tracking, generative AI chatbots, and services like driver rating and variable delivery times, all available 24 hours a day, seven days a week. In addition to traffic, Vietnam is also famous for its excellent street food. Recognizing the importance of the country’s rapidly growing food and beverage (F&B) industry, which is projected to be worth more than US$27.3 billion in 2024 , Ahamove decided to help Vietnam’s small food vendors benefit from the e-commerce boom gripping the country. Using the latest models, including ChatGPT-4o-mini and Llama 3.1, Ahamove’s fully automated generative AI chatbot on MongoDB integrates with restaurants’ Facebook pages. This makes it easier for hungry consumers to handle the entire order process with the restaurant in natural language, from seeking recommendations to placing orders, making payments, and tracking deliveries to their doorsteps. How AhaFood AI chatbot automates the food order journey “Vietnam’s e-commerce industry is growing rapidly as more people turn to their mobile devices to purchase goods and services,” added Ta. “With MongoDB, we meet this customer need for new purchase experiences with innovative services like generative AI chatbots and faster delivery times.” Anticipated to achieve 10% of food deliveries at Da Nang market and take the solution nationwide in the first half of 2025, AhaFood.AI - Ahamove’s latest initiative, also provides personalized dish recommendations based on consumer demographics, budgets, or historical preferences, helping people find and order their favorite food faster. Moreover, merchants receive timely notifications of incoming orders via the AhaMerchant web portal, allowing them to start preparing dishes earlier. AhaFood.AI also collects and securely stores users’ delivery addresses and phone numbers, ensuring better driver assignment and fulfilling food orders in less than 15 minutes. “Adopting MongoDB Atlas was one of the best decisions we’ve ever made for Ahamove, allowing us to build an effective infrastructure that can scale with growing demand and deliver a better experience for our drivers and customers,” said Ngon Pham, CEO, Ahamove. “Generative AI will significantly disrupt the e-commerce and food industry, and with MongoDB Vector Search we can rapidly build new solutions using the latest database and AI technology.” The vibrant atmosphere of Vietnam's bustling cities is part of the country's charm. Rather than seeking to bring calm to this energy, Vietnam thrives on it. Focusing on improving efficiency and supporting street food vendors in lively urban areas with cloud technology will benefit all. Learn how to build AI applications with MongoDB Atlas . Head over to our quick-start guide to get started with Atlas Vector Search today.

September 19, 2024
Applied

MongoDB Enables AI-Powered Legal Searches with Qura

The launch of ChatGPT in November 2022 caught the world by surprise. But while the rest of us marveled at the novelty of its human-like responses, the founders of Qura immediately saw another, more focused use case. “Legal data is a mess,” said Kevin Kastberg, CTO for Qura. “The average lawyer spends tens of hours each month on manual research. We thought to ourselves, ‘what impact would this new LLM technology have on the way lawyers search for information?’” And with that, Qura was born. Gaining trust From its base in Stockholm, Sweden, Qura set about building an AI-powered legal search engine. The team trained custom models and did continual pre-training on millions of pages of publicly available legal texts, looking to bring the comprehensive power of LLMs to the complex and intricate language of the law. “Legal searches have typically been done via keyword search,” said Kastberg. “ We wanted to bring the power of LLMs to this field. ChatGPT created hype around the ability of LLMs to write. Qura is one of the first startups to showcase their far more impressive ability to read. LLMs can read and analyze, on a logical and semantic level, millions of pages of textual data in seconds. This is a game changer for legal search.” Unlike other AI-powered applications, Qura is not interested in generating summaries or “answers” to the questions posed by lawyers or researchers. Instead, Qura aims to provide customers with the best sources and information. “We deliberately wanted to stay away from generative AI. Our customers can be sure that with Qura there is no risk of hallucinations or bad interpretation. Put another way, we will not put an answer in your mouth; rather, we give you the best possible information to create that answer yourselves,” said Kastberg. “Our users are looking for hard-to-find sources, not a gen AI-summary of the basic sources,” he added. With this mantra, the company claims to have reduced research times by 78% while surfacing double the number of relevant sources when compared to similar legal search products. MongoDB in the mix Qura has worked with MongoDB since the beginning. “We needed a document database for flexibility. MongoDB was really convenient as we had a lot of unstructured data with many different characteristics.” In addition to the flexibility to adapt to different data types, MongoDB also offered the Qura team lightning-fast search capabilities. “ MongoDB Atlas search is a crucial tool for our search algorithm agents to navigate our huge datasets. This is especially true of the speed at which we can do efficient text searches on huge corpuses of text, an important part for navigating documents,” said Kastberg. And when it came to AI, a vector database to store and retrieve embeddings was also a real benefit. “Having vector search built into Atlas was convenient and offered an efficient way to work with embeddings and vectorized data.” What's next? Qura's larger goal is to bring about the next generation of intelligent search. The legal space is only the start, and the company has larger ambitions to expand beyond Sweden and into other industries too. “We are live with Qura in the legal space in Sweden and currently onboarding EU customers in the coming month. What we are building towards is a new way of navigating huge text databases, and that could be applied to any type of text data, in any industry,” said Kastberg. Are you building AI apps? Join the MongoDB AI Innovators Program today! Successful participants gain access to free Atlas credits, technical enablement, and invaluable connections within the broader AI ecosystem. If your company is interested in being featured, we’d love to hear from you. Connect with us at ai_adopters@mongodb.com. Head over to our quick-start guide to get started with Atlas Vector Search today.

September 18, 2024
Artificial Intelligence

Top Use Cases for Text, Vector, and Hybrid Search

Search is how we discover new things. Whether you’re looking for a pair of new shoes, the latest medical advice, or insights into corporate data, search provides the means to unlock the truth. Search habits—and the accompanying end-user expectations—have evolved along with changes to the search experiences offered by consumer apps like Google and Amazon. The days of the standard of 10 blue links may well be behind us, as new paradigms like vector search and generative AI (gen AI) have upended long-held search norms. But are all forms of search created equal, or should we be seeking out the right “flavor” of search for specific jobs? In this blog post, we will define and dig into various flavors of search, including text, vector and AI-powered search, and hybrid search, and discuss when to use each, including sample use cases where one type of search might be superior to others. Information retrieval revolutionized with text search The concept of text search has been baked into user behavior from the early days of the web, with the rudimentary text box entry and 10 blue link results based on text relevance to the initial query. This behavior and associated business model has produced trillions in revenue and has become one of the fiercest battlegrounds across the internet . Text search allows users to quickly find specific information within a large set of data by entering keywords or phrases. When a query is entered, the text search engine scans through indexed documents to locate and retrieve the most relevant results based on the keywords. Text search is a good solution for queries requiring exact matches where the overarching meaning isn't as critical. Some of the most common uses include: Catalog and content search: Using the search bar to find specific products or content based on keywords from customer inquiries. For example, a customer searching for "size 10 men trainers" or “installation guide” can instantly find the exact items they’re looking for, like how Nextar tapped into Atlas Search to enable physical retailers to create online catalogs Covid-19 pandemic. In-application search: This is well-suited for organizations with straightforward offerings to make it easier for users to locate key resources, but that don’t require advanced features like semantic retrieval or contextual re-ranking. For instance, if a user searches for "songs key of G," they can quickly receive relevant materials. This streamlines asset retrieval, allowing users to focus on the task they are trying to achieve and boosts overall satisfaction. For a company like Yousician , Atlas Search enabled their 20 million monthly active users to tackle their music lessons with ease. Customer 360: Unifying data from different sources to create a single, holistic view. Consolidated information such as user preferences, purchase history, and interaction data can be used to enhance business visibility and simplify the management, retrieval, and aggregation of user data. Consider a support agent searching for all information related to customer “John Doe." They can quickly access relevant attributes and interaction history, ensuring more accurate and efficient service. Helvetia was able to achieve success after migrating to MongoDB and using Atlas Search to deliver a single, 360-degree real-time view across all customer touchpoints and insurance products. AI and a new paradigm with vector search With advances in technology, vector search has emerged to help solve the challenge of providing relevant results even when the user may not know what they’re looking for. Vector search allows you to take any type of media or content, convert it into a vector using machine learning algorithms, and then search to find results similar to the target term. The similarity aspect is determined by converting your data into numerical high-dimensional vectors, and then calculating the distance between them to determine relevance—the closer the vector, the higher the relevance. There is a wide range of practical, powerful use cases powered by vector search—notably semantic search and retrieval-augmented generation (RAG) for gen AI. Semantic search focuses on meaning and prioritizes user intent by deciphering not just what users type but why they're searching, in order to provide more accurate and context-oriented search results. Some examples of semantic search include: Content/knowledge base search: Vast amounts of organizational data, structured and unstructured, with hidden insights, can benefit significantly from semantic search. Questions like “What’s our remote work policy?” can return accurate results even when the source materials do not contain the “remote” keyword, but rather have “return to office” or “hybrid” or other keywords. A real-world example of content search is the National Film and Sound Archive of Australia , which uses Atlas Vector Search to power semantic search across petabytes of text, audio, and visual content in its collections. Recommendation engines: Understanding users’ interests and intent is a strong competitive advantage–like how Netflix provides a personalized selection of shows and movies based on your watch history, or how Amazon recommends products based on your purchase history. This is particularly powerful in e-commerce, media & entertainment, financial services, and product/service-oriented industries where the customer experience tightly influences the bottom line. A success story is Delivery Hero , which leverages vector search-powered real-time recommendations to increase customer satisfaction and revenue. Anomaly detection: Identifying and preventing fraud, security breaches, and other system anomalies is paramount for all organizations. By grouping similar vectors and using vector search to identify outliers, potential threats can be detected early, enabling timely responses. Companies like VISO TRUST and Extrac are among the innovators that build their core offerings using semantic search for security and risk management. With the rise of large language models (LLMs), vector search is increasingly becoming essential in gen AI application development. It augments LLMs by providing domain-specific context outside of what the LLMs “know,” ensuring relevance and accuracy of the gen AI output. In this case, the semantic search outputs are used to enhance RAG. By providing relevant information from a vector database, vector search helps the RAG model generate responses that are more contextually relevant. By grounding the generated text in factual information, vector search helps reduce hallucinations and improve the accuracy of the response. Some common RAG applications are for chatbots and virtual assistants, which provide users with relevant responses and carry out tasks based on the user query, delivering enhanced user experience. Two real-world examples of such chatbot implementations are from our customers Okta and Kovai . Another popular application is using RAG to help generate content like articles, blog posts, scripts, code, and more, based on user prompts or data. This significantly accelerates content production, allowing organizations including Novo Nordisk and Scalestack to save time and produce content at scale, all at an accuracy level that was not possible without RAG. Beyond RAG, an emerging vector search usage is in agentic systems . Such a system is an architecture encompassing one or more AI agents with autonomous decision-making capabilities, able to access and use various system components and resources to achieve defined objectives while adapting to environmental feedback. Vector search enables efficient and semantically meaningful information retrieval in these systems, facilitating relevant context for LLMs, optimized tool selection, semantic understanding, and improved relevance ranking. Hybrid search: The best of both worlds Hybrid search combines the strengths of text search with the advanced capabilities of vector search to deliver more accurate and relevant search results. Hybrid search shines in scenarios where there's a need for both precision (where text search excels) and recall (where vector search excels), and where user queries can vary from simple to complex, including both keyword and natural language queries. Hybrid search delivers a more comprehensive, flexible information retrieval process, helping RAG models access a wider range of relevant information. For example, in a customer support context, hybrid search can ensure that the RAG model retrieves not only documents containing exact keywords but also semantically similar content, resulting in more informative and helpful responses. Hybrid search can also help reduce information overload by prioritizing the most relevant results. This allows RAG models to focus on processing and understanding the most critical information, leading to faster, more accurate responses, and improving the user experience. Powering your AI and search applications with MongoDB As your organization continues to innovate in the rapidly evolving technology ecosystem, building robust AI and search applications supporting customer, employee, and stakeholder experiences can deliver powerful competitive advantages. With MongoDB, you can efficiently deploy full-text search , vector search , and hybrid search capabilities. Start building today—simplify your developer experience while increasing impact in MongoDB’s fully-managed, secure vector database, integrated with a vast AI partner ecosystem , including all major cloud providers, generative AI model providers, and system integrators. Head over to our quick-start guide to get started with Atlas Vector Search today.

September 16, 2024
Applied

AI Agents, Hybrid Search, and Indexing with LangChain and MongoDB

Since we announced integration with LangChain last year, MongoDB has been building out tooling to help developers create advanced AI applications with LangChain . With recent releases, MongoDB has made it easier to develop agentic AI applications (with a LangGraph integration), perform hybrid search by combining Atlas Search and Atlas Vector Search , and ingest large-scale documents more effectively. For more on each development—plus new support for the LangChain Indexing API—please read on! The rise of AI agents Agentic applications have emerged as a compelling next step in the development of AI. Imagine an application able to act on its own, working towards complicated goals and drawing on context to create a strategy. These applications leverage large language models (LLMs) to dynamically determine their execution path, breaking free from the constraints of traditional, deterministic logic. Consider an application tasked with answering a question like "In our most profitable market, what is the current weather?" While a traditional retrieval-augmented generation (RAG) app may falter, unable to obtain information about “current weather,” an agentic application shines. The application can intelligently deduce the need for an external API call to obtain current weather information, seamlessly integrating this with data retrieved from a vector search to identify the most profitable market. These systems take action and gather additional information with limited human intervention, supplementing what they already know. Building such a system is easier than ever thanks to MongoDB’s continued work with LangGraph. Unleashing the power of AI agents with LangGraph and MongoDB Because it now offers LangGraph—a framework for performing multi-agent orchestration—LangChain is more effective than ever at simplifying the creation of applications using LLMs, including AI agents. These agents require memory to maintain context across multiple interactions, allowing users to engage with them repeatedly while the agent retains information from previous exchanges. While basic agentic applications can utilize in-memory structures, for more complicated use cases these structures are not sufficient. MongoDB allows developers to build stateful, multi-actor applications with LLMs, storing and retrieving the “checkpoints” needed by LangGraph.js. The new MongoDBSaver class makes integration simpler than ever before, as LangGraph.js is able to utilize historical user interactions to enhance agentic AI. By segmenting this history into checkpoints, the library allows for persistent session memory, easier error recovery, and even the ability to “time travel”—allowing users to jump back in the graph to a previous state to explore alternative execution. The MongoDBSaver class implements all of this functionality right into LangGraph.js, with sensible defaults and MongoDB-specific optimization. To learn more, please visit the source code , the documentation , and our new tutorial (which includes both a written and video version). Improve retrieval accuracy with Hybrid Search Retriever Hybrid search is particularly well-suited for queries that have both semantic and keyword-based components. Let’s look at an example, a query such as "find recent scientific papers about climate change impacts on coral reefs that specifically mention ocean acidification". This query would use a hybrid search approach, combining semantic search to identify papers discussing climate change effects on coral ecosystems, keyword matching to ensure "ocean acidification" is mentioned, and potential date-based filtering or boosting to prioritize recent publications. This combination allows for more comprehensive and relevant results than either semantic or keyword search alone could provide. With our recent release of Retrievers in LangChain-MongoDB, building such advanced retrieval patterns is more accessible than ever. Retrievers are how LangChain integrates external data sources into LLM applications. MongoDB has added two new custom, purpose-built Retrievers to the langchain-mongodb Python package, giving developers a unified way to perform hybrid search and full-text search with sensible defaults and extensive code annotation. These new classes make it easier than ever to use the full capabilities of MongoDB Vector Search with LangChain. The new MongoDBAtlasFullTextSearchRetriever class performs full-text searches using the Best Match 25 (BM25) analyzer. The MongoDBAtlasHybridSearchRetriever class builds on this work, combining the above implementation with vector search, fusing the results with Reciprocal Rank Fusion (RRF) algorithm. The combination of these two techniques is a potent tool for improving the retrieval step of a Retrieval-Augmented Generation (RAG) application, enhancing the quality of the results. To find out more, please dive into the MongoDBAtlasHybridSearchRetriever and MongoDBAtlasFullTextSearchRetriever classes. Seamless synchronization using LangChain Indexing API In addition to these releases, we’re also excited to announce that MongoDB now supports the LangChain Indexing API, allowing for seamless loading and synchronization of documents from any source into MongoDB, leveraging LangChain's intelligent indexing features. This new support will help users avoid duplicate content, minimize unnecessary rewrites, and optimize embedding computations. The LangChain Indexing API's record management system ensures efficient tracking of document writes, computing hashes for each document, and storing essential information like write time and source ID. This feature is particularly valuable for large-scale document processing and retrieval applications, offering flexible cleanup modes to manage documents effectively in MongoDB vector search. To read more about how to use the Indexing API, please visit the LangChain Indexing API Documentation . We’re excited about these LangChain integrations and we hope you are too. Here are some resources to further your learning: Check out our written and video tutorial to walk you through building your own JavaScript AI agent with LangGraph.js and MongoDB. Experiment with Hybrid search retrievers to see the power of Hybrid search for yourself. Read the previous announcement with LangChain about Semantic Caching.

September 12, 2024
Artificial Intelligence

Building Gen AI with MongoDB & AI Partners | August 2024

As the AI landscape continues to evolve, companies, industries, and developers seek tailored solutions to their unique challenges. Gone are the days when general-purpose AI models could be applied universally. Now, organizations are looking for industry-specific applications, verticalized AI solutions, and specialized tools to gain a competitive edge and best serve their customers. And as gen AI use cases have diversified—from healthcare diagnostics and autonomous driving, to personalized recommendations and creative content generation—so has the technology stack supporting them. The complexity of building and deploying AI models has led to the rise of specialized AI frameworks and platforms that streamline workflows and optimize performance for specific use cases. In this context, having the right AI stack is essential for driving innovation. AI development is no longer just about choosing the best model but also about selecting the right tools, libraries, and infrastructure to support that model across the board. All of which makes partnerships (and combining technical strengths) increasingly important to innovating with AI. Take, for example, our most recent integration with LangChain: the MongoDB-LangChain partnership exemplifies how having the right components in an AI stack allows teams to focus on innovating instead of managing infrastructure bottlenecks. By combining LangGraph with MongoDB’s vector search capabilities, developers can create more sophisticated, high-performing AI applications. This integration allows for the seamless development of agentic AI systems capable of generating actionable insights and delivering complex tasks. To learn more about building powerful AI agents with LangGraph.js and MongoDB, plus our recent work making vector search even more versatile with custom LangChain Retrievers, check out our tutorial . Welcoming new AI partners MongoDB’s partnership with LangChain highlights the importance of building adaptable solutions that can grow and change as the needs of developers and customers grow and change. Which is why MongoDB is always on the lookout for innovative partners and solutions—in August we welcomed five new AI partners that offer product integrations with MongoDB. Read on to learn more about each great new partner! BuildShip BuildShip is a low-code visual backend and workflow builder to instantly create APIs, scheduled tasks, backend cloud jobs, and automation, powered by AI. " We at BuildShip are thrilled to partner with MongoDB to introduce an innovative low-code approach for rapidly building AI workflows and backend tasks in a visual and scalable manner,” said Harini Janakiraman, CEO of BuildShip.com. “MongoDB offers a comprehensive data stack for AI developers and organizations, enabling them to efficiently build scalable databases and access vector or hybrid search options for their products. Our collaboration provides customizable low-code templates that allow for easy integration of MongoDB databases with a variety of AI models and tools. This enables teams and companies to quickly build powerful APIs, automations, vector search, and scheduled tasks, unlocking organizational efficiency and driving product innovation.” Inductor Inductor is a platform to prototype, evaluate, improve, and observe LLM apps and features, helping developers ship high-quality LLM-powered functionality rapidly and systematically. “ We’re excited to partner with MongoDB to enable companies to rapidly create production-grade LLM applications, by combining MongoDB's powerful vector search with Inductor’s developer platform enabling streamlined, systematic workflows for developing RAG-based applications,” said Ariel Kleiner, CEO of Inductor. “While many LLM-powered demos have been created, few have successfully evolved into production-grade applications that deliver business wins. Together, Inductor and MongoDB enable enterprises to build impactful, needle-moving LLM applications, accelerating time to market and delivering real value to customers.” Metabase Metabase is the easy-to-use, open source Business Intelligence tool that lets everyone work with data, with or without SQL, for internal and customer-facing, embedded analytics. "This partnership is an important step forward for NoSQL database analytics. By integrating Metabase with MongoDB , two popular open-source tools, we are making it easier for users to quickly get valuable insights from their MongoDB data,” explained Luiz Arakaki, Product Manager at Metabase. “Our goal is to create a better integration between the tools to offer more advanced features and stability, simplifying the use of NoSQL databases for advanced analytics.” Shakudo Shakudo is a comprehensive development platform that lets data professionals develop, run, and deploy data pipelines and applications in an all-in-one integrated environment. “ Shakudo is thrilled to be partnering with MongoDB to streamline the entire retrieval-augmented generation (RAG) development lifecycle. Together we help companies test and optimize their RAG features for faster PoC, and production deployment,” noted Yevgeniy Vahlis, CEO of Shakudo. “MongoDB has made it dead simple to launch a scalable vector database with operational data, and Shakudo brings industry leading AI tooling to that data. Our collaboration speeds up time to market and helps companies get real value to customers faster.” VLM Run VLM Run is a versatile API that enables accurate JSON extraction from any visual content such as images, videos, and documents, helping users to integrate visual AI to applications. “ VLM Run is excited to partner with MongoDB to help enterprises accurately extract structured insights from visual content such as images, videos and visual documents,” said Sudeep Pillai, Co-Founder and CEO of VLM Run. “Our combined solution will enable enterprises to turn their often-untapped unstructured visual content into actionable, queryable business intelligence.” But wait, there's more! To learn more about building AI-powered apps with MongoDB, check out our AI Resources Hub , and stop by our Partner Ecosystem Catalog to read about our integrations with MongoDB’s ever-evolving AI partner ecosystem. Head over to our quick-start guide to get started with Atlas Vector Search today.

September 11, 2024
Artificial Intelligence

Boosting Customer Lifetime Value with Agmeta and MongoDB

Nobody likes calling customer service. The phone trees, the wait times, the janky music, and how often your issue just isn’t resolved can make the whole process one most people would rather avoid. For business owners, the customer contact center can also be a source of frustration, simultaneously creating customer churn and unhappiness, while also acting as a black hole of information as to why that churn occurred. It doesn’t have to be this way. What if instead, customer service centers offered valuable ways to increase the Customer Lifetime Value (CLTV) of customers, pipelines of upsell opportunities, and valuable sources of information? That’s the goal of Agmeta.AI , a startup dedicated to giving businesses actionable insights to fight churn, identify key customers primed for upsell, and improve customer service overall. Lost in translation “We started with a very simple thesis±people call into contact centers because they have a problem. That is a real make-or-break moment. The opportunity for churn is very high… or that customer can be a great target for upselling,” said Samir Agarwal, CEO and co-founder of Agmeta. “All of this data sits in a contact center, and businesses don't ever get to see it,” he added. According to Samir, even the businesses that think they are collecting useful information on customer service interactions are instead collecting incorrect or incomplete information. Or worse, they’re analyzing the information they do record incorrectly. Every business today talks about the importance of customer experience (CX), but the challenge businesses face is how they quantify that CX. Many contact centers substitute call sentiment for CX, or use keywords to determine canned responses. For example, imagine if a customer calls into a service center and they have what appears to be a positive conversation with an agent. They use words and phrases like “thank you,” and “yes, I understand,” and reply “no, I do not have anything else to ask” at the end of a call in which their complaint is not resolved. After putting the phone down, the customer goes on to cancel the service, or worse, initiate a chargeback request with their credit card provider. In some businesses the customer service agent may manually mark such a call as positive’ The agent, after all, ‘answered all the customers' concerns.’ As this example illustrates, the sentiment of a call should not be confused with the measure of customer experience. Another common way businesses try to gather feedback is by sending a post-call survey. However, a problem with this approach is that industry response rates for surveys are close to 3%. This implies that decisions get made on that small sample, and may not take into account the other 97% of the customers who didn’t respond to the survey. Survey results are also frequently skewed, as those most likely to respond are also the ones who were most unhappy with the contact center interaction and want their voices heard. The MongoDB advantage Using machine learning and generative AI, backed by MongoDB Atlas , Agmeta’s software understands not only the content of the call, but the context too. Taking our example above, Agmeta’s software would detect that the customer is unhappy, despite their polite and ‘positive’ sounding conversation with the agent, and flag the customer as a potential churn or chargeback candidate in need of immediate attention. “We will give you a CSAT (customer satisfaction) score and a reason for that CSAT score within seconds of the call ending±for 100% of the interactions,” said Samir. For Agmeta to work, Samir and his team had to have a database ready to accept all kinds of data, including voice recordings, unstructured text, and constantly evolving schema. “We didn’t have a fixed schema, we needed a database that was as flexible as Agmeta needed to be. I’ve known of MongoDB forever, so when I started to look at databases it seemed an obvious choice to me,” he said. The ability to quickly and easily work with vectorized data for gen AI was also crucial. “MongoDB provides vector search capabilities in an operational database. Rather than having to add a bolt on a vector database and figure out the ETL, MongoDB solved this issue for me in a single product. The way I look at it, if you do a good job on Vector search, then my life as an entrepreneur and software builder becomes much easier,” Samir said. After assessing database options and multiple LLMs, Samir and his team chose to pair MongoDB Atlas with Google Cloud, taking advantage of Gemini on Google’s generative AI platform. “With Atlas on Google Cloud, there are zero worries about database administration, maintenance, and availability. This frees us up to focus on creating business value,” Samir said. “Another benefit of using MongoDB is the flexibility to use the customer’s MongoDB setup which gives the customer the peace of mind from the perspective of security and privacy of their data.” Customer service first With the power of generative AI and MongoDB, Agmeta can deliver a CSAT score that measures the customers’ true takeaway from the call. The CSAT score is a multi-dimensional score that takes into account areas including resolution (as the customer sees it), politeness, the onus on the customer, and many other attributes. In the short term, the primary use for this technology is to detect and flag those customers at risk of churn, filing a charge dispute with their card provider, or potentially upselling, giving businesses an opportunity to “see” what they could never find out before. “When we talk to customers, the number one thing they are concerned about is customer churn. Right now they operate completely blind with no idea why people are leaving them,” said Samir. “One large telecoms customer Agmeta is in talks with had no idea where their churn was happening. But when we described being able to assign every customer a CSAT score, they were very excited,” he added. And it’s not just about preventing churn. Businesses can identify happy customers too, targeting them for upsell opportunities. “One of the things we do is spot patterns of unanswered questions from product support interactions,” Samir added. “When we see ‘Oh look, suddenly there are a lot more calls because of a release,’ then we can flag this to product teams as a must-fix issue.” The future of customer service Agmeta aims to amalgamate customer information with current and past experiences to provide businesses a more holistic±and nuanced—picture of their customers, and more precise next steps they can take. “What we want to do is look back in time and see what else happened with this customer,” Samir said. “The goal is to provide businesses with targeted directives to minimize churn and grow customer lifetime value.” Retrieval-augmented generation plays a key role in Agmeta’s vision. This also means an expanded role for both MongoDB’s vector database as the source of information against which semantic searches can be run, as well as Gemini for both analysis and presentation of the directives for the business. You can learn more about how innovators across the world are using MongoDB by reviewing our Building AI case studies . If your team is building AI apps, sign up for the AI Innovators Program . Successful companies get access to free Atlas credits and technical enablement, as well as connections into the broader AI ecosystem. Additionally, if your company is interested in being featured in a story like this, we'd love to hear from you! Reach out to us at ai_adopters@mongodb.com . Head over to our quick-start guide to get started with Atlas Vector Search today.

September 10, 2024
Artificial Intelligence

Atlas Stream Processing: A Cost-Effective Way to Integrate Kafka and MongoDB

Developers around the world use Apache Kafka and MongoDB together to build responsive, modern applications. There are two primary interfaces for integrating Kafka and MongoDB. In this post, we’ll introduce these interfaces and highlight how Atlas Stream Processing offers an easy developer experience, cost savings, and performance advantages when using Apache Kafka in your applications. First, we will provide some background. The Kafka Connector For many years, MongoDB has offered the MongoDB Connector for Kafka (Kafka Connector). The Kafka Connector enables the movement of data between Apache Kafka and MongoDB, and thousands of development teams use it. While it supports simple message transformation, developers largely handle data processing with separate downstream tools. Atlas Stream Processing More recently , we announced Atlas Stream Processing—a native stream processing solution in MongoDB Atlas. Atlas Stream Processing is built on the document model and extends the MongoDB Query API to give developers a powerful, familiar way to connect to streams of data and perform continuous processing. The simplest stream processors act similarly to the primary Kafka Connector use case, helping developers move data from one place to another, whether from Kafka to MongoDB or vice versa. Check out an example: // Connect to MongoDB Atlas database using $source. s = { $source: { connectionName: 'myAtlasCluster', db: myDB', coll: ‘myCollection’ } } // Write your data to a Kafka topic using $emit. e = { $emit: { connectionName: 'myKafkaConnection', topic: myTopic } } // Create your processor and start it! sp.createStreamProcessor("mongoDBtoKafka", [s,e]) sp.mongoDBToKafka.start() Beyond making data movement easy, Atlas Stream Processing enables advanced stream processing use cases not possible in the Kafka Connector. One common use case is enriching your event data by using $lookup as a stage in your stream processor. In the example above, a developer can perform this enrichment by simply adding a lookup stage in the pipeline between source and sink. While the Kafka Connector can perform some single message transformations, Atlas Stream Processing makes for both an easier overall experience and gives teams the ability to perform much more complex processing. Choosing the right solution for your needs It’s important to note that Atlas Stream Processing was built to simplify complex, continuous processing and streaming analytics rather than as a replacement for the Kafka Connector. However, even for the more basic data movement use cases referenced above, it provides a new alternative to the Kafka Connector. The decision will depend on data movement and processing needs. Three common considerations we see teams making to help with this choice are ease of use, performance, and cost. Ease of use The Kafka Connector runs on Kafka Connect. If your team already heavily uses Kafka Connect across many systems beyond MongoDB, this may be a good reason to keep it in place. However, many teams find configuring, monitoring, and maintaining connectors costly and cumbersome. In contrast, Atlas Stream Processing is a fully managed service integrated into MongoDB Atlas. It prioritizes ease of use by leveraging the MongoDB Query API to process your event data continuously. Atlas Stream Processing balances simplicity (no managing servers, utilizing other cloud platforms, or learning new tools) and processing power to reduce development time, decrease infrastructure and maintenance costs, and build applications quicker. Performance High performance is increasingly a priority with all data infrastructure, but it’s often a must-have for use cases that rely on streams of event data (commonly from Apache Kafka) to deliver an application feature. Many of our early customers have found Atlas Stream Processing more performant than similar data movement in their Kafka Connector configurations. By connecting directly to your data in Kafka and MongoDB and acting on it as needed, Atlas Stream Processing eliminates the need for a tool in-between. Cost Finally, managing costs is a critical consideration for all development teams. We’ve priced Atlas Stream Processing competitively when compared to typical Kafka Connector configurations. Most hosted Kafka providers charge per task. That means each additional source and sink will generate a separate data transfer and storage cost that linearly scales as you expand. Atlas Stream Processing charges per Stream Processing Instance (SPI) worker and each worker supports up to four stream processors. This means potential cost savings when running similar configurations to the Kafka Connector. See more details in the documentation . Atlas Stream Processing launched just a few months ago. Developers are already using it for a wide range of use cases, like managing real-time inventories, serving contextually relevant recommendations, and optimizing yields in industrial manufacturing facilities. We can’t wait to see what you build and hear about your experience! Ready to get started? Log in to Atlas today. Already a Kafka Connector user? Dig into even more details and get started using our tutorial .

September 9, 2024
Updates

Ready to get Started with MongoDB Atlas?

Start Free