Building AI with MongoDB: Cultivating Trust with Data

Mat Keep
October 3, 2023 | Updated: February 6, 2024
#genAI #Vector Search

“Trust is like the air we breathe – when it’s present, nobody really notices; when it’s absent, everybody notices.” - Warren Buffett

The issue of trust is one that dominates discussions around the safe and responsible adoption of AI across business and society. It was another Warren - this time Warren Bennis, a pioneer in modern leadership principles – who was attributed as saying "Trust is the lubrication that makes it possible for organizations to work." Particularly relevant when we think about how organizations are starting to embed AI into the very fabric of their businesses.

On one hand, we have governments around the world that are at varying stages of regulating their way to trustworthy AI. However, this will not be a quick process, and enterprises can’t afford to wait. Businesses need to make progress now if they are going to unlock the opportunities presented by AI.

In our latest roundup of AI innovators building with MongoDB, we’re going to focus on three companies tackling trust from different angles. We feature Nomic who are working to make AI more explainable. Robust Intelligence is focused on securing AI models against prompt injections, data poisoning, bias, PII leakage, and more. Finally, VISO TRUST comes at this issue from a totally different perspective. They use AI to help their customers reduce cybersecurity risks and improve trust across the supply chain.

Let's dig in.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

Making AI explainable and accessible

Despite the huge advances in AI and its use in almost every industry, very little is known about how the most popular models actually work. What data are they trained on? What are they learning? How can we compare accuracy between different models? These are the questions Nomic AI is seeking to help us answer through its Atlas and GPT4All products.

Nomic Atlas is a data engine that allows users to explore, label, search, share, and build on massive datasets using their web browser. With Atlas, users can begin to understand what data their chosen AI models are learning from and the associations they are making during the training phase. Atlas can be used for exploratory data analysis, data labeling and cleansing, and visualizations of vector embeddings.

To see Nomic Atlas in action, take a look at the recent blog post with Hugging Face announcing IDEFICS, an open-access reproduction of the visual language model based on Flamingo. The model takes image and text inputs and produces text outputs from them. For example, it can answer questions about images, describe visual content, and create stories grounded in multiple images. Nomic allows users to visually explore the content of the training data, as illustrated in the image below.

Screenshot of from Nomic's Blog. Showcasing how Nomic's platform can help you visually explore content

Atlas can be used to curate high-quality training and instruction-tuned datasets for the GPT4All models. Nomic GPT4All is an ecosystem for training and deploying powerful and customized large language models that run locally on consumer-grade CPUs in Windows, Mac, and Ubuntu Linux clients. With GPT4All, users have access to a free-to-use, locally running, privacy-aware chatbot that doesn’t require expensive and scarce GPUs to train and infer on, or an internet connection. It can power question-answering systems, personal writing assistants, document summarization, and code generation. Demand for GPT4All has been explosive, accruing more than 20,000 GitHub stars within its first week of launch.

“Every month MongoDB is adding hundreds of organizations and thousands of developers who are building AI-enabled apps on its multi-cloud developer data platform,” said Brandon Duderstadt, CEO of Nomic. “It makes sense for us to partner with MongoDB Ventures. They are helping us accelerate our vision of making AI explainable and accessible to everyone.”

Update, February 6th 2024:

On February 1, 2024, Nomic released its Nomic Embed open-source embedding model and a fully managed inference endpoint. This allows anyone to build their own powerful RAG applications for generative AI using a text embedding model with a 8,192 context-length that outperforms proprietary alternatives on a variety of benchmarks.

To demonstrate its new endpoint and model in action, the Nomic engineers created the Building a RAG LLM with Nomic Embed and MongoDB. By following the blog post, you will learn:

How to use Nomic to generate embeddings for your data sources.
Add them to MongoDB Atlas Vector Search. (Note that this runs in the Atlas free tier, so there is no cost to you!)
Use an open-source LLM to generate text from your retrieved documents.

Because you have access to the code and data behind the Nomic Embed model, you can easily customize it for even better performance.

Securing generative AI, supercharged by your data

Robust Intelligence delivers end-to-end AI risk management to protect organizations from security, ethical, and operational risks. The company’s platform automates testing and compliance across the AI lifecycle through continuous validation and protects models in real-time with AI Firewall. This combined approach enables Robust Intelligence to proactively manage risk for any model type, including generative AI and gives organizations the confidence to unleash the true potential of AI. Robust Intelligence is trusted by leading companies including ADP, JPMorgan Chase, Expedia, Deloitte, PwC, and the U.S. Department of Defense.

Recent advancements in generative AI have motivated companies to experiment with potential applications, but a lack of security controls has exposed companies to unmanaged risks. This challenge is exacerbated when sensitive company information is used to enrich pre-trained models, such as connecting vector databases, in order to increase the relevance to the end user.

Robust Intelligence’s AI Firewall protects large language models (LLMs) in production by validating inputs and outputs in real-time. It assesses and mitigates operational risks such as hallucinations; ethical risks, including model bias and toxic outputs; and security risks such as prompt injections and PII extraction. AI Firewall stops bad or malicious inputs from reaching AI models and prevents undesired AI-generated results from reaching the application.

Customers can confidently connect MongoDB Atlas Vector Search to any commercial or open-source LLM for secure retrieval-augmented generation with the AI Firewall integration. Atlas Vector Search serves as the memory and fact database for AI Firewall, ensuring the AI model provides enriched responses without hallucinating. Additionally, it serves as the memory and database to store historical data points. This is important in the context of identifying more advanced security attacks, such as data poisoning and model extraction, which often manifest across a cluster of data points as opposed to a single data point.

Yaron Singer, CEO and co-founder at Robust Intelligence commented “By incorporating MongoDB’s Atlas Vector Search into the AI validation process, customers can confidently use their databases to enhance LLM responses knowing that sensitive information will remain secure. The integration provides seamless protection against a comprehensive set of security, ethical, and operational risks.”

Graphic showing the flow of information into and from the Vector Search, core and metadata store.

Being part of the MongoDB Partner Program provides Robust Intelligence with access to specialist technical support to optimize product integrations and provides visibility to the MongoDB customer base.

Transforming cyber risk intelligence

VISO TRUST is an AI-powered third-party cyber risk and trust platform that enables any company to access actionable vendor security information in minutes. VISO TRUST delivers fast and accurate intelligence needed to make informed cybersecurity risk decisions at scale. Today VISO TRUST has many great enterprise customers like InstaCart, Gusto, and Upwork and they all say the same thing: 90% less work, 80% reduction in time to assess risk, and near 100% vendor adoption.

How does VISO TRUST achieve these results? Pierce Lamb, Senior Software Engineer on the Data and Machine Learning team at VISO TRUST provides more detail:

“VISO TRUST Platform easily engages third parties, saving everyone time and resources. In a 5-minute web-based session, third parties are prompted to upload relevant artifacts of the security program that already exists, and our supervised AI – which we call Artifact Intelligence – does the rest.

First, VISO TRUST deploys discriminator models that produce high-confidence predictions about features of the artifact.
Secondly, artifacts have text content parsed out of them which we embed and store in MongoDB Atlas to become part of our dense retrieval system. This dense retrieval system performs Retrieval-Augmented Generation (RAG) using MongoDB features like Atlas Vector Search to provide ranked context to large language model (LLM) prompts.
Thirdly, we use RAG results to seed LLM prompts and chain together their outputs to produce extremely accurate factual information about the artifact in the pipeline. This information is able to provide instant intelligence to customers that previously took weeks to produce.”

Screenshot of the VISO Trust dashboard displaying analytical insights

VISO TRUST is the only SaaS third-party cyber risk management platform that delivers the rapid security intelligence needed for modern companies to make critical risk decisions early in the procurement process

VISO TRUST uses state-of-the-art models from OpenAI, Hugging Face, Anthropic, Google, and AWS, augmented by vector search and retrieval from MongoDB Atlas. Read our interview blog post with VISO TRUST to learn more.

What's next?

If you are getting started with building AI-enabled apps on MongoDB, sign up for our AI Innovators Program. Successful applicants get access to expert technical advice, free MongoDB Atlas credits, co-marketing opportunities, and – for eligible startups, introductions to potential venture investors.

In the spirit of "Trust, but verify" (Ronald Reagan), if you’re not sure how the program or indeed, MongoDB, could deliver value to you, take a look at earlier blog posts in this series:

Building AI with MongoDB: first qualifiers include AI at the network edge for computer vision and augmented reality; risk modeling for public safety; and predictive maintenance paired with Question-answer generation for maritime operators.
Building AI with MongoDB: compliance to copilots features AI in healthcare along with intelligent assistants that help product managers specify better products and help sales teams compose emails that convert 2x higher.
Building AI with MongoDB: unlocking value from multimodal data showcases open source libraries that transform unstructured data into a usable JSON format; entity extraction for contracts management; and making sense of “dark data” to build customer service apps.

You should look at the MongoDB for Artificial Intelligence resources page for the latest best practices that get you started in turning your idea into an AI-driven reality.

← Previous

Melhores práticas de desempenho: indexação

Bem-vindo ao terceiro de nossa série de postagens de blog que abordam as práticas recomendadas de desempenho para MongoDB. Nesta série, abordamos as principais considerações para alcançar o desempenho em escala em uma série de dimensões importantes, incluindo: Modelagem de dados e dimensionamento de memória (o conjunto de trabalho) Padrões de consulta e criação de perfil Indexação, que abordaremos hoje Fragmentação Transações e preocupações de leitura/gravação Configuração de hardware e sistema operacional Aquecimento de bancada Tendo ambos trabalhado para alguns fornecedores de bancos de dados diferentes nos últimos 15 anos, podemos dizer com segurança que não definir os índices apropriados é o principal problema de desempenho que as equipes de suporte técnico precisam resolver com os usuários. Portanto, precisamos acertar… aqui estão as melhores práticas para ajudá-lo. Índices no MongoDB Em qualquer banco de dados, os índices suportam a execução eficiente de consultas. Sem eles, o banco de dados deve examinar todos os documentos de uma collection ou tabela para selecionar aqueles que correspondem à instrução da consulta. Se existir um índice apropriado para uma consulta, o banco de dados poderá usar o índice para limitar o número de documentos que deve inspecionar. O MongoDB oferece uma ampla variedade de tipos de índices e recursos com ordens de classificação específicas de linguagem para oferecer suporte a padrões de acesso complexos aos seus dados. Os índices MongoDB podem ser criados e eliminados sob demanda para acomodar requisitos de aplicativos e padrões de consulta em evolução e podem ser declarados em qualquer campo de seus documentos, incluindo campos aninhados em matrizes. Então, vamos abordar como você faz o melhor uso dos índices no MongoDB. Use índices compostos Índices compostos são índices compostos por vários campos diferentes. Por exemplo, em vez de ter um índice em "Sobrenome" e outro em "Nome", normalmente é mais eficiente criar um índice que inclua "Sobrenome" e "Nome" se você consultar ambos os nomes. . Nosso índice composto ainda pode ser usado para filtrar consultas que especificam apenas o sobrenome. Siga a regra ESR Para índices compostos, esta regra prática é útil para decidir a ordem dos campos no índice: Primeiro, adicione os campos nos quais as consultas de igualdade são executadas Os próximos campos a serem indexados devem refletir a ordem de classificação da consulta Os últimos campos representam o intervalo de dados a serem acessados Use consultas cobertas quando possível As consultas cobertas retornam resultados diretamente de um índice, sem precisar acessar os documentos de origem e, portanto, são muito eficientes. Para que uma consulta seja coberta todos os campos necessários para filtrar, ordenar e/ou retornar ao cliente devem estar presentes em um índice. Para determinar se uma consulta é coberta, use o método explain() . Se a saída de explain() exibir totalDocsExamined como 0, isso mostra que a consulta é coberta por um índice. Leia mais na documentação para explicar os resultados . Um problema comum ao tentar obter consultas cobertas é que o campo ID é sempre retornado por padrão. Você precisa excluí-lo explicitamente dos resultados da consulta ou adicioná-lo ao índice. Em clusters fragmentados, o MongoDB precisa acessar internamente os campos da chave do fragmento. Isso significa que as consultas cobertas só são possíveis quando a chave de fragmento faz parte do índice. Geralmente é uma boa ideia fazer isso de qualquer maneira. Tenha cuidado ao considerar índices em campos de baixa cardinalidade Consultas em campos com um pequeno número de valores exclusivos (baixa cardinalidade) podem retornar grandes conjuntos de resultados. Os índices compostos podem incluir campos com baixa cardinalidade, mas o valor dos campos combinados deve apresentar alta cardinalidade. Elimine índices desnecessários Os índices consomem muitos recursos: mesmo com compactação no mecanismo de armazenamento MongoDB WiredTiger, eles consomem RAM e disco. À medida que os campos são atualizados, os índices associados devem ser mantidos, incorrendo em sobrecarga adicional de CPU e E/S de disco. O MongoDB fornece ferramentas para ajudá-lo a entender o uso do índice, que abordaremos mais adiante nesta postagem. Os índices curinga não substituem o planejamento de índices baseado em carga de trabalho Para cargas de trabalho com muitos padrões de consulta ad hoc ou que lidam com estruturas de documentos altamente polimórficas, os índices curinga oferecem muita flexibilidade extra. Você pode definir um filtro que indexe automaticamente todos os campos, subdocumentos e matrizes correspondentes em uma collection. Como acontece com qualquer índice, eles também precisam ser armazenados e mantidos, portanto, adicionarão sobrecarga ao banco de dados. Se os padrões de consulta do seu aplicativo forem conhecidos antecipadamente, você deverá usar índices mais seletivos nos campos específicos acessados pelas consultas. Use a pesquisa de texto para combinar palavras dentro de um campo Os índices regulares são úteis para combinar o valor inteiro de um campo. Se você deseja corresponder apenas uma palavra específica em um campo com muito texto, use um índice de texto . Se você estiver executando o MongoDB no serviço Atlas, considere usar o Atlas Full Text Search , que fornece um índice Lucene totalmentemanaged e integrado ao banco de dados MongoDB. O FTS oferece maior desempenho e maior flexibilidade para filtrar, classificar e classificar seu banco de dados para exibir rapidamente os resultados mais relevantes para seus usuários. Use índices parciais Reduza o tamanho e a sobrecarga de desempenho dos índices incluindo apenas os documentos que serão acessados por meio do índice. Por exemplo, crie um índice parcial no campo orderID que inclua apenas documentos de pedido com um orderStatus de "Em andamento" ou indexe apenas o campo emailAddress para documentos onde ele existir. Aproveite as vantagens dos índices multichave para consultar matrizes Se seus padrões de consulta exigirem acesso a elementos individuais da matriz, use um índice multichave . O MongoDB cria uma chave de índice para cada elemento do array e pode ser construído sobre arrays que contêm valores escalares e documentos aninhados. Evite expressões regulares que não estejam ancoradas ou enraizadas Os índices são ordenados por valor. Os curingas iniciais são ineficientes e podem resultar em varreduras completas do índice. Os curingas finais podem ser eficientes se houver caracteres iniciais que diferenciam maiúsculas de minúsculas suficientes na expressão. Evite expressões regulares que não diferenciam maiúsculas de minúsculas Se o único motivo para usar um regex for a insensibilidade a maiúsculas e minúsculas, use um índice que não diferencia maiúsculas de minúsculas , pois eles são mais rápidos. Use otimizações de índice disponíveis no mecanismo de armazenamento WiredTiger Se você estiver autogerenciando o MongoDB, poderá opcionalmente colocar índices em seu próprio volume separado, permitindo paginação de disco mais rápida e menor contenção. Consulte as opções WiredTiger para obter mais informações. Use o Plano Explicar Abordamos o uso do plano de explicação do MongoDB na postagem anterior sobre padrões de consulta e criação de perfil, e esta é a melhor ferramenta para verificar a cobertura do índice para consultas individuais. Trabalhando a partir do plano de explicação, o MongoDB fornece ferramentas de visualização para ajudar a melhorar ainda mais a compreensão de seus índices e fornece recomendações inteligentes e automáticas sobre quais índices adicionar. Visualize a cobertura do índice com MongoDB Compass e Atlas Data Explorer Como a GUI gratuita do MongoDB Compass oferece muitos recursos para ajudá-lo a otimizar o desempenho da consulta, incluindo a exploração do seu esquema e a visualização dos planos de explicação da consulta – duas áreas abordadas anteriormente nesta série. A guia de índices do Compass adiciona outra ferramenta ao seu arsenal. Ele lista os índices existentes para uma collection, informando o nome e as chaves do índice, juntamente com seu tipo, tamanho e quaisquer propriedades especiais. Através da guia de índice você também pode adicionar e eliminar índices conforme necessário. Um recurso realmente útil é o uso do índice, que mostra com que frequência um índice foi usado. Ter muitos índices pode ser quase tão prejudicial ao seu desempenho quanto ter poucos, tornando esse recurso especialmente valioso para ajudá-lo a identificar e remover índices que não estão sendo usados. Isso ajuda a liberar espaço no conjunto de trabalho e elimina a sobrecarga do banco de dados resultante da manutenção do índice. Se você estiver executando o MongoDB em nosso serviço Atlas totalmentemanaged , a visualização dos índices no Data Explorer lhe dará a mesma funcionalidade do Compass, sem que você precise se conectar ao seu banco de dados com uma ferramenta separada. Você também pode recuperar estatísticas de índice usando o estágio aggregation pipeline $indexStats . Recomendações de índice automatizado Mesmo com toda a telemetria fornecida pelas ferramentas do MongoDB, você ainda é responsável por extrair e analisar os dados necessários para tomar decisões sobre quais índices adicionar. O limite para consultas lentas varia com base no tempo médio de operações no seu cluster para fornecer recomendações pertinentes à sua carga de trabalho. Os índices recomendados são acompanhados por consultas de amostra, agrupadas por formato de consulta (ou seja, consultas com estrutura de predicado, classificação e projeção semelhantes), que foram executadas em uma collection que se beneficiaria com a adição de um índice sugerido. O Performance Advisor não afeta negativamente o desempenho do seu Atlas cluster. Se você estiver satisfeito com a recomendação, poderá implementar os novos índices automaticamente, sem incorrer em tempo de inatividade do aplicativo. Qual é o próximo Isso encerra esta última edição da série de práticas recomendadas de desempenho. A MongoDB University oferece um curso de treinamento gratuito baseado na Web sobre o desempenho do MongoDB . Esta é uma ótima maneira de aprender mais sobre o poder da indexação.

October 2, 2023

Next →

MongoDB Named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems

I’m pleased to announce that MongoDB has been named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMSs) for the third consecutive year. In our view, this recognition cements MongoDB’s status as the only pure-play database provider in the cloud database management system category, underscoring MongoDB’s innovation, execution, and customer-centric approach. According to Gartner, “The cloud DBMS market remains as vibrant as ever and is transforming in important ways, especially in the use of gen AI and how DBMSs interact with other data management components. This Magic Quadrant will help data and analytics leaders make the right cloud DBMS choices in this essential market.” We believe this continued recognition by Gartner is a testament to MongoDB’s commitment to serving developers, as well as the investments we’ve made in our unified platform and integrated services. Driving innovation for enterprises MongoDB's mission is to empower innovators to create, transform, and disrupt industries by unleashing the power of software and data. 2024 was a year of innovation and accolades at MongoDB, and I’m proud to share some of its highlights: In October, we released MongoDB 8.0 , the best performing version of MongoDB yet. MongoDB 8.0 is over 30% faster than the previous version of the database, it’s more secure than ever, horizontal scaling is faster and easier (at a lower cost), and MongoDB 8.0 gives teams greater control for optimizing database performance. We also launched—and grew—the MongoDB AI Applications Program (MAAP) . With MAAP, MongoDB offers customers a full AI stack and an integrated set of professional services to help them keep pace with innovation, identify the best AI use cases, and to help them future-proof AI investments. MongoDB became a founding member of the U.S. Artificial Intelligence Safety Institute Consortium . Established by the U.S. Department of Commerce’s National Institute of Standards and Technology, the Consortium supports the development and deployment of safe and trustworthy AI. MongoDB released hundreds of features and enhancements to accelerate innovation, manage costs, and simplify building applications at scale. MongoDB was recognized as the most loved vector database in Retool’s State of AI report —for the second consecutive year. The Gartner Magic Quadrant for cloud database management systems “Gartner defines the cloud database management systems (DBMSs) market as solutions designed to store, manipulate, and persist data, primarily delivered as Software-as-a-Service (SaaS). These systems must support transactional, analytical, and hybrid workloads while enabling enterprises to innovate across multi-cloud, hybrid, and intercloud ecosystems.” 1 It’s our opinion that this recognition by Gartner is a testament to MongoDB’s strong ability to execute and support customers today, as well as MongoDB’s comprehensive product vision that positions our platform to support tomorrow's operational workloads. What is the Magic Quadrant, and what is a Leader? “A Gartner Magic Quadrant is a culmination of research in a specific market, giving you a wide-angle view of the relative positions of the market’s competitors. By applying a graphical treatment and a uniform set of evaluation criteria, a Magic Quadrant helps you quickly ascertain how well technology providers are executing their stated visions and how well they are performing against Gartner’s market view.” 2 According to Gartner, “Leaders execute well against their current vision and are well positioned for tomorrow.” Overall, Magic Quadrants can help you “get quickly educated about a market’s competing technology providers and their ability to deliver on what end-users require now and in the future.” Powering innovation at scale with MongoDB Atlas Enterprises choose MongoDB Atlas because it gives them the freedom and agility they need to succeed in a rapidly evolving digital landscape. MongoDB Atlas’s multi-cloud architecture—including availability across Amazon Web Services, Google Cloud, and Microsoft Azure—ensures customers can design for unmatched scale and resilience. By automating functions like scaling and performance optimization , and giving them the ability to leverage industry-first capabilities like MongoDB Queryable Encryption (which allows customers to encrypt, store, and perform queries directly on data), with MongoDB Atlas customers can spend less time managing infrastructure and more time delivering experiences. MongoDB Atlas’s integrated capabilities to support multi-modal data types and use cases—like full-text and vector search , stream processing , and data federation —accelerate innovation, helping enterprises quickly respond to market changes, power AI-driven insights, and deliver meaningful digital experiences to their end users—all without the burden of operational complexity. Modernizing and building for the future In our opinion, the Gartner Magic Quadrant provides organizations with a clear and accessible evaluation framework to identify solutions that fit their needs, today and tomorrow. The placement of MongoDB in the Leader quadrant for Cloud Database Management Systems—for the third year in a row!—validates the efforts MongoDB has made to help developers and organizations take advantage of their most valuable resource, their data. I talk to MongoDB customers frequently, and many say the same thing: in today’s digital-first economy, AI-powered applications and scalable data infrastructure aren’t just advantages, they’re absolute necessities. They say that the time to act is now, and they’re looking for solutions that will help them innovate, streamline, and seize the AI-driven future. And when it comes to modernizing their operations, they consistently point to MongoDB as their go-to partner. Begin your cloud journey with MongoDB Atlas today. Contact our sales team or register for a free account to begin building! And to learn how MongoDB can help accelerate your AI journey, visit the MongoDB AI Applications Program page. Footnotes Gartner, Magic Quadrant for Cloud Database Management Systems, Henry Cook, Ramke Ramakrishnan, et al., 18 December 2024 GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally, and MAGIC QUADRANT is a registered trademark of Gartner, Inc. and/or its affiliates and are used herein with permission. All rights reserved. Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. 1 Gartner Peer Insights, Cloud Database Management Systems, December 2024 https://www.gartner.com/reviews/market/cloud-database-management-systems 2 Gartner Research Methodologies, Gartner Magic Quadrant, 20 December 2024 https://www.gartner.com/en/research/methodologies/magic-quadrants-research

December 23, 2024