Building AI with MongoDB: Cultivating Trust with Data

Mat Keep

#genAI#Vector Search

“Trust is like the air we breathe – when it’s present, nobody really notices; when it’s absent, everybody notices.” - Warren Buffett

The issue of trust is one that dominates discussions around the safe and responsible adoption of AI across business and society. It was another Warren - this time Warren Bennis, a pioneer in modern leadership principles – who was attributed as saying "Trust is the lubrication that makes it possible for organizations to work." Particularly relevant when we think about how organizations are starting to embed AI into the very fabric of their businesses.

On one hand, we have governments around the world that are at varying stages of regulating their way to trustworthy AI. However, this will not be a quick process, and enterprises can’t afford to wait. Businesses need to make progress now if they are going to unlock the opportunities presented by AI.

In our latest roundup of AI innovators building with MongoDB, we’re going to focus on three companies tackling trust from different angles. We feature Nomic who are working to make AI more explainable. Robust Intelligence is focused on securing AI models against prompt injections, data poisoning, bias, PII leakage, and more. Finally, VISO TRUST comes at this issue from a totally different perspective. They use AI to help their customers reduce cybersecurity risks and improve trust across the supply chain.

Let's dig in.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

Making AI explainable and accessible

Despite the huge advances in AI and its use in almost every industry, very little is known about how the most popular models actually work. What data are they trained on? What are they learning? How can we compare accuracy between different models? These are the questions Nomic AI is seeking to help us answer through its Atlas and GPT4All products.

Nomic Atlas is a data engine that allows users to explore, label, search, share, and build on massive datasets using their web browser. With Atlas, users can begin to understand what data their chosen AI models are learning from and the associations they are making during the training phase. Atlas can be used for exploratory data analysis, data labeling and cleansing, and visualizations of vector embeddings.

To see Nomic Atlas in action, take a look at the recent blog post with Hugging Face announcing IDEFICS, an open-access reproduction of the visual language model based on Flamingo. The model takes image and text inputs and produces text outputs from them. For example, it can answer questions about images, describe visual content, and create stories grounded in multiple images. Nomic allows users to visually explore the content of the training data, as illustrated in the image below.

Screenshot of from Nomic's Blog. Showcasing how Nomic's platform can help you visually explore content

Atlas can be used to curate high-quality training and instruction-tuned datasets for the GPT4All models. Nomic GPT4All is an ecosystem for training and deploying powerful and customized large language models that run locally on consumer-grade CPUs in Windows, Mac, and Ubuntu Linux clients. With GPT4All, users have access to a free-to-use, locally running, privacy-aware chatbot that doesn’t require expensive and scarce GPUs to train and infer on, or an internet connection. It can power question-answering systems, personal writing assistants, document summarization, and code generation. Demand for GPT4All has been explosive, accruing more than 20,000 GitHub stars within its first week of launch.

“Every month MongoDB is adding hundreds of organizations and thousands of developers who are building AI-enabled apps on its multi-cloud developer data platform,” said Brandon Duderstadt, CEO of Nomic. “It makes sense for us to partner with MongoDB Ventures. They are helping us accelerate our vision of making AI explainable and accessible to everyone.”

Update, February 6th 2024:

On February 1, 2024, Nomic released its Nomic Embed open-source embedding model and a fully managed inference endpoint. This allows anyone to build their own powerful RAG applications for generative AI using a text embedding model with a 8,192 context-length that outperforms proprietary alternatives on a variety of benchmarks.

To demonstrate its new endpoint and model in action, the Nomic engineers created the Building a RAG LLM with Nomic Embed and MongoDB. By following the blog post, you will learn:

  1. How to use Nomic to generate embeddings for your data sources.

  2. Add them to MongoDB Atlas Vector Search. (Note that this runs in the Atlas free tier, so there is no cost to you!)

  3. Use an open-source LLM to generate text from your retrieved documents.

Because you have access to the code and data behind the Nomic Embed model, you can easily customize it for even better performance.

Securing generative AI, supercharged by your data

Robust Intelligence delivers end-to-end AI risk management to protect organizations from security, ethical, and operational risks. The company’s platform automates testing and compliance across the AI lifecycle through continuous validation and protects models in real-time with AI Firewall. This combined approach enables Robust Intelligence to proactively manage risk for any model type, including generative AI and gives organizations the confidence to unleash the true potential of AI. Robust Intelligence is trusted by leading companies including ADP, JPMorgan Chase, Expedia, Deloitte, PwC, and the U.S. Department of Defense.

Recent advancements in generative AI have motivated companies to experiment with potential applications, but a lack of security controls has exposed companies to unmanaged risks. This challenge is exacerbated when sensitive company information is used to enrich pre-trained models, such as connecting vector databases, in order to increase the relevance to the end user.

Robust Intelligence’s AI Firewall protects large language models (LLMs) in production by validating inputs and outputs in real-time. It assesses and mitigates operational risks such as hallucinations; ethical risks, including model bias and toxic outputs; and security risks such as prompt injections and PII extraction. AI Firewall stops bad or malicious inputs from reaching AI models and prevents undesired AI-generated results from reaching the application.

Customers can confidently connect MongoDB Atlas Vector Search to any commercial or open-source LLM for secure retrieval-augmented generation with the AI Firewall integration. Atlas Vector Search serves as the memory and fact database for AI Firewall, ensuring the AI model provides enriched responses without hallucinating. Additionally, it serves as the memory and database to store historical data points. This is important in the context of identifying more advanced security attacks, such as data poisoning and model extraction, which often manifest across a cluster of data points as opposed to a single data point.

Yaron Singer, CEO and co-founder at Robust Intelligence commented “By incorporating MongoDB’s Atlas Vector Search into the AI validation process, customers can confidently use their databases to enhance LLM responses knowing that sensitive information will remain secure. The integration provides seamless protection against a comprehensive set of security, ethical, and operational risks.”

Graphic showing the flow of information into and from the Vector Search, core and metadata store.

Being part of the MongoDB Partner Program provides Robust Intelligence with access to specialist technical support to optimize product integrations and provides visibility to the MongoDB customer base.

Transforming cyber risk intelligence

VISO TRUST is an AI-powered third-party cyber risk and trust platform that enables any company to access actionable vendor security information in minutes. VISO TRUST delivers fast and accurate intelligence needed to make informed cybersecurity risk decisions at scale. Today VISO TRUST has many great enterprise customers like InstaCart, Gusto, and Upwork and they all say the same thing: 90% less work, 80% reduction in time to assess risk, and near 100% vendor adoption.

How does VISO TRUST achieve these results? Pierce Lamb, Senior Software Engineer on the Data and Machine Learning team at VISO TRUST provides more detail:

“VISO TRUST Platform easily engages third parties, saving everyone time and resources. In a 5-minute web-based session, third parties are prompted to upload relevant artifacts of the security program that already exists, and our supervised AI – which we call Artifact Intelligence – does the rest.

  • First, VISO TRUST deploys discriminator models that produce high-confidence predictions about features of the artifact.

  • Secondly, artifacts have text content parsed out of them which we embed and store in MongoDB Atlas to become part of our dense retrieval system. This dense retrieval system performs Retrieval-Augmented Generation (RAG) using MongoDB features like Atlas Vector Search to provide ranked context to large language model (LLM) prompts.

  • Thirdly, we use RAG results to seed LLM prompts and chain together their outputs to produce extremely accurate factual information about the artifact in the pipeline. This information is able to provide instant intelligence to customers that previously took weeks to produce.”

Screenshot of the VISO Trust dashboard displaying analytical insights

VISO TRUST is the only SaaS third-party cyber risk management platform that delivers the rapid security intelligence needed for modern companies to make critical risk decisions early in the procurement process

VISO TRUST uses state-of-the-art models from OpenAI, Hugging Face, Anthropic, Google, and AWS, augmented by vector search and retrieval from MongoDB Atlas. Read our interview blog post with VISO TRUST to learn more.

What's next?

If you are getting started with building AI-enabled apps on MongoDB, sign up for our AI Innovators Program. Successful applicants get access to expert technical advice, free MongoDB Atlas credits, co-marketing opportunities, and – for eligible startups, introductions to potential venture investors.

In the spirit of "Trust, but verify" (Ronald Reagan), if you’re not sure how the program or indeed, MongoDB, could deliver value to you, take a look at earlier blog posts in this series:

You should look at the MongoDB for Artificial Intelligence resources page for the latest best practices that get you started in turning your idea into an AI-driven reality.