Building AI with MongoDB: How Metaphor Data Uses Atlas Vector Search to Change the World Through Data

Elliott Gluck
October 10, 2023 | Updated: August 8, 2024
#genAI #Vector Search

Illustration representing Atlas Vector Search

Since announcing MongoDB Atlas Vector Search in preview back in June, we’ve already seen rapid adoption from developers building a wide range of AI-enabled apps. Today we're highlighting another customer who has increased efficiency while removing architectural complexity by adopting Atlas Vector Search.

Metaphor is a search and discovery tool built for data scientists, data engineers, and AI practitioners. The company’s mission is to empower individuals and companies of all types to change the world through data. Metaphor is the next evolution of the Data Catalog with fully automated support for Data Governance, Data Literacy, and Data Enablement using an intuitive user interface.

We recently caught up with Mars Lan, Co-founder and CTO to learn more about the company’s journey with MongoDB and their adoption of Atlas Vector Search.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

Tell us a little bit about your company and what you and the team are building

We’re an early-stage startup with a mission to empower individuals and organizations to change the world through data. We refer to ourselves as the social platform for data and have a range of products that support both data teams but also data consumers. Our main product is a SaaS Data Catalog that enables governance and enablement of data across the organization. We’re a small team of around 15 or so with a keen focus on product and engineering. The company was founded about 2.5 years ago.

What role does search play at the company and where did your search story begin?

Well, I will start by saying that we almost ended up having a very different story to tell you than what actually ended up transpiring! We started off our journey using DocumentDB and Elasticsearch on AWS for our database and search needs. After some time we ran into some scalability issues that caused us to evaluate (and eventually move to) MongoDB Atlas for our database needs. When we saw MongoDB offered Atlas Search which was based on the same underlying Lucene technology we got very excited and began the process of migrating our search efforts over to Atlas — and this eventually laid the groundwork for adopting Atlas Vector Search later on.

So starting with those initial search needs, what got you excited about Atlas Search with MongoDB? What were your use cases?

We started to face a significant amount of maintenance and upkeep associated between our database and Elasticsearch. We previously had to build data pipelines, so if something changed in the database, it would also change in search. Once we eventually migrated everything to MongoDB Atlas Search, we no longer had to manage those pipelines. This resulted in lower latency and less likelihood of bugs, which excited our team.

The other component to this was the scalability disconnect of having two different systems. We realized if we ever needed to spin up more storage or compute, we could just spin up a larger MongoDB cluster and get that extra scalability right away with the Atlas platform. Of course one less thing to worry about is also a huge benefit — Elasticsearch is not the easiest thing to manage, so having it all in MongoDB was another big plus for us.

How did you initially learn about Atlas Vector Search and what piqued your interest?

We started experimenting with Pinecone as the AI stuff really started to explode a while back, just to try out the tool, as one of our interns had initially started playing around with it. It turns out not to be cost-effective to spin up a Pinecone instance for each customer, and quite difficult to scale up due to API throttling.

After some time, we started looking around for other vendors for vector search. However, once we learned that MongoDB had Vector Search we got excited at the prospect of being able to use our existing tech stack for this additional functionality. It quickly became a no-brainer to us — since we knew we were going to move everything to Atlas, it became obvious we should just consolidate everything there, so we ended up migrating to Atlas Vector Search for all of our semantic search needs. This means one query API, one set of dependencies, and build in sync, all in a single platform.

What were the key factors that made you pull the trigger and adopt Atlas Vector Search? What were the problems you were trying to solve?

So one key unlock for us was the semantic search side of things, where someone can ask a natural language question and get a natural language answer. This is a much more preferable user experience for us compared to your Google-style keyword searches.

From day one we always wanted to best serve our core customer the engineer, but another huge constituency for us is the business or non-technical audience. These folks prefer a tool that is more intuitive to use.

To best serve them we have a first-class integration into Slack and Microsoft Teams, so they can ask a question and don't have to go to another place or switch tools to get that answer. We didn’t always have the capability to do the natural language question and response, but with Atlas Vector Search this now becomes possible. Using Vector Search we now have the ability to ask the Slack bot questions like “where can I find this type of data” or “where is this one table on revenue from last quarter and who is using it” and get a natural language response back.

One of the key considerations for us when looking at vendors was cost - but not just cost in terms of what shows up on an invoice. I would rather scale one system and get benefits on both (search and vector search). We saw that having to scale two systems independently was just not going to be very efficient in the long run.

Can you talk about some of the initial benefits you’ve seen so far both on the Atlas Search side as well as with Vector Search specifically? How do you think about and quantify these benefits?

Well one obvious thing that stands out on the search side is increased speed and being able to move quickly. MongoDB in general has a great developer experience. Our data model tends to be highly complex documents, and all the metadata tends to be highly structured and complex, so the MongoDB model fits us very well.

In terms of productivity, it’s never an exact science. I will say that with the adoption of Atlas we were able to keep our engineering team size relatively constant while serving many more customers and scale our development efforts faster — so we probably saw a 2X - 3X increase in productivity.

One last item of note. We adopt the most rigorous security practices because we deal with so much customer data, so we want to ensure the highest security possible. We chose to have dedicated MongoDB clusters per customer, so every customer’s data is totally isolated from each other. When we were on Pinecone, this meant spinning up a new Pinecone pod for each customer, which would be both really hard to do and not at all financially viable. Because we are centralizing this all under MongoDB, it becomes so much easier - you can dynamically scale your cluster sizes up and down depending on the needs or requirements of small vs. large customers. There’s not the sort of waste you’d get with multiple discrete systems.

Getting started

A big thank you to Mars and the entire Metaphor Data team for sharing more about their story and use of Atlas Vector Search.

Want to learn more? Head over to our quick-start guide to get started with Atlas Vector Search today. And if you’re a startup building with AI please check out our MongoDB AI Innovators program for Atlas credits, one-on-one technical advice, access to our partner network, and more!

← Previous

4 grandes motivos para atualizar para o MongoDB 7.0

Ultimamente, temos pegado a estrada e feito notícia em uma série de eventos nas principais cidades do mundo. Um dos grandes destaques é o lançamento do MongoDB 7.0 , que oferece um conjunto abrangente de recursos projetados para agilizar as operações, melhorar o desempenho e aumentar a segurança. Com este lançamento, o MongoDB reafirma-se como a melhor escolha para organizações que buscam aumentar a produtividade de suas equipes de desenvolvimento à medida que constroem aplicações modernas e distribuídas. A versão 7.0 possui todos os recursos lançados nas versões anteriores, com recursos adicionais destinados a facilitar a construção de software pelos desenvolvedores. #1 - Desempenho aprimorado O MongoDB 7.0 traz melhorias significativas para trabalhar com dados Time Series , especialmente conjuntos de dados exigentes e de alto volume de todos os formatos. Essas melhorias resultam em melhor otimização e compactação de armazenamento, bem como melhor desempenho de consulta. Os desenvolvedores experimentarão um manuseio ainda melhor de dados de alta cardinalidade, melhor escalabilidade e desempenho geral; permitindo que você managed dados Time Series de maneira mais eficiente e econômica. Change streams agora oferecerá suporte a casos de uso ainda mais amplos: lidar com alterações em documentos grandes, mesmo com pré-imagens e pós-imagens, sem causar erros inesperados. #2 - Migrações mais suaves As atualizações na Cluster-to-Cluster Sync (mongosync) permitirão uma migração de dados mais eficiente em diversos cenários. Cluster-to-Cluster Sync agora oferece maior flexibilidade na sincronização entre clusters com topologias diferentes, como conjuntos de réplicas a clusters fragmentados. A sincronização filtrada permite sincronizar conjuntos de dados específicos em vez de todo o cluster. Atlas Live Migrate agora oferece suporte a migrações para clusters que executam MongoDB 6.0.4+ entregando migrações mais rápidas e resilientes em casos de interrupção durante o processo de migração. #3 - Experiência simplificada do desenvolvedor Com novos aprimoramentos no aggregation pipeline — incluindo compound wildcard indexes , percentis aproximados e operadores bit a bit — os desenvolvedores podem desfrutar de maior flexibilidade e desempenho na indexação e consulta de dados. Com o MongoDB 7.0, os desenvolvedores também podem implementar variáveis de função de usuário no aggregation pipeline , permitindo que uma única visualização exiba dados diferentes com base nas permissões dos usuários logados. Suporte para atualizações e Time Series collection exclusões refinadas na e novas métricas para ajudar a selecionar uma chave de fragmento ajudam a reduzir o esforço do desenvolvedor e agilizar o processo de desenvolvimento. #4 - Controles de segurança mais fortes O MongoDB 7.0 fortalece os recursos de segurança com Queryable Encryption para ajudar os clientes a criptografar dados confidenciais e executar consultas de igualdade em dados criptografados totalmente aleatórios. As melhorias de segurança garantem que os desenvolvedores possam criar e implantar aplicativos com confiança, sabendo que seus dados estão protegidos e em conformidade com os padrões e protocolos de segurança mais recentes. Porque esperar? Com uma série de novos recursos e melhorias projetados para tornar sua equipe mais produtiva, o MongoDB 7.0 é a escolha perfeita para organizações que buscam levar seu desenvolvimento para o próximo nível. Desde desempenho aprimorado até recursos de segurança mais robustos, o MongoDB 7.0 facilita a construção do próximo grande sucesso. Registre-se no Atlas agora e comece a construir hoje . Se desejar orientação sobre como atualizar para a versão 7.0, nossa equipe de serviços profissionais oferece suporte de atualização para ajudar a garantir uma transição tranquila. Para saber mais, consulte Consultoria MongoDB .

October 5, 2023

Next →

Securing Digital Transformation with MongoDB and RegData

Data security and privacy have long been paramount to the financial industry, but they are especially critical for institutions undergoing digital transformations or those implementing new technology. For example, the integration of artificial intelligence (AI) and machine learning (ML) into organizations’ infrastructure and offerings introduces security and privacy complexities, making it all the more essential for financial organizations to safeguard sensitive information while complying with regulations. The consequences of a data breach are extensive and significantly impactful. These incidents have transformed from simple cybersecurity concerns into catalysts for financial losses, reputational harm, legal challenges, regulatory penalties, and a significant decline in consumer trust. Even with an increased focus on data security, organizations must adopt modern data architecture to effectively mitigate these risks. For example, using a database solution like MongoDB with built-in encryption, role-based access control, and audit logging can help organizations safeguard sensitive data and respond proactively to potential vulnerabilities. The challenge of data security in finance Financial institutions face numerous challenges in protecting data integrity during modernization efforts. The increasing sophistication of cyberattacks, coupled with the need to comply with evolving regulations like the General Data Protection Regulation (GDPR) and the Digital Operational Resilience Act (DORA), creates a complex environment for data management. Institutions must also navigate technical sprawl, where diverse applications and data management systems complicate compliance and operational efficiency. Addressing these challenges requires a holistic approach that integrates data protection into the core design of digital transformation initiatives. Financial institutions need to adopt robust data management practices, ensure the encryption of sensitive data, and maintain vigilant cybersecurity measures. Collaboration with trusted third-party vendors, adopting a privacy-first strategy, and complying with global data protection regulations are essential steps toward safeguarding data privacy in this rapidly evolving digital landscape. Discover how the RegData Protection Suite (RPS), built on MongoDB , enables you to balance technological advancement with regulatory requirements. The solution: MongoDB and RegData MongoDB offers unparalleled reliability, scalability, and flexibility, making it an ideal choice for financial services. MongoDB enables financial institutions to combine operational and AI data in a unified interface and can be deployed on-premises with Enterprise Advanced or across any major cloud provider with MongoDB Atlas , multi-cloud, and hybrid cloud when needed. When combined with RegData's Protection Suite (RPS), organizations can effectively tackle the challenges of digital transformation. RPS is a cloud-native application security platform designed to protect sensitive data through advanced techniques such as encryption, anonymization, and tokenization. Figure 1. Simplified architecture of the RPS solution. Key Features of RegData Protection Suite: Core Configuration: Provides services and a user interface to configure the protection of data. RPS Engine: A sophisticated core engine equipped with various data protection tools. This module is the heart of the application and is responsible for all data protection. Consists of encryption, anonymization, tokenization, and pseudonymization RPS Reporting: A vital component focused on data protection oversight. It gathers and analyzes information on the business application activities protected by RPS to generate a range of valuable reports RPS Manager: Provides end-to-end monitoring capabilities for the components of the RPS platform. RPS Integration: RPS seamlessly integrates with various applications, ensuring that sensitive data is protected across diverse environments. The synergy between MongoDB and RegData shines through in practical applications. For instance, a private bank can leverage hybrid cloud deployments to modernize its operations while maintaining data security. By utilizing RPS, the bank can protect sensitive information during cloud migrations and ensure compliance with regulatory requirements. Additionally, as financial institutions explore outsourcing, RPS helps mitigate risks by anonymizing sensitive data, allowing organizations to maintain control over their data even when leveraging external service providers. Embracing a zero-trust approach for gen AI applications With the rise of AI (and particularly gen AI), banks are developing increasingly more AI- and gen AI-powered applications. While on-premise AI/gen AI model development and testing provides a high level of data security and confidentiality, it may not be within the bank’s budget to afford a production-grade GPU compute pool or one that is large enough to offer sufficient scalability and economy of scale. With this dilemma, banks have begun developing models in private clouds and then deploying on the public cloud to leverage its scalability and economy of scale. MongoDB can serve as that unified operational data layer for a variety of data sources, structured, semi-structured, or unstructured that may also come in different forms (eg. tabular, geospatial, network graph, time series, etc.) for the model development, training, fine-tuning and/or testing. When the model is tested and found to be working, it can then be deployed to the public cloud to serve the AI/gen AI applications. The figure below shows the high-level architecture of how a private bank implemented its gen AI application with MongoDB and RPS. Figure 2. Gen AI data flow architecture focused on data protection. The road to modernization As financial institutions navigate the complexities of digital transformation, the partnership between MongoDB and RegData offers a robust solution for securing data. By adopting a comprehensive data protection strategy, organizations can innovate confidently while ensuring compliance with regulatory standards. Embracing these technologies not only enhances data security but also paves the way for a more resilient and agile financial sector. Establishing a robust data architecture with a modern data platform like MongoDB Atlas enables financial institutions to effectively modernize by consolidating and analyzing data in any format in real-time, driving value-added services and features to consumers while ensuring privacy and security concerns are adequately addressed with built-in security controls across all data. Whether managed in a customer environment or through MongoDB Atlas, a fully managed cloud service, MongoDB ensures robust security with features such as authentication (single sign-on and multi-factor authentication), role-based access controls, and comprehensive data encryption. These security measures act as a safeguard for sensitive financial data, mitigating the risk of unauthorized access from external parties and providing organizations with the confidence to embrace AI and ML technologies. Are you prepared to harness these capabilities for your projects or have any questions about this? Then please reach out to us at industry.solutions@mongodb.com or nfo@regdata.ch . You can also take a look at the following resources: RegData & MongoDB: Securing Digital Transformation Streamline Data Control and Compliance with RegData & MongoDB Implementing an Operational Data Layer

January 23, 2025