Building AI with MongoDB: How VISO TRUST is Transforming Cyber Risk Intelligence

Mat Keep and Elliott Gluck
September 5, 2023 | Updated: August 8, 2024
#genAI #Vector Search

NOTE: We have an updated VISO TRUST story, which goes into the latest and greatest with the VISO TRUST team. You can read the latest customer story here.

Since announcing MongoDB Atlas Vector Search preview availability back in June, we’ve seen rapid adoption from developers building a wide range of AI-enabled apps. Today we're going to talk to one of these customers.

VISO TRUST puts reliable, comprehensive, actionable vendor security information directly in the hands of decision-makers who need to make informed risk assessments. The company uses state-of-the-art models from Amazon Bedrock, augmented by vector search and retrieval from MongoDB Atlas.

We sat down with Pierce Lamb, Senior Software Engineer on the Data and Machine Learning team at VISO TRUST to learn more.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

Tell us a little bit about your company. What are you trying to accomplish and how that benefits your customers or society more broadly?

VISO TRUST is an AI-powered third-party cyber risk and trust platform that enables any company to access actionable vendor security information in minutes. VISO TRUST delivers the fast and accurate intelligence needed to make informed cybersecurity risk decisions at scale for companies at any maturity level.

Our commitment to innovation means that we are constantly looking for ways to optimize business value for our customers. VISO TRUST ensures that complex business-to-business (B2B) transactions adequately protect the confidentiality, integrity, and availability of trusted information. VISO TRUST’s mission is to become the largest global provider of cyber risk intelligence and become the intermediary for business transactions. Through the use of VISO TRUST, customers will reduce their threat surface in B2B transactions with vendors and thereby reduce the overall risk posture and potential security incidents like breaches, malicious injections, and more.

Today VISO TRUST has many great enterprise customers like InstaCart, Gusto, and Upwork and they all say the same thing: 90% less work, 80% reduction in time to assess risk, and near 100% vendor adoption. Because it’s the only approach that can deliver accurate results at scale, for the first time, customers are able to gain complete visibility into their entire third-party populations and take control of their third-party risk.

Describe what your application does and what role AI plays in it

The VISO TRUST Platform approach uses patented, proprietary machine learning and a team of highly qualified third-party risk professionals to automate this process at scale.

Simply put, VISO TRUST automates vendor due diligence and reduces third-party at scale. And security teams can stop chasing vendors, reading documents, or analyzing spreadsheets.

Screenshot of the VISO Trust dashboard displaying risk metrics, such as average residual risk and average control coverage. — Figure 1: VISO TRUST is the only SaaS third-party cyber risk management platform that delivers the rapid security intelligence needed for modern companies to make critical risk decisions early in the procurement process

VISO TRUST Platform easily engages third parties, saving everyone time and resources. In a 5-minute web-based session third parties are prompted to upload relevant artifacts of the security program that already exists and our supervised AI – we call Artifact Intelligence – does the rest.

Security artifacts that enter VISO’s Artifact Intelligence pipeline interact with AI/ML in three primary ways. First, VISO deploys discriminator models that produce high-confidence predictions about features of the artifact. For example, one model performs artifact classification, another detects organizations inside the artifact, another predicts which pages are likely to contain security controls, and more. Our modules reference a comprehensive set of over 25 security frameworks and use document heuristics and natural language processing to analyze any written material and extract all relevant control information.

Secondly, artifacts have text content parsed out of them in the form of sentences, paragraphs, headers, table rows, and more; these text blobs are embedded and stored in MongoDB Atlas to become part of our dense retrieval system. This dense retrieval system performs retrieval-augmented generation (RAG) using MongoDB features like Atlas Vector Search to provide ranked context to large language model (LLM) prompts.

Thirdly, we use RAG results to seed LLM prompts and chain together their outputs to produce extremely accurate factual information about the artifact in the pipeline. This information is able to provide instant intelligence to customers that previously took weeks to produce.

VISO TRUST’s risk model analyzes your risk and delivers a complete assessment that provides everything you need to know to make qualified risk decisions about the relationship. In addition, the platform continuously monitors and reassesses third-party vendors to ensure compliance.

What specific AI/ML techniques, algorithms, or models are utilized in your application?

For our discriminator models, we research the state-of-the-art pre-trained models (typically narrowed by those contained in HuggingFace’s transformers package) and perform fine-tuning of these models using our dataset.

For our dense retrieval system, we use MongoDB Atlas Vector Search which internally uses the Hierarchical Navigable Small Worlds algorithm to retrieve similar embeddings to embedded text content. We have plans to perform a re-ranking of these results as well.

Today we are using Amazon Bedrock exclusively for all GenAI needs. In the past we have used OpenAI and Anthropic and tested Vertex/Bard. We are now exclusively on Bedrock and do not use any other subservicers.

Can you describe other AI technologies used in your application stack?

Some of the other frameworks we use are HuggingFace transformers, evaluate, accelerate, and Datasets, PyTorch, WandB, and Amazon Sagemaker. We have a library for ML experiments (fine-tuning) that is custom-built, a library for workflow orchestration that is custom-built, and all of our prompt engineering is custom-built.

Why did you choose MongoDB as part of your application stack? Which MongoDB features are you using and where are you running MongoDB?

The VISO TRUST Platform relies on effective solutions and tools like MongoDB's distinctive attributes to fulfill specific objectives. MongoDB supports our platform's mechanism to engage third parties efficiently, employing both AI and human oversight to automate the assessment of security artifacts at scale.

The fundamental value proposition of MongoDB – a robust document database – is why we originally chose it. It was originally deployed as a storage/retrieval mechanism for all the factual information our artifact intelligence pipeline produces about artifacts. While it still performs this function today, it has now become our “vector/metadata database.”

MongoDB executes fast ranking of large quantities of embedded text blobs for us while Atlas provides us with all the ease-of-use of a cloud-ready database. We use both the Atlas search index visualization, and the query profiler visualization daily. Even just the basic display of a few documents in collections often saves time. Finally, when we recently backfilled embeddings across one of our MongoDB deployments, Atlas would automatically provision more disk space for large indexes without us needing to be around which was incredibly helpful.

What are the benefits you've achieved by using MongoDB?

I would say there are two primary benefits that have greatly helped us with respect to MongoDB and Atlas. First, MongoDB was already a place where we were storing metadata about artifacts in our system; with the introduction of Atlas Vector Search now we have a comprehensive vector/metadata database – that’s been battle-tested over a decade – that solves our dense retrieval needs. No need to deploy a new database we have to manage and learn. Our vectors and artifact metadata can be stored right next to each other.

Second, Atlas has been helpful in making all the painful parts of database management easy. Creating indexes, provisioning capacity, alerting slow queries, visualizing data, and much more have saved us time and allowed us to focus on more important things.

What are your future plans for new applications and how does MongoDB fit into them?

Retrieval-augmented generation is going to continue to be a first-class feature of our application. In this regard, the evolution of Atlas Vector Search and its ecosystem in MongoDB will be highly relevant to us. MongoDB has become the database our ML team uses, so as our ML footprint expands, our use of MongoDB will expand.

Getting started

Thanks so much to Pierce for sharing details on VISO TRUST’s AI-powered applications and experiences with MongoDB.

The best way to get started with Atlas Vector Search is to head over to the product page or our quick-start guide. There you will find tutorials, documentation, and whitepapers along with the ability to sign up for MongoDB Atlas. You’ll just be a few clicks away from spinning up your own vector search engine where you can experiment with the power of vector embeddings and RAG. We’d love to see what you build, and are eager for any feedback that will make the product even better in the future!

← Previous

The Challenges and Opportunities of Processing Streaming Data

Let’s consider a fictitious bank that has a credit card offering for its customers. Transactional data might land in their database from various sources such as a REST API call from a web application or from a serverless function call made by a cash machine. Regardless of how the data was written to the database, the database performed its job and made the data available for querying by the end-user or application. The mechanics are database-specific but the end goal of all databases is the same. Once data is in a database the bank can query and obtain business value from this data. In the beginning, their architecture worked well, but over time customer usage grew and the bank found it difficult to manage the volume of transactions. The company decides to do what many customers in this scenario do and adopts an event-streaming platform like Apache Kafka to queue these event data. Kafka provides a highly scalable event streaming platform capable of managing large data volumes without putting debilitating pressure on traditional databases. With this new design, the bank could now scale supporting more customers and product offerings. Life was great until some customers started complaining about unrecognized transactions occurring on their cards. Customers were refusing to pay for these and the bank was starting to spend lots of resources figuring out how to manage these fraudulent charges. After all, by the time the data gets written into the database, and the data is batch loaded into the systems that can process the data, the user's credit card was already charged perhaps a few times over. However, hope is not lost. The bank realized that if they could query the transactional event data as it's flowing into the database they might be able to compare it with historical spending data from the user, as well as geolocation information, to make a real-time determination if the transaction was suspicious and warranted further confirmation by the customer. This ability to continuously query the stream of data is what stream processing is all about. From a developer's perspective, building applications that work with streaming data is challenging. They need to consider the following: Different serialization formats: The data that arrives in the stream may contain different serialization formats such as JSON, AVRO, Protobuf or even binary. Different schemas: Data originating from a variety of sources may contain slightly different schemas. Fields like CustomerID could be customerId from one source or CustID in another and a third could not even use the field. Late arriving data: The data itself could arrive late due to network latency issues or being completely out of order. Operational complexity: Developers need to be concerned with reacting to application state changes like failed connections to data sources and how to efficiently scale the application to meet the demands of the business. Security: In larger enterprises, the developer usually doesn’t have access to production data. This makes troubleshooting and building queries from this data difficult. Stream processing can help address these challenges and enable real-time use cases, such as fraud detection, hyper-personalization, and predictive maintenance, that are otherwise difficult or extremely costly to overcome. While many stream processing solutions exist, the flexibility of the document model and the power of the aggregation framework are naturally well suited to help developers with the challenges found with complex event data. Discover MongoDB Atlas Stream Processing Read the MongoDB Atlas Stream Processing announcement and check out Atlas Stream Processing tutorials on the MongoDB Developer Center . Request private preview access to Atlas Stream processing Request access today to participate in the private preview. New to MongoDB? Get started for free today by signing up for MongoDB Atlas .

August 30, 2023

Next →

Securing Digital Transformation with MongoDB and RegData

Data security and privacy have long been paramount to the financial industry, but they are especially critical for institutions undergoing digital transformations or those implementing new technology. For example, the integration of artificial intelligence (AI) and machine learning (ML) into organizations’ infrastructure and offerings introduces security and privacy complexities, making it all the more essential for financial organizations to safeguard sensitive information while complying with regulations. The consequences of a data breach are extensive and significantly impactful. These incidents have transformed from simple cybersecurity concerns into catalysts for financial losses, reputational harm, legal challenges, regulatory penalties, and a significant decline in consumer trust. Even with an increased focus on data security, organizations must adopt modern data architecture to effectively mitigate these risks. For example, using a database solution like MongoDB with built-in encryption, role-based access control, and audit logging can help organizations safeguard sensitive data and respond proactively to potential vulnerabilities. The challenge of data security in finance Financial institutions face numerous challenges in protecting data integrity during modernization efforts. The increasing sophistication of cyberattacks, coupled with the need to comply with evolving regulations like the General Data Protection Regulation (GDPR) and the Digital Operational Resilience Act (DORA), creates a complex environment for data management. Institutions must also navigate technical sprawl, where diverse applications and data management systems complicate compliance and operational efficiency. Addressing these challenges requires a holistic approach that integrates data protection into the core design of digital transformation initiatives. Financial institutions need to adopt robust data management practices, ensure the encryption of sensitive data, and maintain vigilant cybersecurity measures. Collaboration with trusted third-party vendors, adopting a privacy-first strategy, and complying with global data protection regulations are essential steps toward safeguarding data privacy in this rapidly evolving digital landscape. Discover how the RegData Protection Suite (RPS), built on MongoDB , enables you to balance technological advancement with regulatory requirements. The solution: MongoDB and RegData MongoDB offers unparalleled reliability, scalability, and flexibility, making it an ideal choice for financial services. MongoDB enables financial institutions to combine operational and AI data in a unified interface and can be deployed on-premises with Enterprise Advanced or across any major cloud provider with MongoDB Atlas , multi-cloud, and hybrid cloud when needed. When combined with RegData's Protection Suite (RPS), organizations can effectively tackle the challenges of digital transformation. RPS is a cloud-native application security platform designed to protect sensitive data through advanced techniques such as encryption, anonymization, and tokenization. Figure 1. Simplified architecture of the RPS solution. Key Features of RegData Protection Suite: Core Configuration: Provides services and a user interface to configure the protection of data. RPS Engine: A sophisticated core engine equipped with various data protection tools. This module is the heart of the application and is responsible for all data protection. Consists of encryption, anonymization, tokenization, and pseudonymization RPS Reporting: A vital component focused on data protection oversight. It gathers and analyzes information on the business application activities protected by RPS to generate a range of valuable reports RPS Manager: Provides end-to-end monitoring capabilities for the components of the RPS platform. RPS Integration: RPS seamlessly integrates with various applications, ensuring that sensitive data is protected across diverse environments. The synergy between MongoDB and RegData shines through in practical applications. For instance, a private bank can leverage hybrid cloud deployments to modernize its operations while maintaining data security. By utilizing RPS, the bank can protect sensitive information during cloud migrations and ensure compliance with regulatory requirements. Additionally, as financial institutions explore outsourcing, RPS helps mitigate risks by anonymizing sensitive data, allowing organizations to maintain control over their data even when leveraging external service providers. Embracing a zero-trust approach for gen AI applications With the rise of AI (and particularly gen AI), banks are developing increasingly more AI- and gen AI-powered applications. While on-premise AI/gen AI model development and testing provides a high level of data security and confidentiality, it may not be within the bank’s budget to afford a production-grade GPU compute pool or one that is large enough to offer sufficient scalability and economy of scale. With this dilemma, banks have begun developing models in private clouds and then deploying on the public cloud to leverage its scalability and economy of scale. MongoDB can serve as that unified operational data layer for a variety of data sources, structured, semi-structured, or unstructured that may also come in different forms (eg. tabular, geospatial, network graph, time series, etc.) for the model development, training, fine-tuning and/or testing. When the model is tested and found to be working, it can then be deployed to the public cloud to serve the AI/gen AI applications. The figure below shows the high-level architecture of how a private bank implemented its gen AI application with MongoDB and RPS. Figure 2. Gen AI data flow architecture focused on data protection. The road to modernization As financial institutions navigate the complexities of digital transformation, the partnership between MongoDB and RegData offers a robust solution for securing data. By adopting a comprehensive data protection strategy, organizations can innovate confidently while ensuring compliance with regulatory standards. Embracing these technologies not only enhances data security but also paves the way for a more resilient and agile financial sector. Establishing a robust data architecture with a modern data platform like MongoDB Atlas enables financial institutions to effectively modernize by consolidating and analyzing data in any format in real-time, driving value-added services and features to consumers while ensuring privacy and security concerns are adequately addressed with built-in security controls across all data. Whether managed in a customer environment or through MongoDB Atlas, a fully managed cloud service, MongoDB ensures robust security with features such as authentication (single sign-on and multi-factor authentication), role-based access controls, and comprehensive data encryption. These security measures act as a safeguard for sensitive financial data, mitigating the risk of unauthorized access from external parties and providing organizations with the confidence to embrace AI and ML technologies. Are you prepared to harness these capabilities for your projects or have any questions about this? Then please reach out to us at industry.solutions@mongodb.com or nfo@regdata.ch . You can also take a look at the following resources: RegData & MongoDB: Securing Digital Transformation Streamline Data Control and Compliance with RegData & MongoDB Implementing an Operational Data Layer

January 23, 2025