Paul Claret

3 results

Better Business Loans with MongoDB and Generative AI

Business loans are a cornerstone of banking operations, providing significant benefits to both financial institutions and broader economies. For example, in 2023 the value of commercial and industrial loans in the United States reached nearly $2.8 trillion . However, these loans can present unique challenges and risks that banks must navigate. Besides credit risk, where the borrower may default, banks also face business risk, in which economic downturns or sector-specific declines can impact borrowers' ability to repay loans. In this post, we dive into the potential of generative AI to generate detailed risk assessments for business loans, and how MongoDB’s multimodal features can be leveraged for comprehensive and multidimensional risk analyses. The critical business plan A business plan is essential for a business loan as it serves as a comprehensive roadmap detailing the borrower's plans, strategies, and financial projections. It helps lenders understand the business's goals, viability, and profitability, demonstrating how the loan will be used for growth and repayment. A detailed business plan includes market analysis, competitive positioning, operational plans, and financial forecasts which build a compelling case for the lender's investment and the business’s ability to manage risks effectively, increasing the likelihood of securing the loan. Reading through borrower credit information and detailed business plans (roughly 15-20 pages long ) poses significant challenges for loan officers due to time constraints, the material’s complexity, and the difficulty of extracting key metrics from detailed financial projections, market analyses, and risk factors. Navigating technical details and industry-specific jargon can also be challenging and require specialized knowledge. Identifying critical risk factors and mitigation strategies only adds further complexity along with ensuring accuracy and consistency among loan officers and approval committees. To overcome these challenges, gen AI can assist loan officers by efficiently analyzing business plans, extracting essential information, identifying key risks, and providing consistent interpretations, thereby facilitating informed decision-making. Assessing loans with gen AI Interactive risk analysis with gen AI-powered chatbots Gen AI can help analyze business plans when built on a flexible developer data platform like MongoDB Atlas . One approach is implementing a gen AI-powered chatbot that allows loan officers to "discuss" the business plan. The chatbot can analyze the input and provide insights on the various risks associated with lending to the borrower for the proposed business. MongoDB sits at the heart of many customer support applications due to its flexible data model that makes it easy to build a single, 360-degree view of data from a myriad of siloed backend source systems. Figure 1 below shows an example of how ChatGPT-4o responds when asked to assess the risk of a business loan. Although the input of the loan purpose and business description is simplistic, gen AI can offer a detailed analysis. Figure 1: Example of how ChatGPT-4o could respond when asked to assess the risk of a business loan Hallucinations or ignorance? By applying gen AI to risk assessments, lenders can explore additional risk factors that gen AI can evaluate. One factor could be the risk of natural disasters or broader climate risks. In Figure 2 below, we added flood risk specifically as a factor to the previous question to see what the ChatGPT4-o comes back with. Figure 2: Example of how ChatGPT-4o responded to flood risk as a factor Based on the above, there is a low risk of flooding. To validate this, we asked ChatGPT-4o the question differently, focusing on its knowledge of flood data. It suggested reviewing FEMA flood maps and local flood history, indicating it might not have the latest information. Figure 3: Asking location-specific flood questions In the query shown in Figure 3 above, ChatGPT gave an opposite answer and indicated there is “significant flooding” providing references to flood evidence after having performed an internet search across 4 sites which it did not perform previously. From this example, we can see that when ChatGPT does not have the relevant data, it starts to make false claims, which can be considered hallucinations. Initially, it indicated a low flood risk due to a lack of information. However, when specifically asked about flood risk in the second query, it suggested reviewing external sources like FEMA flood maps, recognizing its limitations and need for external validation. Gen AI-powered chatbots can recognize and intelligently seek additional data sources to fill their knowledge gaps. However, a causal web search won’t provide the level of detail required. Retrieval-augmented generation-assisted risk analysis The promising example above demonstrates the experience of how gen AI can augment loan officers to analyze business loans. However, interacting with a gen AI chatbot relies on loan officers repeatedly prompting and augmenting the context with relevant information. This can be time-consuming and impractical due to the lack of prompt engineering skills or the lack of data needed. Below is a simplified solution of how gen AI can be used to augment the risk analysis process to fill the knowledge gap of the LLM. This demo uses MongoDB as an operational data store leveraging geospatial queries to find out the floods within 5km of the proposed business location. The prompting for this risk analysis highlights the analysis of the flood risk assessment rather than the financial projections. A similar test was performed on Llama 3 , hosted by our MAAP partner Fireworks.AI . It tested the model’s knowledge of flood data showing a similar knowledge gap as ChatGPT-4o. Interestingly, rather than providing misleading answers, LLama 3 provided a “hallucinated list of flood data,” but highlighted that “this data is fictional and for demonstration purposes only. In reality, you would need to access reliable sources such as FEMA's flood data or other government agencies' reports to obtain accurate information.” Figure 4: LLM’s response with Fictional flood locations With this consistent demonstration of the knowledge gap in the LLMs in specialized areas, it reinforces the need to explore how RAG (retrieval-augmented generation) with a multimodal data platform can help. In this simplified demo, you select a business location, a business purpose, and a description of a business plan. To make inputs easier, an “Example” button has been added to leverage gen AI to generate a sample brief business description to avoid the need to key in the description template from scratch. Figure 5: Choosing a location on the map and writing a brief plan description Upon submission, it will provide an analysis using RAG with the appropriate prompt engineering to provide a simplified analysis of the business with the consideration of the location and also the flood risk earlier downloaded from external flood data sources. Figure 6: Loan risk response using RAG In the Flood Risk Assessment section, gen AI-powered geospatial analytics enable loan officers to quickly understand historical flood occurrences and identify the data sources. You can also reveal all the sample flood locations within the vicinity of the business location selected by clicking on the “Pin” icon. The geolocation pins include the flood location and the blue circle indicates the 5km radius in which flood data is queried, using a simple geospatial command $geoNear . Figure 7: Flood locations displayed with pins The following diagram provides a logical architecture overview of the RAG data process implemented in this solution highlighting the different technologies used including MongoDB, Meta Llama 3, and Fireworks.AI. Figure 8: RAG data flow architecture diagram With MongoDB's multimodal capabilities, developers can enhance the RAG process by utilizing features such as network graphs, time series, and vector search. This enriches the context for the gen AI agent, enabling it to provide more comprehensive and multidimensional risk analysis through multimodal analytics. Building risk assessments with MongoDB When combined with RAG and a multimodal developer data platform like MongoDB Atlas , gen AI applications can provide more accurate and context-aware insights to reduce hallucination and offer profound insights to augment a complex business loan risk assessment process. Due to the iterative nature of the RAG process, the gen AI model will continually learn and improve from new data and feedback, leading to increasingly accurate risk assessments and minimizing hallucinations. A multimodal data platform would allow you to fully maximize the capabilities of the multimodal AI models. Head over to our quick-start guide to get started with Atlas Vector Search today. If you would like to discover how MongoDB can help you on this multimodal gen AI application journey, we encourage you to apply for an exclusive innovation workshop with MongoDB's industry experts to explore bespoke modern app development and tailored solutions to your organization. Additionally, you can enjoy these resources: Solution GitHub: Loan Risk Assessor How Leading Industries are Transforming with AI and MongoDB Atlas Accelerate Your AI Journey with MongoDB’s AI Applications Program The MongoDB Solutions Library is curated with tailored solutions to help developers kick-start their projects

August 22, 2024

Anti-Money Laundering and Fraud Prevention With MongoDB Vector Search and OpenAI

Fraud and anti-money laundering (AML) are major concerns for both businesses and consumers, affecting sectors like financial services and e-commerce. Traditional methods of tackling these issues, including static, rule-based systems and predictive artificial intelligence (AI) methods, work but have limitations, such as lack of context and feature engineering overheads to keeping the models relevant, which can be time-consuming and costly. Vector search can significantly improve fraud detection and AML efforts by addressing these limitations, representing the next step in the evolution of machine learning for combating fraud. Any organization that is already benefiting from real-time analytics will find that this breakthrough in anomaly detection takes fraud and AML detection accuracy to the next level. In this post, we examine how real-time analytics powered by Atlas Vector Search enables organizations to uncover deeply hidden insights before fraud occurs. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. The evolution of fraud and risk technology Over the past few decades, fraud and risk technology have evolved in stages, with each stage building upon the strengths of previous approaches while also addressing their weaknesses: Risk 1.0: In the early stages (the late 1990s to 2010), risk management relied heavily on manual processes and human judgment, with decision-making based on intuition, past experiences, and limited data analysis. Rule-based systems emerged during this time, using predefined rules to flag suspicious activities. These rules were often static and lacked adaptability to changing fraud patterns . Risk 2.0: With the evolution of machine learning and advanced analytics (from 2010 onwards), risk management entered a new era with 2.0. Predictive modeling techniques were employed to forecast future risks and detect fraudulent behavior. Systems were trained on historical data and became more integrated, allowing for real-time data processing and the automation of decision-making processes. However, these systems faced limitations such as, Feature engineering overhead: Risk 2.0 systems often require manual feature engineering. Lack of context: Risk 1.0 and Risk 2.0 may not incorporate a wide range of variables and contextual information. Risk 2.0 solutions are often used in combination with rule-based approaches because rules cannot be avoided. Companies have their business- and domain-specific heuristics and other rules that must be applied. Here is an example fraud detection solution based on Risk 1.0 and Risk 2.0 with a rules-based and traditional AI/ML approach. Risk 3.0: The latest stage (2023 and beyond) in fraud and risk technology evolution is driven by vector search. This advancement leverages real-time data feeds and continuous monitoring to detect emerging threats and adapt to changing risk landscapes, addressing the limitations of data imbalance, manual feature engineering, and the need for extensive human oversight while incorporating a wider range of variables and contextual information. Depending on the particular use case, organizations can combine or use these solutions to effectively manage and mitigate risks associated with Fraud and AML. Now, let us look into how MongoDB Atlas Vector Search (Risk 3.0) can help enhance existing fraud detection methods. How Atlas Vector Search can help A vector database is an organized collection of information that makes it easier to find similarities and relationships between different pieces of data. This definition uniquely positions MongoDB as particularly effective, rather than using a standalone or bolt-on vector database. The versatility of MongoDB’s developer data platform empowers users to store their operational data, metadata, and vector embeddings on MongoDB Atlas and seamlessly use Atlas Vector Search to index, retrieve, and build performant gen AI applications. Watch how you can revolutionize fraud detection with MongoDB Atlas Vector Search. The combination of real-time analytics and vector search offers a powerful synergy that enables organizations to discover insights that are otherwise elusive with traditional methods. MongoDB facilitates this through Atlas Vector Search integrated with OpenAI embedding, as illustrated in Figure 1 below. Figure 1: Atlas Vector Search in action for fraud detection and AML Business perspective: Fraud detection vs. AML Understanding the distinct business objectives and operational processes driving fraud detection and AML is crucial before diving into the use of vector embeddings. Fraud Detection is centered on identifying unauthorized activities aimed at immediate financial gain through deceptive practices. The detection models, therefore, look for specific patterns in transactional data that indicate such activities. For instance, they might focus on high-frequency, low-value transactions, which are common indicators of fraudulent behavior. AML , on the other hand, targets the complex process of disguising the origins of illicitly gained funds. The models here analyze broader and more intricate transaction networks and behaviors to identify potential laundering activities. For instance, AML could look at the relationships between transactions and entities over a longer period. Creation of Vector Embeddings for Fraud and AML Fraud and AML models require different approaches because they target distinct types of criminal activities. To accurately identify these activities, machine learning models use vector embeddings tailored to the features of each type of detection. In this solution highlighted in Figure 1, vector embeddings for fraud detection are created using a combination of text, transaction, and counterparty data. Conversely, the embeddings for AML are generated from data on transactions, relationships between counterparties, and their risk profiles. The selection of data sources, including the use of unstructured data and the creation of one or more vector embeddings, can be customized to meet specific needs. This particular solution utilizes OpenAI for generating vector embeddings, though other software options can also be employed. Historical vector embeddings are representations of past transaction data and customer profiles encoded into a vector format. The demo database is prepopulated with synthetically generated test data for both fraud and AML embeddings. In real-world scenarios, you can create embeddings by encoding historical transaction data and customer profiles as vectors. Regarding the fraud and AML detection workflow , as shown in Figure 1, incoming transaction fraud and AML aggregated text are used to generate embeddings using OpenAI. These embeddings are then analyzed using Atlas Vector Search based on the percentage of previous transactions with similar characteristics that were flagged for suspicious activity. In Figure 1, the term " Classified Transaction " indicates a transaction that has been processed and categorized by the detection system. This classification helps determine whether the transaction is considered normal, potentially fraudulent, or indicative of money laundering, thus guiding further actions. If flagged for fraud: The transaction request is declined. If not flagged: The transaction is completed successfully, and a confirmation message is shown. For rejected transactions, users can contact case management services with the transaction reference number for details. No action is needed for successful transactions. Combining Atlas Vector Search for fraud detection With the use of Atlas Vector Search with OpenAI embeddings, organizations can: Eliminate the need for batch and manual feature engineering required by predictive (Risk 2.0) methods. Dynamically incorporate new data sources to perform more accurate semantic searches, addressing emerging fraud trends. Adopt this method for mobile solutions, as traditional methods are often costly and performance-intensive. Why MongoDB for AML and fraud prevention Fraud and AML detection require a holistic platform approach as they involve diverse data sets that are constantly evolving. Customers choose MongoDB because it is a unified data platform (as shown in Figure 2 below) that eliminates the need for niche technologies, such as a dedicated vector database. What’s more, MongoDB’s document data model incorporates any kind of data—any structure (structured, semi-structured, and unstructured), any format, any source—no matter how often it changes, allowing you to create a holistic picture of customers to better predict transaction anomalies in real time. By incorporating Atlas Vector Search, institutions can: Build intelligent applications powered by semantic search and generative AI over any type of data. Store vector embeddings right next to your source data and metadata. Vectors inserted or updated in the database are automatically synchronized to the vector index. Optimize resource consumption, improve performance, and enhance availability with Search Nodes . Remove operational heavy lifting with the battle-tested, fully managed MongoDB Atlas developer data platform. Figure 2: Unified risk management and fraud detection data platform Given the broad and evolving nature of fraud detection and AML, these areas typically require multiple methods and a multimodal approach. Therefore, a unified risk data platform offers several advantages for organizations that are aiming to build effective solutions. Using MongoDB, you can develop solutions for Risk 1.0, Risk 2.0, and Risk 3.0, either separately or in combination, tailored to meet your specific business needs. The concepts are demonstrated with two examples: a card fraud solution accelerator for Risk 1.0 and Risk 2.0 and a new Vector Search solution for Risk 3.0, as discussed in this blog. It's important to note that the vector search-based Risk 3.0 solution can be implemented on top of Risk 1.0 and Risk 2.0 to enhance detection accuracy and reduce false positives. If you would like to discover more about how MongoDB can help you supercharge your fraud detection systems, take a look at the following resources: Revolutionizing Fraud Detection with Atlas Vector Search Card Fraud solution accelerator (Risk 1.0 and Risk 2.0) Risk .o AML and Fraud detection solution GitHub respository Add vector search to your arsenal for more accurate and cost-efficient RAG applications by enrolling in the DeepLearning.AI course " Prompt Compression and Query Optimization " for free today.

July 17, 2024

RegData & MongoDB: Streamline Data Control and Compliance

While navigating the requirements of keeping data secure in highly regulated markets, organizations can find themselves entangled in a web of costly and complex IT systems. Whether it's the GDPR safeguarding European personal data or the Monetary Authority of Singapore's guidelines on outsourcing and cloud computing , the greater the number of regulations organizations are subjected to, particularly across multiple geographical locations, the more intricate their IT infrastructure becomes, and organizations today face the challenge of adapting immediately or facing the consequences. In addition to regulations, customer expectations have become a major driver for innovation and modernization. In the financial sector, for example, customers demand a fast and convenient user experience with real-time access to transaction info, a fully digitized mobile-first experience with mobile banking, and personalization and accessibility for their specific needs. While these sorts of expectations have become the norm, they conflict with the complex infrastructures of modern financial institutions. Many financial institutions are saddled with legacy infrastructure that holds them back from adapting quickly to changing market conditions. Established financial institutions must find a way to modernize, or they risk losing market share to nimble challenger banks with cost-effective solutions. The banking market today is increasingly populated with nimble fintech companies powered by smaller and more straightforward IT systems, which makes it easier for them to pivot quickly. In contrast, established institutions often operate across borders, meaning they must adhere to a greater number of regulations. Modernizing these complex systems requires the simultaneous introduction of new, disruptive technology without violating any regulatory constraints, akin to driving a car while changing a tire. The primary focus for established banks is safeguarding existing systems to ensure compliance with regulatory constraints while prioritizing customer satisfaction and maintaining smooth operations as usual. RegData: Compliance without risk Multi-cloud application security platform, RegData embraces this challenge head-on. RegData has expertise across a number of highly regulated markets, from healthcare to public services, human resources, banking, and finance. The company’s mission is clear—delivering a robust, auditable, and confidential data protection platform within their comprehensive RegData Protection Suite (RPS), built on MongoDB. RegData provides its customers with more than 120 protection techniques , including 60 anonymization techniques, as well as custom techniques (protection of IBANs, SSNs, emails, etc), giving them total control over how sensitive data is managed within each organization. For example, by working with RegData, financial institutions can configure their infrastructure to specific regulations, by masking, encrypting, tokenizing, anonymizing, or pseudonymizing data into compliance. With RPS, company-wide reports can be automatically generated for the regulating authorities (i.e., ACPR, ECB, EU-GDPR, FINMA, etc.). To illustrate the impact of RPS, and to debunk some common misconceptions, let’s explore before and after scenarios. Figure 1 shows the decentralized management of access control. Some data sources employ features such as Field Level Encryption (FLE) to shield data, restricting access to individuals with the appropriate key. Additionally, certain applications implement Role-Based Access Control (RBAC) to regulate data access within the application. Some even come with an Active Directory (AD) interface to try and centralize the configuration. Figure 1: Simplified architecture with no centralized access control However, each of these only addresses parts of the challenge related to encrypting the actual data and managing single-system access. Neither FLE nor RBAC can protect data that isn’t on their data source or application. Even centralizing efforts like the AD interface excludes older legacy systems that might not have interfacing functionalities. The result in all of these cases is a mosaic of different configurations in which silos stay silos, and modernization is risky and slow because the data may or may not be protected. RegData, with its RPS solution, can integrate with a plethora of different data sources as well as provide control regardless of how data is accessed, be it via the web, APIs, files, emails, or others. This allows organizations to configure RPS at a company level. All applications including silos can and should interface with RPS to protect all of the data with a single global configuration. Another important aspect of RPS is its functions with tokenization, allowing organizations to decide which columns or fields from a given data source should be encrypted according to specific standards and govern the access to corresponding tokens. Thanks to tokenization, RPS can track who accesses what data and when they access it at a company level, regardless of the data source or the application. This is easy enough to articulate but quite difficult to execute at a data level. To efficiently manage diverse data sources, fine-grained authorization, and implement different protection techniques, RegData builds RPS on top of MongoDB's flexible and document-oriented database. The road to modernization As noted, to fully leverage RegData’s RPS, all data sources should go through the RPS. RPS works like a data filter, putting in all of the information and extracting protected data on the other side, to modernize and innovate. Just integrating RegData means being able to make previously siloed data available by masking, encrypting, or anonymizing it before sending it out to other applications and systems. Together, RegData and MongoDB form a robust and proven solution for protecting data and modernizing operations within highly regulated industries. The illustration below shows the architecture of a private bank utilizing RPS. Data can only be seen in plain text to database admins when the request comes from the company’s headquarters. This ensures compliance with regulations, while still being able to query and search for data outside the headquarters. This bank goes a step further by migrating their Customer Relationship Management (CRM), core banking, Portfolio Management System (PMS), customer reporting, advisory, tax reporting, and other digital apps into the public cloud. This is achieved while still being compliant and able to automatically generate submittable audit reports to regulating authorities. Figure 2: Private bank business care Another possible modernization scheme—given RegData’s functionalities—is a hybrid cloud Operational Data Layer (ODL), using MongoDB Atlas . This architectural pattern acts as a bridge between consuming applications and legacy solutions. It centrally integrates and organizes siloed enterprise data, rendering it easily available. Its purpose is to offload legacy systems by providing alternative access to information for consuming applications, thereby breaking down data silos, decreasing latency, allowing scalability, flexibility, and availability, and ultimately optimizing operational efficiency and facilitating modernization. RegData integrates, protects, and makes data available, while MongoDB Atlas provides its inherent scalability, flexibility, and availability to empower developers to offload legacy systems. Figure 3: Example of ODL with both RegData and MongoDB In conclusion, in a world where finding the right solutions can be difficult, RegData provides a strategic solution for financial institutions to securely modernize. By combining RegData's regulatory protection and modern cloud platforms such as MongoDB Atlas, the collaboration takes on the modernizing challenge of highly regulated sectors. Are you prepared to harness these capabilities for your projects? Do you have any questions about this? Then please reach out to us at industry.solutions@mongodb.com or info@regdata.ch You can also take a look at the following resources: Hybrid Cloud: Flexible Architecture for the Future of Financial Services Implementing an Operational Data Layer

February 29, 2024