AnnouncementIntroducing MongoDB 8.0, the fastest MongoDB ever! Read more >>Introducing MongoDB 8.0, the fastest MongoDB ever! >>

Solutions

Assessing Business Loan Risks with MongoDB and Generative AI

Learn how generative AI can generate detailed risk assessments and how MongoDB’s multimodal features enable comprehensive and multidimensional loan risk analysis.
Start Free
Coin box illustration.
Solution Overview

Business loans are a cornerstone of banking operations, providing significant benefits to both financial institutions and broader economies. For example, in 2023 the value of commercial and industrial loans in the United States reached nearly $2.8 trillion. However, these loans can present unique challenges and risks that banks must navigate. Besides credit risk, where the borrower may default, banks also face business risk, in which economic downturns or sector-specific declines can impact borrowers' ability to repay loans. This solution dives into the potential of generative AI (gen AI) to facilitate detailed risk assessments for business loans, and how MongoDB’s multimodal features can be leveraged for comprehensive and multidimensional risk analyses.

The code to demonstrate all the features of MongoDB for building this solution is available in the following GitHub repo.

The Critical Business Plan

A business plan is essential for securing a business loan as it serves as a comprehensive roadmap detailing the borrower's plans, strategies, and financial projections. It helps lenders evaluate the business's goals, viability, and profitability, demonstrating how the loan will be used for growth and repayment. A detailed business plan includes market analysis, competitive positioning, operational plans, and financial forecasts that build a compelling case for the lender's investment and the business’s ability to manage risks effectively, increasing the likelihood of securing the loan.

Reading through borrower credit information and detailed business plans (roughly 15-20 pages long) poses significant challenges for loan officers due to time constraints, the material’s complexity, and the difficulty of extracting key metrics from detailed financial projections, market analyses, and risk factors. Navigating technical details and industry-specific jargon can also be challenging and require specialized knowledge. Identifying critical risk factors and mitigation strategies only adds further complexity along with ensuring accuracy and consistency among loan officers and approval committees.

To overcome these challenges, gen AI can assist loan officers by efficiently analyzing business plans, extracting essential information, identifying key risks, and providing consistent interpretations, thereby facilitating informed decision-making.

Assessing Loans with Generative AI
Interactive risk analysis with gen AI-powered chatbots

Figure 1 below shows an example of how ChatGPT-4o responds when asked to assess the risk of a business loan. Although the input of the loan purpose and business description is simplistic, gen AI can offer a detailed analysis.

Example of how ChatGPT-4o could respond when asked to assess the risk of a business loan
Figure 1: Example of how ChatGPT-4o could respond when asked to assess the risk of a business loan
Hallucinations or ignorance?

By applying gen AI to risk assessments, lenders can explore additional risk factors that gen AI can evaluate. One factor could be the risk of natural disasters or broader climate risks. In Figure 2, we added flood risk specifically as a factor to the previous question to see what the ChatGPT4-o comes back with.

Example of how ChatGPT-4o responded to flood risk as a factor
Figure 2: Example of how ChatGPT-4o responded to flood risk as a factor
Based on the response, there is a low risk of flooding. To validate this, we asked ChatGPT-4 the question differently, focusing on its knowledge of flood data. (See Figure 3.) It suggested reviewing FEMA flood maps and local flood history, indicating it might not have the latest information.
Asking location-specific flood questions
Figure 3: Asking location-specific flood questions

In the query shown, ChatGPT gave an opposite answer and indicated there is “significant flooding” providing references to flood evidence after having performed an internet search across four sites, which it did not perform previously.

From this example, we can see that when ChatGPT does not have the relevant data, it starts to make false claims that can be considered hallucinations. Initially, it indicated a low flood risk due to a lack of information. However, when specifically asked about flood risk in the second query, it suggested reviewing external sources like FEMA flood maps, recognizing its limitations and need for external validation.

Gen AI-powered chatbots can recognize and intelligently seek additional data sources to fill their knowledge gaps. However, a causal web search won’t provide the level of detail required.

Retrieval-augmented generation (RAG) risk analysis

The promising example above demonstrates the experience of how gen AI can augment the expertise of loan officers for analyzing business loans. However, interacting with a gen AI chatbot relies on loan officers repeatedly prompting and augmenting the context with relevant information. This can be time-consuming and impractical due to the lack of prompt engineering skills or the lack of data needed.

Below is a simplified solution of how gen AI can be used to augment the risk analysis process and fill the knowledge gap of the LLM. This demo uses MongoDB as an operational data store leveraging geospatial queries to discover floods within five kilometers of the proposed business location. The prompting for this risk analysis highlights the analysis of the flood risk assessment rather than the financial projections.

A similar test was performed on Llama 3, hosted by our MAAP partner Fireworks.AI. It tested the model’s knowledge of flood data showing a similar knowledge gap as ChatGPT-4o. Interestingly, rather than providing misleading answers, LLama 3 provided a “hallucinated list of flood data,” but highlighted that “this data is fictional and for demonstration purposes only. In reality, you would need to access reliable sources such as FEMA's flood data or other government agencies' reports to obtain accurate information.”

LLM’s response with fictional flood locations
Figure 4: LLM’s response with fictional flood locations

With this consistent demonstration of the knowledge gap in the LLMs for specialized areas, it reinforces the need to explore how RAG (retrieval-augmented generation) with a multimodal data platform can help.

In this simplified demo, you select a business location, a business purpose, and a description of a business plan. To make inputs easier, an “Example” button has been added to leverage gen AI to generate a sample brief business description to avoid the need to type the description template from scratch.

Choosing a location on the map and writing a brief plan description
Figure 5: Choosing a location on the map and writing a brief plan description
Upon submission, it will provide an analysis using RAG with the appropriate prompt engineering to provide a simplified analysis of the business with the consideration of the location and also the flood risk earlier downloaded from external flood data sources.
Loan risk response using RAG
Figure 6: Loan risk response using RAG

In the Flood Risk Assessment section, gen AI-powered geospatial analytics enable loan officers to quickly discover historical flood occurrences and identify the data sources.

You can also reveal all the sample flood locations within the vicinity of the business location selected by clicking on the “Pin” icon. The geolocation pins include the flood location and the blue circle indicates the five-kilometer radius in which flood data is queried.

Flood locations displayed with pins
Figure 7: Flood locations displayed with pins

To illustrate the ease in fetching the flood locations (using the flood data containing geo-locations loaded to MongoDB) around a given coordinate, below is a sample snippet of the geospatial query code. In this example, the $geoNear command is used, which allows one to fetch all the locations that are “near” a given point that is specified by the longitude and latitude (e.g., the business location) and also within a certain distance (e.g., five km). The geospatial query can be processed in MongoDB’s data aggregation pipeline to also include other data processing such as selecting which data field to be returned from the dataset by $project and also filter based on certain conditions via $match (e.g., data where the year is greater than 2016). This data is pulled from the United States Flood Database, which contains multiple sources, with 2020 as the latest dataset.

Reference Architecture

The following diagram provides a logical architecture overview of the RAG data process implemented in this solution highlighting the different technologies used including MongoDB, Meta Llama 3, and Fireworks.AI.

RAG data flow architecture diagram
Figure 8: RAG data flow architecture diagram

With MongoDB's multimodal capabilities, developers can enhance the RAG process by utilizing features such as network graphs, time series, and vector search. This enriches the context for the gen AI agent, enabling it to provide more comprehensive and multidimensional risk analysis through multimodal analytics. It can provide more accurate and context-aware insights (eg. using geospatial data to identify flood risk locations) to reduce hallucination and offer profound insights to augment a complex business loan risk assessment process.

Due to the iterative nature of the RAG process, the gen AI model will continually learn and improve from new data and feedback, leading to increasingly accurate risk assessments and minimizing hallucinations. A multimodal data platform would allow you to to fully maximize the capabilities of the multimodal AI models.

Authors
  • Wei You Pan, Global Director, Financial Industry Solutions, MongoDB
  • Paul Claret, Senior Specialist, Industry Solutions, MongoDB
Related Resources
general_content_developer

GitHub repository: Loan risks assessor

Create this demo by following the instructions and associated models in this solution’s repository.

general_features_transactions

MongoDB for lending and leasing

Learn how MongoDB’s developer data platform supports a wide range of use cases in the lending and leasing space.

industry_credit_card

Green lending

Learn how MongoDB can support you when offering greener loans.

mdb_vector_search

Innovate with AI

Discover how leading industries are transforming with AI and MongoDB Atlas.

Get started with Atlas

Get started in seconds. Our free clusters come with 512 MB of storage so you can experiment with sample data and get familiar with our platform.
Try FreeContact sales
Illustration of hands typing on a laptop in the foreground and a superimposed desktop window and coffee cup in the background.