Turning MongoDB into a Predictive Database

Benjamin Flast, Zoran Pandovski, and Natasha Seelam

#Partners#Modernization

Note: this blog, originally published November 10, 2021, has been updated with new installation and connection instructions to connect your MongoDB instance to MindsDB’s machine learning platform, and with new examples and use cases

There’s a growing interest in artificial intelligence (AI) and machine learning (ML) in the business world. The predictive capabilities of ML/AI enable rapid insights from patterns detected at rates faster than manual analysis. Additionally, recent advances in generative machine learning applications, such as OpenAI and Hugging Face, offer powerful tools for businesses to generate and analyze text data. Businesses realize that this can lead to increased profits, reduced costs, and accelerated innovation. Although businesses both large and small can benefit from the power of AI, implementing a machine learning project can be both complex and time-consuming.

MongoDB, Inc. (NASDAQ: MDB), the leading, modern general purpose database platform, and MindsDB, the open-source machine learning platform that brings automated machine learning to the database, established a technology partnership to advance machine learning innovation. This collaboration aims to make it easy for developers to incorporate powerful ML-driven features into their applications to solve real-world business challenges.

What is the best approach?

Once you have identified the initial ML projects you’d like to focus on, such as forecasting or text analysis, choosing the right tools and methodologies can help speed up the time it takes to build, train, optimize and deploy models. Model selection and feature engineering can be time consuming and difficult if you aren’t aware of the specific dimensions the ML model is going to train on. Additionally, pipelines used for data extraction and transformation need to be maintained over time, and a machine learning model also needs to be deployed on the right compute framework.

Existing state-of-the-art AutoML frameworks provide methods to optimize performance including adjusting hyper parameters (such as the learning rate or batch size). The MindsDB AutoML framework extends beyond most conventional automated systems of hyper parameter tuning and enables novel upstream automation of data cleaning, data pre-processing, and feature engineering. To empower users with transparent development, the framework encompasses explainability tools, enables processing for complex data types (NLP, time series, language modeling, and anomaly detection), and gives users customizability by allowing imported models of their choice.

MindsDB also generates predictions at the data layer (without consuming DB resources)—an additional, significant advancement that accelerates development speed. Generating predictions directly in MongoDB Atlas with MindsDB AI Collections gives you the ability to consume predictions as regular data, query these predictions, and accelerate development speed by simplifying deployment work-flows.

Getting started with MindsDB

We suggest starting with either MindsDB in AWS or http://cloud.mindsdb.com for a demo cloud version of MindsDB. For anything beyond small scale testing (2 models, a few thousand documents) we strongly suggest using MindsDB Pro (easy to set up, simple, usage-based ‘pay as you go pricing’). Check out the product page on AWS Marketplace for instructions on setting up MindsDB in your existing AWS account.

For all documentation and FAQs, please visit https://docs.mindsdb.com/.

Setting up the connection to MindsDB in MongoDB

Currently, integration works by accessing MongoDB through MindsDB’s MongoDB API as a new data source. More information about connecting to MongoDB can be found here. MindsDB hosts a demo MongoDB database with sample data sets.

Use the MongoDB Shell or MongoDB Compass UX to connect to MindsDB’s MongoDB API. Please note that you must have MongoDB shell version ≥3.6 to use the MindsDB MongoDB API.

MongoDB Compass connection

To connect to MindsDB Demo database use the following connection string (as below in the MongoDB Compass UX):

mongodb+srv://admin:201287aA@cluster0.myfdu.mongodb.net/admin?authSource=admin&replicaSet=atlas-5koz1i-shard-0&readPreference=primary&appname=MongoDB%20Compass&ssl=true

If you would prefer to follow along with this tutorial from your own database, feel free to use your own connection string, and upload an example dataset where you can run a number of test cases house_sales.csv.

If you use your own MongoDB instance, you will need to follow two additional steps:

Step 1: Once you have created a MindsDB acc’t then connect your MongoDB instance to MindsDB (cloud or AWS) using your own connection string in the MindsDB editor (Here is the link for MindsDB Cloud Editor: https://cloud.mindsdb.com/editor)

Run the query below in the MindsDB editor:

db.databases.insertOne({
    name: "mongo_int",
    engine: "mongodb",
    connection_args: {
            "port": 27017,
            "host": "mongodb+srv://admin:@localhost",
            "database": "test_data"
    }
});

On execution, we get:

{
	"acknowledged" : true,
	"insertedId" : ObjectId("62dff63c6cc2fa93e1d7f12c")
}

Where:

Step 2: Connect your MongoDB Compass or Shell to your MongoDB and; create a new Collection, and add the .csv file, as below:

  • Create collection > Add data > Select data types

  • Data types: [Date, Number, String, Number]

Now, we have successfully integrated with the MongoDB database. The next step is to use MongoDB-client to connect to MindsDB’s MongoDB API and train models. MindsDB has a number of prepared demo use cases and data sets, including predicting home rental prices, forecasting quarterly house sales and predicting customer sentiment through language analysis of product review text using our Hugging Face integration. Many examples for Mongo, with code, can be found in the links below:

For a powerful, showcase example, we will demonstrate a unique feature that is recently available using MindsDB’s integration with OpenAI’s GPT-3 language model. MindsDB can be used to generate JSON documents from unstructured text in the DB. For example, as below, MindsDB can create JSON documents with relevant information on properties for rent (days on market, number of bathrooms, price, rating) based on natural language descriptions from real-estate listings.

Please follow the guide above, or check out our docs on how to connect MongoDB Compass and MongoDB Shell to MindsDB.

To create this model in MQL, run the below command from MongoDB Compass or MongoDB Shell:

db.models.insertOne({
    name: 'nlp_model',
    predict: 'json',
    training_options: {
        engine: 'openai',
        input_text: 'sentence',
        json_struct: {
            'rental_price': 'rental price',
            'location': 'location',
            'nob': 'number of bathrooms'
        }
    }
})

We pass the same three parameters here.

  1. The engine parameter ensures we use the OpenAI engine.

  2. The json_struct parameter stores a predefined JSON structure used for the output.

  3. The input_text parameter contains the name of the field that stores input text.

Now we can query the model, passing the input text stored in the sentence field.

db.nlp_model.find({
    'sentence': 'Amazing 3 bedroom apartment located at the heart of Manhattan, has one full bathrooms and one toilet room for just 3000 a month.'
    })

On execution, we get:

{
  json: {
    rental_price: '3000',
    location: 'Manhattan',
    nob: '1'
  },

  sentence: 'Amazing 3 bedroom apartment located at the heart of Manhattan, has one full bathrooms and one toilet room for just 3000 a month.'

This tutorial highlights the steps to create an NLP model to generate JSON output from unstructured text inside MongoDB by leveraging MindsDB’s MongoDB connector and automation capabilities. Using the existing compute configuration, the example above took less than five minutes, without the need for extensive tooling, or pipelines in addition to your database. With MindsDB’s machine learning capabilities inside MongoDB, developers can now build machine learning models at reduced cost, gain greater insight into model accuracy, and help users make better data-based decisions.

Modernize with MongoDB and MindsDB

MongoDB provides an intuitive process for data management and exploration by simplifying and enriching data. MindsDB helps turn data into intelligent insights by simplifying modernization into machine learning, AI, and the ongoing spectrum of data science.

Try MindsDB to connect to MongoDB, train models, and run predictions in the cloud! Simply install MindsDB from Amazon Marketplace and our team is available on Slack and Github for feedback and support. Check it out and feel free to ask questions, share use case examples!