Integrating MongoDB with TensorFlow and C#
Rate this tutorial
Are you a C# newbie or guru and curious to know how Tensorflow works with MongoDB as the database? This tutorial is for you.
This process involves fetching data from MongoDB, performing data preprocessing, and building a machine-learning model using ML.NET and TensorFlow. This guide is ideal for developers interested in leveraging the power of machine learning within a .NET environment.
MongoDB is a NoSQL database, and it helps you work with large sets of shared data. MongoDB is a document-oriented database that stores data in JSON-like documents. It allows unmatched scalability and flexibility, plus all the querying and indexing that you need.
TensorFlow is an open-source, end-to-end platform for machine learning, developed by the Google Brain team. It offers a comprehensive system for managing all components of a machine learning setup. This tutorial, however, concentrates on using a specific TensorFlow API to develop and train machine learning models. TensorFlow operates by constructing computational graphs — networks of nodes representing mathematical operations, with edges between nodes representing the multidimensional data arrays (tensors) that flow through these operations.
Here are some useful application examples of TensorFlow.
- Image Recognition: TensorFlow is utilized in image recognition applications to detect objects, faces, and scenes in images and videos. This functionality is essential for a variety of applications, including security systems, where it enhances surveillance by recognizing human activities and faces, and healthcare, where it assists in diagnosing diseases by analyzing medical imagery. Learn more about TensorFlow for image recognition.
- Natural language processing (NLP): ensorFlow's capacity to manage large datasets and intricate algorithms makes it ideal for NLP tasks. It supports applications like language translation, sentiment analysis, and chatbots, enabling machines to understand, interpret, and generate human language in a contextually meaningful way. Explore TensorFlow applications in NLP.
- Recommendation systems: Numerous e-commerce and streaming companies utilize TensorFlow to build recommendation systems that analyze users' past behavior to suggest products or media they might find interesting. This personalization improves the user experience and can significantly boost conversion rates for businesses. Learn about building recommendation systems with TensorFlow.
- Autonomous vehicles: TensorFlow is utilized in the automotive industry to develop and improve systems for autonomous vehicles. By processing data from various sensors and cameras, TensorFlow-based models support making decisions about vehicle steering and collision avoidance. Explore how TensorFlow is applied in autonomous driving.
- Healthcare: TensorFlow is utilized in various tasks, like disease diagnosis and drug discovery. It examines patterns from large datasets of medical records to predict disease progression and results, facilitating early diagnosis and personalized treatment plans. Discover TensorFlow applications in healthcare.
These instances illustrate the versatility of TensorFlow covering different domains, showcasing its role in driving innovation by transforming how data is interpreted and utilized in building intelligent applications. Each instance link provided offers a deeper swoop into how TensorFlow is used in real-world applications, providing evidence of its broad utility and impact.
Before we dive into the details, make sure you have the following installed:
- MongoDB
- Visual Studio Code or another code editor
For this project, you need to install the following NuGet packages:
- MongoDB.Driver: This package includes everything you need to interact with MongoDB, including BSON and CRUD operations. Install with
dotnet add package MongoDB.Driver
. - Microsoft.ML: This package is essential for building and training machine learning models in .NET. Install with
dotnet add package Microsoft.ML
. - Microsoft.ML.TensorFlow: This package allows integration with TensorFlow models within ML.NET. Install with
dotnet add package Microsoft.ML.TensorFlow
.
Make sure MongoDB is running on your local machine. You can download and install MongoDB from the website.
Finally, set up your development environment by initializing a new C# (Console App) project. Follow Visual Studio Code’s guide if you are coding your C# console app project for the first time.
Before connecting to MongoDB, define the structure of your data by creating classes that represent your models. This approach ensures that when you interact with MongoDB, you're working with strongly-typed objects instead of generic
BsonDocument
objects. This improves code clarity, maintainability, and type safety.1 using MongoDB.Bson; 2 using System.Collections.Generic; 3 using Microsoft.ML.Data; 4 5 public class SampleData 6 { 7 public ObjectId Id { get; set; } // This corresponds to the MongoDB _id field 8 public List<double> X { get; set; } = new List<double>(); 9 public List<double> Y { get; set; } = new List<double>(); 10 } 11 12 public class DataPoint 13 { 14 public float X { get; set; } 15 public float Y { get; set; } 16 } 17 18 public class Prediction 19 { 20 [ ]21 public float PredictedY { get; set; } 22 }
With the model classes defined in the previous step, you can now establish a connection to MongoDB using these models. Instead of using a generic
BsonDocument
, specify the type of documents in the collection (SampleData
), making your code more intuitive and type-safe.Place the following code in your
Program.cs
file:1 // MongoDB connection string 2 var client = new MongoClient("mongodb://localhost:27017"); 3 var database = client.GetDatabase("linear-data"); 4 var collection = database.GetCollection<SampleData>("sampleData");
Ensure you include the necessary
using
statements at the top of Program.cs
:1 using MongoDB.Bson; 2 using MongoDB.Driver; 3 using System.Collections.Generic;
This setup allows you to interact with your MongoDB collection using strongly-typed models, improving code readability and maintainability.
- Sample data: First, define the sample data to be inserted into MongoDB.
- Insert data: Then, insert the defined document into the MongoDB collection.
1 // Define the data to insert 2 var sampleData = new SampleData 3 { 4 X = new List<double> { 3.3, 4.4, 5.5, 6.71, 6.93, 4.168, 9.779, 6.182, 7.59, 2.167, 7.042, 10.791, 5.313, 7.997, 5.654, 9.27, 3.1 }, 5 Y = new List<double> { 1.7, 2.76, 2.09, 3.19, 1.694, 1.573, 3.366, 2.596, 2.53, 1.221, 2.827, 3.465, 1.65, 2.904, 2.42, 2.94, 1.3 } 6 }; 7 8 // Insert the data into MongoDB 9 collection.InsertOne(sampleData);
The following steps describe how to retrieve and prepare data from your MongoDB database to use as a dataset for training your TensorFlow model.
1 // Fetch the data from MongoDB as SampleData 2 var fetchedSampleData = collection.Find(new BsonDocument()).FirstOrDefault(); 3 4 // Assuming you have only one document, use the fetchedSampleData directly 5 var xArray = fetchedSampleData.X.Select(value => (float)value).ToArray(); 6 var yArray = fetchedSampleData.Y.Select(value => (float)value).ToArray();
Retrieve data: Begin by retrieving the data from MongoDB. This code fetches the first document from the collection and assumes that your MongoDB setup contains only one document that holds the entire dataset you need.
Convert BSON array: Once you've retrieved the document, you'll need to extract the data from the BSON arrays and convert them into float arrays. The
Select
method is used here to convert each element of the X
and Y
lists from double
to float
. This conversion is crucial for compatibility with TensorFlow, which often requires data in the form of float arrays.These steps ensure that you're not just retrieving generic data, but specifically targeting the structured data stored in your MongoDB collection. This setup is designed to be intuitive for both beginners and experienced developers, providing a clear pathway from data retrieval to TensorFlow model training in a C# environment.
1 // Create a new ML context 2 var mlContext = new MLContext(); 3 4 // Create the ML.NET data structures 5 var data = xArray.Zip(yArray, (x, y) => new DataPoint { X = x, Y = y }).ToList(); 6 var dataView = mlContext.Data.LoadFromEnumerable(data);
- ML context: Create a new ML.NET context.
- Data structures: Load the data into ML.NET's data structures
1 // Define the trainer 2 var pipeline = mlContext.Transforms.Concatenate("Features", new[] { "X" }) 3 .Append(mlContext.Transforms.NormalizeMinMax("Features")) 4 .Append(mlContext.Regression.Trainers.Sdca(labelColumnName: "Y", featureColumnName: "Features")); 5 6 // Train the model 7 var model = pipeline.Fit(dataView);
- Pipeline definition: Define a machine learning pipeline using ML.NET.
- Model training: Train the model on the data.
1 // Use the model to make predictions 2 var predictions = model.Transform(dataView); 3 var metrics = mlContext.Regression.Evaluate(predictions, labelColumnName: "Y"); 4 5 Console.WriteLine($"R^2: {metrics.RSquared}"); 6 Console.WriteLine($"RMSE: {metrics.RootMeanSquaredError}");
- Transform data: Transform the data using the trained model.
- Evaluate model: Evaluate the model’s performance using R² and RMSE metrics.
1 // Display the predictions 2 var predictionFunction = mlContext.Model.CreatePredictionEngine<DataPoint, Prediction>(model); 3 foreach (var point in data) 4 { 5 var prediction = predictionFunction.Predict(point); 6 Console.WriteLine($"X: {point.X}, Y: {point.Y}, Predicted: {prediction.PredictedY}"); 7 }
- Prediction engine: Create a prediction engine.
- Make predictions: Use the engine to make predictions and display the results.
- Ensure MongoDB is running: Start MongoDB on your local machine.
- Run the code: Execute the program using
dotnet run
.
1 dotnet run
The expected output of the given C# code, which seeds data into MongoDB and then uses TensorFlow to perform linear regression, includes two main parts: a confirmation message that data has been seeded successfully and the evaluation metrics of the linear regression model, followed by the predicted values for each data point.
Here’s a detailed breakdown of what you should expect:
Data seeding confirmation
The first output message confirms that the data has been seeded into MongoDB successfully.
1 Data seeded successfully!
Model evaluation metrics
The output includes evaluation metrics for the linear regression model. These metrics help to understand the performance of the model.
R-squared (R^2)
This value measures the proportion of variance in the dependent variable that is predictable from the independent variable(s). In this case, a negative R^2 value of
-85.09826582520343
indicates that the model is not fitting the data well.1 R^2: -131160.77737920854
Root mean squared error (RMSE)
This metric measures the average magnitude of the prediction errors. A lower RMSE indicates a better fit of the model. Here, the RMSE value is
6.567497795652124
.1 RMSE: 256.3340977791911
Predictions
For each data point, the output shows the actual X and Y values, along with the predicted Y value from the linear regression model.
1 X: 3.3, Y: 1.7, Predicted: 84.26584 2 X: 4.4, Y: 2.76, Predicted: -8.049469 3 X: 5.5, Y: 2.09, Predicted: -100.36478 4 X: 6.71, Y: 3.19, Predicted: -201.91168 5 X: 6.93, Y: 1.694, Predicted: -220.3747 6 X: 4.168, Y: 1.573, Predicted: 11.420654 7 X: 9.779, Y: 3.366, Predicted: -459.47144 8 X: 6.182, Y: 2.596, Predicted: -157.60028 9 X: 7.59, Y: 2.53, Predicted: -275.76392 10 X: 2.167, Y: 1.221, Predicted: 179.35062 11 X: 7.042, Y: 2.827, Predicted: -229.77405 12 X: 10.791, Y: 3.465, Predicted: -544.4015 13 X: 5.313, Y: 1.65, Predicted: -84.6712 14 X: 7.997, Y: 2.904, Predicted: -309.9206 15 X: 5.654, Y: 2.42, Predicted: -113.28891 16 X: 9.27, Y: 2.94, Predicted: -416.75458 17 X: 3.1, Y: 1.3, Predicted: 101.050446
These values show the actual X and Y values from the dataset along with the corresponding predicted Y values. The predictions illustrate how the linear regression model approximates the relationship between X and Y. Given the poor performance indicated by the R^2 value, the predicted values may not be close to the actual Y values.
- Data seeding confirmation: Confirms that the data was successfully inserted into MongoDB.
- Model evaluation metrics: Provides insight into the model's performance, indicating poor fit with a negative R^2 and a relatively high RMSE.
- Predictions: Shows the actual and predicted values, highlighting the model's approximation of the data.
1 Data seeded successfully! 2 R^2: -131160.77737920854 3 RMSE: 256.3340977791911 4 X: 3.3, Y: 1.7, Predicted: 84.26584 5 X: 4.4, Y: 2.76, Predicted: -8.049469 6 X: 5.5, Y: 2.09, Predicted: -100.36478 7 X: 6.71, Y: 3.19, Predicted: -201.91168 8 X: 6.93, Y: 1.694, Predicted: -220.3747 9 X: 4.168, Y: 1.573, Predicted: 11.420654 10 X: 9.779, Y: 3.366, Predicted: -459.47144 11 X: 6.182, Y: 2.596, Predicted: -157.60028 12 X: 7.59, Y: 2.53, Predicted: -275.76392 13 X: 2.167, Y: 1.221, Predicted: 179.35062 14 X: 7.042, Y: 2.827, Predicted: -229.77405 15 X: 10.791, Y: 3.465, Predicted: -544.4015 16 X: 5.313, Y: 1.65, Predicted: -84.6712 17 X: 7.997, Y: 2.904, Predicted: -309.9206 18 X: 5.654, Y: 2.42, Predicted: -113.28891 19 X: 9.27, Y: 2.94, Predicted: -416.75458 20 X: 3.1, Y: 1.3, Predicted: 101.050446
By following this guide, you’ve successfully integrated MongoDB with TensorFlow and C# using ML.NET. This integration enables you to leverage MongoDB's data storage capabilities with the powerful machine learning framework TensorFlow, all within a .NET environment.
This tutorial demonstrates the ease with which different technologies can be combined to create robust data processing and machine learning solutions.
For further learning, check out the TensorFlow and MongoDB documentation, and explore more complex machine learning models and database operations. If you have questions or want to share your work, join us in the MongoDB Developer Community.
Thanks for reading...
Happy coding!
Top Comments in Forums
There are no comments on this article yet.