Getting Started with Microsoft's Semantic Kernel in C# and MongoDB Atlas

Luce Carter10 min read • Published Aug 05, 2024 • Updated Aug 05, 2024

FULL APPLICATION

Rate this tutorial

Semantic Kernel has become hugely popular within the Microsoft ecosystem. In fact, at Microsoft Build, Semantic Kernel and AI with MongoDB was the most discussed topic at our booth.

Semantic Kernel is Microsoft’s AI SDK available in Java, Python, and C#. It allows you to build powerful AI applications by chaining together out-of-the-box, community-created, custom plugins. These plugins work together to create plans that allow you to achieve complex tasks. This could be anything from tidying up Scott Hanselman’s desktop to summarizing a block of text and emailing you the summary. The possibilities are endless!

Semantic Kernel is a tool for building retrieval-augmented generation (RAG) apps. The R and A parts come from retrieving information to use as context in the input to the large language model (LLM). This is where MongoDB comes in. MongoDB is an option for storing data, including embeddings representing that data, and even gives you the ability to search the data using Atlas Vector Search.

Semantic Kernel has support for MongoDB Atlas thanks to a connector. So not only can you store your data in MongoDB, including the embeddings, but it also automatically uses Vector Search under the hood to retrieve the results. You get the best of Semantic Kernel and the best of MongoDB, the most popular document database for C# developers!

In this tutorial, you will learn how to get started with Semantic Kernel and MongoDB, taking advantage of the connector and the SemanticTextMemory plugin, to create a bot that will recommend a movie to watch, using OpenAI to create embeddings, and searching the sample movie data in our sample dataset.

Prerequisites

To follow along with this tutorial, you will need a few things in place:

A MongoDB M0 cluster
The sample data loaded into that cluster
A free OpenAI account and project API key
.NET 8 or higher
An IDE or text editor to follow along

If you would prefer to simply read the code, you can find it on GitHub. It has two branches, depending on whether you have access to Azure OpenAI or want to use OpenAI. We will be using OpenAI for this tutorial as it is free and open to all at time of writing.

Creating the project

Now you have the prerequisites in place, it is time to create the project and add the NuGet packages you will need to create the bot.

Create a new console project, either using your IDE or via the DotNet CLI.
Add the following NuGet packages to your new project

Microsoft.SemanticKernel
Microsoft.SemanticKernel.Connectors.MongoDB (N.B. This is in prerelease)
Microsoft.SemanticKernel.Connectors.OpenAI

Setting up our configuration

There are a few variables we are going to need throughout this tutorial so we will start by setting them up in Program.cs.

Because we want to create at least one other method in this tutorial, we will also switch to the traditional structure of our program class. Replace the contents with the following:

Code Snippet

The pragma warning disable addition is because a lot of the features are experimental and this will turn off the errors.

Go ahead and replace the placeholders for OpenAI and Atlas with your own values.

Setting up the memory plugin and memory store

You may have noticed in the last section that you added a MemoryBuilder variable. This builder is what gives you access to the memory plugin, an out-of-the-box plugin for working with stored data.

So now we are going to configure this plugin, use this builder, and also connect it to MongoDB Atlas as our memory store.

Paste the following code inside your Main method:

Code Snippet

The Memory Builder comes with some helper methods. In this case, we are using WithOpenAITextEmbeddingGeneration which helps you configure the memory plugin.

Because we are working with text in this project, we need to be able to generate text embeddings for our data to be used in the search. This is where OpenAI comes in. By passing this method the name of the model we want to use and the OpenAI API key, the plugin has all it needs to automatically take care of the rest for us under the hood — excellent!

Ensure the following using statements are present in the file:

Code Snippet

Using a database that supports vectors and vector searches, such as MongoDB Atlas, is a key part of adding the retrieval and augmentation parts to your RAG applications.

Semantic Kernel’s MongoDB Connector adds support for not only using MongoDB as your data store for your embeddings, but it also uses MongoDB’s vector search capabilities to carry out the search.

Paste the following code after the previous, inside your Main method:

Code Snippet

Just like that, with a few lines of code, we have the memory plugin set up and it is configured to use MongoDB.

Adding documents to our memory store

MongoDB’s sample data comes with different databases and collections for a variety of use cases. One of the recent changes was to the sample_mflix database. This database has been around in the sample data for a long time but we recently added a new collection inside the database called embedded_movies. You may have noticed that already if you have browsed your new cluster. This collection contains vector embeddings on the plot field from a large number of documents from the movies collection and makes it much easier for developers to experience MongoDB’s Atlas Vector Search in a variety of programming languages.

In an ideal world, we would use this collection with Semantic Kernel. Unfortunately, there is a limitation with Semantic Kernel on the name of the field containing the embeddings value as well as the shape of the documents it can use. So for this reason, for the sake of this tutorial, we are going to import some documents from our sample_mflix database and save them in a new collection, using Semantic Kernel. This will generate the embeddings automatically using OpenAI, and save them in the format that Semantic Kernel can use later.

First, we need to create a model that represents the movie document. So create a new Movie.cs class in your project and paste in the following:

Code Snippet

public class Movie
{
    [BsonId]
    [BsonRepresentation(BsonType.ObjectId)]
    public string Id { get; set; }

[BsonElement("plot")]
    public string Plot { get; set; }

[BsonElement("genres")]
    public List<string> Genres { get; set; }

[BsonElement("runtime")]
    public int Runtime { get; set; }

[BsonElement("cast")]
    public List<string> Cast { get; set; }

[BsonElement("num_mflix_comments")]
    public int NumMflixComments { get; set; }

[BsonElement("poster")]
    public string Poster { get; set; }

[BsonElement("title")]
    public string Title { get; set; }

[BsonElement("fullplot")]
    public string Fullplot { get; set; }

[BsonElement("languages")]
    public List<string> Languages { get; set; }

[BsonElement("released")]
    public DateTime Released { get; set; }

[BsonElement("directors")]
    public List<string> Directors { get; set; }

[BsonElement("writers")]
    public List<string> Writers { get; set; }

[BsonElement("awards")]
    public Awards Awards { get; set; }

[BsonElement("rated")]
    public string? Rated { get; set; }

[BsonElement("lastupdated")]
    public string Lastupdated { get; set; }

[BsonElement("year")]
    public object Year { get; set; }

[BsonElement("imdb")]
    public Imdb Imdb { get; set; }

[BsonElement("countries")]
    public List<string> Countries { get; set; }

[BsonElement("type")]
    public string Type { get; set; }

[BsonElement("tomatoes")]
    public Tomatoes Tomatoes { get; set; }

[BsonElement("metacritic")]
    public int? Metacritic { get; set; }

[BsonElement("awesome")]
    public bool? Awesome { get; set; }
}

public class Awards
{
    [BsonElement("wins")]
    public int Wins { get; set; }

[BsonElement("nominations")]
    public int Nominations { get; set; }

[BsonElement("text")]
    public string Text { get; set; }
}

public class Imdb
{
    [BsonElement("id")]
    public object ImdbId { get; set; }

[BsonElement("votes")]
    public object Votes { get; set; }

[BsonElement("rating")]
    public object Rating { get; set; }
}

public class Tomatoes
{
    [BsonElement("viewer")]
    public Viewer Viewer { get; set; }

[BsonElement("lastUpdated")]
    public DateTime LastUpdated { get; set; }

[BsonElement("dvd")]
    public DateTime? DVD { get; set; }

[BsonElement("website")]
    public string? Website { get; set; }

[BsonElement("production")]
    public string? Production { get; set; }

[BsonElement("critic")]
    public Critic? Critic { get; set; }

[BsonElement("rotten")]
    public int? Rotten { get; set; }

[BsonElement("fresh")]
    public int? Fresh { get; set; }

[BsonElement("boxOffice")]
    public string? BoxOffice { get; set; }

[BsonElement("consensus")]
    public string? Consensus { get; set; }

}

public class Viewer
{
    [BsonElement("rating")]
    public double Rating { get; set; }

[BsonElement("numReviews")]
    public int NumReviews { get; set; }

[BsonElement("meter")]
    public int Meter { get; set; }
}

public class Critic
{
    [BsonElement("rating")]
    public double Rating { get; set; }

[BsonElement("numReviews")]
    public int NumReviews { get; set; }

[BsonElement("meter")]
    public int Meter { get; set; }
}

If your IDE or text editor doesn’t auto add the required using statements, add the following at the top of the class:

Code Snippet

Now we have the model available that reflects our document, it is time to make use of it.

Paste the following code in your Program.cs class:

Code Snippet

private static async Task FetchAndSaveMovieDocuments(ISemanticTextMemory memory, int limitSize)
    {
        MongoClient mongoClient = new MongoClient(MongoDBAtlasConnectionString);
        var movieDB = mongoClient.GetDatabase("sample_mflix");
        var movieCollection = movieDB.GetCollection<Movie>("movies");
        List<Movie> movieDocuments;

Console.WriteLine("Fetching documents from MongoDB...");

movieDocuments = movieCollection.Find(m => true).Limit(limitSize).ToList();

movieDocuments.ForEach(movie =>
        {
            if (movie.Plot == null)
            {
                movie.Plot = "UNKNOWN";
            }
        });

foreach (var movie in movieDocuments)
        {
            try
            {
                await memory.SaveReferenceAsync(
                collection: CollectionName,
                description: movie.Plot,
                text: movie.Plot,
                externalId: movie.Title,
                externalSourceName: "Sample_Mflix_Movies",
                additionalMetadata: movie.Year.ToString());
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);

}
        }
    }

Let’s take a look at what is happening:

We take advantage of the MongoDB C# driver, which is available to us from the connector, to create a new client and point it to our existing database and collection.
Then, we create a new list of movies, fetching the requested number of documents and adding them to the list.
For each movie, we do some data hygiene for any null plots as this can cause errors later, and simply marking it as nullable won’t work, sadly.
After we have a clean list of movies, we iterate through each one and save it to our new collection via the memory store.
- The document that Semantic Kernel creates with the plugin has some fields that we want to populate so we assign those the most sensible values from the fields available in our movie document.

Now, we need to actually call this method. We can do this by simply calling await FetchAndSaveMovieDocuments(memory, 1500); from our Main method, after the existing code. This will populate our collection linked to the memory store with 1500 documents. You can choose a different number, if you wish.

Run the application to populate our new database and collection with data using Semantic Kernel. Once it displays “Fetching documents from MongoDB…”, wait a few minutes for it to populate in the background and then close the application. Generating the text embeddings on such a large number of documents using Semantic Kernel can take a little while. This is not a bottleneck due to the wonderful MongoDB C# driver.

dotnet run

This only needs to run once so we have some data available to us. So if you want to run this app again in future, it is OK to comment out the call to the method FetchAndSaveMovieDocuments, or remove it completely.

This will create a new database in your cluster called semantic-kernel with a collection called embedded_movies, containing the data as populated using Semantic Kernel.

Creating the vector search index

You may have noticed earlier that when we added our MongoDB memory store, we passed it the search index name. This search index is used to identify which field or fields we want to use in our search. But this doesn’t exist yet on our MongoDB database.

Now you have run the application once, the data will be available in the collection to use in the search index.

We already have some great documentation on how to create a vector search index so you can refer to that on how to access the wizard in the Atlas UI to create the new index.

The following JSON can be used to define the index:

Code Snippet

This uses the embedding field that was generated by Semantic Kernel. OpenAI’s “text-embedding-ada-002” model that we are using for the text embedding generates 1536 dimensions. You will see this in the documents generated as the embedding array contains 1536 elements.

You will need to use the index name “default” to match the hard coded variable in your code. If you name the search index something else, be sure to update the variable.

Asking questions of our data

Now that we have the data available to us and the search index created, it is time to add the ability to actually ask questions of our data.

Paste the following code inside your Main method, after the existing code:

Code Snippet

Console.WriteLine("Welcome to the Movie Recommendation System!");
Console.WriteLine("Type 'x' and press Enter to exit.");
Console.WriteLine("============================================");
Console.WriteLine();

while(true)
{
    Console.WriteLine("Tell me what sort of film you want to watch..");
    Console.WriteLine();

Console.Write("> ");

var userInput = Console.ReadLine();

if(userInput.ToLower() == "x")
    {
        Console.WriteLine("Exiting application..");
        break;
    }

Console.WriteLine();

var memories = memory.SearchAsync(CollectionName, userInput, limit: 3, minRelevanceScore: 0.6);

Console.WriteLine(String.Format("{0,-20} {1,-50} {2,-10} {3,-15}", "Title", "Plot", "Year", "Relevance (0 - 1)"));
    Console.WriteLine(new String('-', 95)); // Adjust the length based on your column widths

await foreach (var mem in memories)
    {
        Console.WriteLine(String.Format("{0,-20} {1,-50} {2,-10} {3,-15}", 
            mem.Metadata.Id, 
            mem.Metadata.Description.Length > 47 ? mem.Metadata.Description.Substring(0, 47) + "..." : mem.Metadata.Description, // Truncate long descriptions
            mem.Metadata.AdditionalMetadata, 
            mem.Relevance.ToString("0.00"))); // Format relevance score to two decimal places
    }
}

A lot of this code is about user input and formatting the output. But let’s look at the lines of code that matter:

memory.SearchAsync is how we carry out the search. We pass it the name of where we want to search, a.k.a. the collection name, what we want to search, how many results to get back, and what score from 0 to 1 we consider a threshold for “relevant enough.” await foreach (var mem in memories) is slightly different to the foreach you might be used to. The memories variable that was assigned the result of the search is of type ```IAsyncEnumerable

so we have to perform an await foreach to iterate through it.

Trying it out

We have everything in place now to run the application and actually ask it a question. Why not try asking it for a movie about sharks or another topic you love?

Summary

Just like that, you have created a simple movie chat recommendation bot using Semantic Kernel from Microsoft, MongoDB Atlas, and the awesome connector for MongoDB in Semantic Kernel.

If you want to learn more, I wrote a tutorial on how to use Atlas Vector Search natively in a .NET application!

You can view the full code by visiting the repo on GitHub.

There is also a main branch of this repo which uses AzureOpenAI for those of you who have access.

Why not try it out today and see what movie you might want to watch tonight?

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Tutorial

Adding MongoDB Atlas Vector Search to a .NET Blazor C# Application

Feb 29, 2024 | 10 min read

Tutorial

MongoDB Atlas Search with .NET Blazor for Full-Text Search

Feb 01, 2024 | 6 min read

Tutorial

Integrate Azure Key Vault with MongoDB Client-Side Field Level Encryption

May 24, 2022 | 9 min read

Article

How to Set Up MongoDB Class Maps for C# for Optimal Query Performance and Storage Size

Aug 05, 2024 | 8 min read

Prerequisites
Creating the project
Setting up our configuration
Setting up the memory plugin and memory store
Adding documents to our memory store
Creating the vector search index
Asking questions of our data
Trying it out
Summary

C#

Getting Started with Microsoft's Semantic Kernel in C# and MongoDB Atlas

Prerequisites

Creating the project

Setting up our configuration

Setting up the memory plugin and memory store

Adding documents to our memory store

Creating the vector search index

Asking questions of our data

Trying it out

Summary

Top Comments in Forums

Related

Adding MongoDB Atlas Vector Search to a .NET Blazor C# Application

MongoDB Atlas Search with .NET Blazor for Full-Text Search

Integrate Azure Key Vault with MongoDB Client-Side Field Level Encryption

How to Set Up MongoDB Class Maps for C# for Optimal Query Performance and Storage Size

Table of Contents