Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Learn why MongoDB was selected as a leader in the 2024 Gartner® Magic Quadrant™
MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Center
chevron-right
Developer Topics
chevron-right
Products
chevron-right
MongoDB
chevron-right

Generating MQL Shell Commands Using OpenAI and New mongosh Shell

Pavel Duchovny7 min read • Published Jul 23, 2021 • Updated Jul 11, 2023
AIMongoDBShell
Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty

Generating MQL Shell Commands Using OpenAI and New mongosh Shell

OpenAI is a fascinating and growing AI platform sponsored by Microsoft, allowing you to digest text cleverly to produce AI content with stunning results considering how small the “learning data set” you actually provide is.
MongoDB’s Query Language (MQL) is an intuitive language for developers to interact with MongoDB Documents. For this reason, I wanted to put OpenAI to the test of quickly learning the MongoDB language and using its overall knowledge to build queries from simple sentences. The results were more than satisfying to me. Github is already working on a project called Github copilot which uses the same OpenAI engine to code.
In this article, I will show you my experiment, including the game-changing capabilities of the new MongoDB Shell (mongosh) which can extend scripting with npm modules integrations.

What is OpenAI and How Do I Get Access to It?

OpenAI is a unique project aiming to provide an API for many AI tasks built mostly on Natural Language Processing today. You can read more about their projects in this blog.
There are a variety of examples for its text processing capabilities.
If you want to use OpenAI, you will need to get a trial API key first by joining the waitlist on their main page. Once you are approved to get an API key, you will be granted about $18 for three months of testing. Each call in OpenAI is billed and this is something to consider when using in production. For our purposes, $18 is more than enough to test the most expensive engine named “davinci.”
Once you get the API key, you can use various clients to run their AI API from your script/application.
Since we will be using the new mongosh shell, I have used the JS API.

Preparing the mongosh to Use OpenAI

First, we need to install the new shell, if you haven’t done it so far. On my Mac laptop, I just issued:
1brew install mongosh
Windows users should download the MSI installer from our download page and follow the Windows instructions.
Once my mongosh is ready, I can start using it, but before I do so, let’s install OpenAI JS, which we will import in the shell later on:
1$ mkdir openai-test
2$ cd openai-test
3Openai-test $ npm i openai-api
I’ve decided to use the Questions and Answers pattern, in the form of Q: <Question> and A: <Answer>, provided to the text to command completion API to provide the learning material about MongoDB queries for the AI engine. To better feed it, I placed the training questions and answers in a file called AI-input.txt and its content:
1Q: What is the query syntax?
2A: db.collection.find(<filter>, <projection> , <options>)
3Q: Query users collection for username with value "boy"
4A: db.users.find({"username" : "boy"})
5Q: Query users collection for username with value "girl"A: db.users.find({"username" : "girl"})
6Q: Query users collection for username with age bigger than 16
7A: db.users.find({"age" : {$gt : 16}})n;
8Q: Query author collection for username with value "boy"
9A: db.authors.find({"username" : "boy"})
10Q:Query author collection for age lower than 7
11A: db.authors.find({"age" : {$lt : 7}});
12
13Q:insert a json document into collection authors with username equal to "girl"
14A: db.authors.insert({"username" : "girl"}, {"age" : 10, "gender" : "female"})
15Q: insert many documents into users collections
16A: db.users.insertMany([{username : "john doe"}, {username : "joe doe"}]);
17Q: The following aggregation uses $geoNear to find documents with a location at most 2 meters from the center [ -73.99279 , 40.719296 ] and a category equal to Parks.
18A: db.places.aggregate([{ $geoNear: {near: { type: "Point", coordinates: [ -73.99279 , 40.719296 ] },distanceField: "dist.calculated",maxDistance: 2, query: { category: "Parks" },includeLocs: "dist.location", spherical: true}}])
19Q: How to build atlas $search text query?
20A: db.collection.aggregate({$search: {"index": <index name> "text": {"query": "<search-string>","path": "<field-to-search>", "fuzzy": <options>,"score": <options> } }})
21
22Q: What is the aggregate syntax?
23A: db.collection.aggregate([<stage1>,<stage2>,<stage3>], <projection> , <options>);
24Q: aggregate users collection to calculate salary sum per user
25A: db.users.aggregate([{$group : { _id : "$username" , salary_sum : { $sum : "$salary" }}}]);
26Q: aggregate person collection to calculate salary sum per person
27A: db.persons.aggregate([{$group : { _id : "$person" , salary_sum : { $sum : "$salary" }}}]);
28Q: Lookup users and orders collection
29A: db.users.aggregate([{$lookup : {from: 'orders', localField : "_id", foreignField : "userId", as : "result" }} ]);
30
31Q: What is the update syntax?
32A:db.collection.update(query, update, options)
33Q: How to edit collection sports where sportname is 'football' and match is 'england vs portugal' to score of '3-3' and date to current date?
34A: db.sports.update({ sportname: "football", match: "england vs portugal"} , {$set : {score: "3-3" , date : new Date()}} })
35Q: Query and atomically update collection zoo where animal is "bear" with a counter increment on eat field, if the data does not exist user upsert
36A: db.zoo.findOneAndUpdate({animal : "bear"}, {$inc: { eat : 1 }} , {upsert : true})
We will use this file later in our code.
This way, the completion will be based on a similar pattern.

Prepare Your Atlas Cluster

MongoDB Atlas, the database-as-a-platform service, is a great way to have a running cluster in seconds with a sample dataset already there for our test. To prepare it, please use the following steps:
  1. Create an Atlas account (if you don’t have one already) and use/start a cluster. For detailed steps, follow this documentation.
Use the copied connection string, providing it to the mongosh binary to connect to the pre-populated Atlas cluster with sample data. Then, switch to sample_restaurants database.
1mongosh "mongodb+srv://<u>:<p>@<atlas-uri>/sample_restaurants"
2Using Mongosh : X.X.X
3Using MongoDB: X.X.X
4
5For mongosh info see: https://docs.mongodb.com/mongodb-shell/
6
7ATLAS atlas-ugld61-shard-0 [primary]> use sample_restaurants;

Using OpenAI Inside the mongosh Shell

Now, we can build our textToMql function by pasting it into the mongosh. The function will receive a text sentence, use our generated OpenAI API key, and will try to return the best MQL command for it:
1async function textToMql(query){
2
3const OpenAI = require('openai-api');
4const openai-client = new OpenAI("<YOUR-OPENAI-API-KEY>");
5
6const fs = require('fs');
7
8var data = await fs.promises.readFile('AI-input.txt', 'utf8');
9
10const learningPath = data;
11
12var aiInput = learningPath + "Q:" + query + "\nA:";
13
14 const gptResponse = await openai-client.complete({
15 engine: 'davinci',
16 prompt: aiInput,
17 "temperature": 0.3,
18 "max_tokens": 400,
19 "top_p": 1,
20 "frequency_penalty": 0.2,
21 "presence_penalty": 0,
22 "stop": ["\n"]
23 });
24
25 console.log(gptResponse.data.choices[0].text);
26}
In the above function, we first load the OpenAI npm module and initiate a client with the relevant API key from OpenAI.
1const OpenAI = require('openai-api');
2const openai-client = new OpenAI("<YOUR-OPENAI-API-KEY>");
3
4const fs = require('fs');
The new shell allows us to import built-in and external modules to produce an unlimited flexibility with our scripts.
Then, we read the learning data from our AI-input.txt file. Finally we add our Q: <query> input to the end followed by the A: value which tells the engine we expect an answer based on the provided learningPath and our query.
This data will go over to an OpenAI API call:
1 const gptResponse = await openai.complete({
2 engine: 'davinci',
3 prompt: aiInput,
4 "temperature": 0.3,
5 "max_tokens": 400,
6 "top_p": 1,
7 "frequency_penalty": 0.2,
8 "presence_penalty": 0,
9 "stop": ["\n"]
10 });
The call performs a completion API and gets the entire initial text as a prompt and receives some additional parameters, which I will elaborate on:
  • engine: OpenAI supports a few AI engines which differ in quality and purpose as a tradeoff for pricing. The “davinci” engine is the most sophisticated one, according to OpenAI, and therefore is the most expensive one in terms of billing consumption.
  • temperature: How creative will the AI be compared to the input we gave it? It can be between 0-1. 0.3 felt like a down-to-earth value, but you can play with it.
  • Max_tokens: Describes the amount of data that will be returned.
  • Stop: List of characters that will stop the engine from producing further content. Since we need to produce MQL statements, it will be one line based and “\n” is a stop character.
Once the content is returned, we parse the returned JSON and print it with console.log.

Lets Put OpenAI to the Test with MQL

Once we have our function in place, we can try to produce a simple query to test it:
1Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> textToMql("query all restaurants where cuisine is American and name starts with 'Ri'")
2 db.restaurants.find({cuisine : "American", name : /^Ri/})
3
4Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> db.restaurants.find({cuisine : "American", name : /^Ri/})
5[
6 {
7 _id: ObjectId("5eb3d668b31de5d588f4292a"),
8 address: {
9 building: '2780',
10 coord: [ -73.98241999999999, 40.579505 ],
11 street: 'Stillwell Avenue',
12 zipcode: '11224'
13 },
14 borough: 'Brooklyn',
15 cuisine: 'American',
16 grades: [
17 {
18 date: ISODate("2014-06-10T00:00:00.000Z"),
19 grade: 'A',
20 score: 5
21 },
22 {
23 date: ISODate("2013-06-05T00:00:00.000Z"),
24 grade: 'A',
25 score: 7
26 },
27 {
28 date: ISODate("2012-04-13T00:00:00.000Z"),
29 grade: 'A',
30 score: 12
31 },
32 {
33 date: ISODate("2011-10-12T00:00:00.000Z"),
34 grade: 'A',
35 score: 12
36 }
37 ],
38 name: 'Riviera Caterer',
39 restaurant_id: '40356018'
40 }
41...
Nice! We never taught the engine about the restaurants collection or how to filter with regex operators but it still made the correct AI decisions.
Let's do something more creative.
1Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> textToMql("Generate an insert many command with random fruit names and their weight")
2 db.fruits.insertMany([{name: "apple", weight: 10}, {name: "banana", weight: 5}, {name: "grapes", weight: 15}])
3Atlas atlas-ugld61-shard-0 [primary]sample_restaurants> db.fruits.insertMany([{name: "apple", weight: 10}, {name: "banana", weight: 5}, {name: "grapes", weight: 15}])
4{
5 acknowledged: true,
6 insertedIds: {
7 '0': ObjectId("60e55621dc4197f07a26f5e1"),
8 '1': ObjectId("60e55621dc4197f07a26f5e2"),
9 '2': ObjectId("60e55621dc4197f07a26f5e3")
10 }
11}
Okay, now let's put it to the ultimate test: aggregations!
1Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> use sample_mflix;
2Atlas atlas-ugld61-shard-0 [primary] sample_mflix> textToMql("Aggregate the count of movies per year (sum : 1) on collection movies")
3 db.movies.aggregate([{$group : { _id : "$year", count : { $sum : 1 }}}]);
4
5Atlas atlas-ugld61-shard-0 [primary] sample_mflix> db.movies.aggregate([{$group : { _id : "$year", count : { $sum : 1 }}}]);
6[
7 { _id: 1967, count: 107 },
8 { _id: 1986, count: 206 },
9 { _id: '2006è2012', count: 2 },
10 { _id: 2004, count: 741 },
11 { _id: 1918, count: 1 },
12 { _id: 1991, count: 252 },
13 { _id: 1968, count: 112 },
14 { _id: 1990, count: 244 },
15 { _id: 1933, count: 27 },
16 { _id: 1997, count: 458 },
17 { _id: 1957, count: 89 },
18 { _id: 1931, count: 24 },
19 { _id: 1925, count: 13 },
20 { _id: 1948, count: 70 },
21 { _id: 1922, count: 7 },
22 { _id: '2005è', count: 2 },
23 { _id: 1975, count: 112 },
24 { _id: 1999, count: 542 },
25 { _id: 2002, count: 655 },
26 { _id: 2015, count: 484 }
27]
Now that is the AI power of MongoDB pipelines!

DEMO

asciicast

Wrap-Up

MongoDB's new shell allows us to script with enormous power like never before by utilizing npm external packages. Together with the power of OpenAI sophisticated AI patterns, we were able to teach the shell how to prompt text to accurate complex MongoDB commands, and with further learning and tuning, we can probably get much better results.
Try this today using the new MongoDB shell.

Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Leveraging Atlas Vector Search With HashiCorp Terraform: Empowering Semantic Search in Modern Applications


May 02, 2024 | 4 min read
Tutorial

Handle Time Series Data with MongoDB


Nov 19, 2024 | 13 min read
Tutorial

Leafsteroid Resources


Sep 09, 2024 | 1 min read
Article

Building a Quarkus Application to Perform MongoDB Vector Search


Oct 07, 2024 | 9 min read
Table of Contents
  • Generating MQL Shell Commands Using OpenAI and New mongosh Shell