Generating MQL Shell Commands Using OpenAI and New mongosh Shell

Pavel Duchovny7 min read • Published Jul 23, 2021 • Updated Jul 11, 2023

AI MongoDB Shell

Rate this article

Generating MQL Shell Commands Using OpenAI and New mongosh Shell

OpenAI is a fascinating and growing AI platform sponsored by Microsoft, allowing you to digest text cleverly to produce AI content with stunning results considering how small the “learning data set” you actually provide is.

MongoDB’s Query Language (MQL) is an intuitive language for developers to interact with MongoDB Documents. For this reason, I wanted to put OpenAI to the test of quickly learning the MongoDB language and using its overall knowledge to build queries from simple sentences. The results were more than satisfying to me. Github is already working on a project called Github copilot which uses the same OpenAI engine to code.

In this article, I will show you my experiment, including the game-changing capabilities of the new MongoDB Shell (mongosh) which can extend scripting with npm modules integrations.

What is OpenAI and How Do I Get Access to It?

OpenAI is a unique project aiming to provide an API for many AI tasks built mostly on Natural Language Processing today. You can read more about their projects in this blog.

There are a variety of examples for its text processing capabilities.

If you want to use OpenAI, you will need to get a trial API key first by joining the waitlist on their main page. Once you are approved to get an API key, you will be granted about $18 for three months of testing. Each call in OpenAI is billed and this is something to consider when using in production. For our purposes, $18 is more than enough to test the most expensive engine named “davinci.”

Once you get the API key, you can use various clients to run their AI API from your script/application.

Since we will be using the new mongosh shell, I have used the JS API.

Preparing the mongosh to Use OpenAI

First, we need to install the new shell, if you haven’t done it so far. On my Mac laptop, I just issued:

1 brew install mongosh

Windows users should download the MSI installer from our download page and follow the Windows instructions.

Once my mongosh is ready, I can start using it, but before I do so, let’s install OpenAI JS, which we will import in the shell later on:

1 $ mkdir openai-test
2 $ cd openai-test
3 Openai-test $ npm i openai-api

I’ve decided to use the Questions and Answers pattern, in the form of Q: <Question> and A: <Answer>, provided to the text to command completion API to provide the learning material about MongoDB queries for the AI engine. To better feed it, I placed the training questions and answers in a file called AI-input.txt and its content:

1 Q: What is the query syntax?
2 A: db.collection.find(<filter>, <projection> , <options>)
3 Q:  Query users collection for username with value "boy"
4 A: db.users.find({"username" : "boy"})
5 Q:  Query users collection for username with value "girl"A:  db.users.find({"username" : "girl"})
6 Q: Query users collection for username with age bigger than 16
7 A:  db.users.find({"age" : {$gt : 16}})n;
8 Q: Query author collection for username with value "boy"
9 A: db.authors.find({"username" : "boy"})
10 Q:Query author collection for age lower than 7
11 A: db.authors.find({"age" : {$lt : 7}});
12 
13 Q:insert a json document into collection authors with username equal to "girl"
14 A: db.authors.insert({"username" : "girl"}, {"age" : 10, "gender" : "female"})
15 Q: insert many documents into users collections
16 A: db.users.insertMany([{username : "john doe"}, {username : "joe doe"}]);
17 Q: The following aggregation uses $geoNear to find documents with a location at most 2 meters from the center [ -73.99279 , 40.719296 ] and a category equal to Parks.
18 A: db.places.aggregate([{ $geoNear: {near: { type: "Point", coordinates: [ -73.99279 , 40.719296 ] },distanceField: "dist.calculated",maxDistance: 2, query: { category: "Parks" },includeLocs: "dist.location", spherical: true}}])
19 Q: How to build atlas $search text query?
20 A: db.collection.aggregate({$search: {"index": <index name> "text": {"query": "<search-string>","path": "<field-to-search>",  "fuzzy": <options>,"score": <options>  }  }})
21 
22 Q: What is the aggregate syntax?
23 A: db.collection.aggregate([<stage1>,<stage2>,<stage3>], <projection> , <options>);
24 Q: aggregate users collection to calculate salary sum per user
25 A: db.users.aggregate([{$group : { _id : "$username" , salary_sum : { $sum : "$salary" }}}]);
26 Q: aggregate person collection to calculate salary sum per person
27 A: db.persons.aggregate([{$group : { _id : "$person" , salary_sum : { $sum : "$salary" }}}]);
28 Q: Lookup users and orders collection
29 A: db.users.aggregate([{$lookup : {from: 'orders', localField : "_id", foreignField : "userId", as : "result" }} ]);
30 
31 Q: What is the update syntax?
32 A:db.collection.update(query, update, options)
33 Q:  How to edit collection sports where sportname is 'football' and match is 'england vs portugal' to score of '3-3' and date to current date?
34 A: db.sports.update({ sportname: "football", match: "england vs portugal"} , {$set : {score: "3-3" , date : new Date()}} })
35 Q: Query and atomically update collection zoo where animal is "bear" with a counter increment on eat field, if the data does not exist user upsert
36 A: db.zoo.findOneAndUpdate({animal : "bear"}, {$inc: { eat : 1 }} , {upsert : true})

We will use this file later in our code.

This way, the completion will be based on a similar pattern.

Prepare Your Atlas Cluster

MongoDB Atlas, the database-as-a-platform service, is a great way to have a running cluster in seconds with a sample dataset already there for our test. To prepare it, please use the following steps:

Create an Atlas account (if you don’t have one already) and use/start a cluster. For detailed steps, follow this documentation.
Load the sample data set.
Get your connection string.

Use the copied connection string, providing it to the mongosh binary to connect to the pre-populated Atlas cluster with sample data. Then, switch to sample_restaurants database.

1 mongosh "mongodb+srv://<u>:<p>@<atlas-uri>/sample_restaurants"
2 Using Mongosh : X.X.X
3 Using MongoDB:		X.X.X
4 
5 For mongosh info see: https://docs.mongodb.com/mongodb-shell/
6 
7 ATLAS atlas-ugld61-shard-0 [primary]> use sample_restaurants;

Using OpenAI Inside the mongosh Shell

Now, we can build our textToMql function by pasting it into the mongosh. The function will receive a text sentence, use our generated OpenAI API key, and will try to return the best MQL command for it:

1 async function textToMql(query){
2 
3 const OpenAI = require('openai-api');
4 const openai-client = new OpenAI("<YOUR-OPENAI-API-KEY>");
5 
6 const fs = require('fs');
7 
8 var data =  await  fs.promises.readFile('AI-input.txt', 'utf8');
9 
10 const learningPath = data;
11 
12 var aiInput =  learningPath + "Q:" + query + "\nA:";
13 
14  const gptResponse = await openai-client.complete({
15      engine: 'davinci',
16      prompt: aiInput,
17      "temperature": 0.3,
18      "max_tokens": 400,
19      "top_p": 1,
20      "frequency_penalty": 0.2,
21      "presence_penalty": 0,
22      "stop": ["\n"]
23  });
24 
25  console.log(gptResponse.data.choices[0].text);
26 }

In the above function, we first load the OpenAI npm module and initiate a client with the relevant API key from OpenAI.

1 const OpenAI = require('openai-api');
2 const openai-client = new OpenAI("<YOUR-OPENAI-API-KEY>");
3 
4 const fs = require('fs');

The new shell allows us to import built-in and external modules to produce an unlimited flexibility with our scripts.

Then, we read the learning data from our AI-input.txt file. Finally we add our Q: <query> input to the end followed by the A: value which tells the engine we expect an answer based on the provided learningPath and our query.

This data will go over to an OpenAI API call:

1  const gptResponse = await openai.complete({
2      engine: 'davinci',
3      prompt: aiInput,
4      "temperature": 0.3,
5      "max_tokens": 400,
6      "top_p": 1,
7      "frequency_penalty": 0.2,
8      "presence_penalty": 0,
9      "stop": ["\n"]
10  });

The call performs a completion API and gets the entire initial text as a prompt and receives some additional parameters, which I will elaborate on:

engine: OpenAI supports a few AI engines which differ in quality and purpose as a tradeoff for pricing. The “davinci” engine is the most sophisticated one, according to OpenAI, and therefore is the most expensive one in terms of billing consumption.
temperature: How creative will the AI be compared to the input we gave it? It can be between 0-1. 0.3 felt like a down-to-earth value, but you can play with it.
Max_tokens: Describes the amount of data that will be returned.
Stop: List of characters that will stop the engine from producing further content. Since we need to produce MQL statements, it will be one line based and “\n” is a stop character.

Once the content is returned, we parse the returned JSON and print it with console.log.

Lets Put OpenAI to the Test with MQL

Once we have our function in place, we can try to produce a simple query to test it:

1 Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> textToMql("query all restaurants where cuisine is American and name starts with 'Ri'")
2  db.restaurants.find({cuisine : "American", name : /^Ri/})
3 
4 Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> db.restaurants.find({cuisine : "American", name : /^Ri/})
5 [
6   {
7     _id: ObjectId("5eb3d668b31de5d588f4292a"),
8     address: {
9       building: '2780',
10       coord: [ -73.98241999999999, 40.579505 ],
11       street: 'Stillwell Avenue',
12       zipcode: '11224'
13     },
14     borough: 'Brooklyn',
15     cuisine: 'American',
16     grades: [
17       {
18         date: ISODate("2014-06-10T00:00:00.000Z"),
19         grade: 'A',
20         score: 5
21       },
22       {
23         date: ISODate("2013-06-05T00:00:00.000Z"),
24         grade: 'A',
25         score: 7
26       },
27       {
28         date: ISODate("2012-04-13T00:00:00.000Z"),
29         grade: 'A',
30         score: 12
31       },
32       {
33         date: ISODate("2011-10-12T00:00:00.000Z"),
34         grade: 'A',
35         score: 12
36       }
37     ],
38     name: 'Riviera Caterer',
39     restaurant_id: '40356018'
40   }
41 ...

Nice! We never taught the engine about the restaurants collection or how to filter with regex operators but it still made the correct AI decisions.

Let's do something more creative.

1 Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> textToMql("Generate an insert many command with random fruit names and their weight")
2  db.fruits.insertMany([{name: "apple", weight: 10}, {name: "banana", weight: 5}, {name: "grapes", weight: 15}])
3 Atlas atlas-ugld61-shard-0 [primary]sample_restaurants>  db.fruits.insertMany([{name: "apple", weight: 10}, {name: "banana", weight: 5}, {name: "grapes", weight: 15}])
4 {
5   acknowledged: true,
6   insertedIds: {
7     '0': ObjectId("60e55621dc4197f07a26f5e1"),
8     '1': ObjectId("60e55621dc4197f07a26f5e2"),
9     '2': ObjectId("60e55621dc4197f07a26f5e3")
10   }
11 }

Okay, now let's put it to the ultimate test: aggregations!

1 Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> use sample_mflix;
2 Atlas atlas-ugld61-shard-0 [primary] sample_mflix> textToMql("Aggregate the count of movies per year (sum : 1) on collection movies")
3  db.movies.aggregate([{$group : { _id : "$year", count : { $sum : 1 }}}]);
4 
5 Atlas atlas-ugld61-shard-0 [primary] sample_mflix>  db.movies.aggregate([{$group : { _id : "$year", count : { $sum : 1 }}}]);
6 [
7   { _id: 1967, count: 107 },
8   { _id: 1986, count: 206 },
9   { _id: '2006è2012', count: 2 },
10   { _id: 2004, count: 741 },
11   { _id: 1918, count: 1 },
12   { _id: 1991, count: 252 },
13   { _id: 1968, count: 112 },
14   { _id: 1990, count: 244 },
15   { _id: 1933, count: 27 },
16   { _id: 1997, count: 458 },
17   { _id: 1957, count: 89 },
18   { _id: 1931, count: 24 },
19   { _id: 1925, count: 13 },
20   { _id: 1948, count: 70 },
21   { _id: 1922, count: 7 },
22   { _id: '2005è', count: 2 },
23   { _id: 1975, count: 112 },
24   { _id: 1999, count: 542 },
25   { _id: 2002, count: 655 },
26   { _id: 2015, count: 484 }
27 ]

Now that is the AI power of MongoDB pipelines!

DEMO

Wrap-Up

MongoDB's new shell allows us to script with enormous power like never before by utilizing npm external packages. Together with the power of OpenAI sophisticated AI patterns, we were able to teach the shell how to prompt text to accurate complex MongoDB commands, and with further learning and tuning, we can probably get much better results.

Try this today using the new MongoDB shell.

Rate this article

Tutorial

Getting Started With MongoDB and C++

Aug 14, 2024 | 7 min read

Article

Resumable Initial Sync in MongoDB 4.4

Sep 11, 2024 | 5 min read

Article

The Top 4 Reasons Why You Should Use MongoDB

Sep 23, 2022 | 9 min read

Tutorial

How to Migrate Your Node.js App from SQL to MongoDB

Jul 09, 2024 | 16 min read

Generating MQL Shell Commands Using OpenAI and New mongosh Shell

MongoDB

Generating MQL Shell Commands Using OpenAI and New mongosh Shell

Generating MQL Shell Commands Using OpenAI and New mongosh Shell

What is OpenAI and How Do I Get Access to It?

Preparing the mongosh to Use OpenAI

Prepare Your Atlas Cluster

Using OpenAI Inside the mongosh Shell

Lets Put OpenAI to the Test with MQL

DEMO

Wrap-Up

Related

Getting Started With MongoDB and C++

Resumable Initial Sync in MongoDB 4.4

The Top 4 Reasons Why You Should Use MongoDB

How to Migrate Your Node.js App from SQL to MongoDB

Table of Contents

1	$ mkdir openai-test
2	$ cd openai-test
3	Openai-test $ npm i openai-api

1	Q: What is the query syntax?
2	A: db.collection.find(<filter>, <projection> , <options>)
3	Q: Query users collection for username with value "boy"
4	A: db.users.find({"username" : "boy"})
5	Q: Query users collection for username with value "girl"A: db.users.find({"username" : "girl"})
6	Q: Query users collection for username with age bigger than 16
7	A: db.users.find({"age" : {$gt : 16}})n;
8	Q: Query author collection for username with value "boy"
9	A: db.authors.find({"username" : "boy"})
10	Q:Query author collection for age lower than 7
11	A: db.authors.find({"age" : {$lt : 7}});
12
13	Q:insert a json document into collection authors with username equal to "girl"
14	A: db.authors.insert({"username" : "girl"}, {"age" : 10, "gender" : "female"})
15	Q: insert many documents into users collections
16	A: db.users.insertMany([{username : "john doe"}, {username : "joe doe"}]);
17	Q: The following aggregation uses $geoNear to find documents with a location at most 2 meters from the center [ -73.99279 , 40.719296 ] and a category equal to Parks.
18	A: db.places.aggregate([{ $geoNear: {near: { type: "Point", coordinates: [ -73.99279 , 40.719296 ] },distanceField: "dist.calculated",maxDistance: 2, query: { category: "Parks" },includeLocs: "dist.location", spherical: true}}])
19	Q: How to build atlas $search text query?
20	A: db.collection.aggregate({$search: {"index": <index name> "text": {"query": "<search-string>","path": "<field-to-search>", "fuzzy": <options>,"score": <options> } }})
21
22	Q: What is the aggregate syntax?
23	A: db.collection.aggregate([<stage1>,<stage2>,<stage3>], <projection> , <options>);
24	Q: aggregate users collection to calculate salary sum per user
25	A: db.users.aggregate([{$group : { _id : "$username" , salary_sum : { $sum : "$salary" }}}]);
26	Q: aggregate person collection to calculate salary sum per person
27	A: db.persons.aggregate([{$group : { _id : "$person" , salary_sum : { $sum : "$salary" }}}]);
28	Q: Lookup users and orders collection
29	A: db.users.aggregate([{$lookup : {from: 'orders', localField : "_id", foreignField : "userId", as : "result" }} ]);
30
31	Q: What is the update syntax?
32	A:db.collection.update(query, update, options)
33	Q: How to edit collection sports where sportname is 'football' and match is 'england vs portugal' to score of '3-3' and date to current date?
34	A: db.sports.update({ sportname: "football", match: "england vs portugal"} , {$set : {score: "3-3" , date : new Date()}} })
35	Q: Query and atomically update collection zoo where animal is "bear" with a counter increment on eat field, if the data does not exist user upsert
36	A: db.zoo.findOneAndUpdate({animal : "bear"}, {$inc: { eat : 1 }} , {upsert : true})

1	mongosh "mongodb+srv://<u>:<p>@<atlas-uri>/sample_restaurants"
2	Using Mongosh : X.X.X
3	Using MongoDB: X.X.X
4
5	For mongosh info see: https://docs.mongodb.com/mongodb-shell/
6
7	ATLAS atlas-ugld61-shard-0 [primary]> use sample_restaurants;

1	async function textToMql(query){
2
3	const OpenAI = require('openai-api');
4	const openai-client = new OpenAI("<YOUR-OPENAI-API-KEY>");
5
6	const fs = require('fs');
7
8	var data = await fs.promises.readFile('AI-input.txt', 'utf8');
9
10	const learningPath = data;
11
12	var aiInput = learningPath + "Q:" + query + "\nA:";
13
14	const gptResponse = await openai-client.complete({
15	engine: 'davinci',
16	prompt: aiInput,
17	"temperature": 0.3,
18	"max_tokens": 400,
19	"top_p": 1,
20	"frequency_penalty": 0.2,
21	"presence_penalty": 0,
22	"stop": ["\n"]
23	});
24
25	console.log(gptResponse.data.choices[0].text);
26	}

1	const gptResponse = await openai.complete({
2	engine: 'davinci',
3	prompt: aiInput,
4	"temperature": 0.3,
5	"max_tokens": 400,
6	"top_p": 1,
7	"frequency_penalty": 0.2,
8	"presence_penalty": 0,
9	"stop": ["\n"]
10	});

1	Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> textToMql("query all restaurants where cuisine is American and name starts with 'Ri'")
2	db.restaurants.find({cuisine : "American", name : /^Ri/})
3
4	Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> db.restaurants.find({cuisine : "American", name : /^Ri/})
5	[
6	{
7	_id: ObjectId("5eb3d668b31de5d588f4292a"),
8	address: {
9	building: '2780',
10	coord: [ -73.98241999999999, 40.579505 ],
11	street: 'Stillwell Avenue',
12	zipcode: '11224'
13	},
14	borough: 'Brooklyn',
15	cuisine: 'American',
16	grades: [
17	{
18	date: ISODate("2014-06-10T00:00:00.000Z"),
19	grade: 'A',
20	score: 5
21	},
22	{
23	date: ISODate("2013-06-05T00:00:00.000Z"),
24	grade: 'A',
25	score: 7
26	},
27	{
28	date: ISODate("2012-04-13T00:00:00.000Z"),
29	grade: 'A',
30	score: 12
31	},
32	{
33	date: ISODate("2011-10-12T00:00:00.000Z"),
34	grade: 'A',
35	score: 12
36	}
37	],
38	name: 'Riviera Caterer',
39	restaurant_id: '40356018'
40	}
41	...

1	Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> textToMql("Generate an insert many command with random fruit names and their weight")
2	db.fruits.insertMany([{name: "apple", weight: 10}, {name: "banana", weight: 5}, {name: "grapes", weight: 15}])
3	Atlas atlas-ugld61-shard-0 [primary]sample_restaurants> db.fruits.insertMany([{name: "apple", weight: 10}, {name: "banana", weight: 5}, {name: "grapes", weight: 15}])
4	{
5	acknowledged: true,
6	insertedIds: {
7	'0': ObjectId("60e55621dc4197f07a26f5e1"),
8	'1': ObjectId("60e55621dc4197f07a26f5e2"),
9	'2': ObjectId("60e55621dc4197f07a26f5e3")
10	}
11	}

1	Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> use sample_mflix;
2	Atlas atlas-ugld61-shard-0 [primary] sample_mflix> textToMql("Aggregate the count of movies per year (sum : 1) on collection movies")
3	db.movies.aggregate([{$group : { _id : "$year", count : { $sum : 1 }}}]);
4
5	Atlas atlas-ugld61-shard-0 [primary] sample_mflix> db.movies.aggregate([{$group : { _id : "$year", count : { $sum : 1 }}}]);
6	[
7	{ _id: 1967, count: 107 },
8	{ _id: 1986, count: 206 },
9	{ _id: '2006è2012', count: 2 },
10	{ _id: 2004, count: 741 },
11	{ _id: 1918, count: 1 },
12	{ _id: 1991, count: 252 },
13	{ _id: 1968, count: 112 },
14	{ _id: 1990, count: 244 },
15	{ _id: 1933, count: 27 },
16	{ _id: 1997, count: 458 },
17	{ _id: 1957, count: 89 },
18	{ _id: 1931, count: 24 },
19	{ _id: 1925, count: 13 },
20	{ _id: 1948, count: 70 },
21	{ _id: 1922, count: 7 },
22	{ _id: '2005è', count: 2 },
23	{ _id: 1975, count: 112 },
24	{ _id: 1999, count: 542 },
25	{ _id: 2002, count: 655 },
26	{ _id: 2015, count: 484 }
27	]