Generating MQL Shell Commands Using OpenAI and New mongosh Shell
Rate this article
OpenAI is a fascinating and growing AI platform sponsored by Microsoft, allowing you to digest text cleverly to produce AI content with stunning results considering how small the “learning data set” you actually provide is.
MongoDB’s Query Language (MQL) is an intuitive language for developers to interact with MongoDB Documents. For this reason, I wanted to put OpenAI to the test of quickly learning the MongoDB language and using its overall knowledge to build queries from simple sentences. The results were more than satisfying to me. Github is already working on a project called Github copilot which uses the same OpenAI engine to code.
In this article, I will show you my experiment, including the game-changing capabilities of the new MongoDB Shell (
mongosh
) which can extend scripting with npm modules integrations.OpenAI is a unique project aiming to provide an API for many AI tasks built mostly on Natural Language Processing today. You can read more about their projects in this blog.
If you want to use OpenAI, you will need to get a trial API key first by joining the waitlist on their main page. Once you are approved to get an API key, you will be granted about $18 for three months of testing. Each call in OpenAI is billed and this is something to consider when using in production. For our purposes, $18 is more than enough to test the most expensive engine named “davinci.”
Once you get the API key, you can use various clients to run their AI API from your script/application.
First, we need to install the new shell, if you haven’t done it so far. On my Mac laptop, I just issued:
1 brew install mongosh
Windows users should download the MSI installer from our download page and follow the Windows instructions.
Once my mongosh is ready, I can start using it, but before I do so, let’s install OpenAI JS, which we will import in the shell later on:
1 $ mkdir openai-test 2 $ cd openai-test 3 Openai-test $ npm i openai-api
I’ve decided to use the Questions and Answers pattern, in the form of
Q: <Question>
and A: <Answer>
, provided to the text to command completion API to provide the learning material about MongoDB queries for the AI engine. To better feed it, I placed the training questions and answers in a file called AI-input.txt
and its content:1 Q: What is the query syntax? 2 A: db.collection.find(<filter>, <projection> , <options>) 3 Q: Query users collection for username with value "boy" 4 A: db.users.find({"username" : "boy"}) 5 Q: Query users collection for username with value "girl"A: db.users.find({"username" : "girl"}) 6 Q: Query users collection for username with age bigger than 16 7 A: db.users.find({"age" : {$gt : 16}})n; 8 Q: Query author collection for username with value "boy" 9 A: db.authors.find({"username" : "boy"}) 10 Q:Query author collection for age lower than 7 11 A: db.authors.find({"age" : {$lt : 7}}); 12 13 Q:insert a json document into collection authors with username equal to "girl" 14 A: db.authors.insert({"username" : "girl"}, {"age" : 10, "gender" : "female"}) 15 Q: insert many documents into users collections 16 A: db.users.insertMany([{username : "john doe"}, {username : "joe doe"}]); 17 Q: The following aggregation uses $geoNear to find documents with a location at most 2 meters from the center [ -73.99279 , 40.719296 ] and a category equal to Parks. 18 A: db.places.aggregate([{ $geoNear: {near: { type: "Point", coordinates: [ -73.99279 , 40.719296 ] },distanceField: "dist.calculated",maxDistance: 2, query: { category: "Parks" },includeLocs: "dist.location", spherical: true}}]) 19 Q: How to build atlas $search text query? 20 A: db.collection.aggregate({$search: {"index": <index name> "text": {"query": "<search-string>","path": "<field-to-search>", "fuzzy": <options>,"score": <options> } }}) 21 22 Q: What is the aggregate syntax? 23 A: db.collection.aggregate([<stage1>,<stage2>,<stage3>], <projection> , <options>); 24 Q: aggregate users collection to calculate salary sum per user 25 A: db.users.aggregate([{$group : { _id : "$username" , salary_sum : { $sum : "$salary" }}}]); 26 Q: aggregate person collection to calculate salary sum per person 27 A: db.persons.aggregate([{$group : { _id : "$person" , salary_sum : { $sum : "$salary" }}}]); 28 Q: Lookup users and orders collection 29 A: db.users.aggregate([{$lookup : {from: 'orders', localField : "_id", foreignField : "userId", as : "result" }} ]); 30 31 Q: What is the update syntax? 32 A:db.collection.update(query, update, options) 33 Q: How to edit collection sports where sportname is 'football' and match is 'england vs portugal' to score of '3-3' and date to current date? 34 A: db.sports.update({ sportname: "football", match: "england vs portugal"} , {$set : {score: "3-3" , date : new Date()}} }) 35 Q: Query and atomically update collection zoo where animal is "bear" with a counter increment on eat field, if the data does not exist user upsert 36 A: db.zoo.findOneAndUpdate({animal : "bear"}, {$inc: { eat : 1 }} , {upsert : true})
We will use this file later in our code.
This way, the completion will be based on a similar pattern.
MongoDB Atlas, the database-as-a-platform service, is a great way to have a running cluster in seconds with a sample dataset already there for our test. To prepare it, please use the following steps:
- Create an Atlas account (if you don’t have one already) and use/start a cluster. For detailed steps, follow this documentation.
Use the copied connection string, providing it to the
mongosh
binary to connect to the pre-populated Atlas cluster with sample data. Then, switch to sample_restaurants
database.1 mongosh "mongodb+srv://<u>:<p>@<atlas-uri>/sample_restaurants" 2 Using Mongosh : X.X.X 3 Using MongoDB: X.X.X 4 5 For mongosh info see: https://docs.mongodb.com/mongodb-shell/ 6 7 ATLAS atlas-ugld61-shard-0 [primary]> use sample_restaurants;
Now, we can build our
textToMql
function by pasting it into the mongosh
. The function will receive a text sentence, use our generated OpenAI API key, and will try to return the best MQL command for it:1 async function textToMql(query){ 2 3 const OpenAI = require('openai-api'); 4 const openai-client = new OpenAI("<YOUR-OPENAI-API-KEY>"); 5 6 const fs = require('fs'); 7 8 var data = await fs.promises.readFile('AI-input.txt', 'utf8'); 9 10 const learningPath = data; 11 12 var aiInput = learningPath + "Q:" + query + "\nA:"; 13 14 const gptResponse = await openai-client.complete({ 15 engine: 'davinci', 16 prompt: aiInput, 17 "temperature": 0.3, 18 "max_tokens": 400, 19 "top_p": 1, 20 "frequency_penalty": 0.2, 21 "presence_penalty": 0, 22 "stop": ["\n"] 23 }); 24 25 console.log(gptResponse.data.choices[0].text); 26 }
In the above function, we first load the OpenAI npm module and initiate a client with the relevant API key from OpenAI.
1 const OpenAI = require('openai-api'); 2 const openai-client = new OpenAI("<YOUR-OPENAI-API-KEY>"); 3 4 const fs = require('fs');
The new shell allows us to import built-in and external modules to produce an unlimited flexibility with our scripts.
Then, we read the learning data from our
AI-input.txt
file. Finally we add our Q: <query>
input to the end followed by the A:
value which tells the engine we expect an answer based on the provided learningPath and our query.This data will go over to an OpenAI API call:
1 const gptResponse = await openai.complete({ 2 engine: 'davinci', 3 prompt: aiInput, 4 "temperature": 0.3, 5 "max_tokens": 400, 6 "top_p": 1, 7 "frequency_penalty": 0.2, 8 "presence_penalty": 0, 9 "stop": ["\n"] 10 });
The call performs a completion API and gets the entire initial text as a
prompt
and receives some additional parameters, which I will elaborate on:engine
: OpenAI supports a few AI engines which differ in quality and purpose as a tradeoff for pricing. The “davinci” engine is the most sophisticated one, according to OpenAI, and therefore is the most expensive one in terms of billing consumption.temperature
: How creative will the AI be compared to the input we gave it? It can be between 0-1. 0.3 felt like a down-to-earth value, but you can play with it.Max_tokens
: Describes the amount of data that will be returned.Stop
: List of characters that will stop the engine from producing further content. Since we need to produce MQL statements, it will be one line based and “\n” is a stop character.
Once the content is returned, we parse the returned JSON and print it with
console.log
.Once we have our function in place, we can try to produce a simple query to test it:
1 Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> textToMql("query all restaurants where cuisine is American and name starts with 'Ri'") 2 db.restaurants.find({cuisine : "American", name : /^Ri/}) 3 4 Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> db.restaurants.find({cuisine : "American", name : /^Ri/}) 5 [ 6 { 7 _id: ObjectId("5eb3d668b31de5d588f4292a"), 8 address: { 9 building: '2780', 10 coord: [ -73.98241999999999, 40.579505 ], 11 street: 'Stillwell Avenue', 12 zipcode: '11224' 13 }, 14 borough: 'Brooklyn', 15 cuisine: 'American', 16 grades: [ 17 { 18 date: ISODate("2014-06-10T00:00:00.000Z"), 19 grade: 'A', 20 score: 5 21 }, 22 { 23 date: ISODate("2013-06-05T00:00:00.000Z"), 24 grade: 'A', 25 score: 7 26 }, 27 { 28 date: ISODate("2012-04-13T00:00:00.000Z"), 29 grade: 'A', 30 score: 12 31 }, 32 { 33 date: ISODate("2011-10-12T00:00:00.000Z"), 34 grade: 'A', 35 score: 12 36 } 37 ], 38 name: 'Riviera Caterer', 39 restaurant_id: '40356018' 40 } 41 ...
Nice! We never taught the engine about the
restaurants
collection or how to filter with regex operators but it still made the correct AI decisions.Let's do something more creative.
1 Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> textToMql("Generate an insert many command with random fruit names and their weight") 2 db.fruits.insertMany([{name: "apple", weight: 10}, {name: "banana", weight: 5}, {name: "grapes", weight: 15}]) 3 Atlas atlas-ugld61-shard-0 [primary]sample_restaurants> db.fruits.insertMany([{name: "apple", weight: 10}, {name: "banana", weight: 5}, {name: "grapes", weight: 15}]) 4 { 5 acknowledged: true, 6 insertedIds: { 7 '0': ObjectId("60e55621dc4197f07a26f5e1"), 8 '1': ObjectId("60e55621dc4197f07a26f5e2"), 9 '2': ObjectId("60e55621dc4197f07a26f5e3") 10 } 11 }
1 Atlas atlas-ugld61-shard-0 [primary] sample_restaurants> use sample_mflix; 2 Atlas atlas-ugld61-shard-0 [primary] sample_mflix> textToMql("Aggregate the count of movies per year (sum : 1) on collection movies") 3 db.movies.aggregate([{$group : { _id : "$year", count : { $sum : 1 }}}]); 4 5 Atlas atlas-ugld61-shard-0 [primary] sample_mflix> db.movies.aggregate([{$group : { _id : "$year", count : { $sum : 1 }}}]); 6 [ 7 { _id: 1967, count: 107 }, 8 { _id: 1986, count: 206 }, 9 { _id: '2006è2012', count: 2 }, 10 { _id: 2004, count: 741 }, 11 { _id: 1918, count: 1 }, 12 { _id: 1991, count: 252 }, 13 { _id: 1968, count: 112 }, 14 { _id: 1990, count: 244 }, 15 { _id: 1933, count: 27 }, 16 { _id: 1997, count: 458 }, 17 { _id: 1957, count: 89 }, 18 { _id: 1931, count: 24 }, 19 { _id: 1925, count: 13 }, 20 { _id: 1948, count: 70 }, 21 { _id: 1922, count: 7 }, 22 { _id: '2005è', count: 2 }, 23 { _id: 1975, count: 112 }, 24 { _id: 1999, count: 542 }, 25 { _id: 2002, count: 655 }, 26 { _id: 2015, count: 484 } 27 ]
Now that is the AI power of MongoDB pipelines!
MongoDB's new shell allows us to script with enormous power like never before by utilizing npm external packages. Together with the power of OpenAI sophisticated AI patterns, we were able to teach the shell how to prompt text to accurate complex MongoDB commands, and with further learning and tuning, we can probably get much better results.
Try this today using the new MongoDB shell.