Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

Using Golang for AI

Jorge D. Ortiz-Fuentes19 min read • Published Nov 07, 2024 • Updated Nov 07, 2024
AWSAIAtlasVector SearchGo
FULL APPLICATION
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
AI has opened the door to solving problems that a decade ago seemed almost unapproachable by computers or, at the very least, excruciatingly complex. Self-driving cars, for example, must be fully aware of the environment and its changes to make decisions on what to do next. That means gathering data from all the available sensors, which include video cameras, and "understanding" the information they provide.
Neural networks are a great fit for this and other problems that require "interpretation" or "understanding" of the complex inputs. A neural network is a computational model that takes inputs and transforms them into outputs. But before we can use a neural network, we have to train it. A set of inputs and their expected matching outputs, known as the training set, is used to obtain the values of the network parameters that allow it to respond as expected to the inputs of the training set, and also to other inputs that didn't belong to the training set. This results in the model, i.e., the layout of the nodes and layers and the parameters obtained after training. Then, we can use the model with new inputs to provide outputs, and that is called inference.
In this article, we are going to find an interesting problem to solve that requires AI, explain how the solution works behind the curtains, and write a back-end from scratch that implements the solution that we have defined. So, don't blink while reading, if you don't want to miss any details.

A real-world problem

If you frequently brag to your friends about looking like Al Pacino, or that you are a doppelganger of Michael J. Fox, you would be better off if an artificial intelligence corroborates your claim. Let's not forget that some people trust AI more than other people or even experts. And if there is no such AI app, we can build it ourselves and write it in Golang and have fun in the process.
The app that we are going to build throughout this article is going to accept a picture from a web page, taken with a webcam, and find the three celebrities that we resemble more closely (among the ones we have previously selected). This is an interesting problem to solve and something that would be really challenging to do without AI. So, how should we do that?

Top-level design of the solution

We should start by deciding that this is going to be a web application and we are going to split the work between a front end app that will take the picture, make a request with it to the back end, and display the results provided by it. The back end will do the heavy lifting. Sorry for not elaborating more on the front end app, but we are busy writing a back end app in Golang (😄).
The back end will be responsible for all the tasks. Some are quite mundane, like offering an HTTP endpoint or parsing the arguments received through it. But we also have to implement the functionality required for finding out which of the available images of celebrities look more similar to the one provided by by front end and returning an explanation of how they are similar to it. We are going to use two AI models to achieve this functionality: one to find the celebrity matches and the other to explain the similarities.

The AI models

The first model that we will be using here will receive the image as the input and provide a vector of characteristics. AI-bros (😜) call this vector the embedding and it encodes characteristics in a way that is not directly understandable by humans. The embedding for the image is a vector of several floating point numbers in the range (-1.0, 1.0). We will be using 1024 as the size for this vector –that is enough for encoding many characteristics of the image. A particular characteristic of your picture, like the way you smile or the shape of your ears, might be encoded in a given position of the vector or as a combination of several, but we won't know what a given vector or any of its elements describe. Instead, what we can do with the vectors is find the closest ones to find the images that are a closer match. I will get into the details of how this is done below.
Besides getting the best celebrity matches, we would like to understand what makes them so. Looking alike is a very subjective matter and some explanation would help the users understand why they were provided with those matches instead of others that, in their eyes, are almost their twinsies. Thus, the second model acts in a very different way to the first one. We will provide it with the image of the user and the three selected ones and ask it for an explanation of how they are similar. I have been a little bit inaccurate when I have talked about the inputs. The complete story is that we will provide the model with the images and a piece of text describing what we want. This text where we tell the model what we want is known as the prompt and, since we aren't limiting our conversation to just text, but instead, we use both written communication and images, this is called a multimodal prompt.
We use an AI model to extract the characteristics of the image, because trying to establish similarities between two images by comparing each of their pixels is clearly not the right approach. Assuming that we start with two pictures of the same person, a different background, different lighting, or even a slightly different position may cause a bigger difference than that of two pictures of different people with exactly the same background, lighting, and position.
But when we use the image of the user as input for the first model, the response is just the embedding for that image. It doesn't tell us anything about which other images are similar. Then, how do we use this data to find the best matches for the given image?
For starters, we have gathered some celebrity pictures that are the ones we will be using to compare to the user image. We have run each of them through the model in advance to get their corresponding embedding. We have stored them in a MongoDB Atlas cluster. We used a single collection where each of the documents contains a picture with its corresponding embedding and any other relevant data. Thanks to the way MongoDB organizes the data around a document model, we can store the embedding as a proper array in a single attribute. And we can use vector search to find the closest images to a given one using vector search. This will allow us to find the closest neighbors in the n-dimensional used for the embedding.
If n-dimensional space sounds in your head similar to "flux capacitor," "bantha poodoo," or "beam me up," let me explain myself. The most common searches in a database are the ones that compare the value of one of the attributes (a field of a document or a column of a record in a table) with the desired value. However, this is not what happens when you are searching for "restaurants nearby" or the "closest ATM." When you do geospatial queries, you are trying to find the documents whose coordinates are closer to your target, i.e., you are looking for the closest neighbors in a bi-dimensional space (the one conformed by the latitude and longitude). And the way it works is not by comparing one of the coordinates first and then the other. Instead, it uses geometry and ways to clip the space to obtain the closest candidates in an efficient way. The search in an n-dimensional space is very similar, but the data points have n components instead of just two. In our case, each data point has 1024 coordinates, so don't try to imagine the 1,024 dimensional space in your head or get ready for a really bad headache.

Putting it all together

Having described all the parts that constitute the solution, let's see how they work together to produce the desired results.
  1. First the user uses the web browser to navigate to the URL of the front end page.
  2. The front end, assuming the right permissions are granted, captures the image of the user and sends it to the HTTP endpoint of the back end in a JSON request that contains the image in base64 encoded format, preceded by some metadata.
  3. The back end receives the request, deserializes it, and standardizes it. We need to use images with a resolution of 800x600, in JPEG format with a good quality factor, and base64 encoded.
  4. It then sends the image to AWS Bedrock using the amazon.titan-embed-image-v1 model to get the embedding.
  5. The vector returned by Bedrock is what the back end uses in a query to MongoDB Atlas. Using vector search, MongoDB finds the closest matches among the celebrities available in a pre-populated collection and returns them.
  6. The last task that the back end needs to solve is explaining why the images are similar. So we put the user images and the closest matches in a data structure, together with the textual description of what we want, and pass them onto AWS Bedrock again. However, this time we will be using a different model, namely anthropic.claude-3-sonnet-20240229-v1:0, that provides a message API. This model will respond with the textual explanation.
  7. Finally, we put the data back together into the structure that is used for the response to the front end.

"I want to play with this"

We knew you would. You have several ways to do-it. We have both the front end and back end running in the cloud. Open your browser and find out who your doppelgangers are.

Show me the code

I can hear you saying, "Code or it didn't happen," so let's start typing. I will start by taking care of the infrastructure of the back end.
For the sake of simplicity, we are going to put all the code in a single file. Obviously, you could improve the code's maintainability and reusability by using more files and even packages to organize the code.

The HTTP back end

In this section, we will create an HTTP server with a single HTTP handler. From this handler, we will be sending the requests to AWS Bedrock and MongoDB Atlas.
  1. First things first. Let's initialize the module that we are going to be using:
    1go mod init github.com/jdortiz/goai
  2. And in the same directory we create a file called server.go. This file will belong to the main package.
    1package main
    2
    3func main() {
    4}
  3. We are going to define a struct to hold the dependencies of our HTTP handler:
    1type App struct {
    2}
  4. This type will also have the methods to control the application. Let's start by creating one for launching the HTTP server and import the two required packages.
    1func (app App) Start() error {
    2 const serverAddr string = "0.0.0.0:3001"
    3 log.Printf("Starting HTTP server: %s\n", serverAddr)
    4
    5 return http.ListenAndServe(serverAddr, nil)
    6}
  5. This method can be now used in the main function, creating an instance of the type first.
    1app := App{}
    2log.Fatal(app.Start())
  6. Run it to verify that it works so far.

Add the endpoint

We now need to be able to get the request on the expected endpoint that is where all the magic will take place.
  1. We are going to implement this handler as a method of the App type. Since we only need to access this method from the type itself, we don't export it (the name starts with a lowercase letter).
    1func (app App) imageSearch(w http.ResponseWriter, r *http.Request) {
    2 log.Println("Image search invoked")
    3}
  2. We use this handler in the router and associate it with the HTTP POST verb.
    1http.HandleFunc("POST /api/search", app.imageSearch)
  3. We run it again and test it from the command line.
    1curl -IX POST localhost:3001/api/search
  4. This is just the beginning of a beautiful program. How do you like it so far?

Get the image and standardize

  1. The request is sent in JSON format and we are going to use Go's JSON decoder to parse it into this struct.
    1type CelebMatchRequest struct {
    2 Image64 string `json:"img"`
    3}
  2. Then, we decode it.
    1// Deserialize request
    2var imgReq CelebMatchRequest
    3err := json.NewDecoder(r.Body).Decode(&imgReq)
    4if err != nil {
    5 log.Println("ERR: parsing json data", err)
    6 http.Error(w, err.Error(), http.StatusBadRequest)
    7 return
    8}
  3. And separate the metadata:
    1// Split image into metadata and data
    2imgParts := strings.Split(imgReq.Image64, ",")
    3parts := len(imgParts)
    4if parts != 2 {
    5 log.Printf("ERR: expecting metadata and data. Got %d parts\n", parts)
    6 http.Error(w, fmt.Sprintf("expecting metadata and data. Got %d parts", parts), http.StatusBadRequest)
    7 return
    8}
  4. The next step is to standardize the image. We could add this task directly in the handler, but to maintain the readability, we are going to create a private function.
    1// Receives a base64 encoded image
    2func standardizeImage(imageB64 string) (*string, error) {
    3}
  5. The image can then be decoded from base64 and, in turn, as a JPEG.
    1// Get the base64 decoder as an io.Reader and use it to decode the image from the data
    2b64Decoder := base64.NewDecoder(base64.StdEncoding, strings.NewReader(imageB64))
    3origImg, _, err := image.Decode(b64Decoder)
    4if err != nil {
    5 return nil, fmt.Errorf("standardizing image failed: %w", err)
    6}
  6. Go offers some basic image editing functionality that we are going to use to resize the image to 800x600.
    1// Resize to 800x600
    2resizedImg := image.NewRGBA(image.Rect(0, 0, 800, 600))
    3draw.NearestNeighbor.Scale(resizedImg, resizedImg.Rect, origImg, origImg.Bounds(), draw.Over, nil)
  7. And encode it back as a JPEG, with a good quality factor, and to base64 again.
    1// Reencode the image to JPEG format with Q=85
    2var jpegToSend bytes.Buffer
    3// Encode the image into the buffer
    4if err = jpeg.Encode(&jpegToSend, resizedImg, &jpeg.Options{Quality: 85}); err != nil {
    5 return nil, fmt.Errorf("standardizing image failed: %w", err)
    6}
    7// Re-encode to base64
    8stdImgB64 := base64.StdEncoding.EncodeToString(jpegToSend.Bytes())
    9return &stdImgB64, nil
  8. This function takes care of all the necessary steps, so let's use it from the handler.
    1// Decode image from base 64, resize image to 800x600 with Q=85, and re-encode to base64
    2stdImage, err := standardizeImage(imgParts[1])
    3if err != nil {
    4 log.Println("ERR:", err)
    5 http.Error(w, "Error standardizing image", http.StatusInternalServerError)
    6 return
    7}
  9. Before we can compile, we have to add the module for image editing.
    1go get golang.org/x/image/draw
  10. That was some data wrangling, but it wasn't that complicated. Get ready for the meat.

Get the embedding

At this moment, we have prepared the image to compute the embedding. Let's do that now using AWS Bedrock. You will need an active AWS account and the models that we plan to use in the Bedrock service are enabled.
  1. Before we compute the embedding, we would like to have the AWS configuration available and we are going to use AWS SDK for that.
    1go get github.com/aws/aws-sdk-go-v2/config
    2go get github.com/aws/aws-sdk-go-v2/service/bedrockruntime
  2. We add the configuration as a field of our application structure.
    1configur *aws.Config
  3. We initialize this configuration in a private function. This assumes that your credentials are stored in the canonical files.
    1func connectToAWS(ctx context.Context) (*aws.Config, error) {
    2 const dfltRegion string = "us-east-1"
    3 const credAccount string = "your-account-name"
    4 // Load the Shared AWS Configuration (~/.aws/config)
    5 cfg, err := config.LoadDefaultConfig(ctx,
    6 config.WithSharedConfigProfile(credAccount), // this must be the name of the profile in ~/.aws/config and ~/.aws/credentials
    7 config.WithRegion(dfltRegion),
    8 )
    9 return &cfg, err
    10}
  4. And we are going to initialize it in a new constructor for our App type.
    1func NewApp(ctx context.Context) (*App, error) {
    2 cfg, err := connectToAWS(ctx)
    3 if err != nil {
    4 log.Println("ERR: Couldn't load default configuration. Have you set up your AWS account?", err)
    5 return nil, err
    6 }
    7
    8 return &App{
    9 config: cfg,
    10 }, nil
    11}
  5. We also want to have the connection to Bedrock available in the handler(s).
    1bedrock *bedrockruntime.Client
  6. And initialize in the constructor too.
    1// Initialize bedrock client
    2bedrockClient := bedrockruntime.NewFromConfig(*cfg)
    3
    4 return &App{
    5 config: cfg,
    6 bedrock: bedrockClient,
    7 }, nil
  7. We replace the initialization of the app to use the constructor.
    1ctx := context.Background()
    2app, err := NewApp(ctx)
    3if err != nil {
    4 panic(err)
    5}
  8. We create a new private method that will handle the communication with AWS Bedrock to compute the embedding. It will return the vector or an error if anything goes wrong.
    1// Prepare request to titan-embed-img-v1
    2func (app App) computeImageEmbedding(ctx context.Context, image string) ([]float64, error) {
    3}
  9. As we mentioned before, the model we want to use to get the embedding is titan and we define a constant with the whole name.
    1const titanEmbedImgV1ModelId string = "amazon.titan-embed-image-v1"
  10. The request to this model has a predefined structure that we must use, so we define structures to accommodate the data.
    1type EmbeddingConfig struct {
    2 OutputEmbeddingLength int `json:"outputEmbeddingLength"`
    3}
    4
    5type BedrockRequest struct {
    6 InputImage string `json:"inputImage"`
    7 EmbeddingConfig EmbeddingConfig `json:"embeddingConfig"`
    8 InputText *string `json:"inputText,omitempty"`
    9}
  11. And we have to put the data in to create the request, setting the vector size to 1024.
    1// Prepare the request to bedrock
    2payload := BedrockRequest{
    3 InputImage: image,
    4 EmbeddingConfig: EmbeddingConfig{
    5 OutputEmbeddingLength: 1024,
    6 },
    7 InputText: nil,
    8}
  12. Bedrock uses a single interface to interact with different models that use different parameters and return objects with different fields. Hence, it uses a single attribute of its requests (Body) that contains the respective request for the model serialized as a stream of bites. Let's do the serialization into a slice of bytes.
    1bedrockBody, err := json.Marshal(payload)
    2if err != nil {
    3 return nil, fmt.Errorf("failed to get embedding from bedrock: %w", err)
    4}
  13. The slice of bytes goes into the actual request that we have to prepare for Bedrock.
    1bedrockReq := bedrockruntime.InvokeModelInput{
    2 ModelId: aws.String(titanEmbedImgV1ModelId),
    3 Body: bedrockBody,
    4 ContentType: aws.String(contentTypeJson),
    5}
  14. The content type constant that we have just used will be of use for some future requests, so we declare it as a global constant.
    1const contentTypeJson = "application/json"
  15. Now that we have completed the request, we can invoke the model with it to make it infer the embedding.
    1// Invoke model to obtain embedding for the image
    2embeddingResp, err := app.bedrock.InvokeModel(ctx, &bedrockReq)
    3if err != nil {
    4 return nil, fmt.Errorf("failed to get embedding from bedrock: %w", err)
    5}
  16. The response from the model also comes in the Body field as a slice of bytes. We could deserialize the response with a new struct and then deserialize the field again. But instead, we are going to use a module that simplifies the task. It is called GJSON.
    1go get github.com/tidwall/gjson
  17. GJSON can be used to extract the data from any part of a JSON document. We are interested in the slice of bytes returned in the Body field of the bedrockruntime.InvokeModelOutput response. We also have to convert that slice of bytes that contains the string representation of the embedding into an actual vector of floats and return the resulting vector.
    1result := gjson.GetBytes(embeddingResp.Body, "embedding")
    2var embedding []float64
    3result.ForEach(func(key, value gjson.Result) bool {
    4 embedding = append(embedding, value.Float())
    5 return true
    6})
    7
    8return embedding, nil
  18. With this method ready, we just have to invoke it from the handler.
    1// Compute the embedding using titan-embed-image-v1
    2embedding, err := app.computeImageEmbedding(r.Context(), *stdImage)
    3if err != nil {
    4 log.Println("ERR:", err)
    5 http.Error(w, "Error computing embedding", http.StatusInternalServerError)
    6 return
    7}
  19. You can now tell your friends that you have been using AI in one of your programs. Congrats!

Find the best matches

The embedding that we have received from Bedrock can be used to find the best matches among the celebrities in our database. If you want to test the code with your own database and demonstrate to your family-in-law that your kids are more similar to your side of the family, go to your photo library and choose some pictures from all the members. The more, the merrier. You can then run them through the previous code to obtain their corresponding embeddings and store them together in MongoDB Atlas –the free cluster will suffice– each one in a different document.
  1. Since we are going to query the database that contains the celebrity images (or your family's) and their embeddings, we should store the URI to the Atlas cluster including the necessary credentials. We will have a .env file with the data of your cluster (don't copy/use this one).
  2. There is a module that can help us get this configuration from either the file or the environment.
    1go get github.com/joho/godotenv
  3. We load the module at the beginning of the main function.
    1var uri string
    2err := godotenv.Load()
    3if err != nil {
    4 log.Fatal("Unable to load .env file")
    5}
    6if uri = os.Getenv("MONGODB_URI"); uri == "" {
    7 log.Fatal("You must set your 'MONGODB_URI' environment variable. See\n\t https://docs.mongodb.com/drivers/go/current/usage-examples/")
    8}
  4. Now, we are going to add the MongoDB driver and we will be using version 2.0.
    1go get go.mongodb.org/mongo-driver/v2
  5. We will establish a connection to the Atlas cluster and make it available to the HTTP handler. We start by adding another field to the App structure.
    1client *mongo.Client
  6. And we initialize it in a private function.
    1func newDBClient(uri string) (*mongo.Client, error) {
    2 // Use the SetServerAPIOptions() method to set the Stable API version to 1
    3 serverAPI := options.ServerAPI(options.ServerAPIVersion1)
    4 opts := options.Client().ApplyURI(uri).SetServerAPIOptions(serverAPI)
    5 // Create a new client and connect to the server
    6 client, err := mongo.Connect(opts)
    7 if err != nil {
    8 return nil, err
    9 }
    10
    11 return client, nil
    12}
  7. We add a URI parameter to the constructor.
    1func NewApp(ctx context.Context, uri string) (*App, error) {
  8. And we pass the URI that we got from the .env file or from the environment onto the constructor.
    1app, err := NewApp(ctx, uri)
  9. We use it in the constructor.
    1client, err := newDBClient(uri)
    2if err != nil {
    3 log.Println("ERR: connecting to MongoDB cluster:", err)
    4 return nil, err
    5}
    6
    7return &App{
    8 client: client,
    9 config: cfg,
    10 bedrock: bedrockClient,
    11}, nil
  10. We want to be sure that this client is closed properly when we finish, so we are going to create another method that will take care of that.
    1func (app *App) Close() {
    2 if err := app.client.Disconnect(context.Background()); err != nil {
    3 panic(err)
    4 }
    5}
  11. Using defer will ensure that this is run before closing the application. So we do it right after initializing the app.
    1defer func() {
    2 app.Close()
    3}()
  12. As we have done with the previous steps, we define a new private method to find the images in the database.
    1func (app App) findSimilarImages(ctx context.Context, embedding []float64) ([]string, error) {
    2}
  13. In this method, we obtain a reference to the collection that contains the documents with pictures and embeddings.
    1// Get celeb image collection
    2imgCollection := app.client.Database("celebrity_matcher").Collection("celeb_images")
  14. One of the nicest features of MongoDB is the ability to perform and refine searches through different steps: the aggregation pipeline. The outcome of one step is the input to the next one, and one of the possible steps is using vector search.
    1// Aggregation pipeline to get the 3 closest images to the given embedding.
    2vectorSchStage := bson.D{{"$vectorSearch", bson.D{{"index", "vector_index"},
    3 {"path", "embeddings"},
    4 {"queryVector", embedding},
    5 {"numCandidates", 15},
    6 {"limit", 3}}}}
  15. The second stage will take care of choosing only the relevant fields from the results. For that, we use projections.
    1projectStage := bson.D{{"$project", bson.D{{"image", 1}}}}
  16. The aggregation pipeline is the result of putting those stages in an ordered list.
    1pipeline := mongo.Pipeline{vectorSchStage, projectStage}
  17. And we use it to make the query that returns a cursor.
    1// Make query
    2imgCursor, err := imgCollection.Aggregate(ctx, pipeline)
    3if err != nil {
    4 return nil, fmt.Errorf("failed to get similar images from the database: %w", err)
    5}
  18. The cursor can be used to obtain the actual images.
    1// Get all the result using the cursor
    2similarImgs := []struct {
    3 Id bson.ObjectID `bson:"_id,omitempty"`
    4 Image string `bson:"image"`
    5}{}
    6if err = imgCursor.All(ctx, &similarImgs); err != nil {
    7 return nil, fmt.Errorf("failed to get similar images from the database: %w", err)
    8}
  19. And the images are processed to also be in the required format with the function that we created before, and returned.
    1// Return just the standardized images in an array
    2var images []string
    3var stdImage *string
    4for _, item := range similarImgs {
    5 stdImage, err = standardizeImage(item.Image)
    6 if err != nil {
    7 return nil, fmt.Errorf("failed to standardize similar images: %w", err)
    8 }
    9 images = append(images, *stdImage)
    10}
    11return images, nil
  20. With our function to get the best matches from the database ready, we can use it from our handler.
    1// Find similar images using vector search in MongoDB
    2images, err := app.findSimilarImages(r.Context(), embedding)
    3if err != nil {
    4 log.Println("ERR:", err)
    5 http.Error(w, "Error getting similar images", http.StatusInternalServerError)
    6 return
    7}
  21. Another step solved. Hooray!

Explain yourself

Getting images that look similar is interesting enough, but getting the system to explain why is even more appealing. We are going to use another model, Claude, and take advantage of the conversational and multimodal nature to obtain that explanation.
  1. We don't need any further dependencies in our app, because we already have a Bedrock client. Then, as in the previous steps, we are going to add a new private method that will take care of this functionality.
    1// https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-anthropic-claude-messages.html
    2func (app App) getImageSimilaritiesDescription(ctx context.Context, imgB64 string, similarImgB64 []string) (*string, error) {
    3}
  2. Inside of it, we declare a constant with the name of the model that we will be using.
    1const claude3SonnetV1ModelId string = "anthropic.claude-3-sonnet-20240229-v1:0"
  3. And the structures that we will use to interact with the message API.
    1type ClaudeBodyMsgSource struct {
    2 Type string `json:"type"`
    3 MediaType *string `json:"media_type,omitempty"`
    4 Data *string `json:"data,omitempty"`
    5}
    6type ClaudeBodyMsgContent struct {
    7 Type string `json:"type"`
    8 Source *ClaudeBodyMsgSource `json:"source,omitempty"`
    9 Text *string `json:"text,omitempty"`
    10}
    11type ClaudeBodyMsg struct {
    12 Role string `json:"role"`
    13 Content []ClaudeBodyMsgContent `json:"content"`
    14}
    15type ClaudeBody struct {
    16 AnthropicVersion string `json:"anthropic_version"`
    17 MaxTokens int `json:"max_tokens"`
    18 System string `json:"system"`
    19 Messages []ClaudeBodyMsg `json:"messages"`
    20}
  4. Then, we create an instance with the desired data.
    1// Prepare the request to bedrock
    2const mediaTypeImage = "image/jpeg"
    3prompt := "Please let the user know how their first image is similar to the other 3 and which one is the most similar?"
    4payload := ClaudeBody{
    5 AnthropicVersion: "bedrock-2023-05-31",
    6 MaxTokens: 1000,
    7 System: "Please act as face comparison analyzer.",
    8 Messages: []ClaudeBodyMsg{
    9 {
    10 Role: "user",
    11 Content: []ClaudeBodyMsgContent{
    12 {
    13 Type: "image",
    14 Source: &ClaudeBodyMsgSource{
    15 Type: "base64",
    16 MediaType: aws.String(mediaTypeImage),
    17 Data: &imgB64,
    18 },
    19 },
    20 {
    21 Type: "image",
    22 Source: &ClaudeBodyMsgSource{
    23 Type: "base64",
    24 MediaType: aws.String(mediaTypeImage),
    25 Data: &similarImgB64[0],
    26 },
    27 },
    28 {
    29 Type: "image",
    30 Source: &ClaudeBodyMsgSource{
    31 Type: "base64",
    32 MediaType: aws.String(mediaTypeImage),
    33 Data: &similarImgB64[1],
    34 },
    35 },
    36 {
    37 Type: "image",
    38 Source: &ClaudeBodyMsgSource{
    39 Type: "base64",
    40 MediaType: aws.String(mediaTypeImage),
    41 Data: &similarImgB64[2],
    42 },
    43 },
    44 {
    45 Type: "text",
    46 Text: &prompt,
    47 },
    48 },
    49 },
    50 },
    51}
  5. As in the previous step where we worked with Bedrock, we are going to put all this data serialized in the Body field of the request.
    1bedrockBody, err := json.Marshal(payload)
    2if err != nil {
    3 return nil, fmt.Errorf("failed to get embedding from bedrock: %w", err)
    4}
    5bedrockReq := bedrockruntime.InvokeModelInput{
    6 ModelId: aws.String(claude3SonnetV1ModelId),
    7 Body: bedrockBody,
    8 ContentType: aws.String(contentTypeJson),
    9 Accept: aws.String(contentTypeJson),
    10}
  6. We can use the magic wand and invoke the model.
    1// Invoke the model with the request
    2bedrockResp, err := app.bedrock.InvokeModel(ctx, &bedrockReq)
    3if err != nil {
    4 return nil, fmt.Errorf("failed to get embedding from bedrock: %w", err)
    5}
  7. And we extract the explanation using GJSON as we did before, and return it.
    1description := gjson.GetBytes(bedrockResp.Body, "content.0.text").String()
    2
    3return &description, nil
  8. Now, main can use this method and obtain the description explaining how they look similar.
    1description, err := app.getImageSimilaritiesDescription(r.Context(), *stdImage, images)
    2if err != nil {
    3 log.Println("ERR: failed to describe similarities with images", err)
    4 http.Error(w, "Error describing similarities with images", http.StatusInternalServerError)
    5 return
    6}
  9. And that is how you do magic. Well, actually, that is how you use inference from an AI model.

Return the results to the front end

Not much is left. We have to wrap the present and give it to the user. Let's finish triumphantly.
  1. Inside of the handler, we define the structure that will be used to serialize the response as JSON.
    1type CelebMatchResponse struct {
    2 Description string `json:"description"`
    3 Images []string `json:"images"`
    4}
  2. We use that structure to create an instance that contains the description and the images.
    1response := CelebMatchResponse{
    2 Description: *description,
    3 Images: images,
    4}
  3. And write the response of the HTTP handler.
    1jData, err := json.Marshal(response)
    2if err != nil {
    3 log.Fatalln("error serializing json", err)
    4}
    5// Set response headers and return JSON
    6w.Header().Set("Content-Type", contentTypeJson)
    7w.Header().Set("Content-Length", strconv.Itoa(len(jData)))
    8w.WriteHeader(http.StatusOK)
    9w.Write(jData)
  4. Pat yourself on the back and celebrate loudly. You are officially riding the AI hype, and instead of some of the nonsense that you sometimes see, we have created something that is useful. No more discussions with your family-in-law. 😄

Conclusion

We have developed a fully operative back end server that uses AI and vector search in MongoDB. We have used Golang and written the whole code from scratch. I hope that this example illustrates some of the use cases of AI and how it can be used within a real application. And finally you can confirm that you and that famous actress/actor are almost identical twinsies. I call it a day.
If you decide to give this a shot, be sure to let us know how it goes, over in the MongoDB Developer Community.
Stay curious. Hack your code. See you next time!
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Article

Audio Find - Atlas Vector Search for Audio


Sep 09, 2024 | 11 min read
Tutorial

Building a Scalable Media Management Back End: Integrating Node.js, Azure Blob Storage, and MongoDB


Nov 05, 2024 | 10 min read
Tutorial

Getting Started with MongoDB Atlas and Azure Functions using Node.js


Feb 03, 2023 | 8 min read
Tutorial

Tutorial: Build a Movie Search Engine Using Atlas Full-Text Search in 10 Minutes


Sep 09, 2024 | 10 min read
Table of Contents
  • A real-world problem