Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
Java
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Languageschevron-right
Javachevron-right

AI-Powered Playlist Generator: Crafting Custom Vibes With Deeplearning4j and MongoDB

Tim Kelly14 min read • Published Oct 25, 2024 • Updated Oct 25, 2024
AIJava
FULL APPLICATION
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
What exactly is a "romcom coastal grandmother Wednesday afternoon"? How about a "sad girl wistful Friday evening"? Well, I don't entirely know, but Spotify seems to think that this is the kind of music I want to listen to. And they're right! By training on my listening habits, and how these fluctuate throughout the weeks and days, Spotify built the "daylist"—a custom playlist tailored exactly to my tastes, topped off with a funky little playlist name that captures the vibe of the playlist. But what if we want to work backward from there?
I want to be able to give my own funky little playlist name, and get a custom playlist that matches whatever obscure image I have in my head of what that is. Long gone are the days of manually scouring the music streaming platform of your choice to create "midwestern emo puppy love"—that will likely just be the discography of Car Seat Headrest because of the effort it takes to swap between the various artists that will only escape your mind the minute you're confronted with the never-fluid UI of playlist creation.
In this tutorial, we will use Deeplearning4j to import a model that we can use to embed our song lyrics. This will allow us to capture the semantic meaning of the songs. Deeplearning4j is an open-source deep learning framework built for the JVM (Java Virtual Machine), for training and deploying neural networks in Java. It has several submodules, like Nd4j, which is like NumPy for Java and handles complex mathematical operations, and Datavec, which transforms raw data (like song lyrics) into tensors suitable for neural networks. Deeplearning4j also integrates with Samediff, allowing the execution of complex computational graphs similar to TensorFlow or PyTorch. These tools let us embed song lyrics into vectors, capturing the "vibes" of each song, which we can then use with MongoDB Atlas to generate custom playlists based on the funky playlist name you provide.
We will then provide a playlist name that will be used to search the database for the most semantically similar songs, all with MongoDB Atlas Vector Search. There will be some limitations to this implementation. We won't be using the user listening history to tailor our results, or the audio files of the music to better capture the vibes of the songs. More importantly, we will be using just a generic, pre-trained model for embedding our data, which will limit the accuracy somewhat. That being said, Deeplearning4j allows you to import your own custom models from the likes of TensorFlow or Keras to be used in your applications. That just goes a bit beyond the scope of what we will be doing today.
The entire application is available in the GitHub repository.

Prerequisites

For this project, you'll need the following:
  • Java 11+ installed (I'll be using Java 21)
  • Maven version 3.9.6+
  • MongoDB Atlas with a cluster deployed
    • For the dataset we have, we will be exceeding the size limit of the MongoDB M0 cluster, our free tier. But you can load some of the dataset later and still follow along with the tutorial.
  • GloVe embeddings (glove.840B.300d.txt) available at GloVe: Global Vectors for Word Representation
    • store in src/main/resources
  • A CSV file of song lyrics (song_lyrics.csv) available from Kaggle: Genius Song Lyrics
    • store in src/main/resources

Setting up our POM

1<dependencies>
2 <!-- Spring Boot Starter for Web API -->
3 <dependency>
4 <groupId>org.springframework.boot</groupId>
5 <artifactId>spring-boot-starter-web</artifactId>
6 </dependency>
7
8 <!-- ND4J: Core Numerical Processing -->
9 <dependency>
10 <groupId>org.nd4j</groupId>
11 <artifactId>${nd4j.backend}</artifactId>
12 <version>${1.0.0-M2.1}</version>
13 </dependency>
14
15 <!-- DataVec for working with data -->
16 <dependency>
17 <groupId>org.datavec</groupId>
18 <artifactId>datavec-api</artifactId>
19 <version>${1.0.0-M2.1}</version>
20 </dependency>
21
22 <!-- DeepLearning4j Core -->
23 <dependency>
24 <groupId>org.deeplearning4j</groupId>
25 <artifactId>deeplearning4j-core</artifactId>
26 <version>${1.0.0-M2.1}</version>
27 </dependency>
28
29 <!-- DeepLearning4j NLP for Text Processing -->
30 <dependency>
31 <groupId>org.deeplearning4j</groupId>
32 <artifactId>deeplearning4j-nlp</artifactId>
33 <version>${1.0.0-M2.1}</version>
34 </dependency>
35
36 <!-- Apache Commons CSV -->
37 <dependency>
38 <groupId>org.apache.commons</groupId>
39 <artifactId>commons-csv</artifactId>
40 <version>1.9.0</version>
41 </dependency>
42
43 <!-- MongoDB Driver -->
44 <dependency>
45 <groupId>org.mongodb</groupId>
46 <artifactId>mongodb-driver-sync</artifactId>
47 <version>5.2.0</version>
48 </dependency>
49
50 <dependency>
51 <groupId>org.mongodb</groupId>
52 <artifactId>mongodb-driver-core</artifactId>
53 <version>5.2.0</version>
54 </dependency>
55
56 <dependency>
57 <groupId>org.mongodb</groupId>
58 <artifactId>bson</artifactId>
59 <version>5.2.0</version>
60 </dependency>
61
62 <!-- JUnit for testing -->
63 <dependency>
64 <groupId>org.junit.jupiter</groupId>
65 <artifactId>junit-jupiter-api</artifactId>
66 <version>5.7.0</version>
67 <scope>test</scope>
68 </dependency>
69
70</dependencies>

What’s happening here?

  • Spring Boot Starter: This dependency sets up a Spring Boot web application with all the required web API components.
  • ND4J and DataVec: These are the core libraries from the Deeplearning4j ecosystem. ND4J handles numerical computations, and DataVec processes data for machine learning tasks.
  • Deeplearning4j Core and NLP: This provides us with the deep learning functionality and natural language processing (NLP) tools we'll use to embed song lyrics into vectors.
  • MongoDB Driver: This allows us to connect and interact with MongoDB from our Java application.
  • Apache Commons CSV: This is used to read song data from a CSV file for processing and storage.

Our data

This tutorial revolves around two models: Song and Playlist.

Song model

The Song model represents an individual song and includes fields for the song's title, artist, lyrics, and embedding vector. Feel free to modify this to store whatever data you need for your application. With MongoDB documents, your data is stored alongside your vectors:
1import java.util.List;
2
3public class Song {
4 private String title;
5 private String artist;
6 private String lyrics;
7 private List<Double> embedding;
8
9 public Song(String title, String artist, String lyrics, List<Double> embedding) {
10 this.title = title;
11 this.artist = artist;
12 this.lyrics = lyrics;
13 this.embedding = embedding;
14 }
15
16 public String getTitle() {
17 return title;
18 }
19
20 public void setTitle(String title) {
21 this.title = title;
22 }
23
24 public String getArtist() {
25 return artist;
26 }
27
28 public void setArtist(String artist) {
29 this.artist = artist;
30 }
31
32 public String getLyrics() {
33 return lyrics;
34 }
35
36 public void setLyrics(String lyrics) {
37 this.lyrics = lyrics;
38 }
39
40 public List<Double> getEmbedding() {
41 return embedding;
42 }
43
44 public void setEmbedding(List<Double> embedding) {
45 this.embedding = embedding;
46 }
47}

Playlist model

The Playlist model consists of a playlist name and a list of Song objects:
1import java.util.List;
2
3public class Playlist {
4
5 private String playlistName;
6 private List<Song> songs;
7
8 public Playlist() {
9 }
10
11 public Playlist(String playlistName, List<Song> songs) {
12 this.playlistName = playlistName;
13 this.songs = songs;
14 }
15
16 public String getPlaylistName() {
17 return playlistName;
18 }
19
20 public void setPlaylistName(String playlistName) {
21 this.playlistName = playlistName;
22 }
23
24 public List<Song> getSongs() {
25 return songs;
26 }
27
28 public void setSongs(List<Song> songs) {
29 this.songs = songs;
30 }
31}
These models will be used to structure the data we store in MongoDB and fetch for creating our funky little playlists.

Embedding data

In this section, we'll focus on how to convert song lyrics into vector embeddings using GloVe (Global Vectors for Word Representation) embeddings. These embeddings capture the semantic meaning of each word in the lyrics, allowing us to compare songs based on their lyrical content.
To do this, we'll create a service package, and the EmbeddingService class, which will:
  1. Load the pre-trained GloVe model: This model contains vector representations of words. We’ll use the 300-dimensional GloVe embeddings for this.
  2. Tokenize the song lyrics: We'll split the lyrics into individual words, removing any unwanted text.
  3. Generate a vector for each song: By averaging the embeddings for each word in the lyrics, we'll create a single vector that represents the entire song.
Let's get coding!

Loading the GloVe model

We need to load the GloVe model from a text file and store each word’s corresponding vector. Here's the code for loading the model:
1@Service
2public class EmbeddingService {
3
4private final Map<String, INDArray> gloveEmbeddings = new HashMap<>();
5private final DefaultTokenizerFactory tokenizerFactory = new DefaultTokenizerFactory();
6private final Set<String> stopWords; // Set of stop words for filtering
7
8@Value("${embedding.model.path}")
9private String preTrainedGlovePath;
10
11public EmbeddingService() {
12 tokenizerFactory.setTokenPreProcessor(new CommonPreprocessor());
13 stopWords = loadStopWords();
14}
15
16@PostConstruct
17private void init() throws IOException {
18 loadGloveModel(preTrainedGlovePath);
19}
20
21private void loadGloveModel(String preTrainedGlovePath) throws IOException {
22 InputStream gloveStream = getClass().getResourceAsStream("/glove.840B.300d.txt");
23 if (gloveStream == null) {
24 throw new IOException("GloVe model not found in resources: " + preTrainedGlovePath);
25 }
26
27 try (BufferedReader reader = new BufferedReader(new InputStreamReader(gloveStream))) {
28 String line;
29 while ((line = reader.readLine()) != null) {
30 String[] split = line.split(" ");
31 String word = split[0];
32 float[] vector = new float[300]; // 300-dimensional GloVe vector
33 for (int i = 1; i < split.length; i++) {
34 vector[i - 1] = Float.parseFloat(split[i]);
35 }
36 INDArray wordVector = Nd4j.create(vector);
37 gloveEmbeddings.put(word, wordVector);
38 }
39 }
40}
41
42private Set<String> loadStopWords() {
43 return new HashSet<>(Arrays.asList(
44 "the", "a", "an", "and", "is", "in", "at", "of", "to", "for", "with", "on", "by", "this", "that", "it", "i", "you", "they", "we", "but", "or", "as", "if", "when"
45 ));
46}
We're loading a 300-dimensional embedding for each word from the glove.840B.300d.txt file. We then set up a tokenizer that will split text into individual words. We are lastly uploading a list of stop words that will help us with embedding text. These are words that don't provide a lot of semantic meaning to the text and can devalue our embedding. This is not a comprehensive list but will do for a demo.
Note: PostConstruct is an annotation that ensures the GloVe model is loaded as soon as the Spring application starts.

Tokenizing the lyrics

Next, we need to split the song lyrics into words and filter out any stop words. We are also removing anything in bracketed text. This is because the dataset I am using provides information like chorus or verse 2 in bracketed text, and I want to clean up the data before I generate the embeddings.
1 private String removeBracketedText(String text) {
2 return text.replaceAll("\\[.*?]", "").trim();
3}
4
5public List<String> tokenizeText(String text) {
6 text = removeBracketedText(text);
7
8 Tokenizer tokenizer = tokenizerFactory.create(text);
9 List<String> tokens = new ArrayList<>();
10 while (tokenizer.hasMoreTokens()) {
11 String token = tokenizer.nextToken();
12 if (!stopWords.contains(token)) {
13 tokens.add(token);
14 }
15 }
16 return tokens;
17}
Tokenizer Factory: We're using the Deeplearning4j DefaultTokenizerFactory to tokenize the lyrics. This method breaks the input text into individual words (tokens).

Generating the song embedding

Once we have the individual words (tokens), we can create a single vector that represents the entire song by averaging the embeddings for each word.
1public INDArray getEmbeddingForWord(String word) {
2 return gloveEmbeddings.getOrDefault(word, null);
3}
4
5public List<Double> getEmbeddingForText(List<String> tokens) {
6 INDArray embedding = null;
7 int validTokenCount = 0;
8
9 for (String token : tokens) {
10 INDArray wordVector = getEmbeddingForWord(token);
11 if (wordVector != null) {
12 if (embedding == null) {
13 embedding = wordVector.dup(); // Duplicate the word vector
14 } else {
15 embedding.addi(wordVector); // Sum the word embeddings
16 }
17 validTokenCount++;
18 }
19 }
20
21 if (embedding != null && validTokenCount > 0) {
22 embedding.divi(validTokenCount);
23 return convertINDArrayToDoubleList(embedding);
24 }
25 return Collections.emptyList(); // Return an empty list if no valid embeddings
26}
27
28private List<Double> convertINDArrayToDoubleList(INDArray indArray) {
29 double[] array = indArray.toDoubleVector();
30 List<Double> doubleList = new ArrayList<>();
31 for (double value : array) {
32 doubleList.add(value);
33 }
34 return doubleList;
35}
36
37public List<Double> embedText(String text) {
38 List<String> tokens = tokenizeText(text);
39 return getEmbeddingForText(tokens);
40}
We calculate the average of all word embeddings in the lyrics to create a single vector for the song. If a word is not found in the GloVe embeddings, we skip it. This is where a model specifically trained on song lyrics would be particularly useful.
Since MongoDB does not directly support INDArray, we convert the result to a List<Double>.
This service provides the embedding that will later be stored in MongoDB and used for searching similar songs. With EmbeddingService written and ready, how do we store the embeddings in MongoDB and later query them to generate playlists?

Storing in MongoDB

With our song embeddings generated, we need to connect to MongoDB and store the data.
Let’s break down the MongoDB code into two parts:
  1. MongoDB configuration: Setting up the connection to our MongoDB instance
  2. MongoDB repository: Storing and querying song data

MongoDB configuration

First, we need to configure our MongoDB connection. This is where the MongoDBConfig class comes in. We’ll create a shared MongoClient that will handle communication with MongoDB.
1import com.mongodb.client.MongoClient;
2import com.mongodb.client.MongoClients;
3import org.springframework.context.annotation.Bean;
4import org.springframework.context.annotation.Configuration;
5import org.springframework.beans.factory.annotation.Value;
6
7@Configuration
8public class MongoDBConfig {
9
10 @Value("${mongodb.uri}")
11 private String mongoUri;
12
13 @Bean
14 public MongoClient mongoClient() {
15 return MongoClients.create(mongoUri);
16 }
17}
@Configuration: This annotation marks the class as a configuration component in Spring. It tells Spring that this class contains bean definitions.
We use the MongoClients.create() method to create a MongoDB client using the URI specified in the application.properties file.
Add the MongoDB URI in application.properties:
1mongodb.uri=mongodb+srv://<username>:<password>@cluster.mongodb.net/?retryWrites=true&w=majority
We'll also add the database and collection name:
1mongodb.database=music
2mongodb.collection=songs

Storing data in MongoDB

Next, we’ll create a MongoDBRepository class that interacts with the MongoDB database. This repository will house our database interactions to store song data, and perform a vector search to retrieve similar songs based on embeddings.
1import com.mongodb.client.MongoClient;
2import com.mongodb.client.MongoCollection;
3import com.mongodb.client.MongoDatabase;
4import org.bson.Document;
5import org.example.model.Song;
6import org.springframework.beans.factory.annotation.Value;
7import org.springframework.stereotype.Repository;
8
9import java.util.ArrayList;
10import java.util.Arrays;
11import java.util.List;
12
13@Repository
14public class MongoDBRepository {
15
16 private final MongoCollection<Document> songCollection;
17
18 public MongoDBRepository(MongoClient mongoClient,
19 @Value("${mongodb.database}") String databaseName,
20 @Value("${mongodb.collection}") String collectionName) {
21 MongoDatabase database = mongoClient.getDatabase(databaseName);
22 this.songCollection = database.getCollection(collectionName);
23 }
24
25 /**
26 * Store the song embedding along with song details
27 *
28 * @param lyrics The song lyrics
29 * @param title The song title
30 * @param artist The artist name
31 * @param embedding The vector embedding for the song
32 */
33 public void storeEmbedding(String lyrics, String title, String artist, List<Double> embedding) {
34 Document songDocument = new Document()
35 .append("title", title)
36 .append("artist", artist)
37 .append("lyrics", lyrics)
38 .append("embedding", embedding);
39 songCollection.insertOne(songDocument);
40 }
41
42 /**
43 * Fetch similar songs using MongoDB's $vectorSearch aggregation based on the playlist embedding.
44 *
45 * @param playlistEmbedding The playlist title embedding used as the query vector
46 * @return List of Song objects representing similar songs
47 */
48 public List<Song> getSimilarSongs(List<Double> playlistEmbedding) {
49 List<Document> similarSongsDocs = new ArrayList<>();
50
51 // Perform the vector search using the generated embedding
52 String indexName = "vector_index";
53 int numCandidates = 150;
54 int limit = 10;
55
56 List<Document> pipeline = Arrays.asList(
57 new Document("$vectorSearch",
58 new Document("index", indexName)
59 .append("path", "embedding")
60 .append("queryVector", playlistEmbedding)
61 .append("numCandidates", numCandidates)
62 .append("limit", limit)
63 ),
64 new Document("$limit", limit)
65 );
66
67 try {
68 songCollection.aggregate(pipeline).into(similarSongsDocs);
69 } catch (Exception e) {
70 throw new RuntimeException("Failed to retrieve similar songs", e);
71 }
72
73
74 List<Song> similarSongs = new ArrayList<>();
75 for (Document doc : similarSongsDocs) {
76 Song song = new Song(
77 doc.getString("title"),
78 doc.getString("artist"),
79 doc.getString("lyrics"),
80 doc.getList("embedding", Double.class)
81 );
82 similarSongs.add(song);
83 }
84
85 return similarSongs;
86 }
87}
The method storeEmbedding() stores the song’s title, artist, lyrics, and embedding as a document in MongoDB.
The method getSimilarSongs() performs a vector search using MongoDB's $vectorSearch operation. It takes the embedding for the playlist name and retrieves a list of songs with similar embeddings.

Vector Search index

The last step of configuring our database to get ourselves set up is to create the vectorSearch index for the song embeddings stored in the database.
We will be indexing the embedding field from the songs collection in our MongoDB Atlas database. This field contains the vector representation of each song’s lyrics we generated.

Create a Vector Search index using the Atlas UI

  1. Log in to MongoDB Atlas and go to the Clusters page for our project.
  2. In the sidebar, we navigate to Atlas Search under the Services heading.
  3. Let’s click Create Search Index.
  4. In the modal that appears:
    • Index Name: Enter a unique name for your index (e.g., vector_index).
    • Database: Select your database (e.g., music).
    • Collection: Select your collection (e.g., songs).
  5. Choose JSON Editor and click Next.
  6. Define the index using the following JSON structure:
1{
2 "fields": [
3 {
4 "type": "vector",
5 "path": "embedding",
6 "numDimensions": 300, // The number of dimensions of the vectors
7 "similarity": "dotProduct" // Similarity metric (cosine, euclidean, or dotProduct)
8 }
9 ]
10}
  • type: Specifies that the field is a vector type (used for embeddings).
  • path: The name of the field you’re indexing (embedding, in our case).
  • numDimensions: The number of dimensions in the vector. Since we’re using GloVe embeddings, this is 300.
  • similarity: Defines the similarity metric. In our case, we use "cosine" to measure similarity based on the angle between vectors.
  1. Click Next to review your index configuration.
  2. Click Create Search Index.
Once the index is created, Atlas will start building the index, and you’ll be able to use $vectorSearch queries to find songs based on their embeddings.

Creating a playlist

The core functionality of our application lies in creating a playlist based on our inputted playlist title. To do this, we need to create an embedding for our title, just as we would for our song lyrics. We then use Atlas Vector Search with our embedded title to query the MongoDB database, and find the most semantically similar song.
This implementation will go in our PlaylistService, in our Service package, which will rely on our existing EmbeddingService (to generate embeddings for the playlist name) and MongoDBRepository (to perform the operations on our MongoDB Database).
Let’s break down the PlaylistService class step-by-step.

Define the service

We start by marking PlaylistService as a Spring service and injecting the necessary dependencies: EmbeddingService for generating the embeddings and MongoDBRepository for our MongoDB database operations.
1@Service
2public class PlaylistService {
3
4 private final EmbeddingService embeddingService;
5 private final MongoDBRepository songRepository;
6
7 public PlaylistService(EmbeddingService embeddingService, MongoDBRepository songRepository) {
8 this.embeddingService = embeddingService;
9 this.songRepository = songRepository;
10 }
11}
Now, we're going to add the method generatePlaylist(String playlistName):
1 public Playlist generatePlaylist(String playlistName) {
2 // Generate the embedding for the playlist title
3 List<Double> playlistEmbedding = embeddingService.embedText(playlistName);
4
5 if (playlistEmbedding == null || playlistEmbedding.isEmpty()) {
6 throw new RuntimeException("Failed to generate embedding for playlist: " + playlistName);
7 }
8
9 // Query the database to find similar songs
10 List<Song> similarSongs = songRepository.getSimilarSongs(playlistEmbedding);
11
12 // Construct and return the Playlist
13 return new Playlist(playlistName, similarSongs);
14}
When a user provides a playlist name, we need to convert that name into a vector representation that can capture the semantic meaning of the text.
After generating the embedding, the next step is to find songs with similar embeddings. We pass the playlist embedding to MongoDBRepository, which performs a vector search to find songs that match the vibe of the playlist name.
Once we’ve retrieved the similar songs from MongoDB, we create and return a Playlist object that contains the playlist name and the list of songs.
The next step is to expose this functionality via a REST API (if you so desire).

Testing

To ensure our playlist generator is working correctly, we need to load sample data into MongoDB, then test generating a playlist based on a user-provided name. We're going to load data from a CSV file and create endpoints for testing the playlist generation.

Loading sample data

Before we can test the playlist generation, we need song data (lyrics) stored in MongoDB. We'll use a CSV file (song_lyrics.csv) that contains the song title, artist, lyrics, and other metadata. The SongLyricsProcessor class will handle reading this CSV and storing the processed data in MongoDB.
Here’s the code for SongLyricsProcessor:
1import org.apache.commons.csv.CSVFormat;
2import org.apache.commons.csv.CSVParser;
3import org.apache.commons.csv.CSVRecord;
4import org.example.repository.MongoDBRepository;
5import org.example.service.EmbeddingService;
6import org.springframework.stereotype.Component;
7
8import java.io.InputStreamReader;
9import java.io.Reader;
10import java.util.List;
11import java.util.Objects;
12
13@Component
14public class SongLyricsProcessor {
15
16 private final EmbeddingService embeddingService;
17 private final MongoDBRepository mongoDBRepository;
18
19 public SongLyricsProcessor(EmbeddingService embeddingService, MongoDBRepository mongoDBRepository) {
20 this.embeddingService = embeddingService;
21 this.mongoDBRepository = mongoDBRepository;
22 }
23
24 public void processAndStoreLyrics(String csvFilePath) throws Exception {
25 Reader reader = new InputStreamReader(
26 Objects.requireNonNull(getClass().getClassLoader().getResourceAsStream(csvFilePath))
27 );
28
29 CSVFormat csvFormat = CSVFormat.DEFAULT.builder()
30 .setHeader()
31 .setSkipHeaderRecord(true)
32 .build();
33
34 CSVParser csvParser = new CSVParser(reader, csvFormat);
35
36 for (CSVRecord record : csvParser) {
37 String title = record.get("title");
38 String artist = record.get("artist");
39 String lyrics = record.get("lyrics");
40 String languageCld3 = record.get("language_cld3");
41 String languageFt = record.get("language_ft");
42
43 // Check if both language_cld3 and language_ft are 'en' (English)
44 if (!"en".equalsIgnoreCase(languageCld3) || !"en".equalsIgnoreCase(languageFt)) {
45 continue;
46 }
47
48 // Tokenize the lyrics and generate the embedding
49 List<Double> lyricsEmbedding = embeddingService.embedText(lyrics);
50
51 // Store the title, artist, lyrics, and embedding in MongoDB
52 mongoDBRepository.storeEmbedding(lyrics, title, artist, lyricsEmbedding);
53 }
54
55 csvParser.close();
56 }
57}
We read the song_lyrics.csv file. Each record in the file contains information about a song, such as its title, artist, and lyrics. We’ll store some of this information as metadata.
For each song, we generate an embedding for the lyrics.The processed data (title, artist, lyrics, and embedding) is stored in MongoDB.

Loading the data via REST endpoint

To trigger the data loading, we expose an endpoint /loadSampleData in the PlaylistController. This will allow us to load the CSV data into MongoDB by sending a request with the file name.
This is quite a haphazard way of implementing this and is NOT recommended for production, but is absolutely fine for this silly little demo.
1@RestController
2public class PlaylistController {
3
4 private final PlaylistService playlistService;
5 private final SongLyricsProcessor songLyricsProcessor;
6
7 public PlaylistController(PlaylistService playlistService, SongLyricsProcessor songLyricsProcessor) {
8 this.playlistService = playlistService;
9 this.songLyricsProcessor = songLyricsProcessor;
10 }
11
12 @GetMapping("/loadSampleData")
13 public void loadSampleData(@RequestParam String fileName) throws Exception {
14 songLyricsProcessor.processAndStoreLyrics(fileName);
15 }
16}
  • /loadSampleData Endpoint: This endpoint triggers the processAndStoreLyrics() method in SongLyricsProcessor. You pass the CSV file name as a query parameter (e.g., /loadSampleData?fileName=song_lyrics.csv).
  • SongLyricsProcessor: The injected SongLyricsProcessor reads the CSV file, processes the song data, generates embeddings, and stores everything in MongoDB.

Generating a playlist via REST endpoint

Once the data is loaded, we need a way to test playlist generation. For this, we’ll expose a /newPlaylist endpoint in PlaylistController. When a user provides a playlist name, we’ll generate an embedding for that name and query MongoDB for similar songs.
1@GetMapping("/newPlaylist")
2public Playlist newPlaylist(@RequestParam String playlistName) {
3 return playlistService.generatePlaylist(playlistName);
4}
/newPlaylist Endpoint takes a playlistName query parameter (e.g., /newPlaylist?playlistName=sad%20girl%20wistful%20Friday%20evening) and returns a playlist generated based on that name.

A couple of cURLs

So here we are. Will it be time for a "club dance party Saturday night" or a "heartbroken scream Monday morning".
Well, let's first load our data into the database:
1curl -X GET "http://localhost:8080/loadSampleData?fileName=song_lyrics.csv"
Now, this will take some time. We have an awful lot of songs going up. So make a cuppa, put on some Willie Nelson, and forget the world around you (or monitor the MongoDB Atlas dashboard to verify the writes to the database). Once this has completed, or you lose patience and decide several thousand songs is enough for your proof of concept and manually interrupt the process, let's make our playlist.
Let's sit in our melancholy and ask for our worst case scenario playlist.
1curl -X GET "http://localhost:8080/newPlaylist?playlistName=sad%20girl%20wistful%20Friday%20evening"
Well, if everything goes right, you should see something like this in your console.
1{
2 "playlistName": "sad girl wistful Friday evening",
3 "songs": [
4 {
5 "title": "When A Woman Loves",
6 "artist": "R. Kelly",
7 "lyrics": "...",
8 "embedding": "...",
9 },
10 {
11 "title": "Lonely",
12 "artist": "Akon",
13 "lyrics": "...",
14 "embedding": "...",
15 },
16 {
17 "title": "Monster",
18 "artist": "Lady Gaga",
19 "lyrics": "...",
20 "embedding": "...",
21 },
22 {
23 "title": "All the Boys",
24 "artist": "Keri Hilson",
25 "lyrics": "...",
26 "embedding": "...",
27 }
28 ]
29}
It’s a far from perfect implementation, but given the limitations we've accepted with using a general language model on song lyrics, both Akon and Lady Gaga in the top four results is a pretty good playlist.

Conclusion

In this tutorial, we’ve walked through how to build a custom playlist generator using Deeplearning4j to embed song lyrics and MongoDB Atlas Vector Search to query similar songs. While this implementation has limitations, such as using pre-trained embeddings, it opens up a world of possibilities for generating playlists based on funky names and vibes.
If you found this tutorial useful, check out the MongoDB Developer Center for more Java tutorials with MongoDB, and learn how to do stuff like retrieval-augmented generation with MongoDB and Spring AI. Or head over to the MongoDB community forums to ask questions, and see what other people are building with MongoDB.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Quickstart

Introduction to MongoDB and Helidon


Nov 04, 2024 | 6 min read
Tutorial

How to Connect to MongoDB With a SOCKS5 Proxy With Java


Aug 29, 2024 | 2 min read
Tutorial

Building a Real-Time, Dynamic Seller Dashboard on MongoDB


Aug 05, 2024 | 7 min read
News & Announcements

The 2022 MongoDB Java Developer Survey


Apr 02, 2024 | 0 min read
Table of Contents