AI-Powered Playlist Generator: Crafting Custom Vibes With Deeplearning4j and MongoDB
Rate this tutorial
What exactly is a "romcom coastal grandmother Wednesday afternoon"? How about a "sad girl wistful Friday evening"? Well, I don't entirely know, but Spotify seems to think that this is the kind of music I want to listen to. And they're right! By training on my listening habits, and how these fluctuate throughout the weeks and days, Spotify built the "daylist"—a custom playlist tailored exactly to my tastes, topped off with a funky little playlist name that captures the vibe of the playlist. But what if we want to work backward from there?
I want to be able to give my own funky little playlist name, and get a custom playlist that matches whatever obscure image I have in my head of what that is. Long gone are the days of manually scouring the music streaming platform of your choice to create "midwestern emo puppy love"—that will likely just be the discography of Car Seat Headrest because of the effort it takes to swap between the various artists that will only escape your mind the minute you're confronted with the never-fluid UI of playlist creation.
In this tutorial, we will use Deeplearning4j to import a model that we can use to embed our song lyrics. This will allow us to capture the semantic meaning of the songs. Deeplearning4j is an open-source deep learning framework built for the JVM (Java Virtual Machine), for training and deploying neural networks in Java. It has several submodules, like Nd4j, which is like NumPy for Java and handles complex mathematical operations, and Datavec, which transforms raw data (like song lyrics) into tensors suitable for neural networks. Deeplearning4j also integrates with Samediff, allowing the execution of complex computational graphs similar to TensorFlow or PyTorch. These tools let us embed song lyrics into vectors, capturing the "vibes" of each song, which we can then use with MongoDB Atlas to generate custom playlists based on the funky playlist name you provide.
We will then provide a playlist name that will be used to search the database for the most semantically similar songs, all with MongoDB Atlas Vector Search. There will be some limitations to this implementation. We won't be using the user listening history to tailor our results, or the audio files of the music to better capture the vibes of the songs. More importantly, we will be using just a generic, pre-trained model for embedding our data, which will limit the accuracy somewhat. That being said, Deeplearning4j allows you to import your own custom models from the likes of TensorFlow or Keras to be used in your applications. That just goes a bit beyond the scope of what we will be doing today.
For this project, you'll need the following:
- Java 11+ installed (I'll be using Java 21)
- Maven version 3.9.6+
- MongoDB Atlas with a cluster deployed
- For the dataset we have, we will be exceeding the size limit of the MongoDB M0 cluster, our free tier. But you can load some of the dataset later and still follow along with the tutorial.
- store in
src/main/resources
- store in
src/main/resources
1 <dependencies> 2 <!-- Spring Boot Starter for Web API --> 3 <dependency> 4 <groupId>org.springframework.boot</groupId> 5 <artifactId>spring-boot-starter-web</artifactId> 6 </dependency> 7 8 <!-- ND4J: Core Numerical Processing --> 9 <dependency> 10 <groupId>org.nd4j</groupId> 11 <artifactId>${nd4j.backend}</artifactId> 12 <version>${1.0.0-M2.1}</version> 13 </dependency> 14 15 <!-- DataVec for working with data --> 16 <dependency> 17 <groupId>org.datavec</groupId> 18 <artifactId>datavec-api</artifactId> 19 <version>${1.0.0-M2.1}</version> 20 </dependency> 21 22 <!-- DeepLearning4j Core --> 23 <dependency> 24 <groupId>org.deeplearning4j</groupId> 25 <artifactId>deeplearning4j-core</artifactId> 26 <version>${1.0.0-M2.1}</version> 27 </dependency> 28 29 <!-- DeepLearning4j NLP for Text Processing --> 30 <dependency> 31 <groupId>org.deeplearning4j</groupId> 32 <artifactId>deeplearning4j-nlp</artifactId> 33 <version>${1.0.0-M2.1}</version> 34 </dependency> 35 36 <!-- Apache Commons CSV --> 37 <dependency> 38 <groupId>org.apache.commons</groupId> 39 <artifactId>commons-csv</artifactId> 40 <version>1.9.0</version> 41 </dependency> 42 43 <!-- MongoDB Driver --> 44 <dependency> 45 <groupId>org.mongodb</groupId> 46 <artifactId>mongodb-driver-sync</artifactId> 47 <version>5.2.0</version> 48 </dependency> 49 50 <dependency> 51 <groupId>org.mongodb</groupId> 52 <artifactId>mongodb-driver-core</artifactId> 53 <version>5.2.0</version> 54 </dependency> 55 56 <dependency> 57 <groupId>org.mongodb</groupId> 58 <artifactId>bson</artifactId> 59 <version>5.2.0</version> 60 </dependency> 61 62 <!-- JUnit for testing --> 63 <dependency> 64 <groupId>org.junit.jupiter</groupId> 65 <artifactId>junit-jupiter-api</artifactId> 66 <version>5.7.0</version> 67 <scope>test</scope> 68 </dependency> 69 70 </dependencies>
- Spring Boot Starter: This dependency sets up a Spring Boot web application with all the required web API components.
- ND4J and DataVec: These are the core libraries from the Deeplearning4j ecosystem. ND4J handles numerical computations, and DataVec processes data for machine learning tasks.
- Deeplearning4j Core and NLP: This provides us with the deep learning functionality and natural language processing (NLP) tools we'll use to embed song lyrics into vectors.
- MongoDB Driver: This allows us to connect and interact with MongoDB from our Java application.
- Apache Commons CSV: This is used to read song data from a CSV file for processing and storage.
This tutorial revolves around two models:
Song
and Playlist
.The
Song
model represents an individual song and includes fields for the song's title, artist, lyrics, and embedding vector. Feel free to modify this to store whatever data you need for your application. With MongoDB documents, your data is stored alongside your vectors:1 import java.util.List; 2 3 public class Song { 4 private String title; 5 private String artist; 6 private String lyrics; 7 private List<Double> embedding; 8 9 public Song(String title, String artist, String lyrics, List<Double> embedding) { 10 this.title = title; 11 this.artist = artist; 12 this.lyrics = lyrics; 13 this.embedding = embedding; 14 } 15 16 public String getTitle() { 17 return title; 18 } 19 20 public void setTitle(String title) { 21 this.title = title; 22 } 23 24 public String getArtist() { 25 return artist; 26 } 27 28 public void setArtist(String artist) { 29 this.artist = artist; 30 } 31 32 public String getLyrics() { 33 return lyrics; 34 } 35 36 public void setLyrics(String lyrics) { 37 this.lyrics = lyrics; 38 } 39 40 public List<Double> getEmbedding() { 41 return embedding; 42 } 43 44 public void setEmbedding(List<Double> embedding) { 45 this.embedding = embedding; 46 } 47 }
The
Playlist
model consists of a playlist name and a list of Song
objects:1 import java.util.List; 2 3 public class Playlist { 4 5 private String playlistName; 6 private List<Song> songs; 7 8 public Playlist() { 9 } 10 11 public Playlist(String playlistName, List<Song> songs) { 12 this.playlistName = playlistName; 13 this.songs = songs; 14 } 15 16 public String getPlaylistName() { 17 return playlistName; 18 } 19 20 public void setPlaylistName(String playlistName) { 21 this.playlistName = playlistName; 22 } 23 24 public List<Song> getSongs() { 25 return songs; 26 } 27 28 public void setSongs(List<Song> songs) { 29 this.songs = songs; 30 } 31 }
These models will be used to structure the data we store in MongoDB and fetch for creating our funky little playlists.
In this section, we'll focus on how to convert song lyrics into vector embeddings using GloVe (Global Vectors for Word Representation) embeddings. These embeddings capture the semantic meaning of each word in the lyrics, allowing us to compare songs based on their lyrical content.
To do this, we'll create a service package, and the
EmbeddingService
class, which will:- Load the pre-trained GloVe model: This model contains vector representations of words. We’ll use the 300-dimensional GloVe embeddings for this.
- Tokenize the song lyrics: We'll split the lyrics into individual words, removing any unwanted text.
- Generate a vector for each song: By averaging the embeddings for each word in the lyrics, we'll create a single vector that represents the entire song.
Let's get coding!
We need to load the GloVe model from a text file and store each word’s corresponding vector. Here's the code for loading the model:
1 2 public class EmbeddingService { 3 4 private final Map<String, INDArray> gloveEmbeddings = new HashMap<>(); 5 private final DefaultTokenizerFactory tokenizerFactory = new DefaultTokenizerFactory(); 6 private final Set<String> stopWords; // Set of stop words for filtering 7 8 9 private String preTrainedGlovePath; 10 11 public EmbeddingService() { 12 tokenizerFactory.setTokenPreProcessor(new CommonPreprocessor()); 13 stopWords = loadStopWords(); 14 } 15 16 17 private void init() throws IOException { 18 loadGloveModel(preTrainedGlovePath); 19 } 20 21 private void loadGloveModel(String preTrainedGlovePath) throws IOException { 22 InputStream gloveStream = getClass().getResourceAsStream("/glove.840B.300d.txt"); 23 if (gloveStream == null) { 24 throw new IOException("GloVe model not found in resources: " + preTrainedGlovePath); 25 } 26 27 try (BufferedReader reader = new BufferedReader(new InputStreamReader(gloveStream))) { 28 String line; 29 while ((line = reader.readLine()) != null) { 30 String[] split = line.split(" "); 31 String word = split[0]; 32 float[] vector = new float[300]; // 300-dimensional GloVe vector 33 for (int i = 1; i < split.length; i++) { 34 vector[i - 1] = Float.parseFloat(split[i]); 35 } 36 INDArray wordVector = Nd4j.create(vector); 37 gloveEmbeddings.put(word, wordVector); 38 } 39 } 40 } 41 42 private Set<String> loadStopWords() { 43 return new HashSet<>(Arrays.asList( 44 "the", "a", "an", "and", "is", "in", "at", "of", "to", "for", "with", "on", "by", "this", "that", "it", "i", "you", "they", "we", "but", "or", "as", "if", "when" 45 )); 46 }
We're loading a 300-dimensional embedding for each word from the
glove.840B.300d.txt
file. We then set up a tokenizer that will split text into individual words. We are lastly uploading a list of stop words that will help us with embedding text. These are words that don't provide a lot of semantic meaning to the text and can devalue our embedding. This is not a comprehensive list but will do for a demo.Note: PostConstruct is an annotation that ensures the GloVe model is loaded as soon as the Spring application starts.
Next, we need to split the song lyrics into words and filter out any stop words. We are also removing anything in bracketed text. This is because the dataset I am using provides information like chorus or verse 2 in bracketed text, and I want to clean up the data before I generate the embeddings.
1 private String removeBracketedText(String text) { 2 return text.replaceAll("\\[.*?]", "").trim(); 3 } 4 5 public List<String> tokenizeText(String text) { 6 text = removeBracketedText(text); 7 8 Tokenizer tokenizer = tokenizerFactory.create(text); 9 List<String> tokens = new ArrayList<>(); 10 while (tokenizer.hasMoreTokens()) { 11 String token = tokenizer.nextToken(); 12 if (!stopWords.contains(token)) { 13 tokens.add(token); 14 } 15 } 16 return tokens; 17 }
Tokenizer Factory: We're using the Deeplearning4j
DefaultTokenizerFactory
to tokenize the lyrics. This method breaks the input text into individual words (tokens).Once we have the individual words (tokens), we can create a single vector that represents the entire song by averaging the embeddings for each word.
1 public INDArray getEmbeddingForWord(String word) { 2 return gloveEmbeddings.getOrDefault(word, null); 3 } 4 5 public List<Double> getEmbeddingForText(List<String> tokens) { 6 INDArray embedding = null; 7 int validTokenCount = 0; 8 9 for (String token : tokens) { 10 INDArray wordVector = getEmbeddingForWord(token); 11 if (wordVector != null) { 12 if (embedding == null) { 13 embedding = wordVector.dup(); // Duplicate the word vector 14 } else { 15 embedding.addi(wordVector); // Sum the word embeddings 16 } 17 validTokenCount++; 18 } 19 } 20 21 if (embedding != null && validTokenCount > 0) { 22 embedding.divi(validTokenCount); 23 return convertINDArrayToDoubleList(embedding); 24 } 25 return Collections.emptyList(); // Return an empty list if no valid embeddings 26 } 27 28 private List<Double> convertINDArrayToDoubleList(INDArray indArray) { 29 double[] array = indArray.toDoubleVector(); 30 List<Double> doubleList = new ArrayList<>(); 31 for (double value : array) { 32 doubleList.add(value); 33 } 34 return doubleList; 35 } 36 37 public List<Double> embedText(String text) { 38 List<String> tokens = tokenizeText(text); 39 return getEmbeddingForText(tokens); 40 }
We calculate the average of all word embeddings in the lyrics to create a single vector for the song. If a word is not found in the GloVe embeddings, we skip it. This is where a model specifically trained on song lyrics would be particularly useful.
Since MongoDB does not directly support
INDArray
, we convert the result to a List<Double>
.This service provides the embedding that will later be stored in MongoDB and used for searching similar songs. With
EmbeddingService
written and ready, how do we store the embeddings in MongoDB and later query them to generate playlists?With our song embeddings generated, we need to connect to MongoDB and store the data.
Let’s break down the MongoDB code into two parts:
- MongoDB configuration: Setting up the connection to our MongoDB instance
- MongoDB repository: Storing and querying song data
First, we need to configure our MongoDB connection. This is where the
MongoDBConfig
class comes in. We’ll create a shared MongoClient
that will handle communication with MongoDB.1 import com.mongodb.client.MongoClient; 2 import com.mongodb.client.MongoClients; 3 import org.springframework.context.annotation.Bean; 4 import org.springframework.context.annotation.Configuration; 5 import org.springframework.beans.factory.annotation.Value; 6 7 8 public class MongoDBConfig { 9 10 11 private String mongoUri; 12 13 14 public MongoClient mongoClient() { 15 return MongoClients.create(mongoUri); 16 } 17 }
@Configuration: This annotation marks the class as a configuration component in Spring. It tells Spring that this class contains bean definitions.
We use the
MongoClients.create()
method to create a MongoDB client using the URI specified in the application.properties
file.Add the MongoDB URI in
application.properties
:1 mongodb.uri=mongodb+srv://<username>:<password>@cluster.mongodb.net/?retryWrites=true&w=majority
We'll also add the database and collection name:
1 mongodb.database=music 2 mongodb.collection=songs
Next, we’ll create a
MongoDBRepository
class that interacts with the MongoDB database. This repository will house our database interactions to store song data, and perform a vector search to retrieve similar songs based on embeddings.1 import com.mongodb.client.MongoClient; 2 import com.mongodb.client.MongoCollection; 3 import com.mongodb.client.MongoDatabase; 4 import org.bson.Document; 5 import org.example.model.Song; 6 import org.springframework.beans.factory.annotation.Value; 7 import org.springframework.stereotype.Repository; 8 9 import java.util.ArrayList; 10 import java.util.Arrays; 11 import java.util.List; 12 13 14 public class MongoDBRepository { 15 16 private final MongoCollection<Document> songCollection; 17 18 public MongoDBRepository(MongoClient mongoClient, 19 String databaseName, 20 String collectionName) { 21 MongoDatabase database = mongoClient.getDatabase(databaseName); 22 this.songCollection = database.getCollection(collectionName); 23 } 24 25 /** 26 * Store the song embedding along with song details 27 * 28 * @param lyrics The song lyrics 29 * @param title The song title 30 * @param artist The artist name 31 * @param embedding The vector embedding for the song 32 */ 33 public void storeEmbedding(String lyrics, String title, String artist, List<Double> embedding) { 34 Document songDocument = new Document() 35 .append("title", title) 36 .append("artist", artist) 37 .append("lyrics", lyrics) 38 .append("embedding", embedding); 39 songCollection.insertOne(songDocument); 40 } 41 42 /** 43 * Fetch similar songs using MongoDB's $vectorSearch aggregation based on the playlist embedding. 44 * 45 * @param playlistEmbedding The playlist title embedding used as the query vector 46 * @return List of Song objects representing similar songs 47 */ 48 public List<Song> getSimilarSongs(List<Double> playlistEmbedding) { 49 List<Document> similarSongsDocs = new ArrayList<>(); 50 51 // Perform the vector search using the generated embedding 52 String indexName = "vector_index"; 53 int numCandidates = 150; 54 int limit = 10; 55 56 List<Document> pipeline = Arrays.asList( 57 new Document("$vectorSearch", 58 new Document("index", indexName) 59 .append("path", "embedding") 60 .append("queryVector", playlistEmbedding) 61 .append("numCandidates", numCandidates) 62 .append("limit", limit) 63 ), 64 new Document("$limit", limit) 65 ); 66 67 try { 68 songCollection.aggregate(pipeline).into(similarSongsDocs); 69 } catch (Exception e) { 70 throw new RuntimeException("Failed to retrieve similar songs", e); 71 } 72 73 74 List<Song> similarSongs = new ArrayList<>(); 75 for (Document doc : similarSongsDocs) { 76 Song song = new Song( 77 doc.getString("title"), 78 doc.getString("artist"), 79 doc.getString("lyrics"), 80 doc.getList("embedding", Double.class) 81 ); 82 similarSongs.add(song); 83 } 84 85 return similarSongs; 86 } 87 }
The method
storeEmbedding()
stores the song’s title, artist, lyrics, and embedding as a document in MongoDB.The method
getSimilarSongs()
performs a vector search using MongoDB's $vectorSearch
operation. It takes the embedding for the playlist name and retrieves a list of songs with similar embeddings.The last step of configuring our database to get ourselves set up is to create the vectorSearch index for the song embeddings stored in the database.
We will be indexing the
embedding
field from the songs
collection in our MongoDB Atlas database. This field contains the vector representation of each song’s lyrics we generated.- Log in to MongoDB Atlas and go to the Clusters page for our project.
- In the sidebar, we navigate to Atlas Search under the Services heading.
- Let’s click Create Search Index.
- In the modal that appears:
- Index Name: Enter a unique name for your index (e.g.,
vector_index
). - Database: Select your database (e.g.,
music
). - Collection: Select your collection (e.g.,
songs
).
- Choose JSON Editor and click Next.
- Define the index using the following JSON structure:
1 { 2 "fields": [ 3 { 4 "type": "vector", 5 "path": "embedding", 6 "numDimensions": 300, // The number of dimensions of the vectors 7 "similarity": "dotProduct" // Similarity metric (cosine, euclidean, or dotProduct) 8 } 9 ] 10 }
- type: Specifies that the field is a vector type (used for embeddings).
- path: The name of the field you’re indexing (
embedding
, in our case). - numDimensions: The number of dimensions in the vector. Since we’re using GloVe embeddings, this is 300.
- similarity: Defines the similarity metric. In our case, we use
"cosine"
to measure similarity based on the angle between vectors.
- Click Next to review your index configuration.
- Click Create Search Index.
Once the index is created, Atlas will start building the index, and you’ll be able to use
$vectorSearch
queries to find songs based on their embeddings.The core functionality of our application lies in creating a playlist based on our inputted playlist title. To do this, we need to create an embedding for our title, just as we would for our song lyrics. We then use Atlas Vector Search with our embedded title to query the MongoDB database, and find the most semantically similar song.
This implementation will go in our
PlaylistService
, in our Service
package, which will rely on our existing EmbeddingService
(to generate embeddings for the playlist name) and MongoDBRepository
(to perform the operations on our MongoDB Database).Let’s break down the
PlaylistService
class step-by-step.We start by marking
PlaylistService
as a Spring service and injecting the necessary dependencies: EmbeddingService
for generating the embeddings and MongoDBRepository
for our MongoDB database operations.1 2 public class PlaylistService { 3 4 private final EmbeddingService embeddingService; 5 private final MongoDBRepository songRepository; 6 7 public PlaylistService(EmbeddingService embeddingService, MongoDBRepository songRepository) { 8 this.embeddingService = embeddingService; 9 this.songRepository = songRepository; 10 } 11 }
Now, we're going to add the method
generatePlaylist(String playlistName)
:1 public Playlist generatePlaylist(String playlistName) { 2 // Generate the embedding for the playlist title 3 List<Double> playlistEmbedding = embeddingService.embedText(playlistName); 4 5 if (playlistEmbedding == null || playlistEmbedding.isEmpty()) { 6 throw new RuntimeException("Failed to generate embedding for playlist: " + playlistName); 7 } 8 9 // Query the database to find similar songs 10 List<Song> similarSongs = songRepository.getSimilarSongs(playlistEmbedding); 11 12 // Construct and return the Playlist 13 return new Playlist(playlistName, similarSongs); 14 }
When a user provides a playlist name, we need to convert that name into a vector representation that can capture the semantic meaning of the text.
After generating the embedding, the next step is to find songs with similar embeddings. We pass the playlist embedding to
MongoDBRepository
, which performs a vector search to find songs that match the vibe of the playlist name.Once we’ve retrieved the similar songs from MongoDB, we create and return a
Playlist
object that contains the playlist name and the list of songs.The next step is to expose this functionality via a REST API (if you so desire).
To ensure our playlist generator is working correctly, we need to load sample data into MongoDB, then test generating a playlist based on a user-provided name. We're going to load data from a CSV file and create endpoints for testing the playlist generation.
Before we can test the playlist generation, we need song data (lyrics) stored in MongoDB. We'll use a CSV file (
song_lyrics.csv
) that contains the song title, artist, lyrics, and other metadata. The SongLyricsProcessor
class will handle reading this CSV and storing the processed data in MongoDB.Here’s the code for
SongLyricsProcessor
:1 import org.apache.commons.csv.CSVFormat; 2 import org.apache.commons.csv.CSVParser; 3 import org.apache.commons.csv.CSVRecord; 4 import org.example.repository.MongoDBRepository; 5 import org.example.service.EmbeddingService; 6 import org.springframework.stereotype.Component; 7 8 import java.io.InputStreamReader; 9 import java.io.Reader; 10 import java.util.List; 11 import java.util.Objects; 12 13 14 public class SongLyricsProcessor { 15 16 private final EmbeddingService embeddingService; 17 private final MongoDBRepository mongoDBRepository; 18 19 public SongLyricsProcessor(EmbeddingService embeddingService, MongoDBRepository mongoDBRepository) { 20 this.embeddingService = embeddingService; 21 this.mongoDBRepository = mongoDBRepository; 22 } 23 24 public void processAndStoreLyrics(String csvFilePath) throws Exception { 25 Reader reader = new InputStreamReader( 26 Objects.requireNonNull(getClass().getClassLoader().getResourceAsStream(csvFilePath)) 27 ); 28 29 CSVFormat csvFormat = CSVFormat.DEFAULT.builder() 30 .setHeader() 31 .setSkipHeaderRecord(true) 32 .build(); 33 34 CSVParser csvParser = new CSVParser(reader, csvFormat); 35 36 for (CSVRecord record : csvParser) { 37 String title = record.get("title"); 38 String artist = record.get("artist"); 39 String lyrics = record.get("lyrics"); 40 String languageCld3 = record.get("language_cld3"); 41 String languageFt = record.get("language_ft"); 42 43 // Check if both language_cld3 and language_ft are 'en' (English) 44 if (!"en".equalsIgnoreCase(languageCld3) || !"en".equalsIgnoreCase(languageFt)) { 45 continue; 46 } 47 48 // Tokenize the lyrics and generate the embedding 49 List<Double> lyricsEmbedding = embeddingService.embedText(lyrics); 50 51 // Store the title, artist, lyrics, and embedding in MongoDB 52 mongoDBRepository.storeEmbedding(lyrics, title, artist, lyricsEmbedding); 53 } 54 55 csvParser.close(); 56 } 57 }
We read the
song_lyrics.csv
file. Each record in the file contains information about a song, such as its title, artist, and lyrics. We’ll store some of this information as metadata.For each song, we generate an embedding for the lyrics.The processed data (title, artist, lyrics, and embedding) is stored in MongoDB.
To trigger the data loading, we expose an endpoint
/loadSampleData
in the PlaylistController
. This will allow us to load the CSV data into MongoDB by sending a request with the file name.This is quite a haphazard way of implementing this and is NOT recommended for production, but is absolutely fine for this silly little demo.
1 2 public class PlaylistController { 3 4 private final PlaylistService playlistService; 5 private final SongLyricsProcessor songLyricsProcessor; 6 7 public PlaylistController(PlaylistService playlistService, SongLyricsProcessor songLyricsProcessor) { 8 this.playlistService = playlistService; 9 this.songLyricsProcessor = songLyricsProcessor; 10 } 11 12 13 public void loadSampleData( String fileName) throws Exception { 14 songLyricsProcessor.processAndStoreLyrics(fileName); 15 } 16 }
- /loadSampleData Endpoint: This endpoint triggers the
processAndStoreLyrics()
method inSongLyricsProcessor
. You pass the CSV file name as a query parameter (e.g.,/loadSampleData?fileName=song_lyrics.csv
). - SongLyricsProcessor: The injected
SongLyricsProcessor
reads the CSV file, processes the song data, generates embeddings, and stores everything in MongoDB.
Once the data is loaded, we need a way to test playlist generation. For this, we’ll expose a
/newPlaylist
endpoint in PlaylistController
. When a user provides a playlist name, we’ll generate an embedding for that name and query MongoDB for similar songs.1 2 public Playlist newPlaylist( String playlistName) { 3 return playlistService.generatePlaylist(playlistName); 4 }
/newPlaylist Endpoint takes a
playlistName
query parameter (e.g., /newPlaylist?playlistName=sad%20girl%20wistful%20Friday%20evening
) and returns a playlist generated based on that name.So here we are. Will it be time for a "club dance party Saturday night" or a "heartbroken scream Monday morning".
Well, let's first load our data into the database:
1 curl -X GET "http://localhost:8080/loadSampleData?fileName=song_lyrics.csv"
Now, this will take some time. We have an awful lot of songs going up. So make a cuppa, put on some Willie Nelson, and forget the world around you (or monitor the MongoDB Atlas dashboard to verify the writes to the database). Once this has completed, or you lose patience and decide several thousand songs is enough for your proof of concept and manually interrupt the process, let's make our playlist.
Let's sit in our melancholy and ask for our worst case scenario playlist.
1 curl -X GET "http://localhost:8080/newPlaylist?playlistName=sad%20girl%20wistful%20Friday%20evening"
Well, if everything goes right, you should see something like this in your console.
1 { 2 "playlistName": "sad girl wistful Friday evening", 3 "songs": [ 4 { 5 "title": "When A Woman Loves", 6 "artist": "R. Kelly", 7 "lyrics": "...", 8 "embedding": "...", 9 }, 10 { 11 "title": "Lonely", 12 "artist": "Akon", 13 "lyrics": "...", 14 "embedding": "...", 15 }, 16 { 17 "title": "Monster", 18 "artist": "Lady Gaga", 19 "lyrics": "...", 20 "embedding": "...", 21 }, 22 { 23 "title": "All the Boys", 24 "artist": "Keri Hilson", 25 "lyrics": "...", 26 "embedding": "...", 27 } 28 ] 29 }
It’s a far from perfect implementation, but given the limitations we've accepted with using a general language model on song lyrics, both Akon and Lady Gaga in the top four results is a pretty good playlist.
In this tutorial, we’ve walked through how to build a custom playlist generator using Deeplearning4j to embed song lyrics and MongoDB Atlas Vector Search to query similar songs. While this implementation has limitations, such as using pre-trained embeddings, it opens up a world of possibilities for generating playlists based on funky names and vibes.
If you found this tutorial useful, check out the MongoDB Developer Center for more Java tutorials with MongoDB, and learn how to do stuff like retrieval-augmented generation with MongoDB and Spring AI. Or head over to the MongoDB community forums to ask questions, and see what other people are building with MongoDB.
Top Comments in Forums
There are no comments on this article yet.