Docs Menu
Docs Home
/ / /
Scala
/

Store Large Files by Using GridFS

On this page

  • Overview
  • How GridFS Works
  • Create a GridFS Bucket
  • Upload Files
  • Retrieve File Information
  • Download Files
  • Rename Files
  • Delete Files
  • API Documentation

In this guide, you can learn how to store and retrieve large files in MongoDB by using GridFS. GridFS is a specification that describes how to split files into chunks when storing them and reassemble them when retrieving them. The Scala driver's implementation of GridFS is an abstraction that manages the operations and organization of the file storage.

Use GridFS if the size of your files exceeds the BSON document size limit of 16MB. For more detailed information on whether GridFS is suitable for your use case, see GridFS in the MongoDB Server manual.

The following sections describe GridFS operations and how to perform them.

GridFS organizes files in a bucket, a group of MongoDB collections that contain the chunks of files and information describing them. The bucket contains the following collections, named using the convention defined in the GridFS specification:

  • The chunks collection stores the binary file chunks.

  • The files collection stores the file metadata.

When you create a new GridFS bucket, the driver creates the fs.chunks and fs.files collections, unless you specify a different name in the GridFSBucket() constructor. The driver also creates an index on each collection to ensure efficient retrieval of the files and related metadata. The driver creates the GridFS bucket, if it doesn't exist, only when the first write operation is performed. The driver creates indexes only if they don't exist and when the bucket is empty. For more information about GridFS indexes, see GridFS Indexes in the MongoDB Server manual.

When storing files with GridFS, the driver splits the files into smaller chunks, each represented by a separate document in the chunks collection. It also creates a document in the files collection that contains a file ID, file name, and other file metadata. You can upload the file from memory or from a stream. See the following diagram to see how GridFS splits the files when uploaded to a bucket.

A diagram that shows how GridFS uploads a file to a bucket

When retrieving files, GridFS fetches the metadata from the files collection in the specified bucket and uses the information to reconstruct the file from documents in the chunks collection. You can read the file into memory or output it to a stream.

To store or retrieve files from GridFS, create a GridFS bucket by calling the GridFSBucket() constructor and passing in a MongoDatabase instance. You can use the GridFSBucket instance to call read and write operations on the files in your bucket.

val bucket = GridFSBucket(database)

To create or reference a bucket with a custom name other than the default name fs, pass your bucket name as the second parameter to the GridFSBucket() constructor, as shown in the following example:

val filesBucket = GridFSBucket(database, "files")

The GridFSBucket.uploadFromObservable() method reads the contents of an Observable[ByteBuffer] and saves it to the GridFSBucket instance.

You can use the GridFSUploadOptions type to configure the chunk size or include additional metadata.

The following example uploads the contents of an Observable[ByteBuffer] into GridFSBucket:

// Get the input stream
val observableToUploadFrom = Observable(
Seq(ByteBuffer.wrap("MongoDB Tutorial".getBytes(StandardCharsets.UTF_8)))
)
// Create some custom options
val options = new GridFSUploadOptions()
.chunkSizeBytes(358400)
.metadata(Document("type" -> "presentation"))
// Upload the file
val fileIdObservable = filesBucket.uploadFromObservable("mongodb-tutorial", observableToUploadFrom, options)
val fileId = Await.result(fileIdObservable.toFuture(), Duration(10, TimeUnit.SECONDS))
println(s"File uploaded with id: ${fileId.toHexString}")

In this section, you can learn how to retrieve file metadata stored in the files collection of the GridFS bucket. The metadata contains information about the file it refers to, including:

  • The _id of the file

  • The name of the file

  • The length/size of the file

  • The upload date and time

  • A metadata document in which you can store any other information

To learn more about fields you can retrieve from the files collection, see the GridFS Files Collection documentation in the MongoDB Server manual.

To retrieve files from a GridFS bucket, call the find() method on the GridFSBucket instance. The following code example retrieves and prints file metadata from all files in a GridFS bucket:

val filesObservable = filesBucket.find()
val results = Await.result(filesObservable.toFuture(), Duration(10, TimeUnit.SECONDS))
results.foreach(file => println(s" - ${file.getFilename}"))

To learn more about querying MongoDB, see Retrieve Data.

The downloadToObservable() method returns an Observable[ByteBuffer] that reads the contents from MongoDB.

To download a file by its file _id, pass the _id to the method. The following example downloads a file by its file _id:

val downloadObservable = filesBucket.downloadToObservable("<example file ID>")
val downloadById = Await.result(downloadObservable.toFuture(), Duration(10, TimeUnit.SECONDS))

If you don't know the _id of the file but know the filename, then you can pass the filename to the downloadToObservable() method. The following example downloads a file named mongodb-tutorial:

val downloadObservable = filesBucket.downloadToObservable("mongodb-tutorial")
val downloadById = Await.result(downloadObservable.toFuture(), Duration(10, TimeUnit.SECONDS))

Note

If there are multiple documents with the same filename value, GridFS will fetch the most recent file with the given name (as determined by the uploadDate field).

Use the rename() method to update the name of a GridFS file in your bucket. You must specify the file to rename by its _id field rather than its file name.

The following example renames a file to mongodbTutorial:

val renameObservable = filesBucket.rename("<example file ID>", "mongodbTutorial")
Await.result(renameObservable.toFuture(), Duration(10, TimeUnit.SECONDS))

Note

The rename() method supports updating the name of only one file at a time. To rename multiple files, retrieve a list of files matching the file name from the bucket, extract the _id field from the files you want to rename, and pass each value in separate calls to the rename() method.

Use the delete() method to remove a file's collection document and associated chunks from your bucket. You must specify the file by its _id field rather than its file name.

The following example deletes a file by its _id:

val deleteObservable = filesBucket.delete("<example file ID>")
Await.result(deleteObservable.toFuture(), Duration(10, TimeUnit.SECONDS))

Note

The delete() method supports deleting only one file at a time. To delete multiple files, retrieve the files from the bucket, extract the _id field from the files you want to delete, and pass each value in separate calls to the delete() method.

To learn more about using GridFS to store and retrieve large files, see the following API documentation:

Back

Transactions