/ /

Store Large Files with GridFS

Overview

In this guide, you can learn how to store and retrieve large files in MongoDB using GridFS. GridFS is a specification that describes how to split files into chunks during storage and reassemble them during retrieval. The driver implementation of GridFS manages the operations and organization of the file storage.

Use GridFS if the size of your file exceeds the BSON-document size limit of 16 megabytes. For more detailed information on whether GridFS is suitable for your use case, see the GridFS Server manual page.

Navigate the following sections to learn more about GridFS operations and implementation:

Create a GridFS Bucket
Upload Files
Retrieve File Information
Download Files
Rename Files
Delete Files
Delete a GridFS Bucket

How GridFS Works

GridFS organizes files in a bucket, a group of MongoDB collections that contain the chunks of files and descriptive information. Buckets contain the following collections, named using the convention defined in the GridFS specification:

The chunks collection stores the binary file chunks.
The files collection stores the file metadata.

When you create a new GridFS bucket, the driver creates the chunks and files collections, prefixed with the default bucket name fs, unless you specify a different name. The driver also creates an index on each collection to ensure efficient retrieval of files and related metadata. The driver only creates the GridFS bucket on the first write operation if it does not already exist. The driver only creates indexes if they do not exist and when the bucket is empty. For more information on GridFS indexes, see the Server manual page on GridFS Indexes.

When storing files with GridFS, the driver splits the files into smaller pieces, each represented by a separate document in the chunks collection. It also creates a document in the files collection that contains a unique file id, file name, and other file metadata. You can upload the file from memory or from a stream. The following diagram describes how GridFS splits files when uploading to a bucket:

A diagram that shows how GridFS uploads a file to a bucket

When retrieving files, GridFS fetches the metadata from the files collection in the specified bucket and uses the information to reconstruct the file from documents in the chunks collection. You can read the file into memory or output it to a stream.

Create a GridFS Bucket

Create a bucket or get a reference to an existing one to begin storing or retrieving files from GridFS. Create a GridFSBucket instance, passing a database as the parameter. You can then use the GridFSBucket instance to call read and write operations on the files in your bucket:

const db = client.db(dbName);
const bucket = new mongodb.GridFSBucket(db);

Pass your bucket name as the second parameter to the create() method to create or reference a bucket with a custom name other than the default name fs, as shown in the following example:

const bucket = new mongodb.GridFSBucket(db, { bucketName: 'myCustomBucket' });

For more information, see the GridFSBucket API documentation.

Upload Files

Use the openUploadStream() method from GridFSBucket to create an upload stream for a given file name. You can use the pipe() method to connect a Node.js read stream to the upload stream. The openUploadStream() method allows you to specify configuration information such as file chunk size and other field/value pairs to store as metadata.

The following example shows how to pipe a Node.js read stream, represented by the variable fs, to the openUploadStream() method of a GridFSBucket instance:

fs.createReadStream('./myFile').
     pipe(bucket.openUploadStream('myFile', {
         chunkSizeBytes: 1048576,
         metadata: { field: 'myField', value: 'myValue' }
     }));

See the openUploadStream() API documentation for more information.

Retrieve File Information

In this section, you can learn how to retrieve file metadata stored in the files collection of the GridFS bucket. The metadata contains information about the file it refers to, including:

The _id of the file
The name of the file
The length/size of the file
The upload date and time
A metadata document in which you can store any other information

Call the find() method on the GridFSBucket instance to retrieve files from a GridFS bucket. The method returns a FindCursor instance from which you can access the results.

The following code example shows you how to retrieve and print file metadata from all your files in a GridFS bucket. Among the different ways that you can traverse the retrieved results from the FindCursor iterable, the following example uses the for await...of syntax to display the results:

const cursor = bucket.find({});
for await (const doc of cursor) {
   console.log(doc);
}

The find() method accepts various query specifications and can be combined with other methods such as sort(), limit(), and project().

For more information on the classes and methods mentioned in this section, see the following resources:

Download Files

You can download files from your MongoDB database by using the openDownloadStreamByName() method from GridFSBucket to create a download stream.

The following example shows you how to download a file referenced by the file name, stored in the filename field, into your working directory:

bucket.openDownloadStreamByName('myFile').
     pipe(fs.createWriteStream('./outputFile'));

Note

If there are multiple documents with the same filename value, GridFS will stream the most recent file with the given name (as determined by the uploadDate field).

Alternatively, you can use the openDownloadStream() method, which takes the _id field of a file as a parameter:

bucket.openDownloadStream(ObjectId("60edece5e06275bf0463aaf3")).
     pipe(fs.createWriteStream('./outputFile'));

Note

The GridFS streaming API cannot load partial chunks. When a download stream needs to pull a chunk from MongoDB, it pulls the entire chunk into memory. The 255 kilobyte default chunk size is usually sufficient, but you can reduce the chunk size to reduce memory overhead.

For more information on the openDownloadStreamByName() method, see its API documentation.

Rename Files

Use the rename() method to update the name of a GridFS file in your bucket. You must specify the file to rename by its _id field rather than its file name.

Note

The rename() method only supports updating the name of one file at a time. To rename multiple files, retrieve a list of files matching the file name from the bucket, extract the _id field from the files you want to rename, and pass each value in separate calls to the rename() method.

The following example shows how to update the filename field to "newFileName" by referencing a document's _id field:

bucket.rename(ObjectId("60edece5e06275bf0463aaf3"), "newFileName");

For more information on this method, see the rename() API documentation.

Delete Files

Use the delete() method to remove a file from your bucket. You must specify the file by its _id field rather than its file name.

Note

The delete() method only supports deleting one file at a time. To delete multiple files, retrieve the files from the bucket, extract the _id field from the files you want to delete, and pass each value in separate calls to the delete() method.

The following example shows you how to delete a file by referencing its _id field:

bucket.delete(ObjectId("60edece5e06275bf0463aaf3"));

For more information on this method, see the delete() API documentation.

Delete a GridFS Bucket

Use the drop() method to remove a bucket's files and chunks collections, which effectively deletes the bucket. The following code example shows you how to delete a GridFS bucket:

bucket.drop();

For more information on this method, see the drop() API documentation.

Additional Resources

MongoDB GridFS specification

Back

Generate Custom _id Values

Operations on Replica Sets