Store Large Files

On this page

Overview

How GridFS Works
Create a GridFS Bucket
Customize the Bucket
Upload Files
Write to an Upload Stream
Upload an Existing Stream
Retrieve File Information
Example
Download Files
Read From a Download Stream
Download to an Existing Stream
Delete Files
API Documentation

Overview

In this guide, you can learn how to store and retrieve large files in MongoDB by using GridFS. GridFS is a specification implemented by the C++ driver that describes how to split files into chunks when storing them and reassemble those files when retrieving them. The driver's implementation of GridFS is an abstraction that manages the operations and organization of the file storage.

Use GridFS if the size of your files exceeds the BSON document size limit of 16MB. For more detailed information about whether GridFS is suitable for your use case, see GridFS in the MongoDB Server manual.

How GridFS Works

GridFS organizes files in a bucket, a group of MongoDB collections that contain the chunks of files and information describing them. The bucket contains the following collections, named using the convention defined in the GridFS specification:

chunks collection, which stores the binary file chunks
files collection, which stores the file metadata

The driver creates the GridFS bucket, if it doesn't exist, when you first write data to it. The bucket contains the preceding collections prefixed with the default bucket name fs, unless you specify a different name. To ensure efficient retrieval of the files and related metadata, the driver creates an index on each collection. The driver ensures that these indexes exist before performing read and write operations on the GridFS bucket.

For more information about GridFS indexes, see GridFS Indexes in the MongoDB Server manual.

When using GridFS to store files, the driver splits the files into smaller chunks, each represented by a separate document in the chunks collection. It also creates a document in the files collection that contains a file ID, file name, and other file metadata. You can upload the file by passing a stream to the C++ driver to consume or by creating a new stream and writing to it directly.

The following diagram shows how GridFS splits files when they are uploaded to a bucket:

A diagram that shows how GridFS uploads a file to a bucket

When retrieving files, GridFS fetches the metadata from the files collection in the specified bucket and uses the information to reconstruct the file from documents in the chunks collection. You can read the file by writing its contents to an existing stream or by creating a new stream that points to the file.

Create a GridFS Bucket

To begin storing or retrieving files from GridFS, call the gridfs_bucket() method on your database. This method accesses an existing bucket or creates a new bucket if one does not exist.

The following example calls the gridfs_bucket() method on the db database:

auto bucket = db.gridfs_bucket();

Customize the Bucket

You can customize the GridFS bucket configuration by passing an instance of the mongocxx::options::gridfs::bucket class as an optional argument to the gridfs_bucket() method. The following table describes the fields you can set in a mongocxx::options::gridfs::bucket instance:

Field	Description
`bucket_name`	Specifies the bucket name to use as a prefix for the files and chunks collections. The default value is `"fs"`. Type: `std::string`
`chunk_size_bytes`	Specifies the chunk size that GridFS splits files into. The default value is `261120`. Type: `std::int32_t`
`read_concern`	Specifies the read concern to use for bucket operations. The default value is the database's read concern. Type: `mongocxx::read_concern`
`read_preference`	Specifies the read preference to use for bucket operations. The default value is the database's read preference. Type: `mongocxx::read_preference`
`write_concern`	Specifies the write concern to use for bucket operations. The default value is the database's write concern. Type: `mongocxx::write_concern`

The following example creates a bucket named "myCustomBucket" by setting the bucket_name field of a mongocxx::options::gridfs::bucket instance:

mongocxx::options::gridfs::bucket opts;
opts.bucket_name("myCustomBucket");
auto bucket = db.gridfs_bucket(opts);

Upload Files

You can upload files to a GridFS bucket by using the following methods:

open_upload_stream(): Opens a new upload stream to which you can write file contents
upload_from_stream(): Uploads the contents of an existing stream to a GridFS file

Write to an Upload Stream

Use the open_upload_stream() method to create an upload stream for a given file name. The open_upload_stream() method allows you to specify configuration information in an options::gridfs::upload instance, which you can pass as a parameter.

This example uses an upload stream to perform the following actions:

Sets the chunk_size_bytes field of an options instance
Opens a writable stream for a new GridFS file named "my_file" and applies the chunk_size_bytes option
Calls the write() method to write data to my_file, which the stream points to
Calls the close() method to close the stream pointing to my_file

mongocxx::options::gridfs::upload opts;
opts.chunk_size_bytes(1048576);
auto uploader = bucket.open_upload_stream("my_file", opts);
// ASCII for "HelloWorld"
std::uint8_t bytes[10] = {72, 101, 108, 108, 111, 87, 111, 114, 108, 100};
for (auto i = 0; i < 5; ++i) {
    uploader.write(bytes, 10);
}
uploader.close();

Upload an Existing Stream

Use the upload_from_stream() method to upload the contents of a stream to a new GridFS file. The upload_from_stream() method allows you to specify configuration information in an options::gridfs::upload instance, which you can pass as a parameter.

This example performs the following actions:

Opens a file located at /path/to/input_file as a stream in binary read mode
Calls the upload_from_stream() method to upload the contents of the stream to a GridFS file named "new_file"

std::ifstream file("/path/to/input_file", std::ios::binary);
bucket.upload_from_stream("new_file", &file);

Retrieve File Information

In this section, you can learn how to retrieve file metadata stored in the files collection of the GridFS bucket. The metadata contains information about the file it refers to, including:

The _id of the file
The name of the file
The length/size of the file
The upload date and time
A metadata document in which you can store other information

To retrieve files from a GridFS bucket, call the mongocxx::gridfs::bucket::find() method on your bucket. The method returns a mongocxx::cursor instance from which you can access the results. To learn more about cursors, see the Access Data From a Cursor guide.

Example

The following code example shows how to retrieve and print file metadata from files in a GridFS bucket. It uses a for loop to iterate through the returned cursor and display the contents of the files uploaded in the Upload Files examples:

auto cursor = bucket.find({});
for (auto&& doc : cursor) {
    std::cout << bsoncxx::to_json(doc) << std::endl;
}

{ "_id" : { "$oid" : "..." }, "length" : 13, "chunkSize" : 261120, "uploadDate" :
{ "$date" : ... }, "filename" : "new_file" }
{ "_id" : { "$oid" : "..." }, "length" : 50, "chunkSize" : 1048576, "uploadDate" :
{ "$date" : ... }, "filename" : "my_file" }

The find() method accepts various query specifications. You can use its mongocxx::options::find parameter to specify the sort order, maximum number of documents to return, and the number of documents to skip before returning. To view a list of available options, see the API documentation.

Download Files

You can download files from a GridFS bucket by using the following methods:

open_download_stream(): Opens a new download stream from which you can read the file contents
download_to_stream(): Writes the entire file to an existing download stream

Read From a Download Stream

You can download files from your MongoDB database by using the open_download_stream() method to create a download stream.

This example uses a download stream to perform the following actions:

Retrieves the _id value of the GridFS file named "new_file"
Passes the _id value to the open_download_stream() method to open the file as a readable stream
Creates a buffer vector to store the file contents
Calls the read() method to read the file contents from the downloader stream into the vector

auto doc = db["fs.files"].find_one(make_document(kvp("filename", "new_file")));
auto id = doc->view()["_id"].get_value();
auto downloader = bucket.open_download_stream(id);
std::vector<uint8_t> buffer(downloader.file_length());
downloader.read(buffer.data(), buffer.size());

Download to an Existing Stream

You can download the contents of a GridFS file to an existing stream by calling the download_to_stream() method on your bucket.

This example performs the following actions:

Opens a file located at /path/to/output_file as a stream in binary write mode
Retrieves the _id value of the GridFS file named "new_file"
Passes the _id value to download_to_stream() to download the file to the stream

std::ofstream output_file("/path/to/output_file", std::ios::binary);
auto doc = db["fs.files"].find_one(make_document(kvp("filename", "new_file")));
auto id = doc->view()["_id"].get_value();
bucket.download_to_stream(id, &output_file);

Delete Files

Use the delete_file() method to remove a file's collection document and associated chunks from your bucket. This effectively deletes the file. You must specify the file by its _id field rather than its file name.

The following example shows how to delete a file named "my_file" by passing its _id value to delete_file():

auto doc = db["fs.files"].find_one(make_document(kvp("filename", "my_file")));
auto id = doc->view()["_id"].get_value();
bucket.delete_file(id);

Note

File Revisions

The delete_file() method supports deleting only one file at a time. If you want to delete each file revision, or files with different upload times that share the same file name, collect the _id values of each revision. Then, pass each _id value in separate calls to the delete_file() method.

API Documentation

To learn more about using the C++ driver to store and retrieve large files, see the following API documentation:

Back

Bulk Write

Indexes