Docs Menu

Store Large Files

In this guide, you can learn how to store and retrieve large files in MongoDB by using GridFS. GridFS is a specification implemented by the C driver that describes how to split files into chunks when storing them and reassemble them when retrieving them. The driver's implementation of GridFS is an abstraction that manages the operations and organization of the file storage.

Use GridFS if the size of your files exceeds the BSON document size limit of 16MB. For more detailed information on whether GridFS is suitable for your use case, see GridFS in the MongoDB Server manual.

GridFS organizes files in a bucket, a group of MongoDB collections that contain the chunks of files and information describing them. The bucket contains the following collections, named using the convention defined in the GridFS specification:

  • The chunks collection stores the binary file chunks.

  • The files collection stores the file metadata.

The driver creates the GridFS bucket, if it doesn't exist, when you perform the first write operation. The bucket contains the preceding collections prefixed with the default bucket name fs, unless you specify a different name. To ensure efficient retrieval of the files and related metadata, the driver also creates an index on each collection if they don't exist and when the bucket is empty.

For more information about GridFS indexes, see GridFS Indexes in the MongoDB Server manual.

When using GridFS to store files, the driver splits the files into smaller chunks, each represented by a separate document in the chunks collection. It also creates a document in the files collection that contains a file ID, file name, and other file metadata. You can upload the file by passing a stream to the C driver to consume or creating a new stream and writing to it directly.

The following diagram shows how GridFS splits files when they are uploaded to a bucket:

A diagram that shows how GridFS uploads a file to a bucket

When you retrieve files from GridFS, it fetches the metadata from the files collection in the specified bucket and uses the information to reconstruct the file from documents in the chunks collection. You can read the file by writing its contents to an existing stream or creating a new stream that points to the file.

To use GridFS, first call the mongoc_gridfs_bucket_new() function. This function creates a new mongoc_gridfs_bucket_t structure or accesses an existing mongoc_gridfs_bucket_t and accepts the following parameters:

  • Database: Specifies the database in which to create the bucket

  • Options document: Specifies options to customize the bucket, or NULL

  • Read preference: Specifies the read preference to use for read operations, or NULL to inherit the database's read preference

  • Error location: Specifies a location for an error value, or NULL

The following example calls the mongoc_gridfs_bucket_new() function and passes the db database as a parameter:

mongoc_database_t *db = mongoc_client_get_database (client, "db");
bson_error_t error;
if (!mongoc_gridfs_bucket_new (db, NULL, NULL, &error)) {
fprintf (stderr, "Failed to create bucket: %s\n", error.message);
}

You can customize the GridFS bucket configuration by passing a BSON document that specifies option values to the mongoc_gridfs_bucket_new() function. The following table describes the options you can set in the document:

Option
Description

bucketName

Specifies the bucket name to use as a prefix for the files and chunks collections. The default value is "fs".
Type: string

chunkSizeBytes

Specifies the chunk size that GridFS splits files into. The default value is 255 kB.
Type: int32

readConcern

Specifies the read concern to use for bucket operations. The default value is the database's read concern.
Type: mongoc_read_concern_t

writeConcern

Specifies the write concern to use for bucket operations. The default value is the database's write concern.
Type: mongoc_write_concern_t

The following example creates a bucket named "myCustomBucket" by passing an options document to mongoc_gridfs_bucket_new() that sets the bucketName option:

mongoc_database_t *db = mongoc_client_get_database (client, "db");
bson_t opts = BSON_INITIALIZER;
BSON_APPEND_UTF8 (&opts, "bucketName", "myCustomBucket");
bson_error_t error;
if (!mongoc_gridfs_bucket_new (db, &opts, NULL, &error)) {
fprintf (stderr, "Failed to create bucket: %s\n", error.message);
}

You can upload files to a GridFS bucket by using the following functions:

  • mongoc_gridfs_bucket_open_upload_stream(): Opens a new upload stream to which you can write file contents

  • mongoc_gridfs_bucket_upload_from_stream(): Uploads the contents of an existing stream to a GridFS file

Use the mongoc_gridfs_bucket_open_upload_stream() function to create an upload stream for a given file name. The mongoc_gridfs_bucket_open_upload_stream() function allows you to specify configuration information in an options document, which you can pass as a parameter.

This example uses an upload stream to perform the following actions:

  • Opens a writable stream for a new GridFS file named "my_file"

  • Calls the mongoc_stream_write() function to write data to "my_file", which the stream points to

  • Calls the mongoc_stream_close() and mongoc_stream_destroy() functions to close and destroy the stream pointing to "my_file"

bson_error_t error;
mongoc_stream_t *upload_stream = mongoc_gridfs_bucket_open_upload_stream (bucket, "my_file", NULL, NULL, &error);
if (upload_stream == NULL) {
fprintf (stderr, "Failed to create upload stream: %s\n", error.message);
} else {
const char *data = "Data to store";
mongoc_stream_write (upload_stream, data, strlen(data), -1);
}
mongoc_stream_close (upload_stream);
mongoc_stream_destroy (upload_stream);

Use the mongoc_gridfs_bucket_upload_from_stream() function to upload the contents of a stream to a new GridFS file. The mongoc_gridfs_bucket_upload_from_stream() function allows you to specify configuration information in an options document, which you can pass as a parameter.

This example performs the following actions:

  • Calls the mongoc_stream_file_new_for_path() function to open a file located at /path/to/input_file as a stream in read only (O_RDONLY) mode

  • Calls the mongoc_gridfs_bucket_upload_from_stream() function to upload the contents of the stream to a GridFS file named "new_file"

  • Calls the mongoc_stream_close() and mongoc_stream_destroy() functions to close and destroy the stream

mongoc_stream_t *file_stream = mongoc_stream_file_new_for_path ("/path/to/input_file", O_RDONLY, 0);
bson_error_t error;
if (!mongoc_gridfs_bucket_upload_from_stream (bucket, "new_file", file_stream, NULL, NULL, &error)) {
fprintf (stderr, "Failed to upload file: %s\n", error.message);
}
mongoc_stream_close (file_stream);
mongoc_stream_destroy (file_stream);

In this section, you can learn how to retrieve file metadata stored in the files collection of the GridFS bucket. A file's metadata contains information about the file it refers to, including:

  • The _id of the file

  • The name of the file

  • The length/size of the file

  • The upload date and time

  • A metadata document in which you can store any other information

To retrieve files from a GridFS bucket, call the mongoc_gridfs_bucket_find() function and pass your bucket as a parameter. The function returns a cursor from which you can access the results.

Tip

To learn more about cursors in the C driver, see the Access Data From a Cursor guide.

The following code example shows you how to retrieve and print file metadata from files in a GridFS bucket. It uses a while loop to iterate through the returned cursor and display the contents of the files uploaded in the Upload Files examples:

mongoc_cursor_t *cursor = mongoc_gridfs_bucket_find(bucket, bson_new(), NULL);
const bson_t *file_doc;
while (mongoc_cursor_next(cursor, &file_doc)) {
char *json = bson_as_json(file_doc, NULL);
printf("%s\n", json);
bson_free(json);
}
mongoc_cursor_destroy (cursor);
{ "_id" : { "$oid" : "..." }, "length" : 13, "chunkSize" : 261120, "uploadDate" :
{ "$date" : ... }, "filename" : "my_file", "metadata" : { } }
{ "_id" : { "$oid" : "..." }, "length" : 13, "chunkSize" : 261120, "uploadDate" :
{ "$date" : ... }, "filename" : "new_file", "metadata" : { } }

The mongoc_gridfs_bucket_find() function accepts various query specifications. You can use its options parameter to specify the sort order, maximum number of documents to return, and the number of documents to skip before returning. To view a list of available options, see the mongoc_collection_find_with_opts() API documentation.

You can download files from a GridFS bucket by using the following functions:

  • mongoc_gridfs_bucket_open_download_stream(): Opens a new download stream from which you can read the file contents

  • mongoc_gridfs_bucket_download_to_stream(): Writes the entire file to an existing download stream

You can download files from your MongoDB database by using the mongoc_gridfs_bucket_open_download_stream() function to create a download stream.

This example uses a download stream to perform the following actions:

  • Calls the mongoc_gridfs_bucket_open_download_stream() function to select a GridFS file with the specified _id value and opens it as a readable stream

  • Calls the mongoc_stream_read() function to read the contents of the file

  • Calls the mongoc_stream_close() and mongoc_stream_destroy() functions to close and destroy the download stream pointing to the file

char buf[512];
bson_value_t file_id;
file_id.value_type = BSON_TYPE_OID;
bson_oid_init_from_string (&file_id.value.v_oid, "66fb1b8ea0f84a74ee099e71");
bson_error_t error;
mongoc_stream_t *download_stream = mongoc_gridfs_bucket_open_download_stream (bucket, &file_id, &error);
if (!download_stream) {
fprintf (stderr, "Failed to create download stream: %s\n", error.message);
}
mongoc_stream_read (download_stream, buf, 1, 1, 0);
mongoc_stream_close (download_stream);
mongoc_stream_destroy (download_stream);

Note

If there are multiple documents with the same file name, GridFS will stream the most recent file with the given name (as determined by the uploadDate field).

You can download the contents of a GridFS file to an existing stream by calling the mongoc_gridfs_bucket_download_to_stream() function.

This example performs the following actions:

  • Calls the mongoc_stream_file_new_for_path() function to open a file located at /path/to/output_file as a stream in read and write (O_RDWR) mode

  • Downloads a GridFS file that has the specified _id value to the stream

  • Calls the mongoc_stream_close() and mongoc_stream_destroy() functions to close and destroy the file stream

mongoc_stream_t *file_stream = mongoc_stream_file_new_for_path ("/path/to/output_file", O_RDWR, 0);
bson_error_t error;
if (!file_stream) {
fprintf (stderr, "Error opening file stream: %s\n", error.message);
}
bson_value_t file_id;
file_id.value_type = BSON_TYPE_OID;
bson_oid_init_from_string (&file_id.value.v_oid, "66fb1b8ea0f84a74ee099e71");
if (!mongoc_gridfs_bucket_download_to_stream (bucket, &file_id, file_stream, &error)) {
fprintf (stderr, "Failed to download file: %s\n", error.message);
}
mongoc_stream_close (file_stream);
mongoc_stream_destroy (file_stream);

Use the mongoc_gridfs_bucket_delete_by_id() function to remove a file's collection document and associated chunks from your bucket. This effectively deletes the file.

The following example shows you how to delete a file by referencing its _id field:

bson_error_t error;
bson_oid_t oid;
bson_oid_init_from_string (&oid, "66fb1b365fd1cc348b031b01");
if (!mongoc_gridfs_bucket_delete_by_id (bucket, &oid, &error)) {
fprintf (stderr, "Failed to delete file: %s\n", error.message);
}

Note

File Revisions

The mongoc_gridfs_bucket_delete_by_id() function supports deleting only one file at a time. If you want to delete each file revision, or files with different upload times that share the same file name, collect the _id values of each revision. Then, pass each _id value in separate calls to the mongoc_gridfs_bucket_delete_by_id() function.

To learn more about using the C driver to store and retrieve large files, see the following API documentation: