Store Large Files

On this page

Overview

How GridFS Works
Create a GridFS Bucket
Customize the Bucket
Upload Files
Write to an Upload Stream
Upload an Existing Stream
Download Files
Read From a Download Stream
Download to an Existing Stream
Find Files
Delete Files
API Documentation

Overview

In this guide, you can learn how to store and retrieve large files in MongoDB by using GridFS. The GridFS storage system splits files into chunks when storing them and reassembles those files when retrieving them. The driver's implementation of GridFS is an abstraction that manages the operations and organization of the file storage.

Use GridFS if the size of any of your files exceeds the BSON document size limit of 16 MB. For more detailed information about whether GridFS is suitable for your use case, see GridFS in the MongoDB Server manual.

How GridFS Works

GridFS organizes files in a bucket, a group of MongoDB collections that contain the chunks of files and information describing them. The bucket contains the following collections:

chunks: Stores the binary file chunks
files: Stores the file metadata

The driver creates the GridFS bucket, if it doesn't already exist, when you first write data to it. The bucket contains the chunks and files collections prefixed with the default bucket name fs, unless you specify a different name. To ensure efficient retrieval of the files and related metadata, the driver creates an index on each collection. The driver ensures that these indexes exist before performing read and write operations on the GridFS bucket.

For more information about GridFS indexes, see GridFS Indexes in the MongoDB Server manual.

When using GridFS to store files, the driver splits the files into smaller chunks, each represented by a separate document in the chunks collection. It also creates a document in the files collection that contains a file ID, file name, and other file metadata.

The following diagram shows how GridFS splits files when they are uploaded to a bucket:

A diagram that shows how GridFS uploads a file to a bucket

When retrieving files, GridFS fetches the metadata from the files collection in the specified bucket and uses the information to reconstruct the file from documents in the chunks collection.

Create a GridFS Bucket

To begin using GridFS to store or retrieve files, create a new instance of the GridFSBucket class, passing in an IMongoDatabase object that represents your database. This method accesses an existing bucket or creates a new bucket if one does not exist.

The following example creates a new instance of the GridFSBucket class for the db database:

var client = new MongoClient("<connection string>");
var database = client.GetDatabase("db");
// Creates a GridFS bucket or references an existing one
var bucket = new GridFSBucket(database);

Customize the Bucket

You can customize the GridFS bucket configuration by passing an instance of the GridFSBucketOptions class to the GridFSBucket() constructor. The following table describes the properties in the GridFSBucketOptions class:

Field	Description
`BucketName`	The bucket name to use as a prefix for the files and chunks collections. The default value is `"fs"`. Data type: `string`
`ChunkSizeBytes`	The chunk size that GridFS splits files into. The default value is 255 KB. Data type: `integer`
`ReadConcern`	The read concern to use for bucket operations. The default value is the database's read concern. Data type: ReadConcern
`ReadPreference`	The read preference to use for bucket operations. The default value is the database's read preference. Data type: ReadPreference
`WriteConcern`	The write concern to use for bucket operations. The default value is the database's write concern. Data type: WriteConcern

The following example creates a bucket named "myCustomBucket" by passing an instance of the GridFSBucketOptions class to the GridFSBucket() constructor:

var options = new GridFSBucketOptions { BucketName = "myCustomBucket" };
var customBucket = new GridFSBucket(database, options);

Upload Files

You can upload files to a GridFS bucket by using the following methods:

OpenUploadStream() or OpenUploadStreamAsync(): Opens a new upload stream to which you can write file contents
UploadFromStream() or UploadFromStreamAsync(): Uploads the contents of an existing stream to a GridFS file

The following sections describe how to use these methods.

Write to an Upload Stream

Use the OpenUploadStream() or OpenUploadStreamAsync() method to create an upload stream for a given file name. These methods accept the following parameters:

Parameter	Description
`filename`	The name of the file to upload. Data type: `string`
`options`	Optional. An instance of the `GridFSUploadOptions` class that specifies the configuration for the upload stream. The default value is `null`. Data type: GridFSUploadOptions
`cancellationToken`	Optional. A token that you can use to cancel the operation. Data type: CancellationToken

This code example demonstrates how to open an upload stream by performing the following steps:

Calls the OpenUploadStream() method to open a writable GridFS stream for a file named "my_file"
Calls the Write() method to write data to my_file
Calls the Close() method to close the stream that points to my_file

Select the Synchronous or Asynchronous tab to see the corresponding code:

using (var uploader = bucket.OpenUploadStream("my_file"))
{
    // ASCII for "HelloWorld"
    byte[] bytes = { 72, 101, 108, 108, 111, 87, 111, 114, 108, 100 };
    uploader.Write(bytes, 0, bytes.Length);
    uploader.Close();
}

using (var uploader = await bucket.OpenUploadStreamAsync("my_file", options))
{
    // ASCII for "HelloWorld"
    byte[] bytes = { 72, 101, 108, 108, 111, 87, 111, 114, 108, 100 };
    await uploader.WriteAsync(bytes, 0, bytes.Length);
    await uploader.CloseAsync();
}

To customize the upload stream configuration, pass an instance of the GridFSUploadOptions class to the OpenUploadStream() or OpenUploadStreamAsync() method. The GridFSUploadOptions class contains the following properties:

Property	Description
`BatchSize`	The number of chunks to upload in each batch. The default value is 16 MB divided by the value of the `ChunkSizeBytes` property. Data type: `int?`
`ChunkSizeBytes`	The size of each chunk except the last, which is smaller. The default value is 255 KB. Data type: `int?`
`Metadata`	Metadata to store with the file, including the following elements: The `_id` of the file The name of the file The length and size of the file The upload date and time A `metadata` document in which you can store other information The default value is `null`. Data type: BsonDocument

The following example performs the same steps as the preceding example, but also uses the ChunkSizeBytes option to specify the size of each chunk. Select the Synchronous or Asynchronous tab to see the corresponding code.

var options = new GridFSUploadOptions
{
    ChunkSizeBytes = 1048576 // 1 MB
};
using (var uploader = bucket.OpenUploadStream("my_file", options))
{
    // ASCII for "HelloWorld"
    byte[] bytes = { 72, 101, 108, 108, 111, 87, 111, 114, 108, 100 };
    uploader.Write(bytes, 0, bytes.Length);
    uploader.Close();
}

var options = new GridFSUploadOptions
{
    ChunkSizeBytes = 1048576 // 1 MB
};
using (var uploader = await bucket.OpenUploadStreamAsync("my_file", options))
{
    // ASCII for "HelloWorld"
    byte[] bytes = { 72, 101, 108, 108, 111, 87, 111, 114, 108, 100 };
    await uploader.WriteAsync(bytes, 0, bytes.Length);
    await uploader.CloseAsync();
}

Upload an Existing Stream

Use the UploadFromStream() or UploadFromStreamAsync() method to upload the contents of a stream to a new GridFS file. These methods accept the following parameters:

Parameter	Description
`filename`	The name of the file to upload. Data type: `string`
`source`	The stream from which to read the file contents. Data type: Stream
`options`	Optional. An instance of the `GridFSUploadOptions` class that specifies the configuration for the upload stream. The default value is `null`. Data type: GridFSUploadOptions
`cancellationToken`	Optional. A token that you can use to cancel the operation. Data type: CancellationToken

This code example demonstrates how to open an upload stream by performing the following steps:

Opens a file located at /path/to/input_file as a stream in binary read mode
Calls the UploadFromStream() method to write the contents of the stream to a GridFS file named "new_file"

Select the Synchronous or Asynchronous tab to see the corresponding code.

using (var fileStream = new FileStream("/path/to/input_file", FileMode.Open, FileAccess.Read))
{
    bucket.UploadFromStream("new_file", fileStream);
}

using (var fileStream = new FileStream("/path/to/input_file", FileMode.Open, FileAccess.Read))
{
    await bucket.UploadFromStreamAsync("new_file", fileStream);
}

Download Files

You can download files from a GridFS bucket by using the following methods:

OpenDownloadStream() or OpenDownloadStreamAsync(): Opens a new download stream from which you can read file contents
DownloadToStream() or DownloadToStreamAsync(): Writes the contents of a GridFS file to an existing stream

The following sections describe these methods in more detail.

Read From a Download Stream

Use the OpenDownloadStream() or OpenDownloadStreamAsync() method to create a download stream. These methods accept the following parameters:

Parameter	Description
`id`	The `_id` value of the file to download. Data type: BsonValue
`options`	Optional. An instance of the `GridFSDownloadOptions` class that specifies the configuration for the download stream. The default value is `null`. Data type: GridFSDownloadOptions
`cancellationToken`	Optional. A token that you can use to cancel the operation. Data type: CancellationToken

The following code example demonstrates how to open a download stream by performing the following steps:

Retrieves the _id value of the GridFS file named "new_file"
Calls the OpenDownloadStream() method and passes the _id value to open the file as a readable GridFS stream
Creates a buffer vector to store the file contents
Calls the Read() method to read the file contents from the downloader stream into the vector

Select the Synchronous or Asynchronous tab to see the corresponding code.

var filter = Builders<GridFSFileInfo>.Filter.Eq(x => x.Filename, "new_file");
var doc = bucket.Find(filter).FirstOrDefault();
if (doc != null)
{
    using (var downloader = bucket.OpenDownloadStream(doc.Id))
    {
        var buffer = new byte[downloader.Length];
        downloader.Read(buffer, 0, buffer.Length);
        // Process the buffer as needed
    }
}

var filter = Builders<GridFSFileInfo>.Filter.Eq(x => x.Filename, "new_file");
var cursor = await bucket.FindAsync(filter);
var fileInfoList = await cursor.ToListAsync();
var doc = fileInfoList.FirstOrDefault();
if (doc != null)
{
    using (var downloader = await bucket.OpenDownloadStreamAsync(doc.Id))
    {
        var buffer = new byte[downloader.Length];
        await downloader.ReadAsync(buffer, 0, buffer.Length);
        // Process the buffer as needed
    }
}

To customize the download stream configuration, pass an instance of the GridFSDownloadOptions class to the OpenDownloadStream() method. The GridFSDownloadOptions class contains the following property:

Property	Description
`Seekable`	Indicates whether the stream supports seeking, the ability to query and change the current position in a stream. The default value is `false`. Data type: `bool?`

The following example performs the same steps as the preceding example, but also sets the Seekable option to true to specify that the stream is seekable.

Select the Synchronous or Asynchronous tab to see the corresponding code.

var filter = Builders<GridFSFileInfo>.Filter.Eq(x => x.Filename, "new_file");
var doc = bucket.Find(filter).FirstOrDefault();
if (doc != null)
{
    var options = new GridFSDownloadOptions
    {
        Seekable = true
    };
    using (var downloader = bucket.OpenDownloadStream(id, options))
    {
        var buffer = new byte[downloader.Length];
        downloader.Read(buffer, 0, buffer.Length);
        // Process the buffer as needed
    }
}

var filter = Builders<GridFSFileInfo>.Filter.Eq(x => x.Filename, "new_file");
var cursor = await bucket.FindAsync(filter);
var fileInfoList = await cursor.ToListAsync();
var doc = fileInfoList.FirstOrDefault();
if (doc != null)
{
    var options = new GridFSDownloadOptions
    {
        Seekable = true
    };
    using (var downloader = await bucket.OpenDownloadStreamAsync(doc.Id, options))
    {
        var buffer = new byte[downloader.Length];
        await downloader.ReadAsync(buffer, 0, buffer.Length);
        // Process the buffer as needed
    }
}

Download to an Existing Stream

Use the DownloadToStream() or DownloadToStreamAsync() method to download the contents of a GridFS file to an existing stream. These methods accept the following parameters:

Parameter	Description
`id`	The `_id` value of the file to download. Data type: BsonValue
`destination`	The stream that the .NET/C# Driver downloads the GridFS file to. This property's value must be an object that implements the `Stream` class. Data type: Stream
`options`	Optional. An instance of the `GridFSDownloadOptions` class that specifies the configuration for the download stream. The default value is `null`. Data type: GridFSDownloadOptions
`cancellationToken`	Optional. A token that you can use to cancel the operation. Data type: CancellationToken

The following code example demonstrates how to download to an existing stream by performing the following actions:

Opens a file located at /path/to/output_file as a stream in binary write mode
Retrieves the _id value of the GridFS file named "new_file"
Calls the DownloadToStream() method and passes the _id value to download the contents of "new_file" to a stream

Select the Synchronous or Asynchronous tab to see the corresponding code.

var filter = Builders<GridFSFileInfo>.Filter.Eq(x => x.Filename, "new_file");
var doc = bucket.Find(filter).FirstOrDefault();
if (doc != null)
{
    using (var outputFile = new FileStream("/path/to/output_file", FileMode.Create, FileAccess.Write))
    {
        bucket.DownloadToStream(doc.Id, outputFile);
    }
}

var filter = Builders<GridFSFileInfo>.Filter.Eq(x => x.Filename, "new_file");
var cursor = await bucket.FindAsync(filter);
var fileInfoList = await cursor.ToListAsync();
var doc = fileInfoList.FirstOrDefault();
if (doc != null)
{
    using (var outputFile = new FileStream("/path/to/output_file", FileMode.Create, FileAccess.Write))
    {
        await bucket.DownloadToStreamAsync(doc.Id, outputFile);
    }
}

Find Files

To find files in a GridFS bucket, call the Find() or FindAsync() method on your GridFSBucket instance. These methods accept the following parameters:

Parameter	Description
`filter`	A query filter that specifies the entries to match in the `files` collection. Data type: `FilterDefinition<GridFSFileInfo>`. For more information, see the API documentation for the Find() method.
`source`	The stream from which to read the file contents. Data type: Stream
`options`	Optional. An instance of the `GridFSFindOptions` class that specifies the configuration for the find operation. The default value is `null`. Data type: GridFSFindOptions
`cancellationToken`	Optional. A token that you can use to cancel the operation. Data type: CancellationToken

The following code example shows how to retrieve and print file metadata from files in a GridFS bucket. The Find() method returns an IAsyncCursor<GridFSFileInfo> instance from which you can access the results. It uses a foreach loop to iterate through the returned cursor and display the contents of the files uploaded in the Upload Files examples.

Select the Synchronous or Asynchronous tab to see the corresponding code.

var filter = Builders<GridFSFileInfo>.Filter.Empty;
var files = bucket.Find(filter);
foreach (var file in files.ToEnumerable())
{
    Console.WriteLine(file.ToJson());
}

{ "_id" : { "$oid" : "..." }, "length" : 13, "chunkSize" : 261120, "uploadDate" :
{ "$date" : ... }, "filename" : "new_file" }
{ "_id" : { "$oid" : "..." }, "length" : 50, "chunkSize" : 1048576, "uploadDate" :
{ "$date" : ... }, "filename" : "my_file" }

var filter = Builders<GridFSFileInfo>.Filter.Empty;
var files = await bucket.FindAsync(filter);
await files.ForEachAsync(file => Console.Out.WriteLineAsync(file.ToJson()))

{ "_id" : { "$oid" : "..." }, "length" : 13, "chunkSize" : 261120, "uploadDate" :
{ "$date" : ... }, "filename" : "new_file" }
{ "_id" : { "$oid" : "..." }, "length" : 50, "chunkSize" : 1048576, "uploadDate" :
{ "$date" : ... }, "filename" : "my_file" }

To customize the find operation, pass an instance of the GridFSFindOptions class to the Find() or FindAsync() method. The GridFSFindOptions class contains the following properties:

Property	Description
`Sort`	The sort order of the results. If you don't specify a sort order, the method returns the results in the order in which they were inserted. Data type: `SortDefinition<GridFSFileInfo>`. For more information, see the API documentation for the Sort property.

Delete Files

To delete files from a GridFS bucket, call the Delete() or DeleteAsync() method on your GridFSBucket instance. This method removes a file's metadata collection and its associated chunks from your bucket.

The Delete and DeleteAsync() methods accept the following parameters:

Parameter	Description
`id`	The `_id` of the file to delete. Data type: BsonValue
`cancellationToken`	Optional. A token that you can use to cancel the operation. Data type: CancellationToken

The following code example shows how to delete a file named "my_file" passing its _id value to delete_file():

Uses the Builders class to create a filter that matches the file named "my_file"
Uses the Find() method to find the file named "my_file"
Passes the _id value of the file to the Delete() method to delete the file

Select the Synchronous or Asynchronous tab to see the corresponding code.

var filter = Builders<GridFSFileInfo>.Filter.Eq(x => x.Filename, "new_file");
var doc = bucket.Find(filter).FirstOrDefault();
if (doc != null)
{
    bucket.Delete(doc.Id);
}

var filter = Builders<GridFSFileInfo>.Filter.Eq(x => x.Filename, "new_file");
var cursor = await bucket.FindAsync(filter);
var fileInfoList = await cursor.ToListAsync();
var doc = fileInfoList.FirstOrDefault();
if (doc != null)
{
    await bucket.DeleteAsync(doc.Id);
}

Note

File Revisions

The Delete() and DeleteAsync() methods support deleting only one file at a time. If you want to delete each file revision, or files with different upload times that share the same file name, collect the _id values of each revision. Then, pass each _id value in separate calls to the Delete() or DeleteAsync() method.

API Documentation

To learn more about the classes used on this page, see the following API documentation:

To learn more about the methods in the GridFSBucket class used on this page, see the following API documentation:

Back

Search Geospatially

Replica Set Operations