Docs Menu
Docs Home
/ / /
PyMongo
/

Universally Unique IDs (UUIDs)

On this page

  • Overview
  • A Short History of MongoDB UUIDs
  • Specify a UUID Representation
  • Supported UUID Representations
  • UNSPECIFIED
  • STANDARD
  • PYTHON_LEGACY
  • JAVA_LEGACY
  • CSHARP_LEGACY
  • Troubleshooting
  • ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED
  • API Documentation

MongoDB drivers have historically differed in how they encode universally unique identifiers (UUIDs). In this guide, you can learn how to use PyMongo's UuidRepresentation configuration option to maintain cross-language compatibility when working with UUIDs.

Tip

In MongoDB applications, you can use the ObjectId type as a unique identifier for a document. Consider using ObjectId in place of a UUID where possible.

Consider a UUID with the following canonical textual representation:

00112233-4455-6677-8899-aabbccddeeff

Originally, MongoDB represented UUIDs as BSON Binary values of subtype 3. Because subtype 3 didn't standardize the byte order of UUIDs during encoding, different MongoDB drivers encoded UUIDs with different byte orders. Use the following tabs to compare the ways in which different MongoDB language drivers encoded the preceding UUID to Binary subtype 3:

00112233-4455-6677-8899-aabbccddeeff
33221100-5544-7766-8899-aabbccddeeff
77665544-3322-1100-ffee-ddccbbaa9988

To standardize UUID byte order, we created Binary subtype 4. Although this subtype is handled consistently across MongoDB drivers, some MongoDB deployments still contain UUID values of subtype 3.

Important

Use caution when storing or retrieving UUIDs of subtype 3. A UUID of this type stored by one MongoDB driver might have a different value when retrieved by a different driver.

To ensure that your PyMongo application handles UUIDs correctly, use the UuidRepresentation option. This option determines how the driver encodes UUID objects to BSON and decodes Binary subtype 3 and 4 values from BSON.

You can set the UUID representation option in the following ways:

  • Pass the uuidRepresentation parameter when constructing a MongoClient. PyMongo uses the specified UUID representation for all operations performed with this MongoClient instance.

  • Include the uuidRepresentation parameter in the MongoDB connection string. PyMongo uses the specified UUID representation for all operations performed with this MongoClient instance.

  • Pass the codec_options parameter when calling the get_database() method. PyMongo uses the specified UUID representation for all operations performed on the retrieved database.

  • Pass the codec_options parameter when calling the get_collection() method. PyMongo uses the specified UUID representation for all operations performed on the retrieved collection.

Select from the following tabs to see how to specify the preceding options. To learn more about the available UUID representations, see Supported UUID Representations.

The uuidRepresentation parameter accepts the values defined in the UuidRepresentation enum. The following code example specifies STANDARD for the UUID representation:

from bson.binary import UuidRepresentation
client = pymongo.MongoClient("mongodb://<hostname>:<port>",
uuidRepresentation=UuidRepresentation.STANDARD)

The uuidRepresentation parameter accepts the following values:

  • unspecified

  • standard

  • pythonLegacy

  • javaLegacy

  • csharpLegacy

The following code example specifies standard for the UUID representation:

uri = "mongodb://<hostname>:<port>/?uuidRepresentation=standard"
client = MongoClient(uri)

To specify the UUID format when calling the get_database() method, create an instance of the CodecOptions class and pass the uuid_representation argument to the constructor. The following example shows how to obtain a database reference while using the CSHARP_LEGACY UUID format:

from bson.codec_options import CodecOptions
csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY)
csharp_database = client.get_database("database_name", codec_options=csharp_opts)

Tip

You can also specify the codec_options argument when calling the database.with_options() method. For more information about this method, see Configure Read and Write Operations in the Databases and Collections guide.

To specify the UUID format when calling the get_collection() method, create an instance of the CodecOptions class and pass the uuid_representation argument to the constructor. The following example shows how to obtain a collection reference while using the CSHARP_LEGACY UUID format:

from bson.codec_options import CodecOptions
csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY)
csharp_collection = client.testdb.get_collection("collection_name", codec_options=csharp_opts)

Tip

You can also specify the codec_options argument when calling the collection.with_options() method. For more information about this method, see Configure Read and Write Operations in the Databases and Collections guide.

The following table summarizes the UUID representations that PyMongo supports:

UUID Representation
Encode UUID to
Decode Binary subtype 4 to
Decode Binary subtype 3 to
UNSPECIFIED (default)
Raise ValueError
Binary subtype 4
Binary subtype 3
Binary subtype 4
UUID
Binary subtype 3
Binary subtype 3 with standard byte order
Binary subtype 4
UUID
Binary subtype 3 with Java legacy byte order
Binary subtype 4
UUID
Binary subtype 3 with C# legacy byte order
Binary subtype 4
UUID

The following sections describe the preceding UUID representation options in more detail.

Note

UNSPECIFIED is the default UUID representation in PyMongo.

When using the UNSPECIFIED representation, PyMongo decodes BSON Binary values to Binary objects of the same subtype. To convert a Binary object into a native UUID object, call the Binary.as_uuid() method and specify a UUID representation format.

If you try to encode a UUID object while using this representation, PyMongo raises a ValueError. To avoid this, call the Binary.from_uuid() method on the UUID, as shown in the following example:

explicit_binary = Binary.from_uuid(uuid4(), UuidRepresentation.STANDARD)

The following code example shows how to retrieve a document containing a UUID with the UNSPECIFIED representation, then convert the value to a UUID object. To do so, the code performs the following steps:

  • Inserts a document that contains a uuid field using the CSHARP_LEGACY UUID representation.

  • Retrieves the same document using the UNSPECIFIED representation. PyMongo decodes the value of the uuid field as a Binary object.

  • Calls the as_uuid() method to convert the value of the uuid field to a UUID object of type CSHARP_LEGACY. After it's converted, this value is identical to the original UUID inserted by PyMongo.

from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS
from bson.binary import Binary, UuidRepresentation
from uuid import uuid4
# Using UuidRepresentation.CSHARP_LEGACY
csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY)
# Store a legacy C#-formatted UUID
input_uuid = uuid4()
collection = client.testdb.get_collection('test', codec_options=csharp_opts)
collection.insert_one({'_id': 'foo', 'uuid': input_uuid})
# Using UuidRepresentation.UNSPECIFIED
unspec_opts = CodecOptions(uuid_representation=UuidRepresentation.UNSPECIFIED)
unspec_collection = client.testdb.get_collection('test', codec_options=unspec_opts)
# UUID fields are decoded as Binary when UuidRepresentation.UNSPECIFIED is configured
document = unspec_collection.find_one({'_id': 'foo'})
decoded_field = document['uuid']
assert isinstance(decoded_field, Binary)
# Binary.as_uuid() can be used to convert the decoded value to a native UUID
decoded_uuid = decoded_field.as_uuid(UuidRepresentation.CSHARP_LEGACY)
assert decoded_uuid == input_uuid

When using the STANDARD UUID representation, PyMongo encodes native UUID objects to Binary subtype 4 objects. All MongoDB drivers using the STANDARD representation treat these objects in the same way, with no changes to byte order.

Use the STANDARD UUID representation in all new applications, and in all applications working with MongoDB UUIDs for the first time.

The PYTHON_LEGACY UUID representation corresponds to the legacy representation of UUIDs used by versions of PyMongo earlier than v4.0. When using the PYTHON_LEGACY UUID representation, PyMongo encodes native UUID objects to Binary subtype 3 objects, preserving the same byte order as the UUID.bytes property.

Use the PYTHON_LEGACY UUID representation if the UUID you're reading from MongoDB was inserted using the PYTHON_LEGACY representation. This will be true if both of the following criteria are met:

  • The UUID was inserted by an application using a version of PyMongo earlier than v4.0.

  • The application that inserted the UUID didn't specify the STANDARD UUID representation.

The JAVA_LEGACY UUID representation corresponds to the legacy representation of UUIDs used by the MongoDB Java Driver. When using the JAVA_LEGACY UUID representation, PyMongo encodes native UUID objects to Binary subtype 3 objects with Java legacy byte order.

Use the JAVA_LEGACY UUID representation if the UUID you're reading from MongoDB was inserted using the JAVA_LEGACY representation. This will be true if both of the following criteria are met:

  • The UUID was inserted by an application using the MongoDB Java Driver.

  • The application didn't specify the STANDARD UUID representation.

The CSHARP_LEGACY UUID representation corresponds to the legacy representation of UUIDs used by the MongoDB .NET/C# Driver. When using the CSHARP_LEGACY UUID representation, PyMongo encodes native UUID objects to Binary subtype 3 objects with C# legacy byte order.

Use the CSHARP_LEGACY UUID representation if the UUID you're reading from MongoDB was inserted using the CSHARP_LEGACY representation. This will be true if both of the following criteria are met:

  • The UUID was inserted by an application using the MongoDB .NET/C# Driver.

  • The application didn't specify the STANDARD UUID representation.

This error results from trying to encode a native UUID object to a Binary object when the UUID representation is UNSPECIFIED, as shown in the following code example:

unspecified_collection.insert_one({'_id': 'bar', 'uuid': uuid4()})
Traceback (most recent call last):
...
ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED.
UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid()
or a different UuidRepresentation can be configured. See the documentation for
UuidRepresentation for more information.

Instead, you must explicitly convert a native UUID to a Binary object by using the Binary.from_uuid() method, as shown in the following example:

explicit_binary = Binary.from_uuid(uuid4(), UuidRepresentation.STANDARD)
unspec_collection.insert_one({'_id': 'bar', 'uuid': explicit_binary})

To learn more about UUIDs and PyMongo, see the following API documentation:

Back

Dates and Times