Universally Unique IDs (UUIDs)
On this page
Overview
MongoDB drivers have historically differed in how they encode
universally unique identifiers (UUIDs). In this guide, you can learn how to use
PyMongo's UuidRepresentation
configuration option to maintain cross-language
compatibility when working with UUIDs.
Tip
In MongoDB applications, you can use the ObjectId
type as a unique identifier for
a document. Consider using ObjectId
in place of a UUID where possible.
A Short History of MongoDB UUIDs
Consider a UUID with the following canonical textual representation:
00112233-4455-6677-8899-aabbccddeeff
Originally, MongoDB represented UUIDs as BSON Binary
values of subtype 3. Because subtype 3 didn't standardize the byte order of UUIDs
during encoding, different MongoDB drivers encoded UUIDs with different byte orders.
Use the following tabs to compare the ways in which different MongoDB language drivers
encoded the preceding UUID to Binary
subtype 3:
00112233-4455-6677-8899-aabbccddeeff
33221100-5544-7766-8899-aabbccddeeff
77665544-3322-1100-ffee-ddccbbaa9988
To standardize UUID byte order, we created Binary
subtype 4. Although this subtype
is handled consistently across MongoDB drivers, some MongoDB deployments still contain
UUID values of subtype 3.
Important
Use caution when storing or retrieving UUIDs of subtype 3. A UUID of this type stored by one MongoDB driver might have a different value when retrieved by a different driver.
Specify a UUID Representation
To ensure that your PyMongo application handles UUIDs correctly, use the
UuidRepresentation
option. This option
determines how the driver encodes UUID objects to BSON and decodes Binary
subtype
3 and 4 values from BSON.
You can set the UUID representation option in the following ways:
Pass the
uuidRepresentation
parameter when constructing aMongoClient
. PyMongo uses the specified UUID representation for all operations performed with thisMongoClient
instance.Include the
uuidRepresentation
parameter in the MongoDB connection string. PyMongo uses the specified UUID representation for all operations performed with thisMongoClient
instance.Pass the
codec_options
parameter when calling theget_database()
method. PyMongo uses the specified UUID representation for all operations performed on the retrieved database.Pass the
codec_options
parameter when calling theget_collection()
method. PyMongo uses the specified UUID representation for all operations performed on the retrieved collection.
Select from the following tabs to see how to specify the preceding options. To learn more about the available UUID representations, see Supported UUID Representations.
The uuidRepresentation
parameter accepts the values defined in the
UuidRepresentation
enum. The following code example specifies STANDARD
for the UUID representation:
from bson.binary import UuidRepresentation client = pymongo.MongoClient("mongodb://<hostname>:<port>", uuidRepresentation=UuidRepresentation.STANDARD)
The uuidRepresentation
parameter accepts the following values:
unspecified
standard
pythonLegacy
javaLegacy
csharpLegacy
The following code example specifies standard
for the UUID representation:
uri = "mongodb://<hostname>:<port>/?uuidRepresentation=standard" client = MongoClient(uri)
To specify the UUID format when calling the get_database()
method,
create an instance of the CodecOptions
class and pass the uuid_representation
argument to the constructor. The following example shows how to obtain a database
reference while using the CSHARP_LEGACY
UUID format:
from bson.codec_options import CodecOptions csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY) csharp_database = client.get_database("database_name", codec_options=csharp_opts)
Tip
You can also specify the codec_options
argument when calling the
database.with_options()
method. For more information about this method,
see Configure Read and Write Operations in the Databases and Collections guide.
To specify the UUID format when calling the get_collection()
method,
create an instance of the CodecOptions
class and pass the uuid_representation
argument to the constructor. The following example shows how to obtain a collection
reference while using the CSHARP_LEGACY
UUID format:
from bson.codec_options import CodecOptions csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY) csharp_collection = client.testdb.get_collection("collection_name", codec_options=csharp_opts)
Tip
You can also specify the codec_options
argument when calling the
collection.with_options()
method. For more information about this method,
see Configure Read and Write Operations in the Databases and Collections guide.
Supported UUID Representations
The following table summarizes the UUID representations that PyMongo supports:
UUID Representation | Encode UUID to | Decode Binary subtype 4 to | Decode Binary subtype 3 to |
---|---|---|---|
UNSPECIFIED (default) | Raise ValueError | Binary subtype 4 | Binary subtype 3 |
Binary subtype 4 | UUID | Binary subtype 3 | |
Binary subtype 3 with standard byte order | Binary subtype 4 | UUID | |
Binary subtype 3 with Java legacy byte order | Binary subtype 4 | UUID | |
Binary subtype 3 with C# legacy byte order | Binary subtype 4 | UUID |
The following sections describe the preceding UUID representation options in more detail.
UNSPECIFIED
Note
UNSPECIFIED
is the default UUID representation in PyMongo.
When using the UNSPECIFIED
representation, PyMongo decodes BSON
Binary
values to Binary
objects of the same subtype.
To convert a Binary
object into a native
UUID
object, call the Binary.as_uuid()
method and specify a UUID representation
format.
If you try to encode a UUID
object while using this representation, PyMongo
raises a ValueError
. To avoid this, call the Binary.from_uuid()
method on the UUID,
as shown in the following example:
explicit_binary = Binary.from_uuid(uuid4(), UuidRepresentation.STANDARD)
The following code example shows how to retrieve a document containing a UUID with the
UNSPECIFIED
representation, then convert the value to a UUID
object.
To do so, the code performs the following steps:
Inserts a document that contains a
uuid
field using theCSHARP_LEGACY
UUID representation.Retrieves the same document using the
UNSPECIFIED
representation. PyMongo decodes the value of theuuid
field as aBinary
object.Calls the
as_uuid()
method to convert the value of theuuid
field to aUUID
object of typeCSHARP_LEGACY
. After it's converted, this value is identical to the original UUID inserted by PyMongo.
from bson.codec_options import CodecOptions, DEFAULT_CODEC_OPTIONS from bson.binary import Binary, UuidRepresentation from uuid import uuid4 # Using UuidRepresentation.CSHARP_LEGACY csharp_opts = CodecOptions(uuid_representation=UuidRepresentation.CSHARP_LEGACY) # Store a legacy C#-formatted UUID input_uuid = uuid4() collection = client.testdb.get_collection('test', codec_options=csharp_opts) collection.insert_one({'_id': 'foo', 'uuid': input_uuid}) # Using UuidRepresentation.UNSPECIFIED unspec_opts = CodecOptions(uuid_representation=UuidRepresentation.UNSPECIFIED) unspec_collection = client.testdb.get_collection('test', codec_options=unspec_opts) # UUID fields are decoded as Binary when UuidRepresentation.UNSPECIFIED is configured document = unspec_collection.find_one({'_id': 'foo'}) decoded_field = document['uuid'] assert isinstance(decoded_field, Binary) # Binary.as_uuid() can be used to convert the decoded value to a native UUID decoded_uuid = decoded_field.as_uuid(UuidRepresentation.CSHARP_LEGACY) assert decoded_uuid == input_uuid
STANDARD
When using the STANDARD
UUID representation, PyMongo encodes native UUID
objects to Binary
subtype 4 objects. All MongoDB drivers using the STANDARD
representation treat these objects in the same way, with no changes to byte order.
Use the STANDARD
UUID representation in all new applications, and in all
applications working with MongoDB UUIDs for the first time.
PYTHON_LEGACY
The PYTHON_LEGACY
UUID representation
corresponds to the legacy representation of UUIDs used by versions of PyMongo
earlier than v4.0.
When using the PYTHON_LEGACY
UUID representation, PyMongo encodes native
UUID
objects to Binary
subtype 3 objects, preserving the same
byte order as the UUID.bytes
property.
Use the PYTHON_LEGACY
UUID representation if the
UUID you're reading from MongoDB was inserted using the PYTHON_LEGACY
representation.
This will be true if both of the following criteria are met:
The UUID was inserted by an application using a version of PyMongo earlier than v4.0.
The application that inserted the UUID didn't specify the
STANDARD
UUID representation.
JAVA_LEGACY
The JAVA_LEGACY
UUID representation
corresponds to the legacy representation of UUIDs used by the MongoDB Java
Driver. When using the JAVA_LEGACY
UUID representation, PyMongo encodes native
UUID
objects to Binary
subtype 3 objects with Java legacy byte order.
Use the JAVA_LEGACY
UUID representation if the
UUID you're reading from MongoDB was inserted using the JAVA_LEGACY
representation.
This will be true if both of the following criteria are met:
The UUID was inserted by an application using the MongoDB Java Driver.
The application didn't specify the
STANDARD
UUID representation.
CSHARP_LEGACY
The CSHARP_LEGACY
UUID representation
corresponds to the legacy representation of UUIDs used by the MongoDB .NET/C#
Driver. When using the CSHARP_LEGACY
UUID representation, PyMongo encodes
native UUID
objects to Binary
subtype 3 objects with C# legacy byte order.
Use the CSHARP_LEGACY
UUID representation if the
UUID you're reading from MongoDB was inserted using the CSHARP_LEGACY
representation.
This will be true if both of the following criteria are met:
The UUID was inserted by an application using the MongoDB .NET/C# Driver.
The application didn't specify the
STANDARD
UUID representation.
Troubleshooting
ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED
This error results from trying to encode a native UUID
object to a Binary
object
when the UUID representation is UNSPECIFIED
, as shown in the following code
example:
unspecified_collection.insert_one({'_id': 'bar', 'uuid': uuid4()}) Traceback (most recent call last): ... ValueError: cannot encode native uuid.UUID with UuidRepresentation.UNSPECIFIED. UUIDs can be manually converted to bson.Binary instances using bson.Binary.from_uuid() or a different UuidRepresentation can be configured. See the documentation for UuidRepresentation for more information.
Instead, you must explicitly convert a native UUID to a Binary
object by using the
Binary.from_uuid()
method, as shown in the following example:
explicit_binary = Binary.from_uuid(uuid4(), UuidRepresentation.STANDARD) unspec_collection.insert_one({'_id': 'bar', 'uuid': explicit_binary})
API Documentation
To learn more about UUIDs and PyMongo, see the following API documentation: