Docs Menu
Docs Home
/ / /
PyMongo

Frequently Asked Questions

On this page

  • Is PyMongo Thread-Safe?
  • Is PyMongo Fork-Safe?
  • Can I Use PyMongo with Multiprocessing?
  • Can PyMongo Load the Results of a Query as a Pandas DataFrame?
  • How Does Connection Pooling Work in PyMongo?
  • Why Does PyMongo Add an _id Field to All My Documents?
  • How Do I Change the Timeout Value for Cursors?
  • How Can I Store Decimal Instances?
  • Why Does PyMongo Convert 9.99 to 9.9900000000000002?
  • Does PyMongo Support Attribute-style Access for Documents?
  • Does PyMongo Support Asynchronous Frameworks?
  • Does PyMongo Work with mod_wsgi?
  • Does PyMongo Work with PythonAnywhere?
  • How Can I Encode My Documents to JSON?
  • Does PyMongo Behave Differently in Python 3?
  • Can I Share Pickled ObjectIds Between Python 2 and Python 3?

Yes. PyMongo is thread-safe and provides built-in connection pooling for threaded applications.

No. If you use the fork() method to create a new process, don't pass an instance of the MongoClient class from the parent process to the child process. This creates a high probability of deadlock among MongoClient instances in the child process. Instead, create a new MongoClient instance in the child process.

Note

PyMongo tries to issue a warning if this deadlock might occur.

Yes. However, on Unix systems, the multiprocessing module spawns processes by using the fork() method. This carries the same risks described in Is PyMongo Fork-Safe?

To use multiprocessing with PyMongo, write code similar to the following example:

# Each process creates its own instance of MongoClient.
def func():
db = pymongo.MongoClient().mydb
# Do something with db.
proc = multiprocessing.Process(target=func)
proc.start()

Important

Do not copy an instance of the MongoClient class from the parent process to a child process.

You can use the PyMongoArrow library to work with numerical or columnar data. PyMongoArrow lets you load MongoDB query result-sets as Pandas DataFrames, NumPy ndarrays, or Apache Arrow Tables.

Every MongoClient instance has a built-in connection pool for each server in your MongoDB topology. Connection pools open sockets on demand to support concurrent requests to MongoDB in your application.

The maximum size of each connection pool is set by the maxPoolSize option, which defaults to 100. If the number of in-use connections to a server reaches the value of maxPoolSize, the next request to that server will wait until a connection becomes available.

In addition to the sockets needed to support your application's requests, each MongoClient instance opens two more sockets per server in your MongoDB topology for monitoring the server's state. For example, a client connected to a three-node replica set opens six monitoring sockets. If the application uses the default setting for maxPoolSize and only queries the primary (default) node, then there can be at most 106 total connections in the connection pool. If the application uses a read preference to query the secondary nodes, those connection pools grow and there can be 306 total connections.

To support high numbers of concurrent MongoDB requests within one process, you can increase maxPoolSize.

Connection pools are rate-limited. The maxConnecting option determines the number of connections that the pool can create in parallel at any time. For example, if the value of maxConnecting is 2, the third request that attempts to concurrently check out a connection succeeds only when one the following cases occurs:

  • The connection pool finishes creating a connection and there are fewer than maxPoolSize connections in the pool.

  • An existing connection is checked back into the pool.

  • The driver's ability to reuse existing connections improves due to rate-limits on connection creation.

You can set the minimum number of concurrent connections to each server with the minPoolSize option, which defaults to 0. The driver initializes the connection pool with this number of sockets. If sockets are closed, causing the total number of sockets (both in use and idle) to drop below the minimum, more sockets are opened until the minimum is reached.

You can set the maximum number of milliseconds that a connection can remain idle in the pool by setting the maxIdleTimeMS option. Once a connection has been idle for maxIdleTimeMS, the connection pool removes and replaces it. This option defaults to 0 (no limit).

The following default configuration for a MongoClient works for most applications:

client = MongoClient(host, port)

MongoClient supports multiple concurrent requests. For each process, create a client and reuse it for all operations in a process. This practice is more efficient than creating a client for each request.

The driver does not limit the number of requests that can wait for sockets to become available, and it is the application's responsibility to limit the size of its pool to bound queuing during a load spike. Requests wait for the amount of time specified in the waitQueueTimeoutMS option, which defaults to 0 (no limit).

A request that waits more than the length of time defined by waitQueueTimeoutMS for a socket raises a ConnectionFailure error. Use this option if it is more important to bound the duration of operations during a load spike than it is to complete every operation.

When MongoClient.close() is called by any request, the driver closes all idle sockets and closes all sockets that are in use as they are returned to the pool. Calling MongoClient.close() closes only inactive sockets, so you cannot interrupt or terminate any ongoing operations by using this method. The driver closes these sockets only when the process completes.

For more information, see the administration/connection-pool-overview/ in the MongoDB Server documentation.

When you use the Collection.insert_one() method, Collection.insert_many() method, or Collection.bulk_write() method to insert a document into MongoDB, and that document does not include an _id field, PyMongo automatically adds this field for you. It also sets the value of the field to an instance of ObjectId.

The following code example inserts a document without an _id field into MongoDB, then prints the document. After it's inserted, the document contains an _id field whose value is an instance of ObjectId.

>>> my_doc = {'x': 1}
>>> collection.insert_one(my_doc)
InsertOneResult(ObjectId('560db337fba522189f171720'), acknowledged=True)
>>> my_doc
{'x': 1, '_id': ObjectId('560db337fba522189f171720')}

PyMongo adds an _id field in this manner for a few reasons:

  • All MongoDB documents must have an _id field.

  • If PyMongo inserts a document without an _id field, MongoDB adds one itself, but doesn't report the value back to PyMongo for your application to use.

  • Copying the document before adding the _id field is prohibitively expensive for most high-write-volume applications.

Tip

If you don't want PyMongo to add an _id to your documents, insert only documents that your application has already added an _id field to.

MongoDB doesn't support custom timeouts for cursors, but you can turn off cursor timeouts. To do so, pass the no_cursor_timeout=True option to the find() method.

MongoDB v3.4 introduced the Decimal128 BSON type, a 128-bit decimal-based floating-point value capable of emulating decimal rounding with exact precision. PyMongo versions 3.4 and later also support this type. Earlier MongoDB versions, however, support only IEEE 754 floating points, equivalent to the Python float type. PyMongo can store Decimal instances to these versions of MongoDB only by converting them to the float type. You must perform this conversion explicitly.

For more information, see the PyMongo API documentation for decimal128.

MongoDB represents 9.99 as an IEEE floating-point value, which can't represent the value precisely. This is also true in some versions of Python. In this regard, PyMongo behaves the same way as the JavaScript shell, all other MongoDB drivers, and the Python language itself.

No. PyMongo doesn't implement this feature, for the following reasons:

  1. Adding attributes pollutes the attribute namespace for documents and could lead to subtle bugs or confusing errors when using a key with the same name as a dictionary method.

  2. PyMongo uses SON objects instead of regular dictionaries only to maintain key ordering, because the server requires this for certain operations. Adding this feature would complicate the SON class and could break backwards compatibility if PyMongo ever reverts to using dictionaries.

  3. Documents behave just like dictionaries, which makes them relatively simple for new PyMongo users to understand. Changing the behavior of documents adds a barrier to entry for these users.

For more information, see the relevant Jira case.

Yes. For more information, see the Third-Party Tools guide.

Yes. See mod_wsgi in the Tools guide.

No. PyMongo creates Python threads, which PythonAnywhere does not support.

For more information, see the relevant Jira ticket.

PyMongo supports some special types, like ObjectId and DBRef, that aren't supported in JSON. Therefore, Python's json module won't work with all documents in PyMongo. Instead, PyMongo includes the json_util module, a tool for using Python's json module with BSON documents and MongoDB Extended JSON.

python-bsonjs is another BSON-to-MongoDB-Extended-JSON converter, built on top of libbson. python-bsonjs doesn't depend on PyMongo and might offer a performance improvement over json_util in certain cases.

Tip

python-bsonjs works best with PyMongo when using the RawBSONDocument type.

PyMongo encodes instances of the bytes class as BSON type 5 (binary data) with subtype 0. In Python 2, these instances are decoded to Binary with subtype 0. In Python 3, they are decoded back to bytes.

The following code examples use PyMongo to insert a bytes instance into MongoDB, and then find the instance. In Python 2, the byte string is decoded to Binary. In Python 3, the byte string is decoded back to bytes.

Similarly, Python 2 and 3 behave differently when PyMongo parses JSON binary values with subtype 0. In Python 2, these values are decoded to instances of Binary with subtype 0. In Python 3, they're decoded into instances of bytes.

The following code examples use the json_util module to decode a JSON binary value with subtype 0. In Python 2, the byte string is decoded to Binary. In Python 3, the byte string is decoded back to bytes.

If you use Python 2 to pickle an instance of ObjectId, you can always unpickle it with Python 3. To do so, you must pass the encoding='latin-1' option to the pickle.loads() method. The following code example shows how to pickle an ObjectId in Python 2.7, and then unpickle it in Python 3.7:

# Python 2.7
>>> import pickle
>>> from bson.objectid import ObjectId
>>> oid = ObjectId()
>>> oid
ObjectId('4f919ba2fba5225b84000000')
>>> pickle.dumps(oid)
'ccopy_reg\n_reconstructor\np0\n(cbson.objectid\...'
# Python 3.7
>>> import pickle
>>> pickle.loads(b'ccopy_reg\n_reconstructor\np0\n(cbson.objectid\...', encoding='latin-1')
ObjectId('4f919ba2fba5225b84000000')

If you pickled an ObjectID in Python 2, and want to unpickle it in Python 3, you must pass the protocol argument with a value of 2 or less to the pickle.dumps() method. The following code example shows how to pickle an ObjectId in Python 3.7, and then unpickle it in Python 2.7:

# Python 3.7
>>> import pickle
>>> from bson.objectid import ObjectId
>>> oid = ObjectId()
>>> oid
ObjectId('4f96f20c430ee6bd06000000')
>>> pickle.dumps(oid, protocol=2)
b'\x80\x02cbson.objectid\nObjectId\nq\x00)\x81q\x01c_codecs\nencode\...'
# Python 2.7
>>> import pickle
>>> pickle.loads('\x80\x02cbson.objectid\nObjectId\nq\x00)\x81q\x01c_codecs\nencode\...')
ObjectId('4f96f20c430ee6bd06000000')
← Third-Party Tools