Docs Menu
Docs Home
/ / /
PyMongo

Databases and Collections

On this page

  • Overview
  • Access a Database
  • Access a Collection
  • Create a Collection
  • Get a List of Collections
  • Delete a Collection
  • Configure Read and Write Operations
  • Tag Sets
  • Local Threshold
  • Troubleshooting
  • AutoReconnect Error
  • API Documentation

In this guide, you can learn how to use MongoDB databases and collections with PyMongo.

MongoDB organizes data into a hierarchy of the following levels:

  • Databases: The top level of data organization in a MongoDB instance.

  • Collections: MongoDB stores documents in collections. They are analogous to tables in relational databases.

  • Documents: Contain literal data such as string, numbers, dates, and other embedded documents.

For more information about document field types and structure, see the Documents guide in the MongoDB Server manual.

Access a database by using dictionary-style access on your MongoClient instance.

The following example accesses a database named "test_database":

database = client["test_database"]

Access a collection by using dictionary-style access on an instance of your database.

The following example accesses a collection named "test_collection":

database = client["test_database"]
collection = database["test_collection"]

Tip

If the provided collection name does not already exist in the database, MongoDB implicitly creates the collection when you first insert data into it.

Use the create_collection() method to explicitly create a collection in a MongoDB database.

The following example creates a collection called "example_collection":

database = client["test_database"]
database.create_collection("example_collection")

You can specify collection options, such as maximum size and document validation rules, by passing them in as keyword arguments. For a full list of optional parameters, see the create_collection() API documentation.

You can query for a list of collections in a database by calling the list_collections() method. The method returns a cursor containing all collections in the database and their associated metadata.

The following example calls the list_collections() method and iterates over the cursor to print the results:

collection_list = database.list_collections()
for c in collection_list:
print(c)

To query for only the names of the collections in the database, call the list_collection_name() method as follows:

collection_list = database.list_collection_names()
for c in collection_list:
print(c)

For more information about iterating over a cursor, see Access Data From a Cursor.

You can delete a collection from the database by using the drop_collection() method.

The following example deletes the test_collection collection:

collection = database["test_collection"];
collection.drop();

Warning

Dropping a Collection Deletes All Data in the Collection

Dropping a collection from your database permanently deletes all documents and all indexes within that collection.

Drop a collection only if the data in it is no longer needed.

You can control how the driver routes read operations by setting a read preference. You can also control options for how the driver waits for acknowledgment of read and write operations on a replica set by setting a read concern and a write concern.

By default, databases inherit these settings from the MongoClient instance, and collections inherit them from the database. However, you can change these settings on your database or collection by using one of the following methods:

  • get_database(): Gets the database and applies the client's read preference, read concern, and write preference.

  • database.with_options(): Gets the database and applies its current read preference, read concern, and write preference.

  • get_collection(): Gets the collection and applies its current read preference, read concern, and write preference.

  • collection.with_options(): Gets the collection and applies the database's read preference, read concern, and write preference.

To change read or write settings with the preceding methods, call the method and pass in the collection or database name, and the new read preference, read concern, or write preference.

The following example shows how to change the read preference, read concern and write preference of a database called test-database with the get_database() method:

client.get_database("test-database",
read_preference=ReadPreference.SECONDARY,
read_concern="local",
write_concern="majority")

The following example shows how to change read and write settings of a collection called test-collection with the get_collection() method:

database.get_collection("test-collection",
read_preference=ReadPreference.SECONDARY,
read_concern="local",
write_concern="majority")

The following example shows how to change read and write settings of a collection called test-collection with the with_options() method:

collection.with_options(read_preference=ReadPreference.SECONDARY,
read_concern="local",
write_concern="majority")

Tip

To see the types of read preferences available in the ReadPreference enum, see the API documentation.

To learn more about the read and write settings, see the following guides in the MongoDB Server manual:

In MongoDB Server, you can apply key-value tags to replica-set members according to any criteria you choose. You can then use those tags to target one or more members for a read operation.

By default, PyMongo ignores tags when choosing a member to read from. To instruct PyMongo to prefer certain tags, pass them as a parameter to your read preference class constructor.

In the following code example, the tag set passed to the read_preference parameter instructs PyMongo to prefer reads from the New York data center ('dc': 'ny') and to fall back to the San Francisco data center ('dc': 'sf'):

db = client.get_database(
'test', read_preference=Secondary([{'dc': 'ny'}, {'dc': 'sf'}]))

If multiple replica-set members match the read preference and tag sets you specify, PyMongo reads from the nearest replica-set members, chosen according to their ping time.

By default, the driver uses only those members whose ping times are within 15 milliseconds of the nearest member for queries. To distribute reads between members with higher latencies, pass the localThresholdMS option to the MongoClient() constructor.

The following example specifies a local threshold of 35 milliseconds:

client = MongoClient(replicaSet='repl0',
readPreference=ReadPreference.SECONDARY_PREFERRED,
localThresholdMS=35)

In the preceding example, PyMongo distributes reads between matching members within 35 milliseconds of the closest member's ping time.

Note

PyMongo ignores the value of localThresholdMS when communicating with a replica set through a mongos instance. In this case, use the localThreshold command-line option.

You receive this error if you specify tag-sets in your read preference and MongoDB is unable to find replica set members with the specified tags. To avoid this error, include an empty dictionary ({}) at the end of the tag-set list. This instructs PyMongo to read from any member that matches the read-reference mode when it can't find matching tags.

To learn more about any of the methods or types discussed in this guide, see the following API documentation:

← Limit Server Execution Time