Databases and Collections
On this page
- Overview
- Access a Database
- Access a Collection
- Create a Collection
- Time Series Collection
- Capped Collection
- Collation
- Get a List of Collections
- Delete a Collection
- Configure Read and Write Operations
- Tag Sets
- Local Threshold
- Retryable Reads and Writes
- Type Hints
- Database
- Collection
- Troubleshooting
- Client Type Annotations
- Incompatible Type
AutoReconnect
Error- API Documentation
Overview
In this guide, you can learn how to use MongoDB databases and collections with PyMongo.
MongoDB organizes data into a hierarchy of the following levels:
Databases: The top level of data organization in a MongoDB instance.
Collections: MongoDB stores documents in collections. They are analogous to tables in relational databases.
Documents: Contain literal data such as string, numbers, dates, and other embedded documents.
For more information about document field types and structure, see the Documents guide in the MongoDB Server manual.
Access a Database
Access a database by using dictionary-style access on your MongoClient
instance.
The following example accesses a database named test_database
:
database = client["test_database"]
Access a Collection
Access a collection by using dictionary-style access on an instance of your database.
The following example accesses a collection named test_collection
:
database = client["test_database"] collection = database["test_collection"]
Tip
If the provided collection name does not already exist in the database, MongoDB implicitly creates the collection when you first insert data into it.
Create a Collection
Use the create_collection()
method to explicitly create a collection in a
MongoDB database.
The following example creates a collection called example_collection
:
database = client["test_database"] database.create_collection("example_collection")
You can specify collection options, such as maximum size and document validation rules, by passing them in as keyword arguments. For a full list of optional parameters, see the create_collection() API documentation.
Time Series Collection
Time series collections efficiently store sequences of measurements over a period of time.
The following example creates a time series collection called example_ts_collection
in which the documents' time field is called timestamp
:
database = client["test_database"] database.create_collection("example_ts_collection", timeseries={"timeField": "timestamp"})
For more information about using time series data with PyMongo, see the Time Series Data guide.
Capped Collection
You can create a capped collection that cannot grow beyond a specified memory size or
document count. The following example creates a capped collection called
example_capped_collection
that has a maximum size of 1000 bytes:
database = client["test_database"] database.create_collection("example_capped_collection", capped=True, size=1000)
To learn more about capped collections, see Capped Collections in the MongoDB Server manual.
Collation
When you create a collection, you can specify a default collation for all operations you perform on the collection.
A collation is a set of language-specific rules for string comparison, such as for letter case and accent marks.
To specify a collation, create an instance of the Collation
class or a Python dictionary.
For a list of options to pass to the Collation
constructor or include as keys in the
dictionary, see Collation in the MongoDB Server manual.
Tip
Import Collation
To create an instance of the Collation
class, you must import it from
pymongo.collation
.
The following example creates the same collection as the previous example,
but with a default collation of fr_CA
:
from pymongo.collation import Collation database = client["test_database"] database.create_collection("example_collection", collation=Collation(locale='fr_CA'))
Get a List of Collections
You can query for a list of collections in a database by calling the
list_collections()
method. The method returns a cursor containing all
collections in the database and their associated metadata.
The following example calls the list_collections()
method and iterates over
the cursor to print the results:
collection_list = database.list_collections() for c in collection_list: print(c)
To query for only the names of the collections in the database, call the
list_collection_name()
method as follows:
collection_list = database.list_collection_names() for c in collection_list: print(c)
For more information about iterating over a cursor, see Access Data From a Cursor.
Delete a Collection
You can delete a collection from the database by using the drop_collection()
method.
The following example deletes the test_collection
collection:
collection = database["test_collection"]; collection.drop();
Warning
Dropping a Collection Deletes All Data in the Collection
Dropping a collection from your database permanently deletes all documents and all indexes within that collection.
Drop a collection only if the data in it is no longer needed.
Configure Read and Write Operations
You can control how the driver routes read operations by setting a read preference. You can also control options for how the driver waits for acknowledgment of read and write operations on a replica set by setting a read concern and a write concern.
By default, databases inherit these settings from the MongoClient
instance,
and collections inherit them from the database. However, you can change these
settings on your database or collection by using one of the following methods:
get_database()
: Gets the database and applies the client's read preference, read concern, and write preference.database.with_options()
: Gets the database and applies its current read preference, read concern, and write preference.get_collection()
: Gets the collection and applies its current read preference, read concern, and write preference.collection.with_options()
: Gets the collection and applies the database's read preference, read concern, and write preference.
To change read or write settings with the preceding methods, call the method and pass in the collection or database name, and the new read preference, read concern, or write preference.
The following example shows how to change the read preference, read concern and
write preference of a database called test-database
with the get_database()
method:
client.get_database("test-database", read_preference=ReadPreference.SECONDARY, read_concern="local", write_concern="majority")
The following example shows how to change read and write settings of a
collection called test-collection
with the get_collection()
method:
database.get_collection("test-collection", read_preference=ReadPreference.SECONDARY, read_concern="local", write_concern="majority")
The following example shows how to change read and write settings of a
collection called test-collection
with the with_options()
method:
collection.with_options(read_preference=ReadPreference.SECONDARY, read_concern="local", write_concern="majority")
Tip
To see the types of read preferences available in the ReadPreference
enum, see the
API documentation.
To learn more about the read and write settings, see the following guides in the MongoDB Server manual:
Tag Sets
In MongoDB Server, you can apply key-value tags to replica-set members according to any criteria you choose. You can then use those tags to target one or more members for a read operation.
By default, PyMongo ignores tags when choosing a member to read from. To instruct PyMongo to prefer certain tags, pass them as a parameter to your read preference class constructor.
In the following code example, the tag set passed to the read_preference
parameter
instructs PyMongo to prefer reads from the
New York data center ('dc': 'ny'
) and to fall back to the San Francisco data
center ('dc': 'sf'
):
db = client.get_database( 'test', read_preference=Secondary([{'dc': 'ny'}, {'dc': 'sf'}]))
Local Threshold
If multiple replica-set members match the read preference and tag sets you specify, PyMongo reads from the nearest replica-set members, chosen according to their ping time.
By default, the driver uses only those members whose ping times are within 15 milliseconds
of the nearest member for queries. To distribute reads between members with
higher latencies, pass the localThresholdMS
option to the MongoClient()
constructor.
The following example specifies a local threshold of 35 milliseconds:
client = MongoClient(replicaSet='repl0', readPreference=ReadPreference.SECONDARY_PREFERRED, localThresholdMS=35)
In the preceding example, PyMongo distributes reads between matching members within 35 milliseconds of the closest member's ping time.
Note
PyMongo ignores the value of localThresholdMS
when communicating with a
replica set through a mongos
instance. In this case, use the
localThreshold
command-line option.
Retryable Reads and Writes
PyMongo automatically retries certain read and write operations a single time if they fail due to a network or server error.
You can explicitly disable retryable reads or retryable writes by setting the retryReads
or
retryWrites
option to False
in the MongoClient()
constructor. The following
example disables retryable reads and writes for a client:
client = MongoClient("<connection string>", retryReads=False, retryWrites=False)
To learn more about supported retryable read operations, see Retryable Reads in the MongoDB Server manual. To learn more about supported retryable write operations, see Retryable Writes in the MongoDB Server manual.
Type Hints
If your application uses Python 3.5 or later, you can add type hints, as described in PEP 484, to your code. Type hints denote the data types of variables, parameters, and function return values, and the structure of documents. Some IDEs can use type hints to check your code for type errors and suggest appropriate options for code completion.
Note
TypedDict in Python 3.7 and Earlier
The TypedDict class
is in the typing
module, which
is available only in Python 3.8 and later. To use the TypedDict
class in
earlier versions of Python, install the
typing_extensions package.
Database
If all documents in a database match a well-defined schema, you can specify a type hint
that uses a Python class to represent the documents' structure. By including this class
in the type hint for your Database
object, you can ensure that all documents you
store or retrieve have the required structure. This provides more accurate type
checking and code completion than the default Dict[str, Any]
type.
First, define a class to represent a document from the database. The class must inherit
from the TypedDict
class and must contain the same fields as the documents in the
database. After you define your class, include its name as the generic type for the
Database
type hint.
The following example defines a Movie
class and uses it as the
generic type for a Database
type hint:
from typing import TypedDict from pymongo import MongoClient from pymongo.database import Database class Movie(TypedDict): name: str year: int client: MongoClient = MongoClient() database: Database[Movie] = client["test_database"]
Collection
Adding a generic type to a Collection
type hint is similar to adding a generic type
to a Database
type hint. First, define a class that inherits from the TypedDict
class
and represents the structure of the
documents in the collection. Then, include the class name as the generic type for the
Collection
type hint, as shown in the following example:
from typing import TypedDict from pymongo import MongoClient from pymongo.collection import Collection class Movie(TypedDict): name: str year: int client: MongoClient = MongoClient() database = client["test_database"] collection: Collection[Movie] = database["test_collection"]
Troubleshooting
Client Type Annotations
If you don't add a type annotation for your MongoClient
object,
your type checker might show an error similar to the following:
from pymongo import MongoClient client = MongoClient() # error: Need type annotation for "client"
The solution is to annotate the MongoClient
object as
client: MongoClient
or client: MongoClient[Dict[str, Any]]
.
Incompatible Type
If you specify MongoClient
as a type hint but don't include data types for
the document, keys, and values, your type checker might show an error similar to
the following:
error: Dict entry 0 has incompatible type "str": "int"; expected "Mapping[str, Any]": "int"
The solution is to add the following type hint to your MongoClient
object:
``client: MongoClient[Dict[str, Any]]``
AutoReconnect
Error
You receive this error if you specify tag-sets
in your
read preference and MongoDB is unable to find replica set members with the specified
tags. To avoid this error, include an empty dictionary ({}
) at the end of
the tag-set list. This instructs PyMongo to read from any member that
matches the read-reference mode when it can't find matching tags.
API Documentation
To learn more about any of the methods or types discussed in this guide, see the following API documentation: