Today's web applications rely on massive volumes of structured and unstructured data. That data is complex and constantly changing. To better work with this data, and to gain the scalability and flexibility that modern applications require, you may choose to migrate your data from an older, monolithic relational database such as PostgreSQL, to modern, general-purpose database such as MongoDB.
While the idea of migrating from PostgreSQL to MongoDB may seem daunting, the process can be relatively straightforward, especially if you follow the steps we cover in this article. Once the migration is complete, you'll be able to enjoy MongoDB's many advantages, including its native drivers, flexible schema and query language, built-in autoscaling, and much more.
Table of Contents
MongoDB is a scalable and flexible NoSQL document database. Rather than storing data in tables, MongoDB stores its data in collections of BSON (Binary JSON) documents. MongoDB can manage structured and unstructured data. That allows you to build your application without needing to first define the schema.
PostgreSQL is a relational database management system (RDBMS). In an RDBMS, data is stored in tables, and the schema of the database must be defined on creation. While it is possible to alter tables after their creation as an application's needs change, this process can be complicated and error-prone.
For more detailed information, check out this article comparing MongoDB and PostgreSQL.
If your data model and schema evolve regularly —such as in an agile environment— MongoDB, with its flexible schema, is an excellent choice. With MongoDB, you can change the structure of documents on the fly without searching through application code to update queries and table references. In addition, MongoDB Atlas (the cloud offering of MongoDB as a DBaaS) lets you quickly start and scale your MongoDB clusters. MongoDB Atlas offers a free tier, which is a great way to experiment and learn.
On the other hand, if you have a stable, relational schema that does not change over time, an RDMBS like PostgreSQL might be sufficient for your application needs. Your application might already be tightly coupled to SQL-based clients and frameworks, such that it might make sense to hold off on a migration, and consider redesigning the application as a long-term solution.
While data models and schema flexibility are a consideration, scalability is also important. You'll find that MongoDB is the best choice for fast querying and seamless scaling. MongoDB scales both vertically and horizontally with relative ease.
Improving performance by increasing the power (CPU, memory, etc) of a database server—called vertical scaling—is possible on both MongoDB and PostgreSQL. However, vertical scaling has physical limitations and can quickly become expensive.
Horizontal scaling—adding new nodes to a database cluster to share the workload—is often a better choice. However, horizontal scaling is very difficult with relational databases, due to the difficulty in spreading out related data across nodes. With MongoDB, horizontal scaling is easier as collections are self-contained and not coupled relationally. Scaling can even be reconfigured online, without downtime, thanks to online resharding.
While switching from PostgreSQL to MongoDB is not difficult, the process often involves more than just extracting and migrating data. You'll also need to examine the details of the applications that access your database. For example, if you're using an ORM that does not support both relational and document databases, you’ll need to find a new library that can connect to MongoDB.
If you're just diving into document databases for the first time, MongoDB provides excellent documentation on SQL to MongoDB Mappings.
Once you’ve considered any changes needed to your application, the next step is to migrate the data. The migration for some of your tables might be simple. Relational Migrator will help you map a complex relational schema into a MongoDB schema design in a visual and intuitive manner. However, you might want to restructure your data to fit better within the MongoDB schema design. In that case, you should become familiar with best practices for MongoDB schema design, including anti-patterns. To keep our following example simple, we'll only consider the migration of data for a one-to-one mapping.
The process for transferring data from PostgreSQL to MongoDB is clear-cut. Ultimately, the ease of your task depends on the complexity of the PostgreSQL database and the structure of document collections needed in the new MongoDB database.
To migrate data, you’ll extract it from PostgreSQL and then import it to MongoDB using the mongoimport
tool. Let’s look at the two different ways to extract the data: returning queries as tab-separated values (TSV) or as JSON.
The TSV format is the quick and easy option. However, it only works if the original PostgreSQL schema is relatively simple and you don't need to embed documents in other documents with a one-to-many relationship.
For example, if you need to create a collection of documents that represent customers and plan on embedding each customer's order in these documents, TSV won’t work because it doesn’t support hierarchical data structures. Each row in the TSV will become a document. You can still map values in each row to fields deeper in your documents; you just can’t embed documents.
You could create an address field and create nested state and city fields as in the example below. However, you could not store multiple address entities. Let's look at an example query to see how this works.
Consider the PostgreSQL created table “users.”
postgres=# select * from users;
userid | first_name | last_name | logins | upgraded | city | state
--------+------------+-----------+--------+----------+------+-------
1 | Bob | Smith | 10 | t | NYC | NY
COPY (SELECT
userid AS "userId.int32()",
logins AS "loginCount.int32()",
first_name AS "firstName.auto()",
last_name AS "lastName.auto()",
CASE WHEN upgraded=TRUE THEN 'TRUE' ELSE 'FALSE' END AS "upgradedAccount.boolean()",
city AS "address.city.auto()",
state AS "address.state.auto()"
FROM
users
) TO '/tmp/users.tsv' WITH (FORMAT CSV, HEADER TRUE, DELIMITER E'\t');
Notice that we renamed each column we are exporting with the AS
command. The format of the alias being created is mongoFieldName.type()
. For example, we have userId.int32()
. When we execute the import, mongoimport
will parse this header and create the fields with the correct types. On most of the columns, we use the auto()
type and let mongoimport
determine the type based on context.
For the upgraded
column, which is a boolean
, PostgreSQL will return a t
for true
and f
for false
. This default value won't be recognized by MongoDB, so we use a CASE
statement to set values that will work.
You can also do some limited data nesting using the TSV migration process. The city
and state
fields are an example.
The final line in the query formats the export as CSV with a header using a tab delimiter, which is what makes the file TSV format. We use this command to import the file into MongoDB Atlas:
mongoimport --uri mongodb+srv://<mongodb_user>:<mongodb_password>@<altas-cluster>.mongodb.net/<DATABASE>
--collection users --type tsv --file users.tsv --headerline --columnsHaveTypes
This will result in the following document in the “users” collection:
{
userId: 1,
first_name: "Bob",
last_name: "Smith",
LoginCount : 10,
upgraded : true,
"Address" : {
"city" : "NYC",
"state" : "NY" }
}
Using JSON for data migration is preferable if your PostgreSQL schema is complex and you want to nest arrays of related records inside of a MongoDB document.
To return the results of a PostgreSQL query as JSON, you will need three functions:
row_to_json
: Returns a row as a JSON object with column names as keys and the values from the rowarray_to_json
: Returns an array of data as a JSON arrayarray_agg
: Accepts a set of values and returns an array where each value in the set becomes an element in the arrayLet’s look at an orders table which in our relational schema keeps a record for every product ordered by the user:
id | userid | product | quantity | price
----+--------+---------+----------+-------
1 | 1 | shoes | 4 | 50.75
2 | 1 | razer | 20 | 1.75
Here is an example query using all three functions:
COPY (SELECT row_to_json(results)
FROM (
SELECT userid,first_name, last_name,
(
SELECT array_to_json(array_agg(o))
FROM (
SELECT id, product, quantity, price
FROM orders
WHERE products.userid = users.userid
) o
) AS orders
FROM users
) results) TO '/tmp/orders.json' WITH (FORMAT text, HEADER FALSE);
The query above will create a file orders.json with JSON documents for each user from the users table:
{
id: 1,
first_name: "Bob",
last_name: "Smith",
"orders" : [
{
"id" : 1,
"product" : "shoes",
"quantity" : 4,
"price" : 50.75
},
{
"id" : 2,
"product" : "razer",
"quantity" : 20,
"price" : 1.75
}
]
}
Once you have written the query and saved it, you can use mongoimport
to import the file:
mongoimport --uri mongodb+srv://<mongodb_user>:<mongodb_password>@<atlas-cluster>.mognodb.net/<DATABASE>
--collection orders --jsonArray orders.json
A modern, document-based database such as MongoDB can be a great choice over an RDBMS like PostgreSQL. Migrating from PostgreSQL to MongoDB can be a simple process, as we saw in the steps covered by this article:
mongoimport
(or as an alternative: use bulkWrite
operations to load the data).Switching to MongoDB gives you benefits such as a more flexible schema and easier scalability. And while extracting and migrating a PostgreSQL database to MongoDB does take some thought and planning, the process itself is relatively pain-free.
MongoDB may be an excellent choice over PostgreSQL. MongoDB may be especially suitable if:
Assuming your application is ready to connect to MongoDB and, if necessary, you have restructured your data to fit a MongoDB schema design, these are the steps you will take and the tools you will need:
The main advantages are in the areas of speed and scalability. Regarding speed:
Regarding scalability: