Docs Home → View & Analyze Data → Spark Connector
Read from MongoDB in Streaming Mode
On this page
Overview
When reading a stream from a MongoDB database, the MongoDB Spark Connector supports both micro-batch processing and continuous processing. Micro-batch processing, the default processing engine, achieves end-to-end latencies as low as 100 milliseconds with exactly-once fault-tolerance guarantees. Continuous processing is an experimental feature introduced in Spark version 2.3 that achieves end-to-end latencies as low as 1 millisecond with at-least-once guarantees.
To learn more about continuous processing, see the Spark documentation.
Note
The connector reads from your MongoDB deployment's change stream. To generate change events on the change stream, perform update operations on your database.
To learn more about change streams, see Change Streams in the MongoDB manual.
Example
The following example shows how to stream data from MongoDB to your console.
Important
Inferring the Schema of a Change Stream
If you set the change.stream.publish.full.document.only
option to true
, the Spark Connector infers the schema of a DataFrame
by using the schema of the scanned documents. If you set the option to
false
, you must specify a schema.
For more information about this setting, and to see a full list of change stream configuration options, see the Read Configuration Options guide.
API Documentation
To learn more about the types used in these examples, see the following Apache Spark API documentation: