Docs Menu

Docs HomeView & Analyze DataSpark Connector

Read from MongoDB in Streaming Mode

On this page

  • Overview
  • Example
  • API Documentation

When reading a stream from a MongoDB database, the MongoDB Spark Connector supports both micro-batch processing and continuous processing. Micro-batch processing, the default processing engine, achieves end-to-end latencies as low as 100 milliseconds with exactly-once fault-tolerance guarantees. Continuous processing is an experimental feature introduced in Spark version 2.3 that achieves end-to-end latencies as low as 1 millisecond with at-least-once guarantees.

To learn more about continuous processing, see the Spark documentation.

Note

The connector reads from your MongoDB deployment's change stream. To generate change events on the change stream, perform update operations on your database.

To learn more about change streams, see Change Streams in the MongoDB manual.

The following example shows how to stream data from MongoDB to your console.

Important

Inferring the Schema of a Change Stream

If you set the change.stream.publish.full.document.only option to true, the Spark Connector infers the schema of a DataFrame by using the schema of the scanned documents. If you set the option to false, you must specify a schema.

For more information about this setting, and to see a full list of change stream configuration options, see the Read Configuration Options guide.

To learn more about the types used in these examples, see the following Apache Spark API documentation:

← Streaming Mode