Unable to read data from MongoDB Atlas Cluster (M20) using the MongoDB Spark Connector (10.1.1)

Matteo_Tarantino · July 12, 2023, 12:55pm

Hi everyone,

I’m trying to launch a spark JOB locally that connects to my production Atlas cluster (M20). For testing purposes, I have opened the cluster to the whole network (0.0.0.0).

It seems to connect correctly, in fact when I create the dataframe of a collection and use the “df.printSchema()” method, the collection schema is printed correctly on the screen.

However if I run other commands, such as “df.show()” I get this error of a mongoDB library (the spark connector):

Py4JJavaError: An error occurred while calling o49.showString.
: java.lang.NoSuchMethodError: org.apache.spark.sql.types.StructType.toAttributes()Lscala/collection/immutable/Seq;
at com.mongodb.spark.sql.connector.schema.InternalRowToRowFunction.<init>
...

I’m using:

Spark version: 3.4.1
Scala version: 2.12

Jars passed to spark configuration:

jars = [
“mongo-spark-connector_2.13-10.1.1.jar”,
“mongodb-driver-sync-4.10.0.jar”,
“mongodb-driver-core-4.10.0.jar”,
“bson-4.10.0.jar”,
]

For extreme clarity and trasparency, this is the code:

from pyspark.sql import SparkSession

# Jars to pass to spark configuration through "spark.driver.extraClassPath" property

jars = [
"mongo-spark-connector_2.13-10.1.1.jar",
"mongodb-driver-sync-4.10.0.jar",
"mongodb-driver-core-4.10.0.jar",
"bson-4.10.0.jar",
]
jar_path = "/Users/matt/Downloads"
mongo_jar = ""
for jar in jars:
mongo_jar += jar_path + "/" + jar + ":"

# Create a spark session
uri = "mongodb+srv://<username>:<pwd>@<cluster_network>/<database>"
database = "maps"
collection = "users"
spark = SparkSession.builder \
.appName("MongoDB Spark Connector") \
.config("spark.driver.extraClassPath", mongo_jar) \
.getOrCreate()

# Read data from MongoDB
df = spark.read.format("mongodb") \
.option("connection.uri", uri) \
.option("database", database) \
.option("collection", collection) \
.load()

# Print schema
df.printSchema() #It correctly print schema

# Show rows
df.show() # It throws the error above

Matteo_Tarantino · July 12, 2023, 3:11pm

SOLVED

It worked by downgrading the mongodb jar version of the spark connector from “10.1.1” to “10.0.2”.

Ross_Lawley · July 13, 2023, 8:25am

Hi @Matteo_Tarantino,

I see the issue:

Scala version: 2.12
“mongo-spark-connector_2.13-10.1.1.jar”,

Spark is compiled using either Spark 2.12 or Spark 2.13

Here you have mixed the versions and its causing the error. Updating to use the spark 2.12 jar will fix it eg:

“mongo-spark-connector_2.12-10.1.1.jar”,

Hope that helps,

Ross

system · July 18, 2023, 8:25am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.