Hi there, want to read from MongoDB Atlas data to Pyspark.
The read script is from this forum:
from Trademe_MongoDB.Credentials.Credentials import uri
from datetime import datetime
# from motor.motor_asyncio import AsyncIOMotorClient
from Trademe_MongoDB.logger_config import Logger_config
from pyspark.sql import SparkSession
logger = Logger_config().get_logger()
spark = SparkSession.\
config("spark.executor.memory", "1g").\
config("spark.mongodb.read.connection.uri", uri).\
config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector:10.0.3").\
df = spark.read.format("mongodb").option('database', '1').option('collection', '2').load()
Throwing errors:
The system cannot find the path specified.
Error: Missing application resource.
Traceback (most recent call last):
File "C:\Projects\Web projects\Trademe_MongoDB\Data analysis\Load From Spark.py", line 11, in <module>
spark = SparkSession.\
File "C:\Projects\Web projects\venv\lib\site-packages\pyspark\sql\session.py", line 477, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:\Projects\Web projects\venv\lib\site-packages\pyspark\context.py", line 512, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:\Projects\Web projects\venv\lib\site-packages\pyspark\context.py", line 198, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "C:\Projects\Web projects\venv\lib\site-packages\pyspark\context.py", line 432, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "C:\Projects\Web projects\venv\lib\site-packages\pyspark\java_gateway.py", line 106, in launch_gateway
raise RuntimeError("Java gateway process exited before sending its port number")
RuntimeError: Java gateway process exited before sending its port number
Process finished with exit code 1
Background: Download Pyspark not hadoop. Spark_home envrioments variabls set up, Java envrionment variables set up. Both can read from system path.