Docs Menu
Docs Home
/ /
MongoDB Atlas Data Lake
/

View Atlas Data Lake Pipelines - Preview

You can view all of your Data Lake pipelines and view the details of a specified Data Lake Pipeline in your project through the Atlas UI, Data Lake Pipelines API, and the Atlas CLI. You can also retrieve all of your completed Data Lake pipeline data ingestion jobs from the API and the Atlas CLI.

To return all data lake pipelines for your project using the Atlas CLI, run the following command:

atlas dataLakePipelines list [options]

To learn more about the command syntax and parameters, see the Atlas CLI documentation for atlas dataLakePipelines list.

Tip

See: Related Links

To return the details for the specified data lake pipeline for your project using the Atlas CLI, run the following command:

atlas dataLakePipelines describe <pipelineName> [options]

To learn more about the command syntax and parameters, see the Atlas CLI documentation for atlas dataLakePipelines describe.

To return all available schedules for the specified data lake pipeline using the Atlas CLI, run the following command:

atlas dataLakePipelines availableSchedules list [options]

To learn more about the command syntax and parameters, see the Atlas CLI documentation for atlas dataLakePipelines availableSchedules list.

To return all available backup snapshots for the specified data lake pipeline using the Atlas CLI, run the following command:

atlas dataLakePipelines availableSnapshots list [options]

To learn more about the command syntax and parameters, see the Atlas CLI documentation for atlas dataLakePipelines availableSnapshots list.

To returns all data lake pipeline runs for your project using the Atlas CLI, run the following command:

atlas dataLakePipelines runs list [options]

To learn more about the command syntax and parameters, see the Atlas CLI documentation for atlas dataLakePipelines runs list.

To return the details for the specified data lake pipeline run for your project using the Atlas CLI, run the following command:

atlas dataLakePipelines runs describe <pipelineRunId> [options]

To learn more about the command syntax and parameters, see the Atlas CLI documentation for atlas dataLakePipelines runs describe.

To retrieve all your Data Lake pipelines for a project through the API, send a GET request to the Data Lake pipelines endpoint. To learn more about the pipelines endpoint syntax and parameters for retrieving all of your Data Lake pipelines, see Return All Data Lake Pipelines from One Project.

To retrieve one of your Data Lake pipelines through the API, send a GET request to the Data Lake pipelines endpoint with the name of the Data Lake pipeline that you want to retrieve. To learn more about the pipelines endpoint syntax and parameters for retrieving one of your Data Lake pipelines, see Return One Data Lake Pipeline.

To retrieve all the completed Data Lake pipeline data ingestion jobs for a project through the API, send a GET request to the Data Lake runs endpoint. To learn more about the API syntax and options for the runs endpoint, see Return All Data Lake Pipeline Runs from One Project.

To retrieve the details of one of your completed Data Lake pipeline data ingestion jobs through the API, send a GET request to the Data Lake runs endpoint with the unique identifier of the completed Data Lake pipeline data ingestion job that you want to retrieve. To learn more about the API syntax and options for the runs endpoint, see Return One Data Lake Pipeline Run.

1
2
  1. If it's not already displayed, select the organization that contains your project from the Organizations menu in the navigation bar.

  2. If it's not already displayed, select your project from the Projects menu in the navigation bar.

  3. In the sidebar, click Data Lake under the Deployment heading.

3

The page displays all the Data Lake pipelines in the project. For each Data Lake pipeline, the service also displays the following information:

Column Name
Description
Pipeline Name
Name of your Data Lake pipeline. Each pipeline can produce multiple datasets. You can expand the name to view the datasets in the pipeline.
Data Source
Source for the data in the pipeline datasets. For data from a collection on the Atlas cluster, this column shows the cluster name, the database name, and the collection name separated by |.
Data Size
Size of data for each dataset.
Last Run Time
Date and time when the pipeline ran to ingest data for each dataset.
Status

Status of the pipeline. Value can be one of the following for a pipeline:

  • Active - indicates that the pipeline is active

  • Paused - indicates that data ingestion for the pipeline is paused

Frequency
Frequency at which cluster data is ingested and stored for querying.
Actions

Actions you can take for each pipeline. You can click one of the following:

  • || to pause data ingestion and to resume data ingestion. You can't pause on-demand ingestion of data.

  • to edit the data ingestion schedule for the pipeline.

  • to do the following:

    • Delete a pipeline. You can't undo this action. If you delete a pipeline, Atlas Data Lake deletes the datasets, including the data, and removes the datasets from the Federated Database Instances where they are referenced. If you delete a dataset inside a pipeline, Atlas Data Lake removes the dataset from the Federated Database Instance storage configuration where the dataset is referenced.

    • Trigger an on-demand pipeline run.

Back

Manage Data Lake Pipeline