Docs Menu
Docs Home
/ /
MongoDB Atlas Data Lake
/

Set Up a Federated Database Instance for Your Dataset - Preview

On this page

  • Prerequisites
  • Procedure
  • Next Steps

This page guides you through the steps for creating a federated database instance for you Data Lake dataset.

Before you begin, you must have the following:

1
2
  1. If it's not already displayed, select the organization that contains your project from the Organizations menu in the navigation bar.

  2. If it's not already displayed, select your project from the Projects menu in the navigation bar.

  3. In the sidebar, click Data Federation under the Services heading.

3
4
  • For a guided experience, enable Visual Editor. (Default)

  • To edit the raw JSON, disable Visual Editor.

5

Follow the steps in the tab below for your preferred Editor view in the UI.

  1. (Optional) Click the for the:

    • Federated Database Instance to specify a name for your federated database instance. Defaults to FederatedDatabaseInstance[n].

    • Database to edit the database name. Defaults to Database[n].

      Corresponds to databases.[n].name JSON configuration setting.

    • Collection to edit the collection name. Defaults to Collection[n].

      Corresponds to databases.[n].collections.name JSON configuration setting.

    • View to edit the view name.

    You can click:

    • Add Database to add databases and collections.

    • associated with the database to add collections to the database.

    • associated with the collection to add views on the collection. To create a view, you must specify:

      • The name of the view.

      • The pipeline to apply to the view.

        Note

        The view definition pipeline can't include the $out or the $merge stage. If the view definition includes nested pipeline stages such as $lookup or $facet, this restriction applies to those nested pipelines as well.

        To learn more about views, see:

      • associated with the database, collection, or view to remove it.

    Note

    The sample queries that you can run later in this tutorial use the names Database0 for the virtual database name and Collection0 for the virtual collection name. If you modify the names here, make sure to modify the names in the sample queries also before you run them.

  2. Drag and drop the Data Lake Dataset to map with the collection.

    Example

    If you are creating a Federated Database Instance for the Atlas Data Lake dataset that you created for the sample data using the examples in Create an Atlas Data Lake Pipeline - Preview:

    1. Under Datasets, select Ingestion Pipeline from the dropdown if it isn't already selected.

    2. Under Data Lake Dataset section, drag the dataset named sample_mflix.movies and drop it under the collection.

    Corresponds to databases.[n].collections.[n].dataSources JSON configuration setting.

  1. Define your dataset as a data store in your Federated Database Instance storage configuration.

    Edit the JSON configuration settings shown in the UI for stores. Your stores cofiguration setting should resemble the following:

    {
    "stores": [
    {
    "name": "<store-name>",
    "provider": "<cloud-storage-provider-name>",
    "region": "<cloud-storage-provider-region>"
    }
    ]
    }

    To learn more about these settings, see Storage Configuration For Atlas Data Lake Datasets.

    Example

    If you are creating a Federated Database Instance for the Atlas Data Lake pipeline that you created for the sample data using the examples in Create an Atlas Data Lake Pipeline - Preview, replace the stores in the JSON configuration settings shown in the UI with the following:

    {
    "stores": [
    {
    "name": "dls-store-us-east-1",
    "provider": "dls:aws",
    "region": "US_EAST_1"
    }
    ]
    }
  2. Define virtual databases, collections, and views for your dataset in the Atlas Data Federation storage configuration.

    {
    "databases": [
    {
    "name": "<database-name>",
    "collections": [
    {
    "name": "<collection-name>",
    "dataSources": [
    {
    "storeName": "<store-name>",
    "datasetName": "<snapshot-name>"
    }
    ]
    }
    ],
    "views": []
    }
    ]
    }

    To learn more about these settings, see Storage Configuration For Atlas Data Lake Datasets.

    Example

    If you are creating a Federated Database Instance for the Atlas Data Lake dataset that you created for the sample data using the examples in Create an Atlas Data Lake Pipeline - Preview, replace the databases in the JSON configuration settings shown in the UI with the following:

    {
    "databases": [
    {
    "name": "Database0",
    "collections": [
    {
    "name": "Collection0",
    "dataSources": [
    {
    "storeName": "dls-store-us-east-1",
    "datasetName": "v1$atlas$snapshot$dlsTest$sample_mflix$movies$$.<snapshot-id>"
    }
    ]
    }
    ],
    "views": []
    }
    ]
    }
6

Now that you've created a Federated Database Instance for your Data Lake dataset, proceed to Connect to Your Federated Database Instance - Preview.

Back

Step 1: Create a Data Lake Pipeline