Docs Menu
Docs Home
/ /
MongoDB Atlas Kubernetes Operator
/

AtlasDataFederation Custom Resource

On this page

  • Examples
  • Parameters

Note

Atlas Kubernetes Operator doesn't support the AtlasDataFederation custom resource for Atlas for Government.

The AtlasDataFederation custom resource configures a federated database instance in Atlas. When you create the AtlasDataFederation custom resource, Atlas Kubernetes Operator tries to create or update a federated database instance in Atlas. You can use an federated database instance to run federated queries.

Important

Custom Resources No Longer Delete Objects by Default

  • Atlas Kubernetes Operator uses custom resource configuration files to manage your Atlas configuration, but as of Atlas Kubernetes Operator 2.0, custom resources you delete in Kubernetes are no longer (by default) deleted in Atlas. Instead, Atlas Kubernetes Operator simply stops managing those resources in Atlas. For example, if you delete an AtlasProject Custom Resource in Kubernetes, by default the Atlas Kubernetes Operator no longer automatically deletes the corresponding project from Atlas. This change in behavior is intended to help prevent accidental or unexpected deletions. To learn more, including how to revert this behavior to the default used prior to Atlas Kubernetes Operator 2.0, see New Default: Deletion Protection in Atlas Kubernetes Operator 2.0.

    Similarly, Atlas Kubernetes Operator does not delete teams from Atlas if you remove them from an Atlas project in Kubernetes with the Atlas Kubernetes Operator.

  • Explicitly define your desired configuration details in order to avoid implicitly using default Atlas configuration values. In some cases, inheriting Atlas defaults may result in a reconciliation loop which can prevent your custom resource from achieving a READY state. For example, explicitly defining your desired autoscaling behavior in your AtlasDeployment custom resource, as shown in the included example, ensures that a static instance size in your custom resource is not being repeatedly applied to an Atlas deployment which has autoscaling enabled.

    autoScaling:
    diskGB:
    enabled: true
    compute:
    enabled: true
    scaleDownEnabled: true
    minInstanceSize: M30
    maxInstanceSize: M40

Atlas Kubernetes Operator uses the Atlas Clusters API Resource and Advanced Clusters API Resource to create a new federated database instance or update an existing federated database instance. If you specify values for fields under spec.serverlessSpec, Atlas Kubernetes Operator uses the Atlas Serverless Instance API Resource to create or configure private endpoints for your federated database instance.

If you remove the AtlasDataFederation resource from your Kubernetes cluster, Atlas Kubernetes Operator removes the federated database instance from Atlas.

The following example shows an AtlasDataFederation custom resource specification with configured private endpoints:

apiVersion: atlas.mongodb.com/v1
kind: AtlasDataFederation
metadata:
name: my-federated-deployment
spec:
projectRef:
name: my-project
namespace: default
cloudProviderConfig:
aws:
roleId: 12345678
testS3Bucket: my-bucket
dataProcessRegion:
cloudProvider: AWS
region: OREGON_USA
name: my-fdi
storage:
databases:
- collections:
- dataSources:
- allowInsecure: false
collection: my-collection
collectionRegex:
database: my-database
databaseRegex:
defaultFormat: ".avro"
path: /
provenanceFieldName: string
storeName: my-data-store
urls:
- string:
name: my-collection-mdb
maxWildcardCollections: 100
name: my-database-mdb
views:
- name: my-view
pipeline:
source: my-source-collection
stores:
- name: my-store
provider: S3
additionalStorageClasses:
- STANDARD
bucket: my-bucket
delimiter: /
includeTags: false
prefix: data-
public: false
region: US_WEST_1
privateEndpoints:
- endpointId: vpce-3bf78b0ddee411ba1
provider: AWS
type: DATA_LAKE
- endpointId: vpce-3bf78b0ddee411ba2
provider: AWS
type: DATA_LAKE

This section describes some of the key AtlasDataFederation custom resource parameters available. For a full list of available parameters, see the Atlas Data Federation API.

Refer to these descriptions, the available examples, and the API documentation to customize your specifications.

metadata.name

Type: string

Required

Label that identifies the AtlasDataFederation Custom Resource that Atlas Kubernetes Operator uses to add this federated database instance to a project.

spec.cloudProviderConfig

Type: object

Required

List that contains the cloud provider configurations for the federated database instance.

spec.cloudProviderConfig.aws

Type: object

Required

Name of the cloud service provider that hosts the federated database instance.

spec.cloudProviderConfig.aws.roleId

Type: string

Required

Unique identifier of the role that the federated database instance can use to access the data stores.

spec.cloudProviderConfig.aws.testS3Bucket

Type: string

Required

Name of the S3 data bucket that the provided role ID is authorized to access.

spec.dataProcessRegion

Type: object

Required

Information about the cloud provider region to which the federated database instance routes client connections. Atlas Kubernetes Operator supports only AWS.

spec.dataProcessRegion.cloudProvider

Type: string

Required

Name of the cloud service provider that hosts the federated database instance's data stores. Atlas Kubernetes Operator accepts the following values:

  • AWS

  • TENANT

  • SERVERLESS

spec.dataProcessRegion.region

Type: string

Required

Label that indicates the geographical location of the federated database instance's data stores. Atlas Kubernetes Operator accepts the following values:

  • SYDNEY_AUS

  • MUMBAI_IND

  • FRANKFURT_DEU

  • DUBLIN_IRL

  • LONDON_GBR

  • VIRGINIA_USA

  • OREGON_USA

  • SAOPAULO_BRA

  • SINGAPORE_SGP

spec.name

Type: string

Optional

Label that identifies the federated database instance in Atlas.

spec.storage

Type: object

Optional

Configuration information for each data store and its mapping to Atlas databases.

spec.storage.databases

Type: array

Optional

List that contains the queryable databases and collections for this federated database instance.

spec.storage.databases.collections

Type: array

Optional

List of collections and data sources that map to a stores data store.

spec.storage.databases.collections.dataSources

Type: array

Optional

List that contains the data stores that map to a collection for this federated database instance.

spec.storage.databases.collections.dataSources.allowInsecure

Type: boolean

Optional

Flag that validates the scheme in the specified URLs. If true, Atlas Kubernetes Operator allows the insecure HTTP scheme, doesn't verify the server's certificate chain and hostname, and accepts any certificate with any hostname presented by the server. If false, Atlas Kubernetes Operator allows secure the HTTPS scheme only.

spec.storage.databases.collections.dataSources.collection

Type: string

Optional

Human-readable label that identifies the collection in the database. To create a wildcard (*) collection, you must omit this parameter.

spec.storage.databases.collections.dataSources.collectionRegex

Type: string

Optional

Regex pattern to use to create a wildcard (*) collection.

spec.storage.databases.collections.dataSources.database

Type: string

Optional

Human-readable label that identifies the database, which contains the collection in the cluster. You must omit this parameter to generate wildcard (*) collections for dynamically-generated databases.

spec.storage.databases.collections.dataSources.databaseRegex

Type: string

Optional

Regex pattern to use to create the wildcard (*) database.

spec.storage.databases.collections.dataSources.defaultFormat

Type: string

Optional

File format that Atlas Kubernetes Operator uses if it encounters a file without a file extension while searching storeName. Atlas Kubernetes Operator accepts the following values:

  • .avro

  • .avro.bz2

  • .avro.gz

  • .bson

  • .bson.bz2

  • .bson.gz

  • .bsonx

  • .csv

  • .csv.bz2

  • .csv.gz

  • .json

  • .json.bz2

  • .json.gz

  • .orc

  • .parquet

  • .tsv

  • .tsv.bz2

  • .tsv.gz

spec.storage.databases.collections.dataSources.path

Type: string

Optional

File path that controls how Atlas Kubernetes Operator searches for and parses files in the storeName before mapping them to a collection. Specify / to capture all files and folders from the prefix path.

spec.storage.databases.collections.dataSources.provenanceFieldName

Type: string

Optional

Human-readable label that identifies the field that includes the provenance of the documents in the results. Atlas Kubernetes Operator returns different fields in the results for each supported provider.

spec.storage.databases.collections.dataSources.storeName

Type: string

Optional

Human-readable label that identifies the data store that Atlas Kubernetes Operator maps to the collection.

spec.storage.databases.collections.dataSources.urls

Type: array

Optional

URLs of the publicly-accessible data files. You can't specify URLs that require authentication. Atlas Data Federation creates a partition for each URL. If empty or omitted, Atlas Data Federation uses the URLs from the store specified in the dataSources.storeName parameter.

spec.storage.databases.collections.name

Type: string

Optional

Human-readable label that identifies the collection to which Atlas Kubernetes Operator maps the data in the data stores.

spec.storage.databases.maxWildcardCollections

Type: int32

Optional

Maximum number of wildcard collections in the database. This only applies to S3 data sources. The default value is 100.

spec.storage.databases.name

Type: string

Optional

Human-readable label that identifies the database to which the federated database instance maps data.

spec.storage.databases.views

Type: array

Optional

List of aggregation pipelines that apply to the collection. This only applies to S3 data sources.

spec.storage.databases.views.name

Type: string

Optional

Human-readable label that identifies the view, which corresponds to an aggregation pipeline on a collection.

spec.storage.databases.views.pipeline

Type: string

Optional

Aggregation pipeline stages to apply to the source collection.

spec.storage.databases.views.source

Type: string

Optional

Human-readable label that identifies the source collection for the view.

spec.storage.stores

Type: array

Optional

List that contains the data stores for the federated database instance.

spec.storage.stores.name

Type: string

Optional

Human-readable label that identifies the data store. The spec.storage.databases.collections.dataSources.storeName field references this values as part of the mapping configuration.

spec.storage.stores.provider

Type: string

Conditional

Provider for the store. Atlas Kubernetes Operator supports only S3. You must specify this field to use a data store.

spec.storage.stores.additionalStorageClasses

Type: array

Optional

Collection of AWS S3 storage classes. Atlas Data Federation includes the files in these storage classes in the query results. Atlas Kubernetes Operator accepts the following values:

  • STANDARD

  • INTELLIGENT_TIERING

  • STANDARD_IA

spec.storage.stores.bucket

Type: string

Optional

Human-readable label that identifies the AWS S3 bucket. This label must exactly match the name of an S3 bucket that the federated database instance can access with the configured AWS IAM credentials.

spec.storage.stores.delimiter

Type: string

Optional

The delimiter that separates spec.storage.databases.collections.dataSources.path segments in the data store. Atlas Kubernetes Operator uses the delimiter to efficiently traverse S3 buckets with a hierarchical directory structure. You can specify any character supported by the S3 object keys as the delimiter. For example, you can specify an underscore (_) or a plus sign (+) or multiple characters, such as double underscores (__) as the delimiter. If omitted, defaults to /.

spec.storage.stores.includeTags

Type: boolean

Optional

Flag that indicates whether to use S3 tags on the files in the given path as additional partition attributes. If set to true, Atlas Kubernetes Operator adds the S3 tags as additional partition attributes and adds new top-level BSON elements associating each tag to each document. If omitted, defaults to false.

spec.storage.stores.prefix

Type: string

Optional

Prefix that Atlas Kubernetes Operator applies when searching for files in the S3 bucket. The data store prepends the value of prefix to the spec.storage.databases.collections.dataSources.path to create the full path for files to ingest. If omitted, Atlas Kubernetes Operator searches all files from the root of the S3 bucket.

spec.storage.stores.public

Type: boolean

Optional

Flag that indicates whether the bucket is public. If set to true, Atlas Kubernetes Operator doesn't use the configured AWS IAM role to access the S3 bucket. If set to false, the configured AWS IAM role must include permissions to access the S3 bucket.

spec.storage.stores.region

Type: string

Optional

AWS region that indicates the physical location of the S3 bucket.

spec.privateEndpoints

Type: array

Optional

List that contains the private endpoint configurations for the federated database instance.

spec.privateEndpoints.endpointId

Type: string

Required

Unique 22-character alphanumeric string starting with vpce- that identifies the private endpoint in AWS

spec.privateEndpoints.provider

Type: string

Optional

Human-readable label that identifies the cloud service provider. Atlas Data Federation supports only AWS.

spec.privateEndpoints.type

Type: string

Optional

Human-readable label that identifies the resource type associated with this private endpoint. Atlas Data Federation supports only DATA_LAKE.

spec.projectRef.name

Type: string

Required

Name of the project to which the federated database instance belongs. You must specify an existing AtlasProject Custom Resource.

spec.projectRef.namespace

Type: string

Required

Namespace in which the AtlasProject Custom Resource specified in spec.projectRef.name exists.

Back

AtlasTeam

On this page