AtlasDataFederation
Custom Resource
On this page
Note
Atlas Kubernetes Operator doesn't support the AtlasDataFederation
custom resource for
Atlas for Government.
The AtlasDataFederation
custom resource configures a
federated database instance
in Atlas. When
you create the AtlasDataFederation
custom resource, Atlas Kubernetes Operator tries
to create or update a federated database instance in Atlas. You can use an federated database instance to
run federated queries.
Important
Custom Resources No Longer Delete Objects by Default
Atlas Kubernetes Operator uses custom resource configuration
files to manage your Atlas configuration, but as of Atlas Kubernetes Operator 2.0,
custom resources you delete in Kubernetes are no longer deleted in
Atlas. Instead, Atlas Kubernetes Operator simply stops managing those resources.
For example, if you delete an AtlasProject
Custom Resource
in Kubernetes, Atlas Kubernetes Operator no longer automatically deletes the corresponding project
from Atlas, preventing accidental or unexpected deletions. To learn more,
including how to revert this behavior to
the default used prior to Atlas Kubernetes Operator 2.0, see New Default: Deletion Protection in Atlas Kubernetes Operator 2.0.
Atlas Kubernetes Operator uses the Atlas Clusters API Resource and Advanced Clusters API Resource to create a new federated database instance or
update an existing federated database instance. If you specify values for fields under
spec.serverlessSpec
, Atlas Kubernetes Operator uses the Atlas
Serverless Instance API Resource to create or configure private endpoints for
your federated database instance.
If you remove the AtlasDataFederation
resource from your Kubernetes
cluster, Atlas Kubernetes Operator removes the federated database instance from Atlas.
Examples
The following example shows an AtlasDataFederation
custom resource
specification with configured private endpoints:
apiVersion: atlas.mongodb.com/v1 kind: AtlasDataFederation metadata: name: my-federated-deployment spec: projectRef: name: my-project namespace: default cloudProviderConfig: aws: roleId: 12345678 testS3Bucket: my-bucket dataProcessRegion: cloudProvider: AWS region: OREGON_USA name: my-fdi storage: databases: - collections: - dataSources: - allowInsecure: false collection: my-collection collectionRegex: database: my-database databaseRegex: defaultFormat: ".avro" path: / provenanceFieldName: string storeName: my-data-store urls: - string: name: my-collection-mdb maxWildcardCollections: 100 name: my-database-mdb views: - name: my-view pipeline: source: my-source-collection stores: - name: my-store provider: S3 additionalStorageClasses: - STANDARD bucket: my-bucket delimiter: / includeTags: false prefix: data- public: false region: US_WEST_1 privateEndpoints: - endpointId: vpce-3bf78b0ddee411ba1 provider: AWS type: DATA_LAKE - endpointId: vpce-3bf78b0ddee411ba2 provider: AWS type: DATA_LAKE
Parameters
This section describes some of the key AtlasDataFederation
custom
resource parameters available. For a full list of available parameters, see the Atlas Data Federation API.
Refer to these descriptions, the available examples, and the API documentation to customize your specifications.
metadata.name
Type: string
Required
Label that identifies the
AtlasDataFederation
Custom Resource that Atlas Kubernetes Operator uses to add this federated database instance to a project.
spec.cloudProviderConfig
Type: object
Required
List that contains the cloud provider configurations for the federated database instance.
spec.cloudProviderConfig.aws
Type: object
Required
Name of the cloud service provider that hosts the federated database instance.
spec.cloudProviderConfig.aws.roleId
Type: string
Required
Unique identifier of the role that the federated database instance can use to access the data stores.
spec.cloudProviderConfig.aws.testS3Bucket
Type: string
Required
Name of the S3 data bucket that the provided role ID is authorized to access.
spec.dataProcessRegion
Type: object
Required
Information about the cloud provider region to which the federated database instance routes client connections. Atlas Kubernetes Operator supports only AWS.
spec.dataProcessRegion.cloudProvider
Type: string
Required
Name of the cloud service provider that hosts the federated database instance's data stores. Atlas Kubernetes Operator accepts the following values:
AWS
TENANT
SERVERLESS
spec.dataProcessRegion.region
Type: string
Required
Label that indicates the geographical location of the federated database instance's data stores. Atlas Kubernetes Operator accepts the following values:
SYDNEY_AUS
MUMBAI_IND
FRANKFURT_DEU
DUBLIN_IRL
LONDON_GBR
VIRGINIA_USA
OREGON_USA
SAOPAULO_BRA
SINGAPORE_SGP
spec.name
Type: string
Optional
Label that identifies the federated database instance in Atlas.
spec.storage
Type: object
Optional
Configuration information for each data store and its mapping to Atlas databases.
spec.storage.databases
Type: array
Optional
List that contains the queryable databases and collections for this federated database instance.
spec.storage.databases.collections
Type: array
Optional
List of collections and data sources that map to a
stores
data store.
spec.storage.databases.collections.dataSources
Type: array
Optional
List that contains the data stores that map to a collection for this federated database instance.
spec.storage.databases.collections.dataSources.allowInsecure
Type: boolean
Optional
Flag that validates the scheme in the specified URLs. If
true
, Atlas Kubernetes Operator allows the insecureHTTP
scheme, doesn't verify the server's certificate chain and hostname, and accepts any certificate with any hostname presented by the server. Iffalse
, Atlas Kubernetes Operator allows secure theHTTPS
scheme only.
spec.storage.databases.collections.dataSources.collection
Type: string
Optional
Human-readable label that identifies the collection in the database. To create a wildcard (
*
) collection, you must omit this parameter.
spec.storage.databases.collections.dataSources.collectionRegex
Type: string
Optional
Regex pattern to use to create a wildcard (
*
) collection.
spec.storage.databases.collections.dataSources.database
Type: string
Optional
Human-readable label that identifies the database, which contains the collection in the cluster. You must omit this parameter to generate wildcard (
*
) collections for dynamically-generated databases.
spec.storage.databases.collections.dataSources.databaseRegex
Type: string
Optional
Regex pattern to use to create the wildcard (
*
) database.
spec.storage.databases.collections.dataSources.defaultFormat
Type: string
Optional
File format that Atlas Kubernetes Operator uses if it encounters a file without a file extension while searching
storeName
. Atlas Kubernetes Operator accepts the following values:.avro
.avro.bz2
.avro.gz
.bson
.bson.bz2
.bson.gz
.bsonx
.csv
.csv.bz2
.csv.gz
.json
.json.bz2
.json.gz
.orc
.parquet
.tsv
.tsv.bz2
.tsv.gz
spec.storage.databases.collections.dataSources.path
Type: string
Optional
File path that controls how Atlas Kubernetes Operator searches for and parses files in the
storeName
before mapping them to a collection. Specify/
to capture all files and folders from the prefix path.
spec.storage.databases.collections.dataSources.provenanceFieldName
Type: string
Optional
Human-readable label that identifies the field that includes the provenance of the documents in the results. Atlas Kubernetes Operator returns different fields in the results for each supported provider.
spec.storage.databases.collections.dataSources.storeName
Type: string
Optional
Human-readable label that identifies the data store that Atlas Kubernetes Operator maps to the collection.
spec.storage.databases.collections.dataSources.urls
Type: array
Optional
URLs of the publicly-accessible data files. You can't specify URLs that require authentication. Atlas Data Federation creates a partition for each URL. If empty or omitted, Atlas Data Federation uses the URLs from the store specified in the dataSources.storeName parameter.
spec.storage.databases.collections.name
Type: string
Optional
Human-readable label that identifies the collection to which Atlas Kubernetes Operator maps the data in the data stores.
spec.storage.databases.maxWildcardCollections
Type: int32
Optional
Maximum number of wildcard collections in the database. This only applies to S3 data sources. The default value is
100
.
spec.storage.databases.name
Type: string
Optional
Human-readable label that identifies the database to which the federated database instance maps data.
spec.storage.databases.views
Type: array
Optional
List of aggregation pipelines that apply to the collection. This only applies to S3 data sources.
spec.storage.databases.views.name
Type: string
Optional
Human-readable label that identifies the view, which corresponds to an aggregation pipeline on a collection.
spec.storage.databases.views.pipeline
Type: string
Optional
Aggregation pipeline stages to apply to the source collection.
spec.storage.databases.views.source
Type: string
Optional
Human-readable label that identifies the source collection for the view.
spec.storage.stores
Type: array
Optional
List that contains the data stores for the federated database instance.
spec.storage.stores.name
Type: string
Optional
Human-readable label that identifies the data store. The spec.storage.databases.collections.dataSources.storeName field references this values as part of the mapping configuration.
spec.storage.stores.provider
Type: string
Conditional
Provider for the store. Atlas Kubernetes Operator supports only
S3
. You must specify this field to use a data store.
spec.storage.stores.additionalStorageClasses
Type: array
Optional
Collection of AWS S3 storage classes. Atlas Data Federation includes the files in these storage classes in the query results. Atlas Kubernetes Operator accepts the following values:
STANDARD
INTELLIGENT_TIERING
STANDARD_IA
spec.storage.stores.bucket
Type: string
Optional
Human-readable label that identifies the AWS S3 bucket. This label must exactly match the name of an S3 bucket that the federated database instance can access with the configured AWS IAM credentials.
spec.storage.stores.delimiter
Type: string
Optional
The delimiter that separates spec.storage.databases.collections.dataSources.path segments in the data store. Atlas Kubernetes Operator uses the delimiter to efficiently traverse S3 buckets with a hierarchical directory structure. You can specify any character supported by the S3 object keys as the delimiter. For example, you can specify an underscore (
_
) or a plus sign (+
) or multiple characters, such as double underscores (__
) as the delimiter. If omitted, defaults to/
.
spec.storage.stores.includeTags
Type: boolean
Optional
Flag that indicates whether to use S3 tags on the files in the given path as additional partition attributes. If set to true, Atlas Kubernetes Operator adds the S3 tags as additional partition attributes and adds new top-level BSON elements associating each tag to each document. If omitted, defaults to
false
.
spec.storage.stores.prefix
Type: string
Optional
Prefix that Atlas Kubernetes Operator applies when searching for files in the S3 bucket. The data store prepends the value of prefix to the spec.storage.databases.collections.dataSources.path to create the full path for files to ingest. If omitted, Atlas Kubernetes Operator searches all files from the root of the S3 bucket.
spec.storage.stores.public
Type: boolean
Optional
Flag that indicates whether the bucket is public. If set to
true
, Atlas Kubernetes Operator doesn't use the configured AWS IAM role to access the S3 bucket. If set tofalse
, the configured AWS IAM role must include permissions to access the S3 bucket.
spec.storage.stores.region
Type: string
Optional
AWS region that indicates the physical location of the S3 bucket.
spec.privateEndpoints
Type: array
Optional
List that contains the private endpoint configurations for the federated database instance.
spec.privateEndpoints.endpointId
Type: string
Required
Unique 22-character alphanumeric string starting with
vpce-
that identifies the private endpoint in AWS
spec.privateEndpoints.provider
Type: string
Optional
Human-readable label that identifies the cloud service provider. Atlas Data Federation supports only
AWS
.
spec.privateEndpoints.type
Type: string
Optional
Human-readable label that identifies the resource type associated with this private endpoint. Atlas Data Federation supports only
DATA_LAKE
.
spec.projectRef.name
Type: string
Required
Name of the project to which the federated database instance belongs. You must specify an existing
AtlasProject
Custom Resource.
spec.projectRef.namespace
Type: string
Required
Namespace in which the
AtlasProject
Custom Resource specified in spec.projectRef.name exists.