Generate Wildcard Collections
You can dynamically generate collection names that map to data in your
Atlas Data Lake datasets. To dynamically generate collection names, specify the
wildcard, *
, as the value for the collection name setting in your
federated database instance storage configuration.
You can use the storageSetConfig
command, the Atlas UI, and the
create
collections command to configure
the settings for generating wildcard (*
) collections.
To generate wildcard collections for your Data Lake datasets, you must configure the following settings in your federated database instance storage configuration:
Specify
*
as the value for thedatabases.[n].collections.[n].name
setting.Specify the dataset prefix name as the value for
databases.[n].collections.[n].dataSources.[n].datasetPrefix
to map collections to the dataset names with the specified prefix.
You can also optionally specify the following settings:
A trim level that specifies the number fields of the dataset name to trim from the left of the dataset name before mapping the remaining fields to a wildcard collection name through the
databases.[n].collections.[n].dataSources.[n].trimLevel
option.The maximum number of datasets from which to dynamically generate collections for the data source through the
databases.[n].collections.[n].dataSources.[n].maxDatasets
parameter. Atlas Data Lake selects the data in reverse lexicographical order. To select the latest, set the value to1
.
Example
"databases" : [ { "name" : "<db-name>", "collections" : [ { "name" : "*", "dataSources" : [ { "storeName" : "<adl-store-name>", "datasetPrefix": "<dl-dataset-prefix>", "trimLevel": 5, "maxDatasets": 10 } ] } ] } ]