Google Cloud PlatformGoogle Cloud Platform存储桶

在此页面上

Google Cloud Platform存储桶的配置示例Google Cloud Platform
配置格式

Atlas Data FederationAtlasGCP Data Federation支持GCP存储桶作为联合数据库实例存储。您必须在联合数据库实例中定义到 Cloud Storage 存储桶的映射，才能运行数据查询。

注意

在此页面中，我们将对象称为文件，将分隔符分隔的前缀称为目录。但是，这些对象存储服务实际上并不是文件系统，并且在所有情况下都不具有与硬盘驱动器上的文件相同的行为。

Google Cloud Platform存储桶的配置示例Google Cloud Platform

考虑包含从数据中心收集的数据的GoogleGoogle Cloud Platform Cloud Platform存储桶datacenter-alpha ：

|--metrics
  |--hardware

/metrics/hardware 路径存储 JSON 文件，其中包含源自数据中心硬件的指标，每个文件名是该文件所涵盖的 24 小时周期的 UNIX 时间戳（毫秒）：

/hardware/1564671291998.json

配置如下：

datacenter-alphaGoogle Cloud Platform在us-central1Google Cloud Platform Google Cloud Platform地区中的 Google Cloud Platform存储桶上定义联合数据库实例存储。联合数据库实例存储被明确限制为仅包含 metrics目录路径中的数据文件。定义分隔符 / 来模拟文件系统层次结构，以便于导航和检索。
将 hardware 目录中的文件映射到 MongoDB 数据库 datacenter-alpha-metrics 和集合 hardware。配置映射包括用于捕获文件名中隐含的时间戳的解析逻辑。

{
  "stores" : [
    {
      "name" : "datacenter-alpha",
      "provider" : "gcs",
      "region" : "us-central1",
      "bucket" : "datacenter-alpha",
      "prefix": "metrics",
      "delimiter": "/"
    }
  ],
  "databases" : [
    {
      "name" : "datacenter-alpha-metrics",
      "collections" : [
        {
          "name" : "hardware",
          "dataSources" : [
            {
              "storeName" : "datacenter-alpha",
              "path" : "/hardware/{date date}"
            }
          ]
        }
      ]
    }
  ]
}

Atlas Data FederationAtlasGoogle Cloud Platform Data Federation解析Google Cloud Platform存储桶datacenter-alpha 并处理/metrics/hardware/ 下的所有文件。collections对象使用路径解析语法将文件名映射到每个文档中的 date字段，即 ISO-8601 日期。如果文档中不存在匹配的 date字段， Atlas Data Federation会添加该字段。

连接到联合数据库实例的用户可以使用MongoDB查询语言和支持的聚合，通过MongoDB GCPdatacenter-alpha-metrics.hardware集合分析GCP存储桶中的数据。

配置格式

Atlas Data FederationGoogle Cloud Platform为了支持Google Cloud Platform上的Atlas Data Federation ，联合数据库实例配置具有以下原型形式：

1 {
2   "stores" : [
3     {
4       "name" : "<string>",
5       "provider" : "<string>",
6       "region" : "<string>",
7       "bucket" : "<string>",
8       "prefix": "<string>",
9       "delimiter": "<string>"
10     }
11   ],
12   "databases" : [
13     {
14       "name" : "<string>",
15       "collections" : [
16 	{
17 	  "name" : "<string>",
18 	  "dataSources" : [
19 	    {
20 	      "storeName" : "<string>",
21 	      "path" : "<string>",
22 	      "defaultFormat" : "<string>",
23 	      "provenanceFieldName": "<string>",
24 	      "omitAttributes": <boolean>
25 	    }
26 	  ]
27 	}
28       ],
29       "maxWildcardCollections" : <integer>,
30       "views" : [ 
31 	{
32 	  "name" : "<string>", 
33 	  "source" : "<string>", 
34 	  "pipeline" : "<string>" 
35 	}
36       ] 
37     }
38   ]
39 }
40

字段

类型

必要性

说明

stores

阵列

必需

对象数组，其中每个对象代表一个与联合数据库实例关联的数据存储。联合数据库实例存储捕获：

GoogleGoogle Cloud Platform Cloud Platform存储桶中的文件
Atlas 集群中的文档
存储在可公开访问的 URL 上的文件。

Atlas Data Federation 只能访问 stores 对象中定义的数据存储。

stores.[n]. name

字符串

必需

联合数据库实例存储的名称。 databases.[n].collections.[n].dataSources.[n].storeName字段引用此值作为映射配置的一部分。

stores.[n]. provider

字符串

必需

存储数据的云提供商名称。对于Google Cloud Platform存储桶，值必须为gcs 。Google Cloud Platform

stores.[n]. region

字符串

必需

Google Cloud Platform托管GoogleGoogle Cloud Platform Cloud Platform存储桶的Google Cloud Platform地区的名称。有关有效地区名称的列表，请参阅Google Cloud Platform (GCP)。

stores.[n]. bucket

字符串

必需

GoogleGoogle Cloud Platform Cloud Platform存储桶的名称。Google Cloud Platform必须与Atlas Data Federation必须访问权限的Google CloudAtlas Data Federation Platform存储桶的名称完全匹配。

stores.[n]. prefix

字符串

Optional

Prefix AtlasAtlas Data Federation Data Federation在GoogleGoogle Cloud Platform Cloud Platform Storage 存储桶中搜索文件时适用。示例，考虑具有以下结构的GoogleGoogle Cloud Platform Cloud Platform Storage 存储桶metrics ：

metrics
  |--hardware
  |--software
    |--computed

联合数据库实例存储将 prefix 的值预先添加到 databases.[n].collections.[n].dataSources.[n].path 中，以创建要引入的文件的完整路径。将 prefix 设置为 /software 会将使用联合数据库实例的任何 databases 对象限制为仅 /software 的子路径。

默认为GoogleGoogle Cloud Platform Cloud Platform Storage 存储桶的根，检索所有文件。

stores.[n]. delimiter

字符串

Optional

用于分隔联合数据库实例存储中的 databases.[n].collections.[n].dataSources.[n].path 段的分隔符。Atlas Data FederationAtlasGoogle Cloud Platform Data Federation使用分隔符，通过模拟的分层目录结构高效地遍历Google Cloud Platform存储桶。

databases

阵列

必需

对象数组，其中每个对象均代表一个数据库、其集合以及（可选）这些集合的所有视图。每个数据库均可有多个 collections 和 views 对象。

databases.[n]. name

字符串

必需

Atlas Data Federation 将数据存储中包含的数据映射到数据库的数据库名称。

databases.[n]. collections

阵列

必需

对象数组，其中每个对象代表一个集合和映射到 stores联合数据库实例存储的数据源。

databases.[n]. collections.[n]. name

字符串

必需

Atlas Data Federation将每个 databases.[n].collections.[n].dataSources.[n].storeName 中包含的数据映射到的集合的名称。大量中的每个对象代表集合与 stores大量中对象之间的映射。

您可以通过为集合名称指定 * 并在路径字段中指定 collectionName() 函数，从文件路径动态生成集合名称。有关示例，请参阅从文件路径生成动态集合名称。

databases.[n]. collections.[n]. dataSources

阵列

必需

对象数组，其中每个对象代表要与集合映射的 stores联合数据库实例存储。

databases.[n]. collections.[n]. dataSources.[n]. storeName

字符串

必需

要映射到 <collection> 的联合数据库实例存储的名称。必须匹配 stores大量中对象的 name。

databases.[n]. collections.[n]. dataSources.[n]. path

字符串

必需

控制Atlas Data Federation在将文件映射到 <collection> 之前如何搜索和解析 databases.[n].collections.[n].dataSources.[n].storeName 中的文件。联合数据库实例将 stores.[n].prefix 置于 path 之前，以构建要在内部搜索的完整路径。指定 / 以捕获 prefix 路径中的所有文件和文件夹。

示例，假设有一个名为的GoogleGoogle Cloud Platform Cloud Platform Storagemetrics 存储桶，其结构如下：

metrics
|--hardware
|--software
  |--computed

/的path指示 Atlas Data Federation 搜索metrics存储桶中的所有文件和文件夹。

/hardware 的 path 指示 Atlas Data Federation 仅搜索该路径下的文件进行摄取。

如果 stores.[n].prefix 为 software，则Atlas Data Federation仅搜索路径 /software/computed 中的文件。

将 * 通配符附加到此路径会指示 Atlas Data Federation 包含此路径中从该时间点起的所有文件和文件夹。例如，/software/computed* 将匹配 /software/computed-detailed、/software/computedArchive 和 /software/computed/errors 等文件。

databases.[n].collections.[n].dataSources.[n].path 支持用于解析文件名的其他语法，包括：

从文件名生成文档字段
使用正则表达式控制字段生成。
设置按时间戳分桶文件名的界线。

请参阅为 S3 数据定义路径以了解更多信息。

指定 path 时：

指定分区属性的数据类型。
确保分区属性类型与要解析的数据类型相匹配。
使用 delimiter 中指定的分隔符。

在指定相同类型的属性时，请执行以下任一操作：

在属性之间添加常量分隔符。
使用正则表达式来描述搜索模式。要了解更多信息，请参阅不支持的解析函数。

databases.[n]. collections.[n]. dataSources.[n]. defaultFormat

字符串

Optional

Data Federation在搜索 databases.[n].collections.[n].dataSources.[n].storeName 时遇到不带扩展名的文件时采用的默认格式。

以下值对 defaultFormat 字段有效：

.json, .json.gz, .bson, .bson.gz, .avro, .avro.gz, .orc, .tsv, .tsv.gz, .csv, .csv.gz, .parquet

如果您的文件格式为 CSV 或 TSV，则必须在数据中包含标题行。有关详细信息，请参阅 CSV 和 TSV。

如果省略，Data Federation 将尝试处理文件的几个字节来检测文件类型。

另请参阅： 支持的数据格式

databases.[n]. collections.[n]. dataSources.[n]. provenanceFieldName

字符串

Optional

包含结果中文档来源的字段的名称。如果在存储配置中指定了此设置，Atlas Data Federation 则会为结果中的每个文档返回以下字段：

字段名称	说明
`provider`	联合数据库实例存储配置中的提供商 (`stores.[n].provider`)
`region`	Google Cloud PlatformGoogle Cloud`stores.[n].region` Platform地区()
`bucket`	GoogleGoogle Cloud Platform Cloud Platform存储桶的名称`stores.[n].bucket` ()
`key`	文档的路径 (`databases.[n].collections.[n].dataSources.[n].path`)
`lastModified`	文档最后修改的日期和时间。

您无法使用 Atlas 用户界面中的可视化编辑器来配置此设置。

databases.[n]. collections.[n]. dataSources.[n]. omitAttributes

布尔

Optional

指定是否省略 Atlas Data Federation 添加到集合中的文档的属性（键和值对）的标志。您可以指定以下值之一：

false - 添加属性
true - 省略属性

如果省略，则默认为 false，Atlas Data Federation 会添加属性。

示例，考虑一个名为 /employees/949-555-0195.json 的文件，您要为其配置 databases.[n].collections.[n].dataSources.[n].path /employees/{phone string}。如果 omitAttributes 为 false， Atlas Data Federation会将属性 phone: 949-555-0195 添加到此文件中的文档，无论文档中是否已存在键值对。如果设立omitAttributes 设置为 true， Atlas Data Federation不会将该属性添加到虚拟集合中的文档。

后退

部署

来年

部署

1	{
2	"stores" : [
3	{
4	"name" : "<string>",
5	"provider" : "<string>",
6	"region" : "<string>",
7	"bucket" : "<string>",
8	"prefix": "<string>",
9	"delimiter": "<string>"
10	}
11	],
12	"databases" : [
13	{
14	"name" : "<string>",
15	"collections" : [
16	{
17	"name" : "<string>",
18	"dataSources" : [
19	{
20	"storeName" : "<string>",
21	"path" : "<string>",
22	"defaultFormat" : "<string>",
23	"provenanceFieldName": "<string>",
24	"omitAttributes": <boolean>
25	}
26	]
27	}
28	],
29	"maxWildcardCollections" : <integer>,
30	"views" : [
31	{
32	"name" : "<string>",
33	"source" : "<string>",
34	"pipeline" : "<string>"
35	}
36	]
37	}
38	]
39	}
40