/ /

/ /

语言分析器

使用特定于语言的分析器创建适合特定语言的索引。每种语言分析器都具有基于该语言使用模式的内置停用词和词划分。

MongoDB Search 提供以下语言分析器：

`lucene.arabic`	`lucene.armenian`	`lucene.basque`	`lucene.bengali`
`lucene.brazilian`	`lucene.bulgarian`	`lucene.catalan`	`lucene.chinese`
`lucene.cjk` ¹	`lucene.czech`	`lucene.danish`	`lucene.dutch`
`lucene.english`	`lucene.finnish`	`lucene.french`	`lucene.galician`
`lucene.german`	`lucene.greek`	`lucene.hindi`	`lucene.hungarian`
`lucene.indonesian`	`lucene.irish`	`lucene.italian`	`lucene.japanese`
`lucene.korean`	`lucene.kuromoji` ²	`lucene.latvian`	`lucene.lithuanian`
`lucene.morfologik` ³	`lucene.nori` ⁴	`lucene.norwegian`	`lucene.persian`
`lucene.polish`	`lucene.portuguese`	`lucene.romanian`	`lucene.russian`
`lucene.smartcn` ⁵	`lucene.sorani`	`lucene.spanish`	`lucene.swedish`
`lucene.thai`	`lucene.turkish`	`lucene.ukrainian`

¹ cjk 是通用的中文、日文和韩文分析器

² kuromoji 是日文分析器

³ morfologik 是一个波兰语分析器

⁴ nori 是韩语分析器

⁵ smartcn 是一个中文分析器

示例

以一个名为cars的集合为例，其中包含以下文档：

{
  "_id": 1,
  "subject": {
    "en": "It is better to equip our cars to understand the causes of the accident.",
    "fr": "Mieux équiper nos voitures pour comprendre les causes d'un accident.",
    "he": "עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה."
  }
}

{
  "_id": 2,
  "subject": {
    "en": "The best time to do this is immediately after you've filled up with fuel",
    "fr": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant.",
    "he": "הזמן הטוב ביותר לעשות זאת הוא מיד לאחר שמילאת דלק."
  }
}

内置语言分析器示例

以下示例索引定义使用french分析器指定subject.fr字段上的索引：

{
  "mappings": {
    "fields": {
      "subject": {
        "fields": {
          "fr": {
            "analyzer": "lucene.french",
            "type": "string"
          }
        },
        "type": "document"
      }
    }
  }
}

以下查询在 subject.fr 字段中搜索string pour：

db.cars.aggregate([
  {
    $search: {
      "text": {
        "query": "pour",
        "path": "subject.fr"
      }
    }
  },
  {
    $project: {
      "_id": 0,
      "subject.fr": 1
    }
  }
])

使用french分析器时，上一个查询不会返回任何结果，因为pour是内置停用词。使用standard分析器，同一查询将返回两个文档。

以下查询在 subject.fr 字段中搜索string carburant：

db.cars.aggregate([
  {
    $search: {
      "text": {
        "query": "carburant",
        "path": "subject.fr"
      }
    }
  },
  {
    $project: {
      "_id": 0,
      "subject.fr": 1
    }
  }
])

{ subject: { fr: "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." } }

MongoDB Search 在结果中返回包含 _id: 1 的文档，因为该查询与 lucene.french分析器为该文档创建的词元匹配。lucene.french分析器使用 _id: 1 为文档中的 subject.fr字段创建以下词元：

`meileu`	`moment`	`fair`
`est`	`imediat`	`aprè`
`fait`	`plein`	`carburant`

自定义语言分析器示例

您还可以使用 icuFolding 和 stopword 词元过滤器创建自定义分析器，为不支持的语言创建索引。

以下示例索引定义使用名为 myHebrewAnalyzer 的自定义分析器在 subject.he 字段上指定索引，用于分析和创建适用于希伯来语文本的词元：

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "subject": {
        "fields": {
          "he": {
            "analyzer": "myHebrewAnalyzer",
            "type": "string"
          }
        },
        "type": "document"
      }
    }
  },
  "analyzers": [
    {
      "charFilters": [],
      "name": "myHebrewAnalyzer",
      "tokenFilters": [
        {
          "type": "icuFolding"
        },
        {
          "tokens": [
            "אן",
            "שלנו",
            "זה",
            "אל"
          ],
          "type": "stopword"
        }
      ],
      "tokenizer": {
        "type": "standard"
      }
    }
  ]
}

以下查询在 subject.he 字段中搜索string המכוניות：

db.cars.aggregate([
  {
    $search: {
      "text": {
        "query": "המכוניות",
        "path": "subject.he"
      }
    }
  },
  {
    $project: {
      "_id": 0,
      "subject.he": 1
    }
  }
])

{ subject: { he: 'עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה.' } }

MongoDB Search 在结果中返回包含 _id: 1 的文档，因为该查询与 myHebrewAnalyzer分析器为文档创建的词元匹配。myHebrewAnalyzer分析器使用 _id: 1 为文档中的 subject.he字段创建以下词元：

`עדיף`	`לצייד`	`את`
`המכוניות`	`כדי`	`להבין`
`את`	`הגורמים`	`לתאונה`

多语言搜索示例

您还可以创建一个使用多种语言分析器的索引来执行多语言搜索。

以下示例索引定义在 sample_mflix.movies集合上指定具有动态映射的索引。该定义应用 lucene.italian语言分析器来索引fullplot字段，并使用 multi 选项指定 lucene.english 作为备用语言分析器。MongoDB Search 对其在 movies集合中动态索引的所有其他字段使用默认的lucene.english语言分析器。

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": true,
    "fields": {
      "fullplot": {
        "type": "string",
        "analyzer": "lucene.italian",
        "multi": {
          "fullplot_english": {
            "type": "string",
            "analyzer": "lucene.english",
          }
        }
      }
   }
  }
}

以下MongoDB Search查询使用以下复合运算符符子句来查询集合：

must 子句使用文本操作符搜索包含术语 Bella 的英语和意大利语电影情节
mustNot 子句使用范围操作符排除 1984 至 2016 年间上映的电影
should 子句使用文本操作符指定 Comedy 类型的偏好

[
  {
    $search: {
      "index": "multilingual-tutorial",
      "compound": {
        "must": [{
          "text": {
            "query": "Bella",
            "path": { "value": "fullplot", "multi": "fullplot_english" }
          }
        }],
        "mustNot": [{
          "range": {
            "path": "released",
            "gt": ISODate("1984-01-01T00:00:00.000Z"),
            "lt": ISODate("2016-01-01T00:00:00.000Z")
          }
        }],
        "should": [{
          "text": {
            "query": "Comedy",
            "path": "genres"
          }
        }]
      }
    }
  }
]

SCORE: 3.909510850906372  _id: "573a1397f29313caabce8bad"
  plot: "He is a revenge-obssessed stevedore whose sister was brutally raped an…"
  genres:
    0: "Drama"
  runtime: 137
  fullplot: "In Marseilles, a woman commits suicide after she is raped in an alley.…"
  released: 1983-05-18T00:00:00.000+00:00
SCORE: 3.4253346920013428  _id: "573a1396f29313caabce5735"
  plot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: …"
  genres:
    0: "Comedy"
  runtime: 100
  fullplot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: …"
  released: 1974-11-15T00:00:00.000+00:00
SCORE: 3.363344430923462  _id: "573a1395f29313caabce13cf"
  plot: "Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è de…"
  genres:
    0: "Comedy"
  runtime: 95
  fullplot: "Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è de…"
  released: 1960-02-10T00:00:00.000+00:00
SCORE: 1.9502882957458496  _id: "573a1396f29313caabce5299"
  plot: "Dr Tremayne is an enigmatic Psychiatrist running a
  Futuristic asylum h…"
  genres:
    0: "Horror"
  runtime: 90
  fullplot: "Dr Tremayne is an enigmatic Psychiatrist running a Futuristic asylum h…"
  released: 1973-10-31T00:00:00.000+00:00

后退

Keyword

来年

多分析器