/ /

/ /

言語アナライザ

言語固有のアナライザを使用して、特定の言語にカスタマイズされたインデックスを作成します。各言語アナライザには、その言語の使用パターンに基づくストップワードと単語の除算が組み込まれています。

MongoDB Search は、次の言語アナライザを提供します。

`lucene.arabic`	`lucene.armenian`	`lucene.basque`	`lucene.bengali`
`lucene.brazilian`	`lucene.bulgarian`	`lucene.catalan`	`lucene.chinese`
`lucene.cjk` ¹	`lucene.czech`	`lucene.danish`	`lucene.dutch`
`lucene.english`	`lucene.finnish`	`lucene.french`	`lucene.galician`
`lucene.german`	`lucene.greek`	`lucene.hindi`	`lucene.hungarian`
`lucene.indonesian`	`lucene.irish`	`lucene.italian`	`lucene.japanese`
`lucene.korean`	`lucene.kuromoji` ²	`lucene.latvian`	`lucene.lithuanian`
`lucene.morfologik` ³	`lucene.nori` ⁴	`lucene.norwegian`	`lucene.persian`
`lucene.polish`	`lucene.portuguese`	`lucene.romanian`	`lucene.russian`
`lucene.smartcn` ⁵	`lucene.sorani`	`lucene.spanish`	`lucene.swedish`
`lucene.thai`	`lucene.turkish`	`lucene.ukrainian`

¹ cjk は、一般的な中国語、日本語、大文字と小文字のアナライザです

² kuromoji は、日本語のアナライザです

³ morfologik は、ポーランド語アナライザです

⁴ nori は、韓国語アナライザ

⁵ smartcn は中国語アナライザです

例

次のドキュメントを含むcarsという名前のコレクションについて考えてみます。

{
  "_id": 1,
  "subject": {
    "en": "It is better to equip our cars to understand the causes of the accident.",
    "fr": "Mieux équiper nos voitures pour comprendre les causes d'un accident.",
    "he": "עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה."
  }
}

{
  "_id": 2,
  "subject": {
    "en": "The best time to do this is immediately after you've filled up with fuel",
    "fr": "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant.",
    "he": "הזמן הטוב ביותר לעשות זאת הוא מיד לאחר שמילאת דלק."
  }
}

組み込み言語アナライザの例

次のインデックス定義の例では、 frenchアナライザを使用してsubject.frフィールドのインデックスを指定します。

{
  "mappings": {
    "fields": {
      "subject": {
        "fields": {
          "fr": {
            "analyzer": "lucene.french",
            "type": "string"
          }
        },
        "type": "document"
      }
    }
  }
}

次のクエリは、 subject.frフィールドで string pourを検索します。

db.cars.aggregate([
  {
    $search: {
      "text": {
        "query": "pour",
        "path": "subject.fr"
      }
    }
  },
  {
    $project: {
      "_id": 0,
      "subject.fr": 1
    }
  }
])

上記のクエリでは、 frenchアナライザを使用しても結果が返されません。 pourは組み込みのストップワードであるためです。 standardアナライザを使用すると、同じクエリで両方のドキュメントが返されます。

次のクエリは、 subject.frフィールドで string carburantを検索します。

db.cars.aggregate([
  {
    $search: {
      "text": {
        "query": "carburant",
        "path": "subject.fr"
      }
    }
  },
  {
    $project: {
      "_id": 0,
      "subject.fr": 1
    }
  }
])

{ subject: { fr: "Le meilleur moment pour le faire c'est immédiatement après que vous aurez fait le plein de carburant." } }

MongoDB Search では、結果に _id: 1 を含むドキュメントが返されます。このドキュメントは、ドキュメント用に lucene.frenchアナライザが作成したトークンとクエリが一致したためです。lucene.frenchアナライザは、_id: 1 を使用してドキュメントの subject.frフィールドに次のトークンを作成します。

`meileu`	`moment`	`fair`
`est`	`imediat`	`aprè`
`fait`	`plein`	`carburant`

カスタム言語アナライザの例

また、 icuフォールディングとストップワードトークンフィルターを使用してカスタムアナライザを作成し、サポートされていない言語のインデックスを作成することもできます。

次のインデックス定義の例では、 myHebrewAnalyzerというカスタムアナライザを使用してヘブライテキストのトークンを分析および作成し、 subject.heフィールドのインデックスを指定します。

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": false,
    "fields": {
      "subject": {
        "fields": {
          "he": {
            "analyzer": "myHebrewAnalyzer",
            "type": "string"
          }
        },
        "type": "document"
      }
    }
  },
  "analyzers": [
    {
      "charFilters": [],
      "name": "myHebrewAnalyzer",
      "tokenFilters": [
        {
          "type": "icuFolding"
        },
        {
          "tokens": [
            "אן",
            "שלנו",
            "זה",
            "אל"
          ],
          "type": "stopword"
        }
      ],
      "tokenizer": {
        "type": "standard"
      }
    }
  ]
}

次のクエリは、 subject.heフィールドで string המכוניותを検索します。

db.cars.aggregate([
  {
    $search: {
      "text": {
        "query": "המכוניות",
        "path": "subject.he"
      }
    }
  },
  {
    $project: {
      "_id": 0,
      "subject.he": 1
    }
  }
])

{ subject: { he: 'עדיף לצייד את המכוניות שלנו כדי להבין את הגורמים לתאונה.' } }

MongoDB Search では、結果に _id: 1 を含むドキュメントが返されます。このクエリは、myHebrewAnalyzerアナライザがドキュメント用に作成したトークンとクエリが一致したためです。myHebrewAnalyzerアナライザは、_id: 1 を使用してドキュメントの subject.heフィールドに次のトークンを作成します。

`עדיף`	`לצייד`	`את`
`המכוניות`	`כדי`	`להבין`
`את`	`הגורמים`	`לתאונה`

多言語検索の例

複数の言語アナライザを使用して多言語検索を実行するインデックスを作成することも可能です。

次のインデックス定義の例では、 sample_mflix.moviesコレクションに動的マッピングを含むインデックスを指定しています。この定義は、lucene.italian言語アナライザを適用して fullplotフィールドをインデックスし、マルチオプションを使用して lucene.english を代替言語アナライザとして指定します。MongoDB Search は、 moviesコレクション内の動的にインデックスを作成する他のすべてのフィールドに対してデフォルトのlucene.english言語アナライザを使用します。

{
  "analyzer": "lucene.standard",
  "mappings": {
    "dynamic": true,
    "fields": {
      "fullplot": {
        "type": "string",
        "analyzer": "lucene.italian",
        "multi": {
          "fullplot_english": {
            "type": "string",
            "analyzer": "lucene.english",
          }
        }
      }
   }
  }
}

次のMongoDB Search クエリでは、次の複合演算子句を使用してコレクションをクエリします。

must 句は、text 演算子を使用して Bella という用語を含む英語とイタリア語の映画のプロットを検索します
mustNot 句は、1984 年から 2016 年の間に公開された映画を range 演算子を使用して除外します
should 句は、text 演算子を使って Comedy ジャンルの好みを指定します

[
  {
    $search: {
      "index": "multilingual-tutorial",
      "compound": {
        "must": [{
          "text": {
            "query": "Bella",
            "path": { "value": "fullplot", "multi": "fullplot_english" }
          }
        }],
        "mustNot": [{
          "range": {
            "path": "released",
            "gt": ISODate("1984-01-01T00:00:00.000Z"),
            "lt": ISODate("2016-01-01T00:00:00.000Z")
          }
        }],
        "should": [{
          "text": {
            "query": "Comedy",
            "path": "genres"
          }
        }]
      }
    }
  }
]

SCORE: 3.909510850906372  _id: "573a1397f29313caabce8bad"
  plot: "He is a revenge-obssessed stevedore whose sister was brutally raped an…"
  genres:
    0: "Drama"
  runtime: 137
  fullplot: "In Marseilles, a woman commits suicide after she is raped in an alley.…"
  released: 1983-05-18T00:00:00.000+00:00
SCORE: 3.4253346920013428  _id: "573a1396f29313caabce5735"
  plot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: …"
  genres:
    0: "Comedy"
  runtime: 100
  fullplot: "Giovanna e' una bella ragazza, ma ha qualche problema con gli uomini: …"
  released: 1974-11-15T00:00:00.000+00:00
SCORE: 3.363344430923462  _id: "573a1395f29313caabce13cf"
  plot: "Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è de…"
  genres:
    0: "Comedy"
  runtime: 95
  fullplot: "Gerardo è un attore o almeno cerca di esserlo, ma il pubblico non è de…"
  released: 1960-02-10T00:00:00.000+00:00
SCORE: 1.9502882957458496  _id: "573a1396f29313caabce5299"
  plot: "Dr Tremayne is an enigmatic Psychiatrist running a
  Futuristic asylum h…"
  genres:
    0: "Horror"
  runtime: 90
  fullplot: "Dr Tremayne is an enigmatic Psychiatrist running a Futuristic asylum h…"
  released: 1973-10-31T00:00:00.000+00:00

戻る

Keyword

multi