Atlas Data Federation und Online-Archiv können jetzt in Azure bereitgestellt werden

Karthic Subramanian and Sabina Friden
February 29, 2024 | Updated: March 18, 2024

Für Benutzer von Microsoft Azure zeichnen sich spannende Entwicklungen ab, die einen bedeutenden Sprung in den Möglichkeiten der Datenverwaltung darstellen. Zunächst einmal ist Atlas Data Federation jetzt allgemein auf Azure verfügbar. Das bedeutet, dass Sie es jetzt direkt in Azure einsetzen und sogar Daten aus Microsoft Azure Blob Storage abfragen können. Und das ist noch nicht alles. Wir haben auch die allgemeine Verfügbarkeit von Atlas Online Archive auf Azure gestartet. Diese Neuerungen leiten eine neue Ära effizienter Archivierungslösungen für Azure-basierte Datenlösungen ein.

Beide Updates sind ein großer Schritt nach vorn, um die Datenverwaltung auf Azure leistungsfähiger und flexibler zu machen. Was das für Sie bedeutet, erfahren Sie im Folgenden!

Azure-Unterstützung in Atlas Data Federation (allgemeine Verfügbarkeit)

Mit Atlas Data Federation können Benutzer nahtlos Abfragen, Transformationen und Ansichten über mehrere Atlas-Datenbanken und Cloud-Objektspeicherlösungen wie Amazon S3 und jetzt auch Microsoft Azure Blob Storage erstellen. Diese Funktion, die bisher ausschließlich AWS vorbehalten war, ist ein echter Durchbruch. Sie ermöglicht die direkte Bereitstellung in Azure und den Zugriff auf Microsoft Azure Blob Storage für Dateneinblicke.

Abbildung 1: Nutzen Sie Azure Blob Storage ganz einfach über die Atlas-Benutzeroberfläche

Hauptmerkmale von Atlas Data Federation

Cloud-Flexibilität: Wählen Sie zwischen AWS und Azure, um föderierte Datenbankinstanzen zu hosten.
Diverse Datenquellen: Binden Sie MongoDB Atlas-Cluster oder Azure-Speicherlösungen (Azure Blob Storage und Azure Data Lake Storage Gen2) als Datenquellen für umfassende Abfragen ein, einschließlich regionsübergreifender Abfragen.
Erweiterte Aggregation: Umfassende Aggregationsmöglichkeiten mit Operatoren wie $match, $lookup, $queryHistory, $merge, $out usw.
- Direkte $out-Unterstützung für Azure Blob Storage und Azure Data Lake Storage Gen2.
Atlas SQL-Abfragen auf Azure: Führen Sie SQL-Abfragen auf Azure aus und integrieren Sie dabei MongoDB-Daten für eine einheitliche Analyseerfahrung.

Atlas Data Federation vereinfacht den Zugriff auf und die Analyse von komplexen Datensätzen, indem es Daten aus verschiedenen Quellen in einer einzigen, föderierten Ansicht zusammenfasst und so wertvolle Erkenntnisse für fundiertere Geschäftsentscheidungen liefert.

Entdecken Sie noch heute Atlas Data Federation auf Azure.

Azure-Unterstützung im Atlas Online-Archiv (allgemeine Verfügbarkeit)

Die Ausweitung von Atlas Online Archive auf Azure stellt sicher, dass das Daten-Tiering nicht nur effizient, sondern auch integriert ist und die Archivdaten innerhalb der Azure-Umgebung bleiben. Diese Integration behebt die bisherige Einschränkung, dass AWS für die Speicherung standardmäßig verwendet wird, selbst bei in Azure gehosteten Clustern.

Abbildung 2: Nahtlose Auswahl von Azure bei der Auswahl einer Region

Hauptmerkmale von Atlas Online Archive

Wahl des Anbieters: Entscheiden Sie sich für AWS oder Azure, um Ihre Cloud-Strategie zu unterstützen.
Automatische Archivierung: Legen Sie Regeln fest, um ältere Daten automatisch in einen kostengünstigen Cloud-Speicher zu verschieben, so dass die manuelle Verlagerung entfällt.
Einheitlicher Abfrage-Endpunkt: Greifen Sie auf alle Daten über einen einzigen Endpunkt zu, um schnelle Einblicke zu erhalten, ohne die Datenverfügbarkeit zu beeinträchtigen.
Integrierte MongoDB Atlas UI-Verwaltung: Verwalten Sie Ihr Daten-Tiering und Ihre Archivierung über die vertraute Atlas-Oberfläche und optimieren Sie so den Betrieb und die Wartung.

Verwalten Sie Ihr MongoDB Atlas-Daten-Tiering nahtlos und in großem Umfang mit Atlas Online Archive. Mit Atlas Online Archive können Sie den Lebenszyklus Ihrer Daten effizient verwalten und dabei ein ausgewogenes Verhältnis zwischen Kosten und Zugänglichkeit schaffen.

Abschließend noch ein paar Punkte, die es zu beachten gilt:

Jedes neu erstellte Archiv auf einem Azure-Cluster am oder nach dem 28.02. wird standardmäßig in Azure-Regionen erstellt. Beachten Sie, dass Speicherregionen für Online-Archive nur dann standardmäßig auf Azure-Clustern angelegt werden, wenn auf dem betreffenden Azure-Cluster noch keine AWS-Archive vorhanden sind
Wenn es bereits bestehende AWS Online-Archive auf Azure-Clustern gibt, verbleiben alle neu erstellten Archive auf diesem speziellen Cluster auf AWS
Einmal konfigurierte Cloud-Anbieter oder Speicherregionen können nicht bearbeitet oder geändert werden

Nutzen Sie noch heute das volle Potenzial von Atlas Online Archive auf Azure.

Wir freuen uns darauf, Sie bei der Datenverwaltung zu unterstützen und Ihnen mit diesen neuen Azure-Funktionen mehr Kontrolle und Flexibilität über Ihre Daten zu bieten. Da MongoDB Atlas als Multi-Cloud-Lösung weiter expandiert, sind wir hier, um sicherzustellen, dass Ihre Datenstrategie so dynamisch und vielseitig ist, wie Ihr Unternehmen es benötigt.

Eine Anleitung für die ersten Schritte finden Sie in unserer Dokumentation zu Atlas Data Federation oder Atlas Online Archive.

Vielen Dank, dass Sie sich für MongoDB Atlas als Ihre Datenplattform für Entwickler entschieden haben. Willkommen in der Zukunft der Multi-Cloud-Datenverwaltung!

← Previous

Ahora se puede implementar Atlas Data Federation y Online Archive en Azure

Se avecinan desarrollos emocionantes para los usuarios de Microsoft Azure, lo que marca un salto significativo en las capacidades de administración de datos. En primer lugar, Atlas Data Federation ahora cuenta con disponibilidad general en Azure. Esto significa que ahora puede implementarlo de forma directa dentro de Azure e incluso consultar datos de Microsoft Azure Blob Storage. Y eso no es todo. También lanzamos la disponibilidad general de Atlas Online Archive en Azure. Estos avances marcan el comienzo de una nueva era de soluciones de archivado eficaces para soluciones de datos basadas en Azure. Ambas actualizaciones son grandes avances para hacer que la administración de datos en Azure sea más eficaz y flexible. ¡Veamos qué significa esto para usted! Compatibilidad con Azure en Atlas Data Federation (disponibilidad general) Con Atlas Data Federation, los usuarios pueden consultar, transformar y crear vistas sin problemas en múltiples bases de datos de Atlas y soluciones de almacenamiento de objetos en cloud, como Amazon S3 y ahora Microsoft Azure Blob Storage. Esta característica, antes exclusiva para AWS, es revolucionaria, lo que permite la implementación directa dentro de Azure y la capacidad de aprovechar Microsoft Azure Blob Storage para obtener información estratégica de datos. Figura 1: Acceda fácilmente a Azure Blob Storage desde la IU de Atlas Características principales de Atlas Data Federation Flexibilidad en cloud: elija entre AWS y Azure para alojar instancias de bases de datos federadas. Diversas fuentes de datos: incorpora clusters de MongoDB Atlas o soluciones de almacenamiento de Azure (Azure Blob Storage y Azure Data Lake Storage Gen2) como fuentes de datos para realizar consultas completas, incluso entre regiones. Agregación avanzada: capacidades de agregación integrales con operadores que incluyen $match, $lookup, $queryHistory, $merge, $out, etc. Compatibilidad directa de $out para Azure Blob Storage y Azure Data Lake Storage Gen2. Consultas de SQL de Atlas en Azure: ejecute consultas de SQL en Azure, que integra datos de MongoDB para una experiencia de análisis unificada. Atlas Data Federation simplifica el acceso y el análisis de conjuntos de datos complejos al combinar datos de múltiples fuentes en una única visión federada, proporcionando información valiosa para tomar decisiones empresariales más informadas. Explore Atlas Data Federation en Azure hoy mismo . Compatibilidad con Azure en Atlas Online Archive (disponibilidad general) La expansión de Atlas Online Archive a Azure garantiza que la organización de datos por niveles no solo sea eficiente, sino que también esté integrada, manteniendo los datos de archivo dentro del ecosistema de Azure. Esta integración resuelve la limitación anterior de utilizar por defecto AWS para el almacenamiento, incluso para los clusters alojados en Azure. Figura 2: Seleccione Azure al elegir una región Características clave de Atlas Online Archive Opción de proveedor: opte por AWS o Azure para alinearse con su estrategia de cloud. Archivado automático: establezca reglas para mover datos antiguos a un almacenamiento rentable en cloud de forma automática y elimina la descarga manual. Punto de conexión de consulta unificado: acceda a todos los datos a través de un único punto de conexión, lo que garantiza información rápida sin comprometer la disponibilidad de los datos. Gestión integrada de la IU de MongoDB Atlas: gestione el archivado de datos y su organización por niveles desde la conocida interfaz de Atlas, lo que agiliza las operaciones y el mantenimiento. Gestione sin problemas la organización de datos por niveles de MongoDB Atlas a escala con Atlas Online Archive. Atlas Online Archive le permite administrar su ciclo de vida de datos de manera eficiente, equilibrando costos y accesibilidad con facilidad. Finalmente, aquí hay algunos puntos a considerar: Cualquier archivo recién creado en un cluster de Azure a partir del 02/28 será predeterminado para las regiones de Azure. Tenga en cuenta que las regiones de almacenamiento para Online Archive se asignarán por defecto a clústeres de Azure sólo si no hay archivos de AWS preexistentes en ese cluster de Azure específico Si hay archivos AWS Online preexistentes en clusters de Azure, todos los archivos recién creados en ese cluster específico permanecerán en AWS. Los proveedores de cloud o las regiones de almacenamiento no se pueden editar ni modificar una vez configurados Aproveche todo el potencial de Atlas Online Archive en Azure hoy mismo . Estamos encantados de dar asistencia técnica a su recorrido de gestión de datos, lo que ofrece mayor control y flexibilidad sobre sus datos a través de estas nuevas capacidades de Azure. A medida que MongoDB Atlas continúa expandiéndose como una solución multi-cloud, estamos aquí para garantizar que su estrategia de datos sea tan dinámica y versátil como su negocio. Para obtener una guía introductoria, consulte nuestra documentación sobre Atlas Data Federation o Atlas Online Archive . Gracias por confiar en MongoDB Atlas como su plataforma de datos para desarrolladores. Bienvenido al futuro de la administración de datos multi-cloud

February 29, 2024

Next →

Improving MongoDB Queries by Simplifying Boolean Expressions

Key takeaways Document databases encourage storing data in fewer collections with many fields, in contrast to relational databases’ normalization principle. This approach improves efficiency but requires careful handling of complex filters. Simplifying Boolean expressions can improve query performance by reducing computational overhead and enabling better plan generation. The solution uses a modified Quine–McCluskey algorithm and Petrick’s method on an efficient bitset representation of Boolean expressions. MongoDB's culture supports innovation by empowering engineers to tackle problems and solve them from beginning to end. Introduction Document databases like MongoDB encourage a strategy that involves storing data in fewer collections with a large number of fields. This is in contrast to the normalization principle of relational databases, which recommends spreading data across numerous tables with a smaller number of fields. Denormalization and avoiding complex joins is a source of efficiency for document databases. However, as the filters become complex, you must take care to handle them properly. Poorly handled complex filters can negatively affect database performance, resulting in slower query responses and higher resource usage. Addressing complex filters becomes a critical task. One optimization technique to mitigate the performance issues with complex filters is Boolean expression simplification. This involves reducing complex Boolean expressions into simpler forms, which the query engine can evaluate more efficiently. As a result, the database can execute queries faster and with less computational overhead. To demonstrate the importance of filter simplification, consider a real MongoDB customer case. The query in the case was enormous in size and clearly machine-generated, which the optimizer couldn't handle efficiently. It was clear that simplifying the query would help the optimizer find a better plan. Here's a smaller example inspired by that case: simplifying filters can lead to more efficient query plans. db.collection.createIndex({b: 1}) filter = {$or: [{$and: [{a: 1}, {a: {$ne: 1}}]}, {b: 2}]} db.collection.find(filter) The query involves predicates on the field “a” so the optimizer cannot generate an Index Scan plan for the unoptimized version of the query and opts for a collection scan plan. However, the simplified expression: {b: 2} makes the IndexScan plan possible. The difference between the plans can be drastic for large collections and selective indexes. In our benchmark, the simplifier showcased a remarkable 18,100% throughput improvement in this scenario. The benefits of the filter simplification can come in different flavors: in one of our benchmarks testing large queries, execution time was cut in half in a collection scan plan. This improvement is due to the faster execution of the simplified queries. Simplifying Boolean expressions: Modified Quine–McCluskey algorithm and Petrick’s method The journey to find a good solution for Boolean simplification was surprisingly complex. The final solution can be expressed in just four steps: Convert the simplifying expression into bitset DNF form Apply QMC reduction operation of DNF terms: (x ∧ y) ∨ (x ∧ ¬y) = x Apply Absorption law: x ∨ (x ∧ y) = x Use Petrick’s method for further simplification Yet we faced a few challenges along the way. This section will explore the challenges and how we addressed them. Simplifying AST The initial solution, applying Boolean laws directly to filters in the Abstract Syntax Tree (AST) format, which MongoDB's query engine uses, proved to be resource-intensive, consuming significant memory and CPU time. This outlines the first challenge in the journey: finding an efficient method to simplify Boolean expressions. One issue with the initial solution was that transformations could be applied repeatedly, leading to the same expression appearing multiple times. To address this, we needed to store previously observed expressions in memory and check each time if an expression had already been seen. Modified Quine–McCluskey The Quine–McCluskey algorithm, commonly used in digital circuit design, offers a different approach with a finite number of steps. The explanation of the algorithm might seem daunting but in essence, we just apply the following reduction rule to a pair of expressions: (x ∧ y) ∨ (x ∧ ¬y) = x This allows the pair of expressions to be reduced into one. A challenge with the Quine–McCluskey algorithm is that it requires a list of prime implicants as input. An implicant of a Boolean expression is a combination of predicates that makes the expression true, essentially a row of the truth table where the Boolean expression is true. To obtain a list of prime implicants from a given Boolean expression, we need to calculate a truth table. This involves executing the expression (2^n) times, producing up to (2^n) implicants. For an expression with 10 predicates, we need to evaluate the expression 1024 times. With 20 predicates, the number of evaluations is 1,048,576. This is impractical. Implicants are similar to Disjunctive Normal Form (DNF) minterms, as both evaluate the expression to be true. We could use DNF instead of implicants, but here is another challenge: even though the expressions below are equivalent: (a ∧ b) ∨ (a ∧ ¬b) a ∨ (a ∧ ¬b) Only the first one is represented via implicants: (a ∧ b) is an implicant of the expressions, and can be simplified by QMC, while (a) is not, as it says nothing about the value of predicate (b). Absorption law Tackling the previous challenge, we face another one: now we need to find a way where we can compensate for the reduced power of QMC. Fortunately, this one can be resolved using Absorption Law: x ∨ (x ∧ y) = x With its help, we can simplify the expression from the previous example a ∨ (a ∧ ¬b) to just a. Effective bitset representation One proved to be good optimization is to use bitwise instructions on a bitset representation of Boolean expressions instead of working with the AST representation. This approach boosts performance by taking advantage of the speed and simplicity of bitwise operations, which are generally quicker and more straightforward than working with the more complex AST structure. Petrick’s method Can we do better and ensure the expression stays accurate while further reducing redundancy? For example, for the given expression: (¬a ∧ ¬b ∧ ¬c) ∨ (¬a ∧ b ∧ ¬c) ∨ (a ∧ ¬b ∧ ¬c) ∨ (¬a ∧ b ∧ c) After some QMC reduction steps are applied, we end up with the following expression: (¬a ∧ ¬c) ∨ (¬b ∧ ¬c) ∨ (¬a ∧ b) However, this can be reduced further down to: (¬b ∧ ¬c) ∨ (¬a ∧ b) This redundancy comes from the fact that when we reduce two terms into one - the new term sort of covers (or we can say represents) two original ones, and some of the derivative terms can be removed without breaking the “coverage” of the original expression. To find the minimal “coverage” we use Petrick’s method . The idea of this method is just to express the coverage as a Boolean expression and then find the minimal combination of its predicates which evaluate the expression to be true. It can be done by transforming the expression into DNF and picking up the smallest minterm. Boolean expressions in MongoDB In the previous section, we developed the algorithm for simple Boolean expressions. However, we can't use it to simplify filters in MongoDB just yet. The MongoDB Query Language (MQL) has unique logical semantics, particularly regarding negation, handling missing values, and array values. Logical operators MQL supports the following logical operators: $and - regular conjunction operator. $or - regular disjunction operator. $not - Negation operator with special semantics, see below for details $nor - Negation of disjunction, which is equivalent to conjunction of negations (DeMorgan Laws) The only specific operator is $nor, which can be easily converted to the conjunction of negations. MQL negation semantics MongoDB's flexible schema can lead to some fields being missing in documents. For instance, in a collection with documents {a: 1, b: 1} and {a: 2}, the field b is missing in the second document. When dealing with negations in MQL, missing values play an important role. Negations will match documents where the specified field is missing. Let's look at a collection of four documents and see how they answer to different queries (see Table 1): Table 1:   Negating comparison expressions in MQL. As shown, the $not operator matches documents where the field "a" is missing. The $ne operator in MQL is only a syntactic sugar for {$not: {$eq: ...}}, which means it matches documents where the specified field is missing (see Table 2): Table 2:   Negating Equality Expressions in MQL. Let’s prove that for MQL negations Boolean algebra laws still hold. To do that we need to introduce new notations: not* - MQL negation exists - exists A means that the field of predicate A does exist not-exists - is a short for not (exists A), and means that the field of predicate A does not exist Being short for not (exists), not-exists behaves in the same way as a normal not: not-exists(A and B) = not-exists(A) or non-exists(B) not-exists(A or B) = not-exists(A) and non-exists(B) The following equation is always true by the definition of MQL negation: not* A = not A or not (exists A) We also state that the predicate A cannot be true when its field is missing: A and not-exists A = false This is true even for operator {$exists: 0} in MQL: e.g. {$and: [{a: {$not: {$exists: 0}}}, {a: {$exists: 0}}]} always returns an empty set of documents. Internally {$exists: false} is implemented through negation of {$exists: 1}, hence the following statements are also true in MQL: not* (not-exists A) = exists A not* (exists A) = not-exists A Complement laws We need to prove that: A and not* A = false A or not* A = true The proof of the first Complement law: A and not* A = A and (not A or not-exists A) = (A and not A) or (A and not-exists A) = false or false = false The proof of the second Complement law: A or not* A = A or (not A or not-exists A) = A or not A or not-exists A = true DeMorgan laws Proof of the first DeMorgan law not* (A and B) = (not* A) or (not* B): not* (A and B) = not (A and B) or not-exists (A and B) = (not A or not B) or (not-exists A or not-exists B) = (not A or not-exists A) or (not B or not-exists B) = not* A or not* B The proof of the second DeMorgan law not* (A or B) = (not* A) and (not* B) is a bit more complicated: not* (A or B) = not (A or B) or not-exists (A or B) = (not A and not B) or (not-exists A and not-exists B) = (not A or not-exists A) and (not B or not-exists A) and (not A or not-exists B) and (not B or not-exists B) = ((not* A) and (not* B)) and ((not A and not B) or (not A and not-exist B) or (not B and not-exists A) or (not-exists A and not-exists B)) = ((not* A) and (not* B)) and ((not* A) and (not* B)) = (not* A) and (not* B) // because (not* A) and (not* B) = (notA and not B) or (not A and not-exist B) or (not B and not-exists A) or (not-exists A and not-exists B) Involution law We need to prove that not* (not* A) = A: not* (not* A) = not* (not A or not-exists A) = not* (not A) and not* (not-exists A) = (A or not-exists A) and (exists A) = (A and exists A) or (not-exists A and exists A) = A or false = A Here we used DeMorgan law for not* proved in the previous section. MQL array values semantics In MQL, array values behave as expected and don’t change the semantics of Boolean operators. However, they alter the semantics of operands within predicates. It’s helpful to think of it as implicitly adding the word “contains” to the operators interpretation: {a: {$eq: 7}} means “array a contains a value equal 7” {a: {$ne: 7}}, which is always equivalent to {a: {$not: {$eq: 7}}, means “array does not contain a value equal 7” {a: {$gt: 7}} - “array a contains a value greater than 7”. Since the Boolean simplifier focuses on the logical structure of expressions, rather than the internal meaning of predicates, it's irrelevant whether the value is an array or scalar. The one important difference with array values semantics is that they behave differently on interval simplifications. If for a scalar value the following interval transformations are always valid, and invalid for arrays: scalar > 100 && scalar <= 1 => scalar: () // empty interval scalar < 100 && scalar >= 1 => scalar: [1, 100) These differences are important for building Index Scan Bounds. However, as previously mentioned, they are not significant for the Boolean simplifier. Implementation considerations We faced several challenges during implementation. To achieve maximum performance, we created our own version of bitsets in C++. The existing options, such as boost::dynamic_bitset, were too slow, and std::bitset lacked flexibility. Memory management was crucial, so we worked hard to reduce memory allocations. This included using an optimized storage for our bitset and a C++ polymorphic allocator for some algorithms. We released the simplifier cautiously, enabling it only for specific types of expressions. We plan to expand the range of expressions that can be simplified in the future. We also set a limit on the maximum number of terms in DNF. Before each transformation, we estimate the number of resulting terms. If the number is too high, we cancel the simplification. Conclusion In this post, we described an effective algorithm to simplify Boolean expressions. We used bitsets to represent query filters, modified the Quine-McCluskey algorithm, and applied Petrick's method. Finally, we proved that the algorithm works with MQL's Boolean algebra. This journey began, as does so much at MongoDB, with a real-world customer example. The customer was struggling with a complex query, and we developed the Boolean Expression Simplifier to help them overcome their specific challenge—and ended up enhancing MongoDB's performance for any user dealing with intricate filters. As noted before, in one particularly demanding case involving large collections and selective indexes, the Simplifier helped achieve an 18,100% throughput improvement" This project reflects MongoDB’s longstanding commitment to turning customer challenges into successes, and our practice of taking what we learn from individual projects and making broad improvements. We’re always working to improve MongoDB’s performance, and the simplifier does this—it enables faster and more efficient query execution. Finally, this project is a good example of how MongoDB's culture of trust and ownership can lead to impact. Our engineers are empowered to innovate from concept to completion, transforming theoretical ideas like Boolean expression minimization into tangible gains that directly improve the customer experience. Join our MongoDB Community to learn about upcoming events, hear stories from MongoDB users, and connect with community members from around the world.

January 30, 2025