Table of contents
- Generative AI, Vector Databases, and MongoDB Atlas Vector Search
- What are vector databases?
- How do vector databases work?
- Why is vector search critical?
- Use cases for vector databases
- MongoDB Atlas Vector Search: A game-changer
- Atlas Vector Search: For intelligent applications powered by semantic search
- FAQs
Generative AI, Vector Databases, and MongoDB Atlas Vector Search
You’ve heard the hype about generative AI (aka artificial intelligence). Across the economy — from healthcare to finance, retail to government agencies — organizations are looking for ways to leverage it. It seems like every CEO wants to roll out applications as fast as possible.
It’s more than just hype. According to a McKinsey report, generative AI could infuse trillions into the global economy.
Central to this transformational technology is the mathematical concept of the vector. Through vectorization and the prowess of large language models (LLMs), generative AI achieves its game-changing potential. In the era of generative AI, vector embeddings lay the groundwork; vector databases amplify its impact.
What is a vector database? How does it work? What are some common use cases? And why is MongoDB Atlas Vector Search playing a significant role in the generative AI discussion?
What are vector databases?
To understand vector databases, you need to first understand the vector.
In math and physics, a vector is a quantity that has both magnitude (or size) and direction. A vector can be broken down into components. For example, in a two-dimensional space, a vector has an X (horizontal) and Y (vertical) component.
In data science and machine learning, a vector is an ordered list or sequence of numbers that represents data. A vector can represent any type of data, including unstructured data (or data without a pre-defined data model or schema) – from text to image, audio to video. A vector is usually represented as arrays or lists of numbers where each number in the list represents a specific feature or attribute of that data.
For example, imagine you have a large collection of cat photos. Each image is a piece of unstructured data. But you can represent each image as a vector by extracting features, such as the following:
- Average color
- Color histogram
- Texture histogram
- The presence or absence of ears, whiskers, and a tail
Vector embeddings (or vectorization) is the process of converting such words and other data into numbers, where each data point is represented by a vector in high-dimensional space.
A vector database — also known as a vector search database or vector similarity search engine — stores, retrieves, and searches for vectors.
Instead of rows and columns typical of relational databases, vector databases represent data as points in a multi-dimensional space. Vector databases are ideal for applications that require rapid and accurate matching of data based on similarity rather than exact values.
“Imagine a vector database as a vast warehouse and the artificial intelligence as the skilled warehouse manager. In this warehouse, every item (data) is stored in a box (vector), organized neatly on shelves in multidimensional space,” writes Mark Hinkle in The New Stack.
If you’re building generative AI applications, a vector database is tailored to efficiently process vast volumes of vectorized data, ensuring faster queries and processing speeds.