Building AI with MongoDB: Putting Jina AI’s Breakthrough Open Source Embedding Model To Work

Mat Keep

#genAI#Vector Search

Founded in 2020 and based in Berlin, Germany, Jina AI has swiftly risen as a leader in multimodal AI, focusing on prompt engineering and embedding models. With its commitment to open-source and open research, Jina AI is bridging the gap between advanced AI theory and the real world AI-powered applications being built by developers and data scientists. Over 400,000 users are registered to use the Jina AI platform.

Dr. Han Xiao, Founder and CEO at Jina AI, describes the company’s mission:

“We envision paving the way towards the future of AI as a multimodal reality. We recognize that the existing machine learning and software ecosystems face challenges in handling multimodal AI. As a response, we're committed to developing pioneering tools and platforms that assist businesses and developers in navigating these complexities. Our vision is to play a crucial role in helping the world harness the vast potential of multimodal AI and truly revolutionize the way we interpret and interact with information."

Jina AI’s work in embedding models has caught significant industry interest. As many developers now know, embeddings are essential to generative AI (gen AI). Embedding models are sophisticated algorithms that transform and embed data of any structure into multi-dimensional numerical encodings called vectors. These vectors give data semantic meaning by capturing its patterns and relationships. This means we can analyze and search for unstructured data in the same way we’ve always been able to with structured business data. Considering that over 80% of the data we create every day is unstructured, we start to appreciate how transformational embeddings — when combined with a powerful solution such as MongoDB Atlas Vector Search — are for gen AI.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

Jina AI's jina-embeddings-v2 is the first open-source 8K text embedding model. Its 8K token length provides deeper context comprehension, significantly enhancing accuracy and relevance for tasks like retrieval-augmented generation (RAG) and semantic search. Jina AI’s embeddings offer enhanced data indexing and search capabilities, along with bilingual support. The embedding models are focused on singular languages and language pairs, ensuring state-of-the-art performance on language-specific benchmarks. Currently, Jina Embeddings v2 includes bilingual German-English and Chinese-English models, with other bilingual models in the works.

Jina AI’s embedding models excel in classification, reranking, retrieval, and summarization, making them suitable for diverse applications, especially those that are cross-lingual. Recent examples from multinational enterprise customers include the automation of sales sequences, skills matching in HR applications, and payment reconciliation with fraud detection.

Figure 1:  Jina AI’s world-class embedding models improve search and RAG systems.

In our recently published Jina Embeddings v2 and MongoDB Atlas article we show developers how to get started in bringing vector embeddings into their apps. The article covers:

  1. Creating a MongoDB Atlas instance and loading it with your data. (The article uses a sample Airbnb reviews data set.)

  2. Creating embeddings for the data set using the Jina Embeddings API.

  3. Storing and indexing the embeddings with Atlas Vector Search.

  4. Implementing semantic search using the embeddings.

Dr. Xiao says, “Our Embedding API is natively integrated with key technologies within the gen AI developer stack including MongoDB Atlas, LangChain, LlamaIndex, Dify, and Haystack. MongoDB Atlas unifies application data and vector embeddings in a single platform, keeping both fully synced. Atlas Triggers keeps embeddings fresh by calling our Embeddings API whenever data is inserted or updated in the database. This integrated approach makes developers more productive as they build new, cutting-edge AI-powered apps for the business.”

To get started with MongoDB and Jina AI, register for MongoDB Atlas and read the tutorial. If your team is building its AI apps, sign up for the AI Innovators Program. Successful companies get access to free Atlas credits and technical enablement, as well as connections into the broader AI ecosystem.