Together AI: Advancing the Frontier of AI With Open Source Embeddings, Inference, and MongoDB Atlas

Mat Keep
February 20, 2024 | Updated: January 30, 2025
#genAI

Founded in San Francisco in 2022, Together AI is on a mission to create the fastest cloud platform for building and running generative AI (gen AI). The company has so far raised over $120 million, counting Nvidia, Kleiner Perkins, Lux, and NEA as investors.

Ce Zhang, Founder & CTO at Together AI says, “Together AI is a research-driven artificial intelligence company. We contribute leading open-source research, models, and datasets to advance the frontier of AI. Our cloud services empower developers and researchers at organizations of all sizes to train, fine-tune, and deploy generative AI models. We believe open and transparent AI systems will drive innovation and create the best outcomes for society."

Check out our AI Learning Hub to learn more about building AI-powered apps with MongoDB.

The company has recently introduced its Together Embeddings endpoint — a new service for developers building a variety of applications, including one that is top of mind for nearly all gen AI-powered apps: retrieval-augmented generation (RAG). With the RAG pattern, developers can feed gen AI models with their own up-to-date, domain-specific data. The results are more reliable gen AI outputs that are customized for the business along with reduced risks of hallucinations.

The Together Embeddings endpoint offers access to eight leading open-source embedding models at up to 12x cheaper price than proprietary alternatives. The list of the models includes top models from the MTEB leaderboard (Massive Text Embedding Benchmark), such as UAE-Large-v1 and BGE models, and state-of-the-art long context retrieval models. Together Embeddings also offers integrations to MongoDB Atlas, LangChain, and LlamaIndex for RAG.

To demonstrate this integration, the engineering team at Together AI created a tutorial for developers exploring how to build a RAG application with MongoDB Atlas. This tutorial shows how to use Together Embeddings and Together Inference to generate embeddings and language responses. Atlas Vector Search is used to store and index embeddings and then perform semantic search to retrieve relevant data examples for natural language queries against a sample Airbnb listing dataset. With this RAG pattern, the gen AI model can recommend properties that meet the user’s criteria while adhering to factual information.

We prioritized integrating with MongoDB because of its relevance and importance in the AI stack.
Vipul Ved Prakash, Founder & CEO at Together AI

“Bringing together live application data synchronized right alongside vector embeddings in a single platform, MongoDB Atlas helps developers reduce complexity and cost, and bring cutting-edge apps to market faster,” says Prakash. “This is one example, and we are looking forward to seeing many amazing applications that will be built using Together AI and MongoDB’s Atlas Vector Search.” To learn more about its RAG integrations, take a look at Together AI’s documentation.

To get started with MongoDB and Together AI, register for MongoDB Atlas and read the tutorial. If your team is building AI apps, sign up for the AI Innovators Program. Successful companies get access to free Atlas credits and technical enablement, as well as connections into the broader AI ecosystem.

← Previous

Enhanced Atlas Functionality: Introducing Resource Tagging for Projects

We are thrilled to announce that Atlas has now extended its tagging functionality to include projects in addition to deployments . This enhancement enables users to apply resource tags to projects, further enriching the way you can associate metadata with your cloud resources. With this new capability, categorizing, organizing, and tracking your projects within Atlas becomes more intuitive and effective, offering a streamlined approach to managing your resources. Enhancing project management with resource tagging Incorporating resource tagging into projects significantly enhances visibility and streamlines project management. By applying tags, teams can categorize resources, making it easier to understand the purpose or specific metadata associated with a project. This practice is especially beneficial in large-scale projects, where organizing resources systematically can vastly improve productivity. Tags serve as versatile markers, representing various attributes of a project such as environment, criticality, cost center, or application, thereby simplifying project organization. Furthermore, tags lay the groundwork for supporting automation and policy enforcement within organizations. By utilizing tags, tasks related to access controls, compliance, and other policies can be automated, enhancing operational efficiency. Auditing processes also benefit from tagging, facilitating tracking, and ensuring resources meet specific business requirements. In environments where teamwork is essential, adding tags to projects aids in streamlined collaboration. Tags allow team members to quickly grasp the purpose or function of different resources, surfacing critical information about the project that can help reduce miscommunication and conflicts. Overall, adopting resource tagging in cloud resource management unlocks significant improvements in performance and efficiency, making it an invaluable tool for modern organizational needs. How to add tags to projects You can view and manage tagging on projects in multiple areas: Atlas UI: When creating a new project , on the Organization Project List, or within Project Settings. Admin API: Various operations on projects were enhanced to allow you to view, create, and manage tags applied to projects, such as CreateOneProject and ReturnAllProjects . Atlas CLI: various commands on projects were enhanced to all you to view, create, and manage tags applied to projects. Resource tagging best practices We recognize that the complexity of tagging use cases varies, tailored to an organization's unique structure and specific business requirements. With this in mind, we’ve designed resource tagging in Atlas to support a variety of use cases. We suggest defining tags that should be applied across all projects to get started. This will ensure your tagging approach is reliable and consistent across all resources. If you have multiple deployments within a project, apply more granular metadata on each deployment. In the simplified example below, an organization has three projects containing one or more deployments. Each project contains a deployment for each development environment. We’ve added common tags to the projects and more granular tags to identify the environment at the deployment level. Given the uniqueness of each organization, we've designed a flexible system with simplicity at its heart, using key-value pairs. If you have a flatter organization structure in Atlas (e.g. with one deployment per project), consider adding all tags at the level that makes the most sense for your organization. This may vary depending on how you manage your deployments, existing tag workflows, or where you desire to view tags in the Atlas UI. Finally, here are a few points to consider when tagging: Do not include any sensitive information such as Personally Identifiable Information (PII) or Protected Health Information (PHI) in your resource tag keys or values. Use a standard naming convention for all tags, including spelling, case, and punctuation. Define and communicate a strategy for enforcing mandatory tags. We recommend starting by identifying the environment and the application, service, or workload. Use namespaces or prefixes to easily identify tags owned by different business units. Use programmatic tools like Terraform or the Admin API to manage the database of your tags. In summary The introduction of resource tagging for projects marks an improvement in how users can intuitively categorize, organize, and track projects within Atlas, streamlining cloud resource management. We're eager to hear your thoughts and ideas on further applications of resource tagging in Atlas. Please share your feedback and suggestions at feedback.mongodb.com , as your input is invaluable in shaping the future of our platform.

February 15, 2024

Next →

Automate Network Management Using Gen AI Ops with MongoDB

Imagine that it’s a typical Tuesday afternoon and that you’re the operations manager for a major North American telecommunications company. Suddenly, your Network Operations Center (NOC) receives an alert that web traffic in Toronto has surged by hundreds of percentage points over the last hour—far above its usual baseline. At nearly the same moment, a major Toronto-based client complains that their video streams have been buffering nonstop. Just a few years ago, a scenario like this would trigger a frantic scramble: teams digging into logs, manually writing queries, and attempting to correlate thousands of lines of data in different formats to find a single root cause. Today, there’s a more streamlined, AI-driven approach. By combining MongoDB’s developer data platform with large language models (LLMs) and a retrieval-augmented generation (RAG) architecture, you can move from reactive “firefighting” to proactive, data-informed diagnostics. Instead of juggling multiple monitoring dashboards or writing complicated queries by hand, you can simply ask for insights—and the system retrieves and analyzes the necessary data automatically. Facing the unexpected traffic spike Now let’s imagine the same situation, but this time with AI-assisted network management. Shortly after you spot a traffic surge in Toronto, your NOC chatbot pings you with a situation report: requests from one neighborhood are skyrocketing, and an unusually high percentage involve video streaming paths or caching servers. Under the hood, MongoDB automatically ingests every log entry and telemetry event in real time—capturing IP addresses, geographic data, request paths, timestamps, router logs, and sensor data. Meanwhile, textual content (such as error messages, user complaints, and chat transcripts) is vectorized and stored in MongoDB for semantic search. This setup enables near-instant access to relevant information whenever a keyword like “buffering,” “video streams,” or “streaming lag” is mentioned, ensuring a fast, end-to-end diagnosis. Refer to this article to learn more about semantic search. Zeroing in on the root cause Instead of rummaging through separate logging tools, you pose a simple natural-language question to the system: “What might be causing the client’s video stream buffering problem in Toronto?” The LLM responds by generating a custom MongoDB Aggregation Pipeline —written in Python code—tailored to your query. It might look something like this: a $match stage to filter for the last twenty-four hours of data in Toronto, a $group stage to roll up metrics by streaming services, and a $sort stage to find the largest error counts. The code is automatically served back to you, and with a quick confirmation, you execute it on your MongoDB cluster. A moment later, the chatbot returns with a summarized explanation that points to an overloaded local CDN node, along with higher-than-expected requests from older routers known to misbehave under peak load. Next, you ask the system to explain the core issue in simpler terms so you can share it with a business stakeholder. The LLM takes the numeric results from the Aggregation Pipeline, merges them with textual logs that mention “firmware out-of-date,” and then outputs a cohesive explanation. It even suggests that many of these older routers are still running last year’s firmware release—a known contributor to buffering issues on video streams during traffic spikes. How retrieval-augmented generation (RAG) helps The power behind this effortless insight is a RAG architecture, which marries semantic search with generative text responses. First, the LLM uses vector search in MongoDB to retrieve only those log entries, complaint records, and knowledge base articles that directly relate to streaming. Once it has these key data chunks, the LLM can generate—and continually refine—its analysis. Figure 1. Network chatbot architecture with MongoDB. When the system references historical data to confirm that “similar spikes occurred during the playoffs last year” or that “users with older firmware frequently complain about buffering,” it’s not blindly guessing. Instead, it’s accessing domain-specific logs, user feedback, and diagnostic documents stored in MongoDB, and then weaving them together into a coherent explanation. This eliminates guesswork and slashes the time your team would otherwise spend on low-level data cleanup, correlation, and interpretation. Executing automated remediation Armed with these insights, your team can roll out a targeted fix, possibly involving an auto-update to the affected routers or load-balancing traffic to alternative CDN endpoints. MongoDB’s Change Streams can monitor for future anomalies. If a traffic spike starts to look suspiciously similar to the scenario you just solved, the system can raise a proactive alert or even initiate the fix automatically. Refer to the official documentation to learn more about the change streams. Meanwhile, the cost savings add up. You no longer need engineers manually piecing data together, nor do you endure prolonged user dissatisfaction while you try to figure out what’s happening. Everything from anomaly detection to root-cause analysis and recommended mitigation steps is fed through a single pipeline—visible and explainable in plain language. A future of AI-driven operations This scenario highlights how (gen) AI Ops and MongoDB complement each other to transform network management: Schema flexibility: MongoDB’s document-based model effortlessly stores logs, performance metrics, and user feedback in a single, consistent environment. Real-time performance: With horizontal scaling, you can ingest the massive volumes of data generated by network logs and user requests at any hour of the day. Vector search integration: By embedding textual data (such as logs, user complaints, or FAQs) and storing those vectors in MongoDB, you enable instant retrieval of semantically relevant content—making it easy for an LLM to find exactly what it needs. Aggregation + LLM: An LLM can auto-generate MongoDB Aggregation Pipelines to sift through numeric data with ease, while a second pass to the LLM composes a final summary that merges both numeric and textual analysis. Once you see how much time and effort this end-to-end workflow saves, you can extend it across the entire organization. Whether it’s analyzing sudden traffic spikes in specific geographies, diagnosing a security event, or handling peak online shopping loads during a holiday sale, the concept remains the same: empower people to ask natural-language questions about complex data, rely on AI to craft the specialized queries behind the scenes, and store it all in a platform that can handle unbounded complexity. Ready to embrace gen AI ops with MongoDB? Network disruptions will never fully disappear, but how quickly and intelligently you respond can be a game-changer. By uniting MongoDB with LLM-based AI and a retrieval-augmented generation (RAG) strategy, you transform your network operations from a tangle of logs and dashboards into a conversational, automated, and deeply informed system. Sign up for MongoDB Atlas to start building your own RAG-based workflows. With intelligent vector search, automated pipeline generation, and natural-language insight, you’ll be ready to tackle everything from video streams buffering complaints to the next unexpected traffic surge—before users realize there’s a problem. If you would like to learn more about how to build gen AI applications with MongoDB, visit the following resources: Learn more about MongoDB capabilities for artificial intelligence on our product page. Get started with MongoDB Vector Search by visiting our product page. Blog: Leveraging an Operational Data Layer for Telco Success Want to learn more about why MongoDB is the best choice for supporting modern AI applications? Check out our on-demand webinar, “ Comparing PostgreSQL vs. MongoDB: Which is Better for AI Workloads? ” presented by MongoDB Field CTO, Rick Houlihan.

February 5, 2025