Founded in 1923 in Denmark, Novo Nordisk is today one of the world’s leading healthcare companies. Building upon its heritage in diabetes treatments, the company’s mission is to drive change to defeat serious chronic diseases. It does this by pioneering scientific breakthroughs, expanding access to its medicines, and working to prevent — and ultimately cure — disease.
Novo Nordisk employs more than 64,000 people in 80 countries. Its products are marketed in 170 countries, generating revenues of 232 billion Danish Krone ($33.5bn) in its Fiscal Year 2023.
Louise Lind Skov, Head of Content Digitalisation at Novo Nordisk explains, “Our treatments today are benefiting millions of people living with diabetes, obesity, and rare blood and endocrine diseases. We produce 50% of the world’s insulin, have manufactured over 600 million insulin pens, and more than 36 million people are using our diabetes care products. From our labs to our factory floors, we are discovering and developing innovative biological medicines and making them accessible to patients throughout the world.”
By harnessing generative AI (gen AI) with Amazon Bedrock and MongoDB Atlas, Novo Nordisk is dramatically accelerating how quickly it can get new medicines approved and delivered to patients.
Louise Lind Skov, Novo Nordisk
Figure 1: Example of a Clinical Study Report
Explaining the time and effort required to produce a clinical study report, Skov says, “A CSR usually takes around 12 weeks to compile, involving a multidisciplinary team of statisticians, scientists, and technical authors. Each day of delay means patients don’t get the treatments they need and the company cannot start to recover its R&D costs.”
The process starts with the statistical analysis of clinical trial data collected in the field, creating outputs such as tables and figures. Technical authors then extract and merge this data with report templates that are used in the regulatory submission. Extensive quality assurance (QA) processes are needed to ensure that all the data in the 100+ page report is consistent, comprehensive, and compliant with regulatory standards.
With the arrival of gen AI, Skov’s team at Novo Nordisk saw the opportunity to drive significant efficiencies in the production of CSRs. And so NovoScribe was born.
Initiating the project in mid-2023, Skov’s team reimagined their workflow with NovoScribe. They experimented with dynamically compiling the CSR by leveraging retrieval augmented generation to prompt state-of-the-art large language models (LLMs) using both statistical outputs from the clinical trials along with vector embeddings of report templates.
Within a few weeks, the experiments proved successful. NovoScribe produced CSRs faster and more accurately, and required fewer resources than the previous manual methods. NovoScribe was ready for prime time.
Tobias Kröpelin, NovoScribe Tech Lead and Statistical Programming Specialist at Novo Nordisk, explains the gen AI stack powering NovoScribe. “Each foundation model has its own strengths and weaknesses, so we typically experiment with a variety of different embedding and generation models for each report we compile.”
NovoScribe uses the Claude 3 and Titan foundation models hosted by Amazon Bedrock, alongside the company’s own private instance of ChatGPT. With the LangChain development and orchestration framework the team can switch between models quickly and easily, without having to change any application code. Using RAG, the models are served with report data and vector embeddings managed by MongoDB Atlas Vector Search.
NovoScribe generates validated text based on defined content rules and statistical output, Atlas Vector Search calculates the similarity of each text snippet to the relevant statistics. This combined with the LLM output draft the CSR. By utilizing Atlas Vector Search the relevant text is selected with a high degree of precision and accuracy. Full lineage of all sources are presented, enabling the authors to verify accuracy, which eliminates weeks of writing and reviews.
“What’s great about MongoDB Atlas is that we can store native vector embeddings of the report right alongside all of their associated text snippets and metadata,” says Kröpelin. “This means we can run really powerful and complex queries quickly. For each vector embedding we can filter on which source document it's coming from, who wrote it, and when.”
Tobias Kröpelin, PhD, Novo Nordisk
Figure 2: NovoScribe cloud-native architecture
At the outset of the NovoScribe project, Kröpelin and the Novo Nordisk Statistics team started with the relational databases they typically used in their day-to-day work. But it quickly became obvious that the data model needed to feed both statistical outputs and report text into the LLMs was hugely complex and nowhere near flexible enough to cope with the pace of NovoScribe’s rapid feature development.
Kröpelin says, “Working with the tabular model of our traditional relational database, we would have ended up with dozens of separate tables, each with just a couple of columns. These looked nothing like the Python dictionaries my team were working with in code, which slowed down our development velocity. What also slowed us down was that we couldn’t make any changes to our application without complex schema migrations in the database. And then joining all of these tables at query time to prompt the LLMs crippled application performance and user experience.”
Beyond relational databases, Kröpelin’s team also had familiarity with MongoDB and quickly recognized its document data model would provide the ease of use, flexibility, and speed demanded by NovoScribe. A single call from the MongoDB Python driver can retrieve the entire object — including the source text snippets, its vector embedding, and metadata — without the overhead of joining data.
In addition to programmatic access, MongoDB Compass is available for non-developer team members to view and filter data stored in MongoDB via a GUI, enabling them to review the data set’s completeness before serving it to the LLMs.
By using the fully managed MongoDB Atlas service, Novo Nordisk gets the mission-critical assurances it needs to run highly regulated applications. As Waheed Jowiya, Digitalisation Strategy Lead at Novo Nordisk says, “Security and disaster recovery are non-negotiable. We have VPC access via Atlas’ support for Amazon Privatelink. In addition, fine-grained access controls, auditing, end-to-end data encryption, and backups are all standard Atlas features, configured with simple API calls.”
Jowiya goes on to say, “We have a small team, so the operational automation provided by MongoDB Atlas is invaluable. It also gives us optionality. NovoScribe runs on AWS today, but as a company, we also have a relationship with Azure. Through its multi-cloud support, we can run Atlas between both hyperscale platforms with complete freedom and no lock-in.”
Waheed Jowiya, Digitalisation Strategy Lead at Novo Nordisk
Jowiya goes on to say that the LLMs take just minutes to generate the CSR using the data retrieved from MongoDB Atlas to produce the final output. The rest of the time is spent in QA. Highly skilled team members no longer have to take the time to pull the data together, or double check that they are cutting and pasting the right statistics into the appropriate section of the report. The gen AI models automate the process now, freeing them up to focus on driving more breakthrough research and development.
For Novo Nordisk, NovoScribe is just the start. Beyond CSRs, the company is exploring many new opportunities to apply gen AI in every part of its business, with MongoDB Atlas at the core of its efforts.
Louise Lind Skov, Head of Content Digitalisation at Novo Nordisk
To learn more about how others are innovating with AI, check out the Building AI with MongoDB case study series. You can also register for MongoDB Atlas and visit the Atlas Vector Search Quick Start guide to start building smarter searches or get started on gen AI in your next project.