A practical guide for Pharma professionals, operational excellence and automation engineers deploying AI in regulated environments and high precision environments.
Project managers in pharma are navigating a critical tension: harness AI's potential to accelerate processes and improve outcomes while maintaining the compliance, safety, and quality standards that define our industry.
The stakes are high—patient safety, regulatory approval, and scientific integrity all depend on getting this right.
At Acodis, we've distilled our experience deploying machine learning in regulated pharma environments into 10 essential attributes to find the right trade-offs. These aren't theoretical ideals - they're pragmatic requirements that separate successful, compliant ML deployments from those that stall in pilot purgatory or, worse, create regulatory risk.
What makes this different: We focus on operational realities, solution impacts and overall safety, beyond the specific validation GxP, regulatory and GAMP5 frameworks. These attributes reflect what actually works in production - balancing innovation with GxP requirements, enabling teams without overwhelming them, and building systems that auditors can understand and trust.
The top 10 Attributes for IMpactful and REliable AI Applications
Data quality determines everything downstream. 70% of the top 20 Pharma companies place data quality as the #1 issue for successful AI implementations[1]. AI-ready data means structured, standardized datasets with proper metadata, controlled vocabularies (MedDRA, SNOMED), and clear lineage from source systems.
This requires investment before model development begins. For project managers, allocate 30-40% of your timeline to data preparation. The alternative is building models on unstable foundations that fail on impact or validation.
Key deliverables:
Model control comprehends specificity, update control and versioning.
Model specificity means pre-trained models that get fine-tuned on the company's specific documents, language and use-cases. This is essential for performance.
For safety and reliability, update control is essential. Models are essentially frozen, until approved team members can push a model update based on pre-tested performance levels. A pre-defined set of documents, instructions and correct answers are used to measure performance across versions.
Of course, model versioning with ability to roll-back complete the picture. Having the ability to push updates and roll-backs through a no-code, simple user interface is essential for day-to-day practicality.
Not all AI systems carry the same risk profile. LLMs are black-box and probabilistic by nature - they can produce different outputs for the same input. However, a machine learning model trained on a controlled data set is deterministic and reproducible.
In regulated environments, this distinction is critical. Deterministic ML provides the reliability and safety that GxP processes require: same input, same output, every time. Since inputs are rarely the same, what also matters is predictability: small changes in inputs mean small changes in outputs.
Getting the best of both worlds:
The right choice depends on your use case. Tools that enable hybrid approaches—GenAI flexibility where safe, deterministic ML where required—give you innovation without compromising compliance.
AI systems for Pharma and regulated industries must provide transparency on answer quality. This is achieved through confidence scores that give transparency on individual data points and to entire documents.
Confidence scores are not perfect on their own – is 97% good enough? In what case? It takes the comparison between a confidence score and a golden truth or reference dataset to benchmark which score should trigger what (reprocessing, human review, blind approval).
Since confidence does not mean correctness, for the most sensitive use-cases we also advise companies to run separate checks based on predefined checklist and/or predefined triggers, for instance in the case of unusual documents or inputs, where the model own confidence level might not be accurate.
Practical implementation:
The result: Teams focus expert time where it's needed most, while routine cases flow through efficiently. This transforms AI from a replacement threat into a productivity multiplier.
Human validation requirements should be adapted based on use case, model type, and confidence scores. AI systems must provide flexible expert-in-the-loop workflows that make verification easy and track validation steps for audit purposes.
This isn't one-size-fits-all. A safety signal detection system needs different oversight than a document routing tool.
Design considerations:
The goal: Human-in-the-loop (HITL) steps that are pleasant to the user, efficient to use, with clear traceability that satisfies both operational needs and regulatory requirements.
See how Acodis applies these principles in practice
Get a free audit of your document processes and a tailored implementation roadmap.
Traceability is easier said than done. When documents and information navigate from one system to the next, it becomes unclear what is the original source and which one is the latest truth. AI systems for regulated teams must provide traceability all the way back to the source.
Users shouldn't need to dig through logs or launch separate queries to understand where information came from. Source attribution should be visible, clickable, and instantaneous. This requires AI ready data and feeds directly in the effectiveness and reliability of the HITL step.
Implementation best practices:
When experts can verify sources in seconds, validation becomes faster, trust builds naturally, and compliance is maintained. Traceability isn't a feature - it's the foundation of responsible AI deployment.
Not all use cases carry equal risk. Based on use case risks and AI system type (see for instance the ISPE risk severity matrix for AI and ML systems[2]), pharma companies need nuanced implementation strategies.
The key questions to ask:
Smart risk-based deployment means adjusting the implementation design to the required impact to the validation effort, with the long-term view in mind.
Golden truth or reference datasets must be used to track performance over time and across model versions. Typically best to define such data sets in the beginning and adjust them as the use-case evolves.
Without ongoing monitoring, model performance silently decays. Not because the models are getting tired, but because the data they process is evolving. Therefore the gap between the data they were trained on and the data they handle every day is widening. It could be that new types of documents or new language is being introduced that aren't reflected in the training set. This isn't a huge problem, it just requires to refresh the training set with a sample of the new input types.
For applications that are "in production" with adequate HITL validation, this is easy: the experts by definition create new reference data that can be included in the training set.
Establish monitoring framework:
Software provides better tracking than traditional paper processes—but only if done right. If everything is tracked but buried in unclear event logs, it doesn't help anyone.
Yes to time-stamped, account-based tracking. But also yes to formats that are easy to access and understand—by users, reviewers, and auditors alike.
Effective audit trail characteristics:
Good documentation serves two masters: operational needs (finding information quickly) and compliance needs (proving system integrity). Design for both from day one.
For data privacy and security, favor dedicated instances with minimal endpoints to external systems. Combine this with strong access management featuring tiered permissions and regular review of access rights.
Ring-fencing means isolation. Your AI systems operate in controlled environments, not on shared infrastructure where a breach in one area compromises everything.
Security architecture:
This controlled approach protects patient data, maintains competitive intelligence, satisfies regulatory requirements (GDPR, HIPAA), and gives auditors confidence that your systems are properly secured.
Machine learning in pharma isn't about replacing human expertise—it's about amplifying it. Done right, ML accelerates discovery, improves quality, time-to-market and strengthens compliance.
This framework is our view on how to do it right. These 10 attributes aren't a checklist to complete once—they're design principles to embed from day one and revisit as your systems scale. Teams that get this right don't just deploy AI faster; they build the organisational trust that allows them to go further.
And it obviously works in consideration of the latest regulation and guidance on computer and AI systems.
[1] DISRUPT-DS Roundtable (Senderovitz T., Weatherall J., Rochon J. et al., representing Novo Nordisk, AstraZeneca, Boehringer Ingelheim, Novartis, Merck & Co, Eli Lilly, Pfizer, Sanofi, Bayer, Gilead, AbbVie, Genentech, GSK, BCG and others). Generative AI in pharmaceutical R&D: From large language models to AI agents to regulation. Drug Discovery Today, Vol. 31, Issue 1, January 2026, 104593. doi:10.1016/j.drudis.2025.104593
[2] "Machine Learning Risk and Control Framework" by Rolf Blumenthal, Nico Erdmann, Martin Heitmann, Anna-Liisa Lemettinen, Brandi Stockton. View article
Want to apply these principles to your specific use case?
Acodis helps pharmaceutical and life sciences companies deploy AI safely in regulated environments. Book a free 30-minute consultation and we'll map these attributes to your actual processes and compliance requirements.
Prefer email? Reach us at erik.cervilla@acodis.io