A practical guide for Pharma professionals, operational excellence and automation engineers deploying AI in regulated environments and high precision environments.
Project managers in pharma are navigating a critical tension: harness AI's potential to accelerate processes and improve outcomes while maintaining the compliance, safety, and quality standards that define our industry.
The stakes are high—patient safety, regulatory approval, and scientific integrity all depend on getting this right.
At Acodis, we've distilled our experience deploying machine learning in regulated pharma environments into 10 essential attributes to find the right trade-offs. These aren't theoretical ideals - they're pragmatic requirements that separate successful, compliant ML deployments from those that stall in pilot purgatory or, worse, create regulatory risk.
What makes this different: We focus on operational realities, solution impacts and overall safety, beyond the specific validation GxP, regulatory and GAMP5 frameworks. These attributes reflect what actually works in production - balancing innovation with GxP requirements, enabling teams without overwhelming them, and building systems that auditors can understand and trust.
10 Essential Attributes
1. AI-Ready Data
Data quality determines everything downstream. 70% of the top 20 Pharma companies place data quality as the #1 issue for successful AI implementations[1]. AI-ready data means structured, standardized datasets with proper metadata, controlled vocabularies (MedDRA, SNOMED), and clear lineage from source systems.
This requires investment before model development begins. For project managers, allocate 30-40% of your timeline to data preparation. The alternative is building models on unstable foundations that fail on impact or validation.
Key deliverables:
- Standardised data formats, typically XML
- Data chunked in small components with adequate context (metadata)
- Validation of data quality (see below confidence scores)
- Traceability back to the source maintained through the pipeline
2. Model Control
Model control comprehends specificity, update control and versioning.
Model specificity means pre-trained models that get fine-tuned on the company's specific documents, language and use-cases. This is essential for performance.
For safety and reliability, update control is essential. Models are essentially frozen, until approved team members can push a model update based on pre-tested performance levels. A pre-defined set of documents, instructions and correct answers are used to measure performance across versions.
Of course, model versioning with ability to roll-back complete the picture. Having the ability to push updates and roll-backs through a no-code, simple user interface is essential for day-to-day practicality.
3. Deterministic Models
Not all AI systems carry the same risk profile. LLMs are black-box and probabilistic by nature - they can produce different outputs for the same input. However, a machine learning model trained on a controlled data set is deterministic and reproducible.
In regulated environments, this distinction is critical. Deterministic ML provides the reliability and safety that GxP processes require: same input, same output, every time. Since inputs are rarely the same, what also matters is predictability: small changes in inputs mean small changes in outputs.
Getting the best of both worlds:
- In-production for sensitive cases: stick to deterministic ML
- Enable GenAI only on non-GxP use cases with full human review or for low-risk applications where variation is acceptable
- Alternatively, use GenAI in controlled "setup phases" and then only use ML in production
The right choice depends on your use case. Tools that enable hybrid approaches—GenAI flexibility where safe, deterministic ML where required—give you innovation without compromising compliance.
4. Confidence Scores
AI systems for Pharma and regulated industries must provide transparency on answer quality. This is achieved through confidence scores that give transparency on individual data points and to entire documents.
Confidence scores are not perfect on their own – is 97% good enough? In what case? It takes the comparison between a confidence score and a golden truth or reference dataset to benchmark which score should trigger what (reprocessing, human review, blind approval).
Since confidence does not mean correctness, for the most sensitive use-cases we also advise companies to run separate checks based on predefined checklist and/or predefined triggers, for instance in the case of unusual documents or inputs, where the model own confidence level might not be accurate.
Practical implementation:
- Establish confidence thresholds based on reference data sets
- Apply scores at granular level (field-by-field) and document level
- Create risk-stratified workflows: high confidence auto-processes, low confidence requires review
- Monitor score calibration over time to ensure accuracy
- Run separate human based, quality checks on the most sensitive or out-of-the-norm cases
The result: Teams focus expert time where it's needed most, while routine cases flow through efficiently. This transforms AI from a replacement threat into a productivity multiplier.
5. Expert in the Loop for Validation
Human validation requirements should be adapted based on use case, model type, and confidence scores. AI systems must provide flexible expert-in-the-loop workflows that make verification easy and track validation steps for audit purposes.
This isn't one-size-fits-all. A safety signal detection system needs different oversight than a document routing tool.
Design considerations:
- Adapt review depth to risk level and confidence scores
- Interfaces should provide easy, side-by-side visual verification
- Capture rationale for approvals and rejections
- Maintain complete audit trails of who validated what, when, and why
The goal: Human-in-the-loop (HITL) steps that are pleasant to the user, efficient to use, with clear traceability that satisfies both operational needs and regulatory requirements.
See how Acodis applies these principles in practice
Get a free audit of your document processes and a tailored implementation roadmap.
6. Traceability of Content and Sources
Traceability is easier said than done. When documents and information navigate from one system to the next, it becomes unclear what is the original source and which one is the latest truth. AI systems for regulated teams must provide traceability all the way back to the source.
Users shouldn't need to dig through logs or launch separate queries to understand where information came from. Source attribution should be visible, clickable, and instantaneous. This requires AI ready data and feeds directly in the effectiveness and reliability of the HITL step.
Implementation best practices:
- Highlight source text directly in documents
- Provide one-click navigation to original sources
- Show context around extracted data (surrounding paragraphs, section headers)
- Display metadata (document version, page number, extraction timestamp)
When experts can verify sources in seconds, validation becomes faster, trust builds naturally, and compliance is maintained. Traceability isn't a feature - it's the foundation of responsible AI deployment.
7. Risk-Based Deployment
Not all use cases carry equal risk. Based on use case risks and AI system type (see for instance the ISPE risk severity matrix for AI and ML systems[2]), pharma companies need nuanced implementation strategies.
The key questions to ask:
- Is this use case GxP-relevant?
- What's the risk level to patient safety and data integrity?
- Does the value of the system justify the cost of GxP validation?
- How can different implementation approaches lower the risk and validation burden while still delivering most of the impact?
- Is it possible to create phased implementation starting with basic functions and HITL steps that don't cross the GxP line, before moving on to fuller implementation and validation?
Smart risk-based deployment means adjusting the implementation design to the required impact to the validation effort, with the long-term view in mind.
8. Ongoing Performance Monitoring
Golden truth or reference datasets must be used to track performance over time and across model versions. Typically best to define such data sets in the beginning and adjust them as the use-case evolves.
Without ongoing monitoring, model performance silently decays. Not because the models are getting tired, but because the data they process is evolving. Therefore the gap between the data they were trained on and the data they handle every day is widening. It could be that new types of documents or new language is being introduced that aren't reflected in the training set. This isn't a huge problem, it just requires to refresh the training set with a sample of the new input types.
For applications that are "in production" with adequate HITL validation, this is easy: the experts by definition create new reference data that can be included in the training set.
Establish monitoring framework:
- Create golden truth datasets at project start—expert-validated, representative, version-controlled
- Test all model versions against the same benchmark for fair comparison
- Track performance metrics monthly (accuracy, precision, recall, domain-specific KPIs)
- Set alert thresholds for degradation (e.g., >5% drop triggers investigation)
- Update training from time to time with outputs validated by the experts in the normal course of business
9. Audit Trails & Documentation
Software provides better tracking than traditional paper processes—but only if done right. If everything is tracked but buried in unclear event logs, it doesn't help anyone.
Yes to time-stamped, account-based tracking. But also yes to formats that are easy to access and understand—by users, reviewers, and auditors alike.
Effective audit trail characteristics:
- Comprehensive: Capture all critical system activities
- Tamper-evident: Immutable logs that can't be altered post-facto
- Searchable: Filter by user, date, document, action type
- Exportable: Generate reports for inspections and internal audits
- Human-readable: Translate technical events into business language
Good documentation serves two masters: operational needs (finding information quickly) and compliance needs (proving system integrity). Design for both from day one.
10. Ring-Fenced Deployment & Role-Based Access
For data privacy and security, favor dedicated instances with minimal endpoints to external systems. Combine this with strong access management featuring tiered permissions and regular review of access rights.
Ring-fencing means isolation. Your AI systems operate in controlled environments, not on shared infrastructure where a breach in one area compromises everything.
Security architecture:
- Deploy in dedicated instances or isolated network segments
- Minimize API endpoints and external connections
- Implement role-based access control with least-privilege principle
- Conduct quarterly access reviews to remove unused permissions
- Log all access attempts and data movements
This controlled approach protects patient data, maintains competitive intelligence, satisfies regulatory requirements (GDPR, HIPAA), and gives auditors confidence that your systems are properly secured.
Conclusion
Machine learning in pharma isn't about replacing human expertise—it's about amplifying it. Done right, ML accelerates discovery, improves quality, time-to-market and strengthens compliance.
This framework is our view on how to do it right. These 10 attributes aren't a checklist to complete once—they're design principles to embed from day one and revisit as your systems scale. Teams that get this right don't just deploy AI faster; they build the organisational trust that allows them to go further.
And it obviously works in consideration of the latest regulation and guidance on computer and AI systems.
References
[1] DISRUPT-DS Roundtable (Senderovitz T., Weatherall J., Rochon J. et al., representing Novo Nordisk, AstraZeneca, Boehringer Ingelheim, Novartis, Merck & Co, Eli Lilly, Pfizer, Sanofi, Bayer, Gilead, AbbVie, Genentech, GSK, BCG and others). Generative AI in pharmaceutical R&D: From large language models to AI agents to regulation. Drug Discovery Today, Vol. 31, Issue 1, January 2026, 104593. doi:10.1016/j.drudis.2025.104593
[2] "Machine Learning Risk and Control Framework" by Rolf Blumenthal, Nico Erdmann, Martin Heitmann, Anna-Liisa Lemettinen, Brandi Stockton. View article
Want to apply these principles to your specific use case?
Acodis helps pharmaceutical and life sciences companies deploy AI safely in regulated environments. Book a free 30-minute consultation and we'll map these attributes to your actual processes and compliance requirements.
Prefer email? Reach us at erik.cervilla@acodis.io