Evaluating Efficiency Gains and Security of LLM-Driven Test Generation for Computerised System Validation: A Compliance-Focused Analysis of Life Sciences Testing Processes

Loading...
Thumbnail Image
Authors
Vladimirov, Daniil
Issue Date
2025
Type
Thesis
Language
Keywords
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract

Abstract Pharmaceutical computerized system validation remains documentation-intensive, consuming substantial project effort and impeding Pharma 4.0 adoption. The CSV market grew to $3.92B in 2024 and is projected to reach $14.02B by 2037, highlighting the scale of optimization opportunity. This thesis addressed the tension between regulatory assurance and agility by developing and empirically evaluating a compliance-aware framework that uses Large Language Models to automate Operational Qualification (OQ) test generation from User Requirements Specifications (URS) under GAMP 5 (2nd ed.), 21 CFR Part 11, EU Annex 11, and ALCOA+ constraints. The methodology employed a five-agent, event-driven architecture (GAMP classifier, context provider, research analyst, SME consultant, OQ generator) with confidencegated handoffs, a fail-closed no-fallback policy, and full audit trails; evaluation used 30 synthetic URS spanning GAMP Categories 3–5, K=5 self-consistency, risk-based scoring aligned to ALCOA+, and predefined quantitative metrics. Results demonstrated 96.7% requirements coverage (target ≥95%), 91.3% categorization accuracy, and 7.4 minutes average processing per document. Migration to the open-source DeepSeek model reduced cost by 91% while preserving performance. Security controls achieved 100% semantic preservation with zero unsafe transformations; however, end-to-end completion was 76.7%, below the 90% reliability target, indicating variance and edge-case sensitivity. This research contributes the Compliance-Aware AI Engineering paradigm, establishing regulatory constraints as first-class design parameters, and validates a practical multi-agent architecture for auditable, GxP-aligned OQ generation. In practice, the framework offers a staged implementation path with measurable efficiency gains and clear governance (traceability, authority checks, documentation) suitable for regulated deployment. Future work should focus on variance reduction via reproducible multi-run protocols, expanded adversarial testing, and extension beyond OQ to IQ/PQ and multilingual corpora.

Description
Citation
Publisher
License
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN