Monitoring of AI systems requires a multidisciplinary approach that includes technical, clinical, and safety-related aspects. The organization should identify and implement measures to enable the detection and investigation of changes in system behavior [149].
AI systems rely on robust technical infrastructure, which must be continuously monitored to protect against threats that could affect system stability and reliability. Clinical monitoring ensures that the AI system fulfils its intended clinical purpose and provides value to healthcare professionals and patients. Security incidents may have direct clinical consequences, such as misdiagnosis or incorrect treatment, that can endanger patient health and safety.
Cyber security, information security and data protection
AI systems may involve new or more complex technologies compared to what the organization is accustomed to. As knowledge about AI system security is still developing, it should be expected that understanding of security risks, and best practices for managing them, will evolve over time. The threat landscape is dynamic, and it may be necessary to regularly update risk assessments and implement new security measures. The EU Medical Device Coordination Group (MDCG) [150] has published a guidance document that can support cybersecurity assessments for medical devices [151]. Additional resources related to AI security are referenced in phase 4 [152][153][154].
Security testing must be considered for new versions. This includes assessing whether security testing procedures need to be adapted, for example, when new functionalities are implemented. It is essential to conduct regular security testing even if no changes to the AI system have been made. This makes it possible to detect newly emerging risks, such as the discovery of new vulnerabilities or unauthorized changes to the system. AI systems may pose unique security challenges due to specific requirements related to their monitoring and logging.
These risks can be managed through established routines for regular assessments and a structured plan for staying up to date on security threats associated with the type of AI technology the system relies on.
Performance monitoring
Like any other medical device or treatment, AI systems in use must be monitored. The AI Act will require continuous monitoring of high-risk AI systems, as well as mandatory reporting of serious incidents [155].
If the AI system is intended to function with a high degree of autonomy in the organization, this will require more extensive monitoring. AI systems and their operational contexts are often complex, which can make it difficult to detect and address errors as they occur.
Model drift is particularly important to monitor and refers to the deterioration of an AI model's performance due to changes in data or in the relationship between input and output variables. This includes tracking changes in input data, patient populations, or system outputs. Model drift can negatively impact the system’s performance, resulting in poor predictions or incorrect decisions [156]. There are various causes of model drift, and the AI system must be continuously monitored for these.
For a clinical decision support system based on AI, monitoring may include:
Technical measures for performance monitoring:
- AI systems are sensitive to the input data they receive, and control mechanisms should therefore be in place, for example, to ensure that data is complete and of the correct and expected quality and format.
- Monitoring of technical stability and robustness.
- Validation of new model versions using a fixed test dataset. Updating the test dataset when needed.
- Monitoring changes in input data and workflows and assessing their impact on model performance.
- Implementing performance monitoring mechanisms, such as tracking accuracy and bias.
Clinical measures for performance monitoring:
- Reporting errors, deviations and limitations in the use of the AI system. This includes both misuse of the AI system and issues related to its output.
- Conducting periodic validations against clinical datasets to detect potential errors that may be difficult to identify during routine use.
- Performing quality control by recording discrepancies between the AI model’s predictions and actual clinical outcomes, and evaluating whether the system’s recommendations or decisions align with clinical standards and professional guidelines.
[149] The Code of Conduct's documents, including Security risks in artificial intelligence systems - what do we know and what can we do?
[155] Requirements for the organization in Article 26 of the AI Act: Obligations of deployers of high-risk AI systems (eur-lex.europa.eu)