FDA
Assessments related to medical equipment with generative AI conducted by FDA
FDA has assessed medical equipment that uses generative artificial intelligence (AI) and points to several challenges and risks:
- Hallucinations: Generative AI can produce incorrect content ("hallucinations"). For example, equipment designed to summarize a conversation between a patient and healthcare personnel may inadvertently generate a false diagnosis that was never discussed.
- Lack of transparency: Equipment using foundation models developed by third parties often has limited access to information about the models' architecture, training methods, and datasets. This can make it difficult for manufacturers to ensure quality and safety.
- Continuous learning: Generative models can either be static or change continuously ("continuous learning"). Continuous learning is associated with uncertainty about the model's performance over time, and as of November 2024, FDA had not approved medical equipment using continuous learning.
- Challenges with scientific documentation: It can be difficult to determine what type of valid scientific documentation ("evidence") FDA should request to be able to assess the equipment's safety and effectiveness throughout the lifecycle.
- Challenges with FDA's classification system: Generative AI can introduce new or different risks that challenge FDA's current classification system for medical equipment. How equipment is classified affects what regulatory measures are necessary to ensure the equipment becomes safe and effective.
FDA also highlights challenges related to evaluation and testing of generative models:
- Pre-market evaluation: Large language models have complex parameters and can give different responses based on small changes in wording or prompts. It is not possible to test all possible prompts before launch. Moreover, lack of transparency and the potential for unforeseen responses can make it particularly difficult to evaluate such systems before they come to market. They (FDA) point to the importance of plans and methods for monitoring after the equipment is put into use (postmarket monitoring).
- Need for new evaluation methods: Current methods for quantitative performance evaluation may be insufficient to ensure safe use of generative AI. New, qualitative evaluation methods can help characterize/describe the model's autonomy, transparency, and explainability. Which evaluation methods are required will vary based on the product's specific use area and design.
FDA recommends manufacturers of medical equipment to consider the following compared to a non-generative alternative:
- Will inclusion of generative AI increase the risk class of medical equipment?
- Could a product with generative AI provide misinformation and thus pose a risk to public health?
- Is generative AI appropriate for the intended use area?