Establish quality framework for large language models for Norwegian health and care services

Foreslått tiltakseier: Helsedirektoratet i første omgang

Proposed responsible party: Norwegian Directorate of Health, initially

In collaboration with: Health and care services, Norwegian Institute of Public Health, Health Supervision Authority, DSA, R&D environments, National Library, Digitalization Directorate, Research Council of Norway, Innovation Norway, etc.

Relevant for: The health and care sector

Problem to be solved: There is great uncertainty about what constitutes sufficiently good quality to use large language models in the health and care sector, and how this is measured. Evaluation of language models is complex and there are currently few generally accepted methods for evaluation and testing (benchmarks) of large language models for the health and care sector. This also makes it difficult to choose the right language model for further development into high-quality AI systems.

Proposal: The sector develops and establishes common quality frameworks for testing and evaluating language models to contribute to safe use of generative AI models in the health and care sector.

The establishment of the quality framework will require close collaboration with several stakeholders in the health and care sector, public and private sector, research sector, and business.

A good starting point would be administrative use cases with low risk and with basic evaluation, which is the lowest level in Figure 7. The framework can be gradually expanded to higher-risk use cases as the field matures and the EU clarifies both regulations and standards related to AI and medical devices.

The work should include the following:

  1. Identify organizations that should participate in the work and clarify roles and responsibilities
  2. Map relevant existing tests (benchmarks), frameworks, best practices, and standards for testing and evaluation of large language models
  3. Systematize experiences from similar work in the sector, including the Norwegian Directorate of Health's work with AI services such as Helsesvar and Easier Access to Information (ETI).
  4. Gradually develop and pilot the framework based on selected use cases

The framework can include both quantitative and qualitative evaluation methods. For further details, see chapter 4.3 and chapter 4.4.