Kapittel 6.3 Establish shared data resources

Proposed responsible party: Several measures, with different responsible parties

In collaboration with: Norwegian Directorate of Health, the health and care sector, regional health enterprises, Norwegian Association of Local and Regional Authorities (KS), NIPH and National Library, DSA

Relevant for: the health and care sector

Problem to be solved: The health and care sector lacks a sufficient common foundation of data to develop, adapt, and test language models more easily and cost-effectively, and thus contribute to the development of high-quality AI tools adapted to Norwegian conditions.

Proposal: The health and care sector should collaborate to make more high-quality data available for both training (pre-training and post-training), knowledge grounding (e.g., RAG), and testing of language models to be used in the health and care sector.

The work should include:

  1. mapping open and easily accessible data sources, such as guidelines and methodology books, for developing large language models for health and care services (Hdir, in collaboration with NIPH). An overview of freely available texts and other language resources with healthcare knowledge and practice can be placed on the information page about AI at the Norwegian Directorate of Health
  2. making professional language data sources such as terminologies and classifications available for reuse together with other relevant data, for example, on helsedata.no, helsedirektoratet.no, or in the language bank at the National Library (Hdir), also Nynorsk and Sami
  3. establishing data resources that constitutes common national principles for values and ethics in language models in Norwegian health and care services
  4. assessing how sensitive data such as medical records, discharge summaries, and forms can be used to develop large language models (clarify regulations through guidance, technical solutions, etc.) (Hdir, in collaboration with relevant (research) environments)
  5. making data sources protected by specific rights more accessible, for example, by considering establishing a compensation scheme for rights holders of healthcare content in line with national principles (see the Mimir project at the National Library [150])
  6. investigating the need for and establishment of a common infrastructure for sharing language data, for example, through helsedata.no (NIPH, Hdir, health enterprises). See also recommended measures under establishing infrastructure for computing power (chapter 5.2)

For further details, see chapter 4.3 and chapter 4.4

Last update: 29. juli 2025