Training on professional language resources
Training on professional language resources
There are several professional language resources that can be used to train language models, including terminologies and classifications.
SNOMED CT is an international machine-readable terminology with nearly 370,000 health professional concepts. Approximately 130,000 of the concepts are translated into Norwegian and cover health professional areas such as anatomy, findings, symptoms, diagnoses, procedures, substances, and medicines. SNOMED CT is mainly used for documentation and interaction of patient information.
The translation is done in Bokmål, while an increasing part is also available in Nynorsk. The resource is multilingual in that there is a direct connection between English and Norwegian terms.
The Norwegian translations are freely available from the Language Bank at the National Library. Internationally, the terms in SNOMED CT have been used to train several language models (pubmed.ncbi.nlm.nih.gov).
The Norwegian Directorate of Health manages the Norwegian version of SNOMED CT with regular upgrades.
SNOMED CT also contains a deeper machine-readable structure (ontology) with, among other things, concept relationships. SNOMED CT's ontology has been used for RAG in language models, also in Norway, but requires a license from the Norwegian Directorate of Health.
Another relevant resource is the International Statistical Classification of Diseases and Related Health Problems (ICD), which is owned and managed by WHO. The health service in Norway currently uses ICD-10 as a code system for diseases and causes of death and therefore contains terms and concepts that are in established use internationally.
The Norwegian Directorate of Health manages the Norwegian version of ICD. WHO has now updated ICD-10 to ICD-11. ICD-11 has updated medical content that is much more comprehensive and also includes a terminology. The resource is now being translated into Norwegian and will be freely available for use.
Similarly, ICPC-2 is the international classification for health problems, diagnoses, and other reasons for contact with primary health services. ICPC-2 is in use in Norway and the Norwegian Directorate of Health maintains both this and the website where the updated international edition of ICPC-2 (English version, ICPC-2e-English) is published on behalf of the Wonca International Classification Committee (WICC).
The Norwegian Directorate of Health also manages a number of other relevant classifications, for example the Norwegian Procedure Code System, Norwegian Laboratory Code System with associated code systems (Test Material, Anatomical Localization, Textual Result Values, and Examination Method), Norwegian Pathology Code System (NORPAT), and Activity Codes for pathology laboratories (APAT). The development of the Norwegian Procedure Code System was initiated by the Nordic Council and still has a Nordic-Baltic core (NCSP). This contains, for example, terms for procedures and procedure groups in use in specialist health services in the Nordic-Baltic countries.