Mímir project and training on copyright-protected material

Mímir project and training on copyright-protected material

On behalf of the Government, the National Library collaborated with NorwAI at NTNU and Language Technology Group at UiO in 2024 to analyze the significance of copyright-protected material on linguistic quality in language models.

The language in language models with and without copyright-protected material such as Norwegian newspapers and books was compared. The results showed that language models trained on content where rights-protected Norwegian material is included achieve better quality.

The goal of the project was to gather empirical evidence that could form the basis for possible agreements between the state and rights holders on the use of copyrighted content for AI purposes. Work is now underway to establish principles related to such agreements (as of March 1, 2025).

Source: https://www.nb.no/content/uploads/2024/08/Mimirprosjektet_teknisk-rapport.pdf