According to Liisa Pakosta, the Minister of Justice and Digital Affairs, it is important that large language models take the Estonian language and culture into account. "It is crucial for the sustainability of our language and culture that open data of the Estonian language corpus be available to language model developers," noted minister Pakosta.
Sharing Estonian-language data creates the precondition for large language models to understand the cultural context of Estonia and become more proficient in using the Estonian language.
At the same time, this enables the development of better services for Estonian-speaking users in various AI-based applications – such as chatbots, translation systems, and other language technology solutions.
The Ministry of Justice and Digital Affairs and the Ministry of Education and Research, in cooperation with the Institute of the Estonian Language, are working to ensure the accessibility and discoverability of Estonian-language datasets. In addition, efforts are being made to expand quality data packages so that the Estonian language is represented at a higher level in large language models.
Estonia is open to cooperation and ready to share its language data with other large language model developers. The Ministry of Justice and Digital Affairs encourages both the public and private sectors to release data in order to increase the volume of quality Estonian-language data. This can be done through Estonian open data portal.
*Press release has been modified.