Skip to content

News

Denmark's Strategic Effort for Artificial Intelligence

The Danish Ministry of Digitalisation has published a national strategy for artificial intelligence — Strategisk indsats for kunstig intelligens — outlining Denmark's ambitions and priorities for AI development and adoption. The strategy identifies four key initiatives, and Danish Foundation Models is directly at the heart of the third.

The Imperative of Danish Foundation Models: Bridging the Linguistic AI Divide

In recent years, the field of machine learning has experienced a transformative shift, primarily driven by the advent of foundation models. These models, pre-trained on vast amounts of data, can be finetuned for various downstream tasks, making them invaluable across multiple domains. However, the dominance of the English language in the development of these models poses significant challenges for smaller language communities. The Danish Foundation Models project emerges as a crucial initiative to ensure that the Danish language does not lag behind in this AI revolution.

Data Handling

Training large language models requires enormous amounts of data. From the moment we receive raw data to the point it can be used for model training, it goes through a transformation process.

The following is a high-level description of this process. We continuously develop and improve it to ensure we apply state-of-the-art methods and practices.

Data Sources

The data language models are trained on is decisive for what they can be used for. In Danish Foundation Models (DFM), our approach is to have certainty that we are permitted to use the data we train on from data owners, and to focus on value-creating use cases. We pursue this, among other ways, through our collaboration with the Danish Language Model Consortium.

Releasing Munin 7B Alpha - A Danish LLM

We are excited to announce the release of the first model from the Danish Foundation Models project, nicknamed Munin 7B Alpha. This model represents the beginning of our research into Danish Large Language Models (LLMs), employing continual pre-training based on the already pre-trained Mistral-7b-v0.1 model. It has been pre-trained on the Danish Gigaword dataset, which has been instrumental in training various Danish BERT-style models.

Why Danish Needs Its Own Foundation Models

Danish is one of the world's richest languages — but in the age of large language models, it risks becoming a digital second-class citizen. We published a position paper arguing why that matters, and what we're doing about it.